The field from a grid

As part of his presentation of indirect methods for finding the field, Feynman presents an interesting argument on the electrostatic field of a grid. It’s just another indirect method to arrive at meaningful conclusions on how a field is supposed to look like, but it’s quite remarkable, and that’s why I am expanding it here. Feynman’s presentation is extremely succint indeed and, hence, I hope the elaboration below will help you to understand it somewhat quicker than I did. :-)

The grid is shown below: it’s just a uniformly spaced array of parallel wires in a plane. We are looking at the field above the plane of wires here, and the dotted lines represent equipotential surfaces above the grid.


As you can see, for larger distances above the plane, we see a constant electric field, just as though the charge were uniformly spread over a sheet of charge, rather than over a grid. However, as we approach the grid, the field begins to deviate from the uniform field.

Let’s analyze it by assuming the wires lie in the xy-plane, running parallel to the y-axis. The distance between the wires is measured along the x-axis, and the distance to the grid is measured along the z-axis, as shown in the illustration above. We assume the wires are infinitely long and, hence, the electric field does not depend on y. So the component of E in the y-direction is 0, so E= –∂Φ/∂y = 0. Therefore, ∂2Φ/∂y= 0 and our Poisson equation above the wires (where there are no charges) is reduced to ∂2Φ/∂x+ ∂2Φ/∂z=0. What’s next?

Let’s look at the field of two positive wires first. The plot below comes from the Wolfram Demonstrations Project. I recommend you click the link and play with it: you can vary the charges and the distance, and the tool will redraw the equipotentials and the field lines accordingly. It will give you a better feel for the (a)symmetries involved. The equipotential lines are the gray contours: they are cross-sections of equipotential surfaces. The red curves are the field lines, which are always orthogonal to the equipotentials.

WolframThe point at the center is really interesting: the straight horizontal and vertical red lines through it are limits really. Feynman’s illustration below shows the point represents an unstable equilibrium: the hollow tube prevents the charge from going sideways. So if it wouldn’t be there, the charge would go sideways, of course! So it’s some kind of saddle point. Onward!hollow tube

Look at the illustration below and try to imagine how the field looks like by thinking about the value of the potential as you move along one of the two blue lines below: the potential goes down as we move to the right, reaches a minimum in the middle, and then goes up again. Also think about the difference between the lighter and darker blue line: going along the light-blue line, we start at a lower potential, and its minimum will also be lower than that of the dark-blue line.


So you can start drawing curves. However, I have to warn you: the graphs are not so simple. Look at the detail below. The potential along the blue line goes slightly up before it decreases, so the graph of the potential may resemble the green curve on the right of the image. I did an actual calculation here. :-) If there are only two charges, the formula for the potential is quite simple: Φ = (1/4πε0)·(q1/r1) + (1/4πε0)·(q2/r2). Briefly forgetting about the (1/4πε0) and equating q1 and q2 to +1, we get Φ = 1/r1 + 1/r= (r1 + r2)/r1r2.  That looks like an easy function, but we need to express it as a function of x, keeping z (i.e. the ‘vertical’ coordinate) constant. That’s what I did to get the graphs below. It’s easy to see that 1/r= (x+ z2)−1/2, while 1/r= [(a−x)+ z2]−1/2. Assuming a = 2 and z = 0.8, the contribution from the first charge is given by the blue curve, the contribution of the second charge is represented by the red curve, and the green curve adds both and, hence, represents the potential generated by both charges, i.e. qat x = 0 and qat x = a. OK… Onward!

 lines 3graph 2

The point to note is that we have an extremely simple situation here – two charges only, or two wires, I should say – but a potential function that is surely not some simple sinusoidal function. To drive the point home, I plotted a few more curves below, keeping a at a = 2, but equating z with 0.4, 0.7 and 1.7 respectively. The z = 1.7 curve shows that, at larger distances, the potential actually increases slightly as we move from left to right along the z = 1.7 line. Note the remarkable symmetry of the curves and the equipotential lines: there should be some obvious mathematical explanation for that but, unfortunately, not obvious enough for me to find it, so please let me know if you see it! :-)

graph 3

OK. Let’s get back to our grid. For your convenience, I copied it once more below.


Feynman’s approach to calculating the variations is quite original. He also duly notes that the potential function is surely not some simple sinusoidal function. However, he also notes that, when everything is said and done, it is some periodic quantity, in one way or another, and, therefore, we should be able to do a Fourier analysis and express it as a sum of sinusoidal waves. To be precise, we should be able to write Φ(x, z) as a sum of harmonics.

[…] I know. […] Now you say: Oh sh**! And you’ll just turn off. That’s OK, but why don’t you give it a try? I promise to be lengthy. :-)

Before we get too much into the weeds, let’s briefly recall how it works for our classical guitar string. That post explained how the wavelengths of the harmonics of a string depended on its length. If we denote the various harmonics by their harmonic number n = 1, 2, 3 etcetera, and the length of the string by L, we have λ1 = 2L = (1/1)·2L, λ2 = L = (1/2)·2L, λ3 = (1/3)·2L,… λn = (1/n)·2L. In short, the harmonics – i.e. the components of our waveform – look like this:


etcetera (1/8, 1/9,…,1/n,… 1/∞)

Beautiful, isn’t it? As I explained in that post, it’s so beautiful it triggered a misplaced fascination with harmonic ratios. It was misplaced because the Pythagorean theory was a bit too simple to be true. However, their intuition was right, and they set the stage for guys like Copernicus, Fourier and Feynman, so that was good! :-)

Now, as you know, we’ll usually substitute wavelength and frequency by wavenumber and angular frequency so as to convert all to something expressed in radians, which we can then use as the argument in the sine and/or cosine component waves. [Yes, the Pythagoreans once again! :-)] The wavenumber k is equal to k = 2π/λ, and the angular frequency is ω = 2π·f = 2π/T (in case you doubt, you can quickly check that the speed of a wave is equal to the product of the wavelength and its frequency by substituting: = λ·= (2π/k)·(ω/2π) = ω/k, which gives you the phase velocity vp= c). To make a long story short, we wrote k = k1 = 2π·1/(2L), k2 = 2π·2/(2L) = 2k, k3 = 2π·3/(2L) = 3k,,… kn = 2π·3/(2L) = nk,… to arrive at the grand result, and that’s our wave F(x) expressed as the sum of an infinite number of simple sinusoids: 

F(x) = a1cos(kx) + a2cos(2kx) + a3cos(3kx) + … + ancos(nkx) + … = ∑ ancos(nkx)

That’s easy enough. The problem is to find those amplitudes a1, a2, a3,… of course, but the great French mathematician who gave us the Fourier series also gave us the formulas for that, so we should be fine! Can we use them here? Should we use them here? Let’s see…

The in the analysis, i.e. the spacing of the wires, is the physical quantity that corresponds to the length of our guitar string in our musical sound problem. In fact, a corresponds to 2L, because guitar strings are fixed at two ends and, hence, the two ends have to be nodes and, therefore, the wavelength of our first harmonic is twice the length of the string. Huh? Well… Something like that. As you can see from the illustration of the grid, a, in contrast to L, does correspond to one full wavelength of our periodic function. So we write:

Φ(x) = ∑ ancos(n·k·x) = ∑ ancos(2π·n·x/a) (n = 1, 2, 3,…)

Now, that’s the formula for Φ(x) assuming we’re fixing z, so it’s Φ(x) at some fixed distance from the grid. Let’s think about those amplitudes an now. They should not depend on x, because the harmonics themselves (i.e. the cos(2π·n·x/a) components) are all that varies with x. So they have be some function of n and – most importantlysome function of z also. So we denote them by Fn(z) and re-write the equation above as:

 Φ(x, z) = ∑ Fn(z)·cos(2π·n·x/a) (n = 1, 2, 3,…)

Now, the rest of Feynman’s analysis speaks for itself, so I’ll just shamelessly copy it:


What did he find here? What is he saying, really? :-) First note that the derivation above has been done for one term in the Fourier sum only, so we’re talking a specific harmonic here. That harmonic is a function of z which – let me remind you – is the distance from the grid. To be precise, the function is Fn(z) = Ane−z/z0. [In case you wonder how Feynman goes from equation (7.43) to (7.44), he’s just solving a second-order linear differential equation here. :-)]

Now, you’ve seen the graph of that function a zillion times before: it starts at Afor z = 0 and goes to zero as z goes to infinity, as shown below. :-)

graph 4

Now, that’s the case for all Fn(z) coefficients of course. As Feynman writes:

“We have found that if there is a Fourier component of the field of harmonic n, that component will decrease exponentially with a characteristic distance z= a/2πn. For the first harmonic (n=1), the amplitude falls by the factor e−2π (i.e. a large decrease) each time we increase z by one grid spacing a. The other harmonics fall off even more rapidly as we move away from the grid. We see that if we are only a few times the distance a away from the grid, the field is very nearly uniform, i.e., the oscillating terms are small. There would, of course, always remain the “zero harmonic” field, i.e. Φ= −E0·z, to give the uniform field at large z. Of course, for the complete solution, the sum needs to be made, and the coefficients An would need to be adjusted so that the total sum, when differentiated, gives an electric field that would fit the charge density of the grid wires.” 

Phew! Quite something, isn’t it? But that’s it really, and it’s actually simpler than the ‘direct’ calculations of the field that I googled. Those calculations involve complicated series and logs and what have you, to arrive at the same result: the field away from a grid of charged wires is very nearly uniform.

Let me conclude this post by noting Feynman’s explanation of shielding by a screen. It’s quite terse:

“The method we have just developed can be used to explain why electrostatic shielding by means of a screen is often just as good as with a solid metal sheet. Except within a distance from the screen a few times the spacing of the screen wires, the fields inside a closed screen are zero. We see why copper screen—lighter and cheaper than copper sheet—is often used to shield sensitive electrical equipment from external disturbing fields.”

Hmm… So how does that work? The logic should be similar to the logic I explained when discussing shielding in one of my previous posts. Have a look—if only because it’s a lot easier to understand than the rather convoluted business I presented above. :-) But then I guess it’s all par for the course, isn’t it? :-)

The field from a grid

The capacity of a capacitor

This post briefly explores the properties of capacitors. Why? Well… Just because they’re an element in electric circuits, and so we should try to fully understand how they function.

Feynman introduces condensers − now referred to as capacitors – right from the start, as he explains Maxwell’s fourth equation, which is written as c2×B =  ∂E/∂t + j0 in differential form, but easier to read when integrating over a surface S bounded by a curve C:

formula 4The ∂E/∂t term implies that changing electric fields produce magnetic effects (i.e. some circulation of B, i.e. the c2×B on the left-hand side). We need this term because, without it, there could be no currents in circuits that are not complete loops, like the circuit below, which is just a circuit with a capacitor made of two flat plates. The capacitor is charged by a current that flows toward one plate and away from the other. It looks messy because of the complicated drawing: we have a curve C around one of the wires defining two surfaces: S1 is a surface that just fills the loop and, hence, crosses the wire, while S2 is a bowl-shaped surface which passes between the plates of the capacitor (so it does not cross the wire).    condensor

If we look at C and S1 only, then the circulation of B around C is explained by the current through the wire, so that’s the j0 term in Maxwell’s equation, which is probably how you understood magnetism during your high-school time. However, no current goes through the S2 surface, so if we look at C and S2 only, we need the ∂E/∂t to explain the magnetic field. Indeed, as Feynman points out, changing the location of an imaginary surface should not change a real magnetic field! :-)

Let’s look at those charged sheets. For a single sheet of charge, we found two opposite fields of magnitude E = (1/2)·σ/ε0. Now, it is easy to see that we can superimpose the solutions for two parallel sheets with equal and opposite charge densities +σ and −σ, so we get:

between the sheets = σ/ε0 and E outside = 0

 Charged sheetcapacitor

Now, actual capacitors are not made of some infinitely thin sheet of charge: they are made of some conductor and, hence, we get that shielding effect and we’re talking surface charge densities +σ and −σ, so the actual picture is more like the one below. Having said that, the formula above is still correct: E is σ/ε0 between the plates, and zero everywhere else (except at the edge, but I’ll talk about that later).

capacitor 2

We’re now ready to discuss what we want to discuss here, i.e. the concept of the capacity of a capacitor. We know the two plates are both equipotentials but with different potential, obviously! If we denote these two potentials as Φ1 and Φrespectively, we can define their difference Φ1 − Φ2 as the voltage between the two plates. It’s unit is the same as the unit for potential which, as you may or may not remember, is potential energy per unit charge, so that’s newton·meter/coulomb. [In honor of the guy who invented the first battery, 1 N·m/C is usually referred to as one volt, which – quite annoyingly – is also abbreviated as V, even if the voltage and the volt are two very different things: the volt is the unit of voltage.] 

Now, it’s easy to see that the voltage, or potential difference, is the amount of work that’s required to carry one unit charge from one plate to the other. To be precise, because the coulomb is a huge unit − it’s equivalent to the combined charge of some 6.241×1018 protons − we should say that the voltage is the work per unit charge required to carry a small charge from one plate to the other. Hence, if d is the distance between the two plates (as shown in the illustration above), we can write:

voltage formula

Q is the total charge on each plate (so it’s positive on one, and negative on the other), A is the area of each plate, and is the separation between the two plates. What the equation says is that the voltage is proportional to the charge, and the constant of proportionality is d over ε0A. Now, the proportionality between V and Q is there for any two conductors in space (provided we have a plus charge on one, and a minus charge on the other, and so we assume there are no other charges around). Why? It’s just the logic of the superposition of fields: we double the charges, so we double the fields, and so the work done in carrying a unit charge from one point to the other is also doubled! So that’s why the potential difference between any two points is proportional to the charges.

Now, the constant of proportionality is called the capacity or capacitance of the system. In fact, it’s defined as C = Q/V. [Again, it’s a bit of a nuisance the symbol (C) is the same as the symbol that is used for the unit of charge, but don’t worry about it.] To put it simply, the capacitance is the ability of a body to store electric charge. For our parallel-plate condenser, it is equal to C =  ε0A/d. Its unit is coulomb/volt, obviously, but – again in honor of some other guy – it’s referred to as the farad: 1 F = 1 C/V.

To build a fairly high-capacity condenser, one could put waxed paper between sheets of aluminium and roll it up. Sealed in plastic, that made a typical radio-type condenser. The principle used today is still the same. In order to reduce the risk of breakdown (which occurs when the field strength becomes so large that it pulls electrons from the dielectric between the plates, thus causing conduction), higher capacity is generally better, so the voltage developed across the condenser will be smaller. Condensers used to be fairly big, but modern capacitors are actually as small as other computer card components. It’s all interesting stuff, but I won’t elaborate on it here, because I’d rather focus on the physics and the math behind the engineering in this blog. :-)

The capacity of a capacitor

The method of images

In my previous post, I mentioned the so-called method of images, but didn’t elaborate much. Let’s recall the problem. As you know, the whole subject of electrostatics is governed by one equation: the so-called Poisson equation:

2Φ = ∂2Φ/∂x2 + ∂2Φ/∂x2 + ∂2Φ/∂x2 = −ρ/ε0

We get this equation by combining Maxwell’s first law (·Φ = −ρ/ε0) and the E = −Φ formula. Now, if we know the distribution of charges, then we don’t need that Poisson equation: we can calculate the potential at every point – denoted by (1) below – using the following formulas:


And if we have Φ, we have E, because E = –Φ. But, in most actual situations, we don’t know the charge distribution, and then we need to work with that Poisson equation. Of course, you’ll say: if you don’t know the charge distribution, then you don’t know the ρ in the equation, and so what use is it really?

The answer is: most problems will involve conductors, and we do know that their surface is an equipotential surface. We also know that the electric field just outside the surface must be normal to the surface. Let’s take the example of the grounded conducting sheet once again, as depicted below. We know the image charge and the field lines on the left-hand side are not there. In fact, because the sheet is grounded, there is no net charge on it, and the conductor acts as a shield.

image 3

We do have a real field on the right-hand side though, and it’s exactly the same as that of a dipole: we only need to cross out the left-hand half of the picture. What charges are responsible for it? It surely cannot be the lone +q charge alone, and it’s isn’t: we also have induced local charges on the sheet. Indeed, the positive charge will attract negative charges to the surface and, hence, while the sheet as a whole is neutral (so it has no net charge), the surface charge density is not zero. We can calculate it. How? It’s quite complicated, but let’s give it a try.

Look at the detail below. Let’s forget about the induced charges for a while, and analyze the field produced by the positive charge in the absence of induced charges, so that’s the E field at point P. The magnitude of its normal component is En+ = E·cosθ, with θ the angle between the two vectors.


θ is an angle of a rectangular triangle, and it’s easy to see that cosθ is equal to a/(a2 + ρ2)1/2. Now, Coulomb’s Law tells us that E = (1/4πε0)·q/[(a2 + ρ2)1/2]= (1/4πε0)·q/(a2 + ρ2). Hence, we can write:

En+ = (1/4πε0a·q/(a2 + ρ2)3/2 

[A quick note on the symbols used here: we use ρ (rho) to denote a distance here. That’s somewhat confusing because it usually denotes a volume density. However, we’re interested in a surface density here, for which the σ (sigma) symbol is used. So don’t worry about it. Just note that ρ is some distance here, instead of a charge density.]

Now we know that the induced charges will arrange themselves in such way that the addition of their field makes the field at P look like there was a negative charge of the same magnitude as q at the other side of the sheet. If there was such charge −q, then we could do the same analysis, as shown below. It’s easy to see that the component of the imaginary field along the sheet (i.e. the component that’s perpendicular to the normal) cancels the actual component along the shield of the field created by +q, while its normal component adds to the normal component of the +q field. To make a long story short, the actual field at P is equal to E(ρ) = (1/4πε0)·2a·q/(a2 + ρ2)3/2, and it has two components of strength (1/4πε0a·q/(a2 + ρ2)3/2.

snip 2

To put it differently, the actual field can be thought as two parts: (1) the (normal) component of the field caused by + q, and (2) the field caused by the surface charge density σ at P, which we denote as σ(ρ). Let’s see what we can do with this.

The analysis of the field of a sheet of charge on a conductor is quite complicated, and not quite like the analysis of just a sheet of charge. The analysis for just a sheet of charge was based on the theoretical situation depicted below. We imagined some box with two Gaussian surfaces of area A, and we then used Gauss’ Law to deduce that, if σ was the charge per unit area (i.e. the surface density), the total flux out of the box should be equal to EA + EA = σA/ε0 and, hence, E = (1/2)·σ/ε0. The illustration below shows we should think of two fields with opposite direction, and with a magnitude of (1/2)·σ/ε0 each.

Charged sheet

That’s simple enough. However, a sheet of charge on a conductor produces a different field, as shown below. Because of the shielding effect, we have flux on one side of the box only, and the field strength of this flux is σ/ε0, so that’s two times the (1/2)·σ/ε0 magnitude described above. However, as mentioned, it’s zero on the other side, i.e. the inside of the conductor shown below.

Flux out of a conductor

So what happens here? The charges in the neighborhood of a point P on the surface actually do produce a local field (Elocal), both inside and outside of the surface, which respects the Elocal = (1/2)·σ/2ε0 equality, but all the rest of the charges on the conductor “conspire” to produce an additional field at the point P, which also produces two fields, again with opposite direction and with a magnitude of (1/2)·σ/ε0 each. So the net result is that the total field inside goes to zero, and the field outside is equal to E = σ/ε0, so E = 2·Elocal. Note that the example above assumes a positively charged conductor: if the charge on the conductor would be negative, the direction of the field would be inwards, but we’d still have a field on and outside of the surface only.

I know you’ve switched off already but − just in case you didn’t − what equality should we use to find σ in this case, i.e. the grounded sheet with no net charge on it but with some (negative) surface charge density. Well… We’re talking a surface density, and a conductor, and, therefore, I would think it’s the E = σ/ε0, i.e. the formula for a charged sheet on a conductor. So we write:

E = σ(ρ)/ε0 ⇔ σ(ρ) = ε0E

But what E do we take to continue our calculation? The whole field or (1/4πε0a·q/(a2 + ρ2)3/2 only? The analysis above may make you think that we should take (1/4πε0a·q/(a2 + ρ2)3/2 only, so that’s the component that’s related to the imaginary charge only, but… No! We’re talking one actual field here, which is produced by the positive charge as well as by the induced charges. So we should not cut it for the purpose of calculating σ(ρ)! So the grand result is:

σ(ρ) = ε0E = (1/4π)·2a·q/(a2 + ρ2)3/2

The shape of this function should not surprise us: it’s shown below for some different values of q (1 and 2 respectively) and a (1, 2 and 3 respectively).


How do we know our solution is correct? We can check it: if we integrate σ over the whole surface, we should find that the total induced charge is equal to −q. So… Well… I’ll let you do that. Feynman also notes the induced charges should exert a force on our point charge, which we can calculating the force between the surface charges and the charge. It’s again an integral, and it should be equal to


Lo and behold! The force acting on the positive charge is exactly the same as it would be with the negative image charge instead of the plate. Why? Well… Because the fields are the same!

The results we obtained are quite wonderful! Indeed, we said we did not know the charge distribution, and so we used a very different method to find the field: the method of images, which consists of computing the field due to q and some imaginary point charge –q somewhere else. Feynman summarizes the method of images as follows:

“The point charge we “imagine” existing behind the conducting surface is called an image charge. In books you can find long lists of solutions for hyperbolic-shaped conductors and other complicated looking things, and you wonder how anyone ever solved these terrible shapes. They were solved backwards! Someone solved a simple problem with given charges. He then saw that some equipotential surface showed up in a new shape, and he wrote a paper in which he pointed out that the field outside that particular shape can be described in a certain way.”

However, as you can see, the method is actually quite powerful, because we got a substantial bonus here: we calculated the field indeed, but then we could also calculate the charge distribution afterwards, so we got it all! Let’s see if we master the topic by looking at some other applications of the method of images.

Point charges near conducting spheres

For a grounded conducting sphere, we get the result shown below: the point charge q will induce charges on it whose fields are those of an image charge q’ = −aq/b placed at the point below.

charged sphere

You can check the details in Feynman’s Lecture on it, in which you will also find a more general formula for spheres that are not at zero potential. The more general formula involves a third charge q” at the center of the sphere, with charge q” = −q’ = aq/b.

Again, we’ll have a force of attraction between the sphere and the point charge, even if the net charge on the sphere is zero, because it’s grounded. Indeed, the positive charge q attracts negative charges to the side closer to itself and, hence, leaves positive charges on the surface of the far side. As the attraction by the negative charges exceeds the repulsion from the positive charges, we end up with some net attraction. Feynman leaves us with an interesting challenge here:

“Those who were entertained in childhood by the baking powder box which has on its label a picture of a baking powder box which has on its label a picture of a baking powder box which has … may be interested in the following problem. Two equal spheres, one with a total charge of +Q and the other with a total charge of −Q, are placed at some distance from each other. What is the force between them? The problem can be solved with an infinite number of images. One first approximates each sphere by a charge at its center. These charges will have image charges in the other sphere. The image charges will have images, etc., etc., etc. The solution is like the picture on the box of baking powder—and it converges pretty fast.”

Well… I’ll leave it to you to take up that challenge. :-)

Direct and indirect methods

Let me end this post by noting that I started out with that Poisson equation, but that I actually didn’t use it. Having said that, this method of images did result in some solutions for it. It is what Feynman calls an indirect method of solving some problems, and he writes the following on it:

“If the problem to be solved does not belong to the class of problems for which we can construct solutions by the indirect method, we are forced to solve the problem by a more direct method. The mathematical problem of the direct method is the solution of Laplace’s equation ∇2Φ = 0 subject to the condition that Φ is a suitable constant on certain boundaries—the surfaces of the conductors. [Note that Laplace’s equation is Poisson’s equation with a zero on the right-hand side.] Problems which involve the solution of a differential field equation subject to certain boundary conditions are called boundary-value problems. They have been the object of considerable mathematical study. In the case of conductors having complicated shapes, there are no general analytical methods. Even such a simple problem as that of a charged cylindrical metal can closed at both ends—a beer can—presents formidable mathematical difficulties. It can be solved only approximately, using numerical methods. The only general methods of solution are numerical.”

Well… That says it all, I guess. There are other indirect methods, i.e. other than the method of images, but I won’t present these here. I may write something about it in some other post, perhaps. :-)

The method of images

The electric field in various circumstances

This post summarizes two of what may well be Feynman’s most tedious Lectures. Their title is the same: the electric field “in various circumstances.” At first, I wanted to skip them, but then I found some unifying principle: the fields involved are all quite simple. In fact, except in chapter seven, it’s only about (a) the field of a single charge and (b) the field of a so-called dipole, i.e. the field of two opposite charges next to each other. Both are depicted below, and the dipole field can actually be derived by adding the fields of the two single charges.

Radial fielddipole field

So… In a way, these two Lectures are just a bunch of formulas repeating the same thing over and over again. The thing to remember is that a complicated but neutral mess of charges will also create a dipole field and, if that mess would not be neutral as a whole, then the field of our lump of charge will look like that of a point charge, provided we look at it from a large enough distance (i.e. a distance that is large relative to the separation of the elementary charges involved). So the situation we’re looking at, is the one depicted below, which is really quite general.

lump of charge

Before going into the nitty-gritty, it is probably good to review one of the points I made in my previous post: the field inside of a spherical shell of charge (like the one below) is zero everywhere, i.e. for any point P inside the shell.

spherical shell of chargeThis has nothing to do with the phenomenon of shielding, which is a consequence of free electrons re-arranging themselves so as to cancel the field inside. If we’d be able to build the cage below from protons only, so we’d have a fixed distribution of charges, the inside would not be shielded from the external electrical field. [Credit for the animation must go to Wikipedia.]


Because of the symmetry of the situation, however, the field inside a rectangular, fixed and uniform distribution of charges would also be zero. Let me quickly go over the math for the example of the spherical shell. The randomly chosen point P defines small cones extending to the surface of the sphere, with their apex at P and cutting out some surface area Δa. In the illustration above, we have two symmetrical cones defining two surfaces Δa1 and Δa2 respectively. It is easy to see that:

Δa2/Δa1 = r22/r12

Note that r22/r12 is equal to (r2/r1)but that (r2/r1)is not equal to r2/r1. The square matters, and the square of a ratio is different than the ratio itself! In fact, it’s because of the inverse square law that the fields cancel exactly. Indeed, if the surface of the sphere is uniformly charged (which is the key assumption here), then the charge Δq on each of the area elements will be proportional to the area, so Δq2/Δq1 = Δa2/Δa1. Now, Coulomb’s Law also says that the magnitudes of the fields produced at P by these two surface elements are in the ratio of:


Huh? Yes. E2/E1 = (Δa2/Δa1)·(r12/ r22) = (Δa2/Δa1)·(Δa1/Δa2) = 1, according to the above. So… Yes, the fields cancel exactly, and because all parts of the surface can be paired off in the same way, the total field at P is zero, indeed! But what if we’d put a charge with equal sign at the center? Logic dictates the shell would balance it at the center. Hence, Feynman’s statement that a charge in an electrostatic field in free space can only be in equilibrium if there are mechanical constraints − as illustrated below – is false, and – I should add – the whole argument that follows has no relevance whatsoever for the quantum-mechanical model of an atom. But that’s a somewhat separate story which I’ll touch upon at the end of this post. Let me get back to the dipole problem.

hollow tube

Dipole fields

The model of a dipole is illustrated below. We have two opposite charges separated by a distance d. The so-called dipole moment is defined as p = q·d, and we also have an associated vector p, whose magnitude is p (so that’s the product of q and d) and whose direction is that of the dipole axis from −q to +q. We could also define a vector d and write p as p = q·d. Just think about it. I am sure you’ll figure it out. :-)

dipole model

Now, Feynman derives the formula for the dipole potential in various ways—first in an easy way, and then in a not-so-easy way. :-) The not-so-easy way is the most interesting—in this case, that is! He first notes the general formula for the potential of some point charge q at the origin at some point P = (x, y, z). You’ve seen that before: it’s Φ= q/r. [Forget about the constant of proportionality (I mean that 1/4πε0 factor in Coulomb’s Law) for a while. We can stick it back in at the end of the argument.] What it says, is that, while the field follows an inverse square law, the potential has a 1/r dependence only (so when you double the distance, you halve the potential). Now, if we’d move the charge q along the z-axis, up a distance Δz, then the potential at P will change a little, by, say ΔΦ+. How much exactly? Well, Feynman notes that “it is just the amount that the potential would change if we were to leave the charge at the origin and move P downward by the same distance Δz.” His illustration below, and the associated formula below, speak for themselves:dipole moment one charge

formula potential change

Now I’ll refer you to Feynman itself for the detail of the whole argument. The bottom line is that he gets the following formula for the dipole potential:

Φ = −p·φ0

We have a vector dot product here of that dipole vector we defined above (p) and the gradient of φ0, which is the potential of a unit point of charge: φ0 = 1/4πε0r. So what? Well… We can re-write this as:

Φ = −(1/4πε0)p·(1/r)

Isn’t that great? For point charges, we have a field that’s the gradient of a potential that has a 1/r dependence, but so… Well… Here we have the potential of a dipole that’s the gradient of… Well… Just a number that has a 1/r dependence. :-)

It explains why the dipole field E = −Φ varies inversely not as the square but as the cube of the distance from a dipole. I could give you the formula for E but, again, I don’t want to copy all of Feynman here and so I’ll just assume you believe me. Let me just wrap up in this section with the graph of the electric field, and note how the field vector E can be analyzed as the sum of a transverse component (i.e. the component in the x-y plane) and its component along the dipole axis (i.e. the component along the z-axis).

dipole field 2

The dipole field of a lump of charges

The only thing that’s left is to define the p vector for a lump (or a mess as Feynman calls it) of charges. Note that the lump should not be neutral: if it is, then it will look like a point charge from a distance. But if it’s not neutral, then its field will be a dipole field. So the same formula applies but p is defined as p = ∑qidi. I copy the illustration above below so you can see what is what. :-)

lump of charge

So… Is that it? Well… Yes. And… Well… No. All of the above assumes we know the charge distribution from the start. If we do, then my little summary above pretty much covers the whole subject. :-) However, we’ll often be talking some conductor with some total charge Q, without being able to say where the charges are, exactly. All that we know is that they will be spread out on the surface in some way.

Now… Well… That’s not quite exact. We also know they will distribute themselves so that the potential of the surface is constant, and that helps us some practical problems at least. What problems? Well… The problem of finding the field of charged conductors, which is the second topic that Feynman deals with in his two Lectures on the field “in various circumstances.”

However, that story risks becoming as tedious as Feynman’s Lectures on it, and so I’d rather not copy him here. Just look at the following illustrations. The first one gives the field lines and equipotentials for two point charges once again. It highlights two equipotentials in particular: A and B. Now look at the second illustration: we have a curved conductor with a given potential near a point charge and – lo and behold! – the field looks the same: we replace A by the surface of our conductor and all the rest vanishes. In fact, the illustration we could just put an imaginary point charge q at a suitable point and get the same field.

image 1 image 2image 3

Now that’s what’s referred to as the method of images, and it’s illustrated in the third graph, where we have an “image charge” indeed. We see the equipotential halfway between the two charges which, in this case, is grounded conducting sheet. Why grounded? Because the plane had zero potential in our dipole field, as it was halfway between the two charges indeed.


Well… It doesn’t matter all that much. This is, indeed, the really boring stuff one just has to grind through in order to understand the next thing, which is hopefully somewhat more exciting.

Quadrupole fields

Because you’re interested in physics, you probably know a thing or two about those quadrupole magnets used to focus particles beams in accelerators. They’re also referred to as lenses. The illustration below is the field of a quadrupole electric field, but a quadrupole magnetic field looks the same.

quadrupole field

The point is: these lenses focus in one direction and, hence, in an actual accelerator or cyclotron, the Q-magnets will be arranged so as to alternately focus horizontally and vertically. Why can’t we build magnets so as to focus electric or magnetically charged particles simultaneously in two directions?

Well… It would require a tube built of protons, or electrons, in a stable configuration. We can’t do that. Technology just isn’t ready for it: we’re not able to build stable tubes of protons, or of electrons. :-) So the so-called Theorem of Earnshaw is still valid. Earnshaw’s Theorem says just that: simultaneous focusing in two directions at once is impossible. It applies to classical inverse-square law forces, such as the electric and gravitational force, but also the magnetic forces created by permanent magnets.

However, the theorem is subject to constraints, and these constraints can be exploited to create very interesting exceptions, like magnetic levitation. I warmly recommend the link. :-)

The electric field in various circumstances

The electric field in (and from) a conductor

This is just a quick post to answer a question of my 16-year old son, Vincent: why are we safe in a car when lightning strikes? What’s the Faraday effect really?

He wants to become an engineer, and so I told him what I knew: the electric charges reside at the surface of a conductor and, therefore, a fully-enclosed, all-metallic vehicle is safe. One should just not touch the interior metallic areas, surely not during the strike, but also not after the strike. Why? Because there may still be some residual charge left on the vehicle, even if the metal frame should direct all lightning currents to the ground.

Through the rubber of the tyres? Yes. In fact, it’s the rubber and other insulators that explain why some residual charge might be left. Indeed, the common assumption that, somehow, it’s the rubber that protects the occupants of a car (or that, somehow, rubber soles would insulate us in an electric storm and, hence, less likely to get hit) is ridiculous—completely false, really! The following quote from the US National Weather Service is clear enough on that:

“While rubber is an electric insulator, it’s only effective to a certain point. The average lightning bolt carries about 30,000 amps of charge, has 100 million volts of electric potential, and is about 50,000°F. These amounts are several orders of magnitude higher than what humans use on a daily basis and can burn through any insulator—even the ceramic insulators on power lines! Besides, the lightning bolt may just have traveled many miles through the atmosphere, which is a good insulator. Half an inch (or less) of rubber will make no difference.”

So that’s what I told him—sort of. However, I felt my answer (which I tried to get across as I was driving the car, in fact) was superficial and incomplete. So…

Vincent, here’s the full answer! I promise, no integrals or complex numbers. At the same time, it will be not so easy as the physics you learned in school, because I want to teach you something new. :-) Just try it. What I want to explain to you is Gauss’ Law. If you manage to go through it, you’ll know all you need to know about electrostatics, and it will make your first undergrad year a lot easier. [Especially that vector equation, as I always felt my math teacher never told me what a vector really was: it’s something physical. :-)]

Forces and fields

You’ve surely seen Coulomb’s Law:

F = ke·(q1q2)·(1/r212)

The ke factor is Coulomb’s constant: it is just a constant of proportionality, so it’s there to make the units come out alright. Indeed, Coulomb’s formula is simple enough: it says that the force is directly proportional to the amount of charge and inversely proportional to the square of the distance. That’s all. However, the units in which we measure stuff are not necessarily compatible: we measure distance in meter, electric charge in coulomb, and force in newton. So, if we’d define the newton as the force between two charges of one coulomb separated by a distance of one meter, then we wouldn’t need to put that kfactor there. But the newton has another definition: one newton is the force needed to accelerate 1 kg at a rate of 1 m/s per second.

Coulomb’s constant is usually written as k= 1/4πε0 factor in more serious textbooks. Why? Well… You can read my note at the end of this post, but it doesn’t matter right now. It’s much more important to try to understand the vector form of Coulomb’s Law, which is written as:

Coulomb's Law

I used boldface to denote F1 and F2 because they are force vectors. Vectors are physical ‘quantities’ with a magnitude (denoted by F1 and F2, so no boldface here) and a direction. That direction is given by the unit vector e12 in the equation: it’s a unit vector (so its length is one) from q2 to q1. Read again: from q2 to q1, not from q2 to q1. It’s important to get this one thing right, otherwise you’ll make a mess of the signs. Indeed, in the example below, q1 and q2 have the same sign (+) but their sign may differ (so we have a plus and a minus), and the formula above should still work. Check it yourself by doing the drawing for opposite charges.

Coulomb's LawIn fact, my drawing above has a small mistake: Fis the same as Fbut I forgot to put the minus sign: the force on q2 is F= –F1. It’s the action = reaction principle, really.

OK. That’s clear. Now you need to learn about the concept of a field: the field is the force per unit charge. So the field at q1, or the field at point (1), is the force on q1 divided by q1. For example, if q1 is three Coulomb, we divide by three. More in general, we write:


So now you know what the field vector E stands for: it is the force on a unit charge we would place in the field. To be clear, a unit charge is +1 unit. We can measure it in coulomb, or the proton charge, or the charge of a quark, or in whatever unit we want, but we’ve been using coulomb so far so let’s stick to that. Just in case you wonder: one coulomb is the charge of approximately 6.241×1018 protons, so… Yes. That’s quite a lot. :-)

OK. Next thing.

Gauss’ Law

The field is real. We don’t have to put any charge there. The field is there, and it has energy. [There’s a formula for the energy, but I won’t bother you with that here, because we don’t need it.] The magnitude of the electric field, i.e. the field strength E = |E|, is measured in newton (N) per coulomb (C), so in N/C. In physics, we’ll multiply the field strength with a surface area so we get the so-called flux of the field, which is measured in (N/C)·m2. The illustration below (which I took from Feynman’s Lectures) is just as good as any. In fact, we have several surfaces here: we have a closed surface S with several faces, including surface a and b, which are spherical surfaces. The other surfaces of this box are so-called radial faces. The E field coming out of the charge is like a flow, and so the flow going through face a is the same as the flow going through face b: the face is larger, but the field strength is less.


It is easy to show that the net flux is zero: Coulomb’s Law tells us that the magnitude of E decreases as 1/r2 while, from our geometry classes, we know that the surface area increases as r2, so their product is the same. So, if the surface area of a is Δa, and the surface area of b is Δb, then Ea·Δa = Eb·Δb and so the net flux through the box is equal to Eb·Δb − Ea·Δa = 0. So the flux of E into face a is just cancelled by the flux out of face b. Needless to say, there is no flux through the radial surfaces. Why? Because the electric force is a radial force.

OK. Let’s look at a more complicated situation:

Flux box

When calculating the flux through a surface, we need to take the component of E that is normal to the surface, so that’s En = E·n = |E|·|ncosθ = |Ecosθ. I am sure you’ve seen that much in your math classes: n is the so-called normal vector, so its length is one and it’s perpendicular to the surface. In any case, the point is: the net flux through this closed surface will still be zero.

Now it’s time for the Big Move. Look at the volume enclosed by the surface S below: we can think of it as completely made up of infinitesimal truncated cones and, for each of these cones, the flux of E from one end of each conical segment will be equal and opposite to the flux from the other end. So the total net flux from the surface S is still zero!

Flux any volume

So we have a very general result here:

The (net) flux out of a volume that has no charge(s) in it is zero, always!

You’ll say: so what? Well… It’s a most remarkable result, really. First, it’s not what you’d expect intuitively, and, second, we can now use a clever trick to calculate the flux out of a volume that has some charge(s) in it. Let’s be clever about it. Look at the surface S below: it’s got a point charge q in it. Now we imagine another surface S’ around it: we imagine a little sphere centered on the charge.

Flux with charge

From Coulomb’s Law, we know that, if the radius of our little sphere is equal to r, then the field strength E, everywhere on its surface, is equal to:

formula 1

From your geometry class, you also know that the surface of a sphere is equal to 4πr2, so the flux from the surface of our little sphere is just the product of the field and the surface, so we write:

formula 2

Now, the nice thing is that we can generalize this result for many charges, or for charge distributions, because we can simply add the fields for each of them: EE+ EE+ … That gives us Gauss’ Law:

The flux from any closed surface S = Qinside0

Qinside is, obviously, the sum of the charges inside the volume enclosed by the surface.

OK. That’s Gauss’ Law. Let’s go back to our car. :-)

The field in (and from) a conductor

An electrical conductor is a solid that contains many free electrons. Free electrons can move freely around, but cannot leave the surface. When we charge a conductor, the electrons will move around until they have arranged themselves to produce a zero electric field everywhere inside the conductor. It’s the corollary of Gauss’ Law: the (net) flux out of a volume that has no charge(s) in it is zero, always! And so the electrons will arrange themselves in order to make sure that happens.

Think about the dynamics of the situation: as long as there’s some field inside, the charges will keep moving. Fortunately (especially if you’re in a car or a plane hit by lightning!), the re-arrangement happens in a fraction of a second. Hence, if we have some kind of shell, then the field everywhere inside of the shell will be zero, always. In addition, when we charge a conductor, the electrons will push each other away and try to spread as much as possible, so they will reside at the surface of the conductor. In fact, the excess charge of any conductor is, on the average, within one or two atomic layers of the surface only. The situation is illustrated below:

Flux out of a conductor

Let me sum up the main conclusions:

  1. The electric field inside the conductor (E1) is zero. In other words, if a cavity is completely enclosed by a conductor, no distribution of charges outside can ever produce any field inside. But no field is no force, so that’s how the shielding really works!
  2. The electric field just outside the surface of a conductor (E2) is normal to the surface. There can be no tangential component. If there were a tangential component, the electrons would move along the surface until it was gone.

To be fully complete, the formula for the field just outside the surface of the conductor is E = σ/ε0, where σ is the local surface charge density. That local surface charge density can be quite high, of course, especially when lightning is involved—but it works! You’re safe in a car!

There’s one more point. You may think that you’ve seen that E = σ/ε0 formula before: it’s the formula for the field from a charged sheet, which is easy to calculate from Gauss’ Law. Indeed, if we look at some imaginary rectangular box that cuts through the sheet, as shown below (it’s referred to as a Gaussian surface), then the total flux is, once again, the field times the area. Now, if the charge density (so the charge per unit area) is ρ, then the total charge enclosed in the box is σA. So the flux, on each side of the sheet, must be equal to E·A = σA/ε0, from which we get: E = σ/ε0. But so we have a field left and right. For our conductor, we only have the E = σ/ε0 field outside. So how does it work really?

Charged sheet

We only have a field outside the conductor – and, hence, no field inside – because the charges in the immediate neighborhood of a point P on the surface will arrange themselves in such a way so as to produce a field that neutralizes the E = σ/ε0 field we’d expect on the inside. So we have ‘other charges’ here that come into play. The mechanics behind are similar to the mechanics behind the polarization phenomenon. If we have a negative charge density on the surface, we’ll have a positive charge density in the layer below. However, it’s quite complicated and, to analyze it properly, we’d need to analyze the electric properties of matter in more detail, which we won’t do here.

So… When everything is said and done, the phenomenon of ‘shielding’ is extremely complex indeed: it’s all about charges arranging themselves in patterns, and the result is truly remarkable: the fields on the two sides of a closed conducting shell are completely independent—zero on the inside, and E = σ/ε0 on the outside, with σ the local surface charge density. And it also works the other way around: if we’d have some distribution of charges inside of a closed conductor, those charges would not produce any field outside. So shielding works both ways!

Some closing remarks

A car is not a sphere. Some surfaces may have points or sharp ends, like the object sketched below. Again, the charges will try to spread out as much as possible on the surface, and the tip of a sharp point is as far away as it is possible from most of the surface. Therefore, we should expect the surface density to be very high there. Now, a high charge density means a high field just outside. In fact, if the electric field is too great, air will break down, so we get a discharge. As Feynman explains it:

“Air will break down if the electric field is too great. What happens is that a loose charge (electron, or ion) somewhere in the air is accelerated by the field, and if the field is very great, the charge can pick up enough speed before it hits another atom to be able to knock an electron off that atom. As a result, more and more ions are produced. Their motion constitutes a discharge, or spark. If you want to charge an object to a high potential and not have it discharge itself by sparks in the air, you must be sure that the surface is smooth, so that there is no place where the field is abnormally large.”

Sharp tip

It explains why lightning is attracted to pointy objects, so you should stay away from them.

What about planes and lightning? Well… There’s a nice article on that on the Scientific American website. Let me quote a paragraph that sort of sums up what actually happens:

“Although passengers and crew may see a flash and hear a loud noise if lightning strikes their plane, nothing serious should happen because of the careful lightning protection engineered into the aircraft and its sensitive components. Initially, the lightning will attach to an extremity such as the nose or wing tip. The airplane then flies through the lightning flash, which reattaches itself to the fuselage at other locations while the airplane is in the electric “circuit” between the cloud regions of opposite polarity. The current will travel through the conductive exterior skin and structures of the aircraft and exit off some other extremity, such as the tail. Pilots occasionally report temporary flickering of lights or short-lived interference with instruments.”

One more thing perhaps: isn’t incredible that, even when lightning goes through a car or a plane, it’s only the surface that’s being affected? I mean… It’s fairly easy to see the equilibrium situation, which has the charges on the surface only. But what about the dynamics indeed? 30,000 amps, 100 million volts, and 25,000 to 30,000 degrees Celsius… As lightning strikes, that must go everywhere, no? Well… Yes and no. If there are pointy objects, lightning will effectively burn through them. For an example of the damage of lightning on the nose of an airplane, click this link. :-) But then… Well… Let me copy Feynman as he introduces the electric force:

“Consider a force like gravitation which varies predominantly inversely as the square of the distance, but which is about a billion-billion-billion-billion times stronger. And with another difference. There are two kinds of “matter,” which we can call positive and negative. Like kinds repel and unlike kinds attract—unlike gravity where there is only attraction. What would happen? A bunch of positives would repel with an enormous force and spread out in all directions. A bunch of negatives would do the same.”

So that’s what happens. The charges spread out, in a fraction of a second, all away from each other, and so they stay on the surface only, because that’s as far away as they can get from each other. As mentioned above, we’re talking atomic or molecular layers really, so they don’t penetrate, despite the incredible charges and voltages involved. Let me continue the quote—just to illustrate the strength of the forces involved:

“But an evenly mixed bunch of positives and negatives would do something completely different. The opposite pieces would be pulled together by the enormous attractions. The net result would be that the terrific forces would balance themselves out almost perfectly, by forming tight, fine mixtures of the positive and the negative, and between two separate bunches of such mixtures there would be practically no attraction or repulsion at all. […] There is such a force: the electrical force. And all matter is a mixture of positive protons and negative electrons which are attracting and repelling with this great force. So perfect is the balance, however, that when you stand near someone else you don’t feel any force at all. If there were even a little bit of unbalance you would know it. If you were standing at arm’s length from someone and each of you had one percent more electrons than protons, the repelling force would be incredible. How great? Enough to lift the Empire State Building? No! To lift Mount Everest? No! The repulsion would be enough to lift a “weight” equal to that of the entire earth!”

So… Well… That’s it. I’ll close this post with the promised note on Coulomb’s constant and the electric constant, but it’s just an addendum, so you don’t have to read it if you don’t feel like it, Vincent. :-)

Addendum 1: Coulomb’s constant and the electric constant

The ke = 1/4πε0 factor in Coulomb’s Law is just a constant of proportionality. Coulomb’s formula is simple enough – it says that the force is directly proportional to the amount of charge and inversely proportional to the square of the distance – but it would be a miracle if the units came out alright, wouldn’t it? Indeed, we measure distance in meter, charge in coulomb, and force in newton. Now, we could re-define one of those units so as to get rid of the 1/4πε0 factor, but so that’s not what we’re going to do. Why not? First, the constant of proportionality depends on the medium. Indeed, εis the so-called permittivity in a vacuum, so that’s in empty space. The constant of proportionality will be different in a gas, and it will be different for different gases and different temperatures and at different pressure. You can check it online if you want – just click the link here for some examples – but I guess you’ll believe me. So, if we write 1/4πε instead of ke then we can put in a different ε for each medium and our formula is still OK.

Now, because you’re a smart kid, you’ll say that doesn’t quite answer the question: why do we write is as 1/4πε? Why don’t we simply write μ instead of 1/4πε, or just k or a or something? Well… There is an answer to that, but it’s complicated. First, the μ and μ0 symbols are already used for something else: it’s something similar as ε and εbut then for magnetic fields. To be precise, μ0 is referred to as the permeability of the vacuum (and μ is just the permeability of some non-vacuum medium, of course). Now, because electricity and magnetism are part of one and the same phenomenon in Nature (when you’re going for engineer, you’ll get one course on electromagnetism, not two separate ones), ε0 are μ0 related. In fact, they’re related through a marvelous formulas—a formula like E = mc2 in physics or, in math, eiπ+ 1 = 0. Don’t try to understand it. Just look at it:

c2ε0μ0 = (cε0)(cμ0) = 1

Amazing, isn’t it? The c here is the speed of light in a vacuum, obviously. So it’s a physical constant. In other words, unlike ε0 or μ0, it’s got nothing to do with proportionality or units: the speed of light is the speed of light no matter what units we use—meters or light-seconds or whatever. OK. Just swallow this and don’t pay too much attention. It’s just a digression, but let me finish it.

The equivalent of Coulomb’s Law in magnetism is Ampère’s Law, and it involves the circulation of a field, as illustrated below. So that’s why Ampère’s Law involves a 2π factor.


In fact, because we’re talking two wires (or two conductors) with currents going through them (I1 and I2 respectively), the proportionality constant in Ampère’s Law is written as 2kA.

Ampere Law

Now, I won’t go too much into the detail but the thing about the circulation and that factor 2 in Ampère’s Law result in μbeing written as μ0 = 4π×10–7 N/A2. As for the units: N is newton and A is ampere obviously. And so that’s why we have the 4π in the proportionality constant for Coulomb’s Law as well. And, of course, the (cε0)(cμ0) = 1 equation makes it obvious that cε0 and cμ0 are reciprocal numbers, so that’s why we write 1/4πε0 for the proportionality constant in Coulomb’s Law, rather than kor a or whatever other simple thing. […] Well… Sort of. In any case, nothing to worry about. :-)

The electric field in (and from) a conductor

The Uncertainty Principle and the stability of atoms

The Model of the Atom

In one of my posts, I explained the quantum-mechanical model of an atom. Feynman sums it up as follows:

“The electrostatic forces pull the electron as close to the nucleus as possible, but the electron is compelled to stay spread out in space over a distance given by the Uncertainty Principle. If it were confined in too small a space, it would have a great uncertainty in momentum. But that means it would have a high expected energy—which it would use to escape from the electrical attraction. The net result is an electrical equilibrium not too different from the idea of Thompson—only is it the negative charge that is spread out, because the mass of the electron is so much smaller than the mass of the proton.”

This explanation is a bit sloppy, so we should add the following clarification: “The wave function Ψ(r) for an electron in an atom does not describe a smeared-out electron with a smooth charge density. The electron is either here, or there, or somewhere else, but wherever it is, it is a point charge.” (Feynman’s Lectures, Vol. III, p. 21-6)

The two quotes are not incompatible: it is just a matter of defining what we really mean by ‘spread out’. Feynman’s calculation of the Bohr radius of an atom in his introduction to quantum mechanics clears all confusion in this regard:

Bohr radius

It is a nice argument. One may criticize he gets the right thing out because he puts the right things in – such as the values of e and m, for example :-) − but it’s nice nevertheless!

Mass as a Scale Factor for Uncertainty

Having complimented Feynman, the calculation above does raise an obvious question: why is it that we cannot confine the electron in “too small a space” but that we can do so for the nucleus (which is just one proton in the example of the hydrogen atom here). Feynman gives the answer above: because the mass of the electron is so much smaller than the mass of the proton.

Huh? What’s the mass got to do with it? The uncertainty is the same for protons and electrons, isn’t it?

Well… It is, and it isn’t. :-) The Uncertainty Principle – usually written in its more accurate σxσp ≥ ħ/2 expression – applies to both the electron and the proton – of course! – but the momentum is the product of mass and velocity (p = m·v), and so it’s the proton’s mass that makes the difference here. To be specific, the mass of a proton is about 1836 times that of an electron. Now, as long as the velocities involved are non-relativistic — and they are non-relativistic in this case: the (relative) speed of electrons in atoms is given by the fine-structure constant α = v/c ≈ 0.0073, so the Lorentz factor is very close to 1 — we can treat the m in the p = m·v identity as a constant and, hence, we can also write: Δp = Δ(m·v) = m·Δv. So all of the uncertainty of the momentum goes into the uncertainty of the velocity. Hence, the mass acts likes a reverse scale factor for the uncertainty. To appreciate what that means, let me write ΔxΔp = ħ as:

ΔxΔv = ħ/m

It is an interesting point, so let me expand the argument somewhat. We actually use a more general mathematical property of the standard deviation here: the standard deviation of a variable scales directly with the scale of the variable. Hence, we can write: σ(k·x) = k·σ(x), with k > 0. So the uncertainty is, indeed, smaller for larger masses. Larger masses are associated with smaller uncertainties in their position x. To be precise, the uncertainty is inversely proportional to the mass and, hence, the mass number effectively acts like a reverse scale factor for the uncertainty.

Of course, you’ll say that the uncertainty still applies to both factors on the left-hand side of the equation, and so you’ll wonder: why can’t we keep Δx the same and multiply Δv with m, so its product yields ħ again? In other words, why can’t we have a uncertainty in velocity for the proton that is 1836 times larger than the uncertainty in velocity for the electron? The answer to that question should be obvious: the uncertainty should not be greater than the expected value. When everything is said and done, we’re talking a distribution of some variable here (the velocity variable, to be precise) and, hence, that distribution is likely to be the Maxwell-Boltzmann distribution we introduced in previous posts. Its formula and graph are given below:

Formula M-B distribution428px-Maxwell-Boltzmann_distribution_pdf

In statistics (and in probability theory), they call this a chi distribution with three degrees of freedom and a scale parameter which is equal to a = (kT/m)1/2. The formula for the scale parameter shows how the mass of a particle indeed acts as a reverse scale parameter. The graph above shows three graphs for a = 1, 2 and 5 respectively. Note the square root though: quadrupling the mass (keeping kT the same) amounts to going from a = 2 to a = 1, so that’s halving a. Indeed, [kT/(4m)]1/2 = (1/2)(kT/m)1/2. So we can’t just do what we want with Δv (like multiplying it with 1836, as suggested). In fact, the graph and the formulas show that Feynman’s assumption that we can equate p with Δp (i.e. his assumption that “the momenta must be of the order p = ħ/Δx, with Δx the spread in position”), more or less at least, is quite reasonable.

Of course, you are very smart and so you’ll have yet another objection: why can’t we associate a much higher momentum with the proton, as that would allow us to associate higher velocities with the proton? Good question. My answer to that is the following (and it might be original, as I didn’t find this anywhere else). When everything is said and done, we’re talking two particles in some box here: an electron and a proton. Hence, we should assume that the average kinetic energy of our electron and our proton is the same (if not, they would be exchanging kinetic energy until it’s more or less equal), so we write <melectron·v2electron/2> = <mproton·v2proton/2>. We can re-write this as mp/m= 1/1836 = <v2e>/<v2p> and, therefore, <v2e> = 1836·<v2p>. Now, <v2> ≠ <v>2 and, hence, <v> ≠ √<v2>. So the equality does not imply that the expected velocity of the electron is √1836 ≈ 43 times the expected velocity of the proton. Indeed, because of the particularities of the distribution, there is a difference between (a) the most probable speed, which is equal to √2·a ≈ 1.414·a, (b) the root mean square speed, which is equal to √<v2> = √3·a ≈ 1.732·a, and, finally, (c) the mean or expected speed, which is equal to <v> = 2·(2/π)1/2·a ≈ 1.596·a.

However, we are not far off. We could use any of these three values to roughly approximate Δv, as well as the scale parameter a itself: our answers would all be of the same order. However, to keep the calculations simple, let’s use the most probable speed. Let’s equate our electron mass with unity, so the mass of our proton is 1836. Now, such mass implies a scale factor (i.e. a) that’s √1836 ≈ 43 times smaller. So the most probable speed of the proton and, therefore, its spread, would be about √2/√1836 = √(2/1836) ≈ 0.033 that of the electron, so we write: Δvp ≈ 0.033·Δve. Now we can insert this in our ΔxΔv = ħ/m = ħ/1836 identity. We get: ΔxpΔvp = Δxp·√(2/1836)·Δve = ħ/1836. That, in turn, implies that √(2·1836)·Δxp = ħ/Δve, which we can re-write as: Δx= Δxe/√(2·1836) ≈ Δxe/60. In other words, the expected spread in the position of the proton is about 60 times smaller than the expected spread of the electron. More in general, we can say that the spread in position of a particle, keeping all else equal, is inversely proportional to (2m)1/2. Indeed, in this case, we multiplied the mass with about 1800, and we found that the uncertainty in position went down with a factor 1/60 = 1/√3600. Not bad as a result ! Is it precise? Well… It could be like √3·√m or 2·(2/π)1/2··√m depending on our definition of ‘uncertainty’, but it’s all of the same order. So… Yes. Not bad at all… :-)

You’ll raise a third objection now: the radius of a proton is measured using the femtometer scale, so that’s expressed in 10−15 m, which is not 60 but a million times smaller than the nanometer (i.e. 10−9 m) scale used to express the Bohr radius as calculated by Feynman above. You’re right, but the 10−15 m number is the charge radius, not the uncertainty in position. Indeed, the so-called classical electron radius is also measured in femtometer and, hence, the Bohr radius is also like a million times that number. OK. That should settle the matter. I need to move on.

Before I do move on, let me relate the observation (i.e. the fact that the uncertainty in regard to position decreases as the mass of a particle increases) to another phenomenon. As you know, the interference of light beams is easy to observe. Hence, the interference of photons is easy to observe: Young’s experiment involved a slit of 0.85 mm (so almost 1 mm) only. In contrast, the 2012 double-slit experiment with electrons involved slits that were 62 nanometer wide, i.e. 62 billionths of a meter! That’s because the associated frequencies are so much higher and, hence, the wave zone is much smaller. So much, in fact, that Feynman could not imagine technology would ever be sufficiently advanced so as to actually carry out the double slit experiment with electrons. It’s an aspect of the same: the uncertainty in position is much smaller for electrons than it is for photons. Who knows: perhaps one day, we’ll be able to do the experiment with protons. :-) For further detail, I’ll refer you one of my posts on this.

What’s Explained, and What’s Left Unexplained?

There is another obvious question: if the electron is still some point charge, and going around as it does, why doesn’t it radiate energy? Indeed, the Rutherford-Bohr model had to be discarded because this ‘planetary’ model involved circular (or elliptical) motion and, therefore, some acceleration. According to classical theory, the electron should thus emit electromagnetic radiation, as a result of which it would radiate its kinetic energy away and, therefore, spiral in toward the nucleus. The quantum-mechanical model doesn’t explain this either, does it?

I can’t answer this question as yet, as I still need to go through all Feynman’s Lectures on quantum mechanics. You’re right. There’s something odd about the quantum-mechanical idea: it still involves a electron moving in some kind of orbital − although I hasten to add that the wavefunction is a complex-valued function, not some real function − but it does not involve any loss of kinetic energy due to circular motion apparently!

There are other unexplained questions as well. For example, the idea of an electrical point charge still needs to be re-conciliated with the mathematical inconsistencies it implies, as Feynman points out himself in yet another of his Lectures. :-/

Finally, you’ll wonder as to the difference between a proton and a positron: if a positron and an electron annihilate each other in a flash, why do we have a hydrogen atom at all? Well… The proton is not the electron’s anti-particle. For starters, it’s made of quarks, while the positron is made of… Well… A positron is a positron: it’s elementary. But, yes, interesting question, and the ‘mechanics’ behind the mutual destruction are quite interesting and, hence, surely worth looking into—but not here. :-)

Having mentioned a few things that remain unexplained, the model does have the advantage of solving plenty of other questions. It explains, for example, why the electron and the proton are actually right on top of each other, as they should be according to classical electrostatic theory, and why they are not at the same time: the electron is still a sort of ‘cloud’ indeed, with the proton at its center.

The quantum-mechanical ‘cloud’ model of the electron also explains why “the terrific electrical forces balance themselves out, almost perfectly, by forming tight, fine mixtures of the positive and the negative, so there is almost no attraction or repulsion at all between two separate bunches of such mixtures” (Richard Feynman, Introduction to Electromagnetism, p. 1-1) or, to quote from one of his other writings, why we do not fall through the floor as we walk:

“As we walk, our shoes with their masses of atoms push against the floor with its mass of atoms. In order to squash the atoms closer together, the electrons would be confined to a smaller space and, by the uncertainty principle, their momenta would have to be higher on the average, and that means high energy; the resistance to atomic compression is a quantum-mechanical effect and not a classical effect. Classically, we would expect that if we were to draw all the electrons and protons closer together, the energy would be reduced still further, and the best arrangement of positive and negative charges in classical physics is all on top of each other. This was well known in classical physics and was a puzzle because of the existence of the atom. Of course, the early scientists invented some ways out of the trouble—but never mind, we have the right way out, now!”

So that’s it, then. Except… Well…

The Fine-Structure Constant

When talking about the stability of atoms, one cannot escape a short discussion of the so-called fine-structure constant, denoted by α (alpha). I discussed it another post of mine, so I’ll refer you there for a more comprehensive overview. I’ll just remind you of the basics:

(1) α is the square of the electron charge expressed in Planck units: α = eP2.

(2) α is the square root of the ratio of (a) the classical electron radius and (b) the Bohr radius: α = √(re /r). You’ll see this more often written as re = α2r. Also note that this is an equation that does not depend on the units, in contrast to equation 1 (above), and 4 and 5 (below), which require you to switch to Planck units. It’s the square of a ratio and, hence, the units don’t matter. They fall away.

(3) α is the (relative) speed of an electron: α = v/c. [The relative speed is the speed as measured against the speed of light. Note that the ‘natural’ unit of speed in the Planck system of units is equal to c. Indeed, if you divide one Planck length by one Planck time unit, you get (1.616×10−35 m)/(5.391×10−44 s) = m/s. However, this is another equation, just like (2), that does not depend on the units: we can express v and c in whatever unit we want, as long we’re consistent and express both in the same units.]

(4) Finally, α is also equal to the product of (a) the electron mass (which I’ll simply write as me here) and (b) the classical electron radius re (if both are expressed in Planck units): α = me·re. [think that’s, perhaps, the most amazing of all of the expressions for α. If you don’t think that’s amazing, I’d really suggest you stop trying to study physics.]

Note that, from (2) and (4), we also find that:

(5) The electron mass (in Planck units) is equal me = α/r= α/α2r = 1/αr. So that gives us an expression, using α once again, for the electron mass as a function of the Bohr radius r expressed in Planck units.

Finally, we can also substitute (1) in (5) to get:

(6) The electron mass (in Planck units) is equal to me = α/r = eP2/re. Using the Bohr radius, we get me = 1/αr = 1/eP2r.

In addition, in the mentioned post, I also related α to the so-called coupling constant determining the strength of the interaction between electrons and photons. So… What a magical number indeed ! It suggests some unity that our little model of the atom above doesn’t quite capture. As far as I am concerned, it’s one of the many other ‘unexplained questions’, and one of my key objectives, as I struggle through Feynman’s Lectures, is to understand it all. :-) One of the issues is, of course, how to relate this coupling constant to the concept of a gauge, which I briefly discussed in my previous post. In short, I’ve still got a long way to go… :-(

Post Scriptum: The de Broglie relations and the Uncertainty Principle

My little exposé on mass being nothing but a scale factor in the Uncertainty Principle is a good occasion to reflect on the Uncertainty Principle once more. Indeed, what’s the uncertainty about, if it’s not about the mass? It’s about the position in space and velocity, i.e. it’s movement and time. Velocity or speed (i.e. the magnitude of the velocity vector) is, in turn, defined as the distance traveled divided by the time of travel, so the uncertainty is about time as well, as evidenced from the ΔEΔt = h expression of the Uncertainty Principle. But how does it work exactly?

Hmm… Not sure. Let me try to remember the context. We know that the de Broglie relation, λ = h/p, which associates a wavelength (λ) with the momentum (p) of a particle, is somewhat misleading, because we’re actually associating a (possibly infinite) bunch of component waves with a particle. So we’re talking some range of wavelengths (Δλ) and, hence, assuming all these component waves travel at the same speed, we’re also talking a frequency range (Δf). The bottom line is that we’ve got a wave packet and we need to distinguish the velocity of its phase (vp) versus the group velocity (vg), which corresponds to the classical velocity of our particle.

I think I explained that pretty well in one of my previous posts on the Uncertainty Principle, so I’d suggest you have a look there. The mentioned post explains how the Uncertainty Principle relates position (x) and momentum (p) as a Fourier pair, and it also explains that general mathematical property of Fourier pairs: the more ‘concentrated’ one distribution is, the more ‘spread out’ its Fourier transform will be. In other words, it is not possible to arbitrarily ‘concentrate’ both distributions, i.e. both the distribution of x (which I denoted as Ψ(x) as well as its Fourier transform, i.e. the distribution of p (which I denoted by Φ(p)). So, if we’d ‘squeeze’ Ψ(x), then its Fourier transform Φ(p) will ‘stretch out’.

That was clear enough—I hope! But how do we go from ΔxΔp = h to ΔEΔt = h? Why are energy and time another Fourier pair? To answer that question, we need to clearly define what energy and what time we are talking about. The argument revolves around the second de Broglie relation: E = h·f. How do we go from the momentum p to the energy E? And how do we go from the wavelength λ to the frequency f?

The answer to the first question is the energy-mass equivalence: E = mc2, always. This formula is relativistic, as m is the relativistic mass, so it includes the rest mass m0 as well as the equivalent mass of its kinetic energy m0v2/2 + … [Note, indeed, that the kinetic energy – defined as the excess energy over its rest energy – is a rapidly converging series of terms, so only the m0v2/2 term is mentioned.] Likewise, momentum is defined as p = mv, always, with m the relativistic mass, i.e. m = (1−v2/c2)−1/2·m0 = γ·m0, with γ the Lorentz factor. The E = mc2 and p = mv relations combined give us the E/c = m·c = p·c/v or E·v/c = p·c relationship, which we can also write as E/p = c2/v. However, we’ll need to write E as a function of p for the purpose of a derivation. You can verify that E− p2c= m02c4) and, hence, that E = (p2c+ m02c4)1/2.

Now, to go from a wavelength to a frequency, we need the wave velocity, and we’re obviously talking the phase velocity here, so we write: vp = λ·f. That’s where the de Broglie hypothesis comes in: de Broglie just assumed the Planck-Einstein relation E = h·ν, in which ν is the frequency of a massless photon, would also be valid for massive particles, so he wrote: E = h·f. It’s just a hypothesis, of course, but it makes everything come out alright. More in particular, the phase velocity vp = λ·f can now be re-written, using both de Broglie relations (i.e. p/h = λ and E/h = f) as vp = (E/h)·(p/h) = E/p = c2/v. Now, because v is always smaller than c for massive particles (and usually very much smaller), we’re talking a superluminal phase velocity here! However, because it doesn’t carry any signal, it’s not inconsistent with relativity theory.

Now what about the group velocity? To calculate the group velocity, we need the frequencies and wavelengths of the component waves. The dispersion relation assumes the frequency of each component wave can be expressed as a function of its wavelength, so f = f(λ). Now, it takes a bit of wave mechanics (which I won’t elaborate on here) to show that the group velocity is the derivative of f with respect to λ, so we write vg = ∂f/∂λ. Using the two de Broglie relations, we get: vg = ∂f/∂λ = ∂(E/h)/∂(p/h) = ∂E/∂p = ∂[p2c+ m02c4)1/2]/∂p. Now, when you write it all out, you should find that vg = ∂f/∂λ = pc2/E = c2/vp = v, so that’s the classical velocity of our particle once again.

Phew! Complicated! Yes. But so we still don’t have our ΔEΔt = h expression! All of the above tells us how we can associate a range of momenta (Δp) with a range of wavelengths (Δλ) and, in turn, with a frequency range (Δf) which then gives us some energy range (ΔE), so the logic is like:

Δp ⇒ Δλ ⇒ Δf ⇒ ΔE

Somehow, the same sequence must also ‘transform’ our Δx into Δt. I googled a bit, but I couldn’t find any clear explanation. Feynman doesn’t seem to have one in his Lectures either so, frankly, I gave up. What I did do in one of my previous posts, is to give some interpretation. However, I am not quite sure if it’s really the interpretation: there are probably several ones. It must have something to do with the period of a wave, but I’ll let you break your head over it. :-) As far as I am concerned, it’s just one of the other unexplained questions I have as I sort of close my study of ‘classical’ physics. So I’ll just make a mental note of it. [Of course, please don’t hesitate to send me your answer, if you’d have one!] Now it’s time to really dig into quantum mechanics, so I should really stay silent for quite a while now! :-)

The Uncertainty Principle and the stability of atoms

Maxwell, Lorentz, gauges and gauge transformations

I’ve done quite a few posts already on electromagnetism. They were all focused on the math one needs to understand Maxwell’s equations. Maxwell’s equations are a set of (four) differential equations, so they relate some function with its derivatives. To be specific, they relate E and B, i.e. the electric and magnetic field vector respectively, with their derivatives in space and in time. [Let me be explicit here: E and B have three components, but depend on both space as well as time, so we have three dependent and four independent variables for each function: E = (Ex, Ey, Ez) = E(x, y, z, t) and B = (Bx, By, Bz) = B(x, y, z, t).] That’s simple enough to understand, but the dynamics involved are quite complicated, as illustrated below.

Maxwell interactionI now want to do a series on the more interesting stuff, including an exploration of the concept of gauge in field theory, and I also want to show how one can derive the wave equation for electromagnetic radiation from Maxwell’s equations. Before I start, let’s recall the basic concept of a field.

The reality of fields

I said a couple of time already that (electromagnetic) fields are real. They’re more than just a mathematical structure. Let me show you why. Remember the formula for the electrostatic potential caused by some charge q at the origin:

E 4

We know that the (negative) gradient of this function, at any point in space, gives us the electric field vector at that point: E = –Φ. [The minus sign is there because of convention: we take the reference point Φ = 0 at infinity.] Now, the electric field vector gives us the force on a unit charge (i.e. the charge of a proton) at that point. If q is some positive charge, the force will be repulsive, and the unit charge will accelerate away from our q charge at the origin. Hence, energy will be expended, as force over distance implies work is being done: as the charges separate, potential energy is converted into kinetic energy. Where does the energy come from? The energy conservation law tells us that it must come from somewhere.

It does: the energy comes from the field itself. Bringing in more or bigger charges (from infinity, or just from further away) requires more energy. So the new charges change the field and, therefore, its energy. How exactly? That’s given by Gauss’ Law: the total flux out of a closed surface is equal to:

Gauss Law

You’ll say: flux and energy are two different things. Well… Yes and no. The energy in the field depends on E. Indeed, the formula for the energy density in space (i.e. the energy per unit volume) is

D 6

Getting the energy over a larger space is just another integral, with the energy density as the integral kernel:

energy integral

Feynman’s illustration below is not very sophisticated but, as usual, enlightening. :-)

energy in the field

Gauss’ Theorem connects both the math as well as the physics of the situation and, as such, underscores the reality of fields: the energy is not in the electric charges. The energy is in the fields they produce. Everything else is just the principle of superposition of fields –  i.e. E = E+ E– coming into play. I’ll explain Gauss’ Theorem in a moment. Let me first make some additional remarks.

First, the formulas are valid for electrostatics only (so E and B only vary in space, not in time), so they’re just a piece of the larger puzzle. :-) As for now, however, note that, if a field is real (or, to be precise, if its energy is real), then the flux is equally real.

Second, let me say something about the units. Field strength (E or, in this case, its normal component En = E·n) is measured in newton (N) per coulomb (C), so in N/C. The integral above implies that flux is measured in (N/C)·m2. It’s a weird unit because one associates flux with flow and, therefore, one would expect flux is some quantity per unit time and per unit area, so we’d have the m2 unit (and the second) in the denominator, not in the numerator. But so that’s true for heat transfer, for mass transfer, for fluid dynamics (e.g. the amount of water flowing through some cross-section) and many other physical phenomena. But for electric flux, it’s different. You can do a dimensional analysis of the expression above: the sum of the charges is expressed in coulomb (C), and the electric constant (i.e. the vacuum permittivity) is expressed in C2/(N·m2), so, yes, it works: C/[C2/(N·m2)] = (N/C)·m2. To make sense of the units, you should think of the flux as the total flow, and of the field strength as a surface density, so that’s the flux divided by the total area, so (field strength) = (flux)/(area). Conversely, (flux) = (field strength)×(area). Hence, the unit of flux is [flux] = [field strength]×[area] = (N/C)·m2.

OK. Now we’re ready for Gauss’ Theorem. :-) I’ll also say something about its corollary, Stokes’ Theorem. It’s a bit of a mathematical digression but necessary, I think, for a better understanding of all those operators we’re going to use.

Gauss’ Theorem

The concept of flux is related to the divergence of a vector field through Gauss’ Theorem. Gauss’s Theorem has nothing to do with Gauss’ Law, except that both are associated with the same genius. Gauss’ Theorem is:

Gauss Theorem

The ·C in the integral on the right-hand side is the divergence of a vector field. It’s the volume density of the outward flux of a vector field from an infinitesimal volume around a given point.

Huh? What’s a volume density? Good question. Just substitute C for E in the surface and volume integral above (the integral on the left is a surface integral, and the one on the right is a volume integral), and think about the meaning of what’s written. To help you, let me also include the concept of linear density, so we have (1) linear, (2) surface and (3) volume density. Look at that representation of a vector field once again: we said the density of lines represented the magnitude of E. But what density? The representation hereunder is flat, so we can think of a linear density indeed, measured along the blue line: so the flux would be six (that’s the number of lines), and the linear density (i.e. the field strength) is six divided by the length of the blue line.

Linear density

However, we defined field strength as a surface density above, so that’s the flux (i.e. the number of field lines) divided by the surface area (i.e. the area of a cross-section): think of the square of the blue line, and field lines going through that square. That’s simple enough. But what’s volume density? How do we count the number of lines inside of a box? The answer is: mathematicians actually define it for an infinitesimally small cube by adding the fluxes out of the six individual faces of an infinitesimally small cube:

Volume density for small cube 1

So, the truth is: volume density is actually defined as a surface density, but for an infinitesimally small volume element. That, in turn, gives us the meaning of the divergence of a vector field. Indeed, the sum of the derivatives above is just ·C (i.e. the divergence of C), and ΔxΔyΔz is the volume of our infinitesimal cube, so the divergence of some field vector C at some point P is the flux – i.e. the outgoing ‘flow’ of Cper unit volume, in the neighborhood of P, as evidenced by writing

Volume density for small cube

Indeed, just bring ΔV to the other side of the equation to check the ‘per unit volume’ aspect of what I wrote above. The whole idea is to determine whether the small volume is like a sink or like a source, and to what extent. Think of the field near a point charge, as illustrated below. Look at the black lines: they are the field lines (the dashed lines are equipotential lines) and note how the positive charge is a source of flux, obviously, while the negative charge is a sink.


Now, the next step is to acknowledge that the total flux from a volume is the sum of the fluxes out of each part. Indeed, the flux through the part of the surfaces common to two parts will cancel each other out. Feynman illustrates that with a rough drawing (below) and I’ll refer you to his Lecture on it for more detail.


So… Combining all of the gymnastics above – and integrating the divergence over an entire volume, indeed –  we get Gauss’ Theorem:

Gauss Theorem

Stokes’ Theorem

There is a similar theorem involving the circulation of a vector, rather than its flux. It’s referred to as Stokes’ Theorem. Let me jot it down:

Stokes Theorem

We have a contour integral here (left) and a surface integral (right). The reasoning behind is quite similar: a surface bounded by some loop Γ is divided into infinitesimally small squares, and the circulation around Γ is the sum of the circulations around the little loops. We should take care though: the surface integral takes the normal component of ×C, so that’s (×C)n = (×Cn. The illustrations below should help you to understand what’s going on.

Stokes Theorem 1Stokes Theorem 2

The electric versus the magnetic force

There’s more than just the electric force: we also have the magnetic force. The so-called Lorentz force is the combination of both. The formula, for some charge q in an electromagnetic field, is equal to:

Lorentz force

Hence, if the velocity vector v is not equal to zero, we need to look at the magnetic field vector B too! The simplest situation is magnetostatics, so let’s first have a look at that.

Magnetostatics imply that that the flux of E doesn’t change, so Maxwell’s third equation reduces to c2×B = j0. So we just have a steady electric current (j): no accelerating charges. Maxwell’s fourth equation, B = 0, remains what is was: there’s no such thing as a magnetic charge. The Lorentz force also remains what it is, of course: F = q(E+v×B) = qE +qv×B. Also note that the v, j and the lack of a magnetic charge all point to the same: magnetism is just a relativistic effect of electricity.

What about units? Well… While the unit of E, i.e. the electric field strength, is pretty obvious from the F = qE term  – hence, E = F/q, and so the unit of E must be [force]/[charge] = N/C – the unit of the magnetic field strength is more complicated. Indeed, the F = qv×B identity tells us it must be (N·s)/(m·C), because 1 N = 1C·(m/s)·(N·s)/(m·C). Phew! That’s as horrendous as it looks, and that’s why it’s usually expressed using its shorthand, i.e. the tesla: 1 T = 1 (N·s)/(m·C). Magnetic flux is the same concept as electric flux, so it’s (field strength)×(area). However, now we’re talking magnetic field strength, so its unit is T·m= (N·s·m)/(m·C) = (N·s·m)/C, which is referred to as the weber (Wb). Remembering that 1 volt = 1 N·m/C, it’s easy to see that a weber is also equal to 1 Wb = 1 V·s. In any case, it’s a unit that is not so easy to interpret.

Magnetostatics is a bit of a weird situation. It assumes steady fields, so the ∂E/∂t and ∂B/∂t terms in Maxwell’s equations can be dropped. In fact, c2×B = j0 implies that ·(c2×B ·(j0) and, therefore, that ·= 0. Now, ·= –∂ρ/∂t and, therefore, magnetostatics is a situation which assumes ∂ρ/∂t = 0. So we have electric currents but no change in charge densities. To put it simply, we’re not looking at a condenser that is charging or discharging, although that condenser may act like the battery or generator that keeps the charges flowing! But let’s go along with the magnetostatics assumption. What can we say about it? Well… First, we have the equivalent of Gauss’ Law, i.e. Ampère’s Law:

Ampere Law

We have a line integral here around a closed curve, instead of a surface integral over a closed surface (Gauss’ Law), but it’s pretty similar: instead of the sum of the charges inside the volume, we have the current through the loop, and then an extra c2 factor in the denominator, of course. Combined with the B = 0 equation, this equation allows us to solve practical problems. But I am not interested in practical problems. What’s the theory behind?

The magnetic vector potential

TheB = 0 equation is true, always, unlike the ×E = 0 expression, which is true for electrostatics only (no moving charges). It says the divergence of B is zero, always, and, hence, it means we can represent B as the curl of another vector field, always. That vector field is referred to as the magnetic vector potential, and we write:

·B = ·(×A) = 0 and, hence, B×A

In electrostatics, we had the other theorem: if the curl of a vector field is zero (everywhere), then the vector field can be represented as the gradient of some scalar function, so if ×= 0, then there is some Ψ for which CΨ. Substituting C for E, and taking into account our conventions on charge and the direction of flow, we get E = –Φ. Substituting E in Maxwell’s first equation (E = ρ/ε0) then gave us the so-called Poisson equation: ∇2Φ = ρ/ε0, which sums up the whole subject of electrostatics really! It’s all in there!

Except magnetostatics, of course. Using the (magnetic) vector potential A, all of magnetostatics is reduced to another expression:

2A= −j0, with ·A = 0

Note the qualifier: ·A = 0. Why should the divergence of A be equal to zero? You’re right. It doesn’t have to be that way. We know that ·(×C) = 0, for any vector field C, and always (it’s a mathematical identity, in fact, so it’s got nothing to do with physics), but choosing A such that ·A = 0 is just a choice. In fact, as I’ll explain in a moment, it’s referred to as choosing a gauge. The·A = 0 choice is a very convenient choice, however, as it simplifies our equations. Indeed, c2×B = j0 = c2×(×A), and – from our vector calculus classes – we know that ×(×C) = (·C) – ∇2C. Combining that with our choice of A (which is such that ·A = 0, indeed), we get the ∇2A= −j0 expression indeed, which sums up the whole subject of magnetostatics!

The point is: if the time derivatives in Maxwell’s equations, i.e. ∂E/∂t and ∂B/∂t, are zero, then Maxwell’s four equations can be nicely separated into two pairs: the electric and magnetic field are not interconnected. Hence, as long as charges and currents are static, electricity and magnetism appear as distinct phenomena, and the interdependence of E and B does not appear. So we re-write Maxwell’s set of four equations as:

  1. ElectrostaticsE = ρ/ε0 and ×E = 0
  2. Magnetostatics: ×B = j/c2ε0 and B = 0

Note that electrostatics is a neat example of a vector field with zero curl and a given divergence (ρ/ε0), while magnetostatics is a neat example of a vector field with zero divergence and a given curl (j/c2ε0).


But reality is usually not so simple. With time-varying fields, Maxwell’s equations are what they are, and so there is interdependence, as illustrated in the introduction of this post. Note, however, that the magnetic field remains divergence-free in dynamics too! That’s because there is no such thing as a magnetic charge: we only have electric charges. So ·B = 0 and we can define a magnetic vector potential A and re-write B as B×A, indeed.

I am writing a vector potential field because, as I mentioned a couple of times already, we can choose A. Indeed, as long as ·A = 0, it’s fine, so we can add curl-free components to the magnetic potential: it won’t make a difference. This condition is referred to as gauge invariance. I’ll come back to that, and also show why this is what it is.

While we can easily get B from A because of the B×A, getting E from some potential is a different matter altogether. It turns out we can get E using the following expression, which involves both Φ (i.e. the electric or electrostatic potential) as well as A (i.e. the magnetic vector potential):

E = –Φ – ∂A/∂t

Likewise, one can show that Maxwell’s equations can be re-written in terms of Φ and A, rather than in terms of E and B. The expression looks rather formidable, but don’t panic:

Equations 2

Just look at it. We have two ‘variables’ here (Φ and A) and two equations, so the system is fully defined. [Of course, the second equation is three equations really: one for each component x, y and z.] What’s the point? Why would we want to re-write Maxwell’s equations? The first equation makes it clear that the scalar potential (i.e. the electric potential) is a time-varying quantity, so things are not, somehow, simpler. The answer is twofold. First, re-writing Maxwell’s equations in terms of the scalar and vector potential makes sense because we have (fairly) easy expressions for their value in time and in space as a function of the charges and currents. For statics, these expressions are:

Integrals staticsSo it is, effectively, easier to first calculate the scalar and vector potential, and then get E and B from them. For dynamics, the expressions are similar:

Integrals dynamics

Indeed, they are like the integrals for statics, but with “a small and physically appealing modification”, as Feynman notes: when doing the integrals, we must use the so-called retarded time t′ = t − r12/ct’. The illustration below shows how it works: the influences propagate from point (2) to point (1) at the speed c, so we must use the values of ρ and j at the time t′ = t − r12/ct’ indeed!

Retarded timeThe second aspect of the answer to the question of why we’d be interested in Φ and A has to do with the topic I wanted to write about here: the concept of a gauge and a gauge transformation.

Gauges and gauge transformations in electromagnetics

Let’s see what we’re doing really. We calculate some A and then solve for B by writing: B = ×A. Now, I say some A because any A‘ = AΨ, with Ψ any scalar field really. Why? Because the curl of the gradient of Ψ – i.e. curl(gradΨ) = ×(Ψ) – is equal to 0. Hence, ×(AΨ) = ×A×Ψ = ×A.

So we have B, and now we need E. So the next step is to take Faraday’s Law, which is Maxwell’s second equation: ×E = –∂B/∂t. Why this one? It’s a simple one, as it does not involve currents or charges. So we combine this equation and our B = ×A expression and write:

×E = –∂(∇×A)/∂t

Now, these operators are tricky but you can verify this can be re-written as:

×(E + ∂A/∂t) = 0

Looking carefully, we see this expression says that E + ∂A/∂t is some vector whose curl is equal to zero. Hence, this vector must be the gradient of something. When doing electrostatics, When we worked on electrostatics, we only had E, not the ∂A/∂t bit, and we said that E tout court was the gradient of something, so we wrote E = −Φ. We now do the same thing for E + ∂A/∂t, so we write:

E + ∂A/∂t = −Φ

So we use the same symbol Φ but it’s a bit of a different animal, obviously. However, it’s easy to see that, if the ∂A/∂t would disappear (as it does in electrostatics, where nothing changes with time), we’d get our ‘old’ −Φ. Now, E + ∂A/∂t = −Φ can be written as:

E = −Φ – ∂A/∂t

So, what’s the big deal? We wrote B and E as a function of Φ and A. Well, we said we could replace A by any A‘ = AΨ but, obviously, such substitution would not yield the same E. To get the same E, we need some substitution rule for Φ as well. Now, you can verify we will get the same E if we’d substitute Φ for Φ’ = Φ – ∂Ψ/∂t. You should check it by writing it all out:

E = −Φ’–∂A’/∂t = −(Φ–∂Ψ/∂t)–∂(A+Ψ)/∂t

= −Φ+(∂Ψ/∂t)–∂A/∂t–∂(Ψ)/∂t = −Φ – ∂A/∂t = E

Again, the operators are a bit tricky, but the +(∂Ψ/∂t) and –∂(Ψ)/∂t terms do cancel out. Where are we heading to? When everything is said and done, we do need to relate it all to the currents and the charges, because that’s the real stuff out there. So let’s take Maxwell’s E = ρ/ε0 equation, which has the charges in it, and let’s substitute E for E = −Φ – ∂A/∂t. We get:


That equation can be re-written as:

equation 1

So we have one equation here relating Φ and A to the sources. We need another one, and we also need to separate Φ and A somehow. How do we do that?

Maxwell’s fourth equation, i.e. c2×B = j+ ∂E/∂t can, obviously, be written as c2×− E/∂t = j0. Substituting both E and B yields the following monstrosity:

equation 3

We can now apply the general ∇×(×C) = (·C) – ∇2C identity to the first term to get:

equation 4

It’s equally monstrous, obviously, but we can simplify the whole thing by choosing Φ and A in a clever way. For the magnetostatic case, we chose A such that ·A = 0. We could have chosen something else. Indeed, it’s not because B is divergence-free, that A has to be divergence-free too! For example, I’ll leave it to you to show that choosing ·A such that

equation 5also respects the general condition that any A and Φ we choose must respect the A‘ = AΨ and Φ’ = Φ – ∂Ψ/∂t equalities. Now, if we choose ·A such that ·A = −c–2·∂Φ/∂t indeed, then the two middle terms in our monstrosity cancel out, and we’re left with a much simpler equation for A:

equation 6

In addition, doing the substitution in our other equation relating Φ and A to the sources yields an equation for Φ that has the same form:

equation 7

What’s the big deal here? Well… Let’s write it all out. The equation above becomes:

wave equation

That’s a wave equation in three dimensions. In case you wonder, just check one of my posts on wave equations. The one-dimensional equivalent for a wave propagating in the x direction at speed c (like a sound wave, for example) is ∂2Φ/∂xc–2·∂2Φ/∂t2, indeed. The equation for A yields above yields similar wave functions for A‘s components Ax, Ay, and Az.

So, yes, it is a big deal. We’ve written Maxwell’s equations in terms of the scalar (Φ) and vector (A) potential and in a form that makes immediately apparent that we’re talking electromagnetic waves moving out at the speed c. Let me copy them again:

Equations 2

You may, of course, say that you’d rather have a wave equation for E and B, rather than for A and Φ. Well… That can be done. Feynman gives us two derivations that do so. The first derivation is One example is extremely simple and assumes the source our electromagnetic wave moves in one direction only. The second derivation is much more complicated and gives an equation for E that, if you’ve read the first volume of Feynman’s Lectures, you’ll surely remember:

equation for E

The links are there, and so I’ll let you have fun with those Lectures yourself. I am finished here, indeed, in terms of what I wanted to do in this post, and that is to say a few words about gauges in field theory. It’s nothing much, really, and so we’ll surely have to discuss the topic again, but at least you now know what a gauge actually is in classical electromagnetic theory. Let’s quickly go over the concepts:

  1. Choosing the ·A is choosing a gauge, or a gauge potential (because we’re talking scalar and vector potential here). The particular choice is also referred to as gauge fixing.
  2. Changing A by adding ψ is called a gauge transformation, and the scalar function Ψ is referred to as a gauge function. The fact that we can add curl-free components to the magnetic potential without them making any difference is referred to as gauge invariance.
  3. Finally, the ·A = −c–2·∂Φ/∂t gauge is referred to as a Lorentz gauge.

Just to make sure you understand: why is that Lorentz gauge so special? Well… Look at the whole argument once more: isn’t it amazing we get such beautiful (wave) equations if we stick it in? Also look at the functional shape of the gauge itself: it looks like a wave equation itself! […] Well… No… It doesn’t. I am a bit too enthusiastic here. We do have the same 1/c2 and a time derivative, but it’s not a wave equation. :-) In any case, it all confirms, once again, that physics is all about beautiful mathematical structures. But, again, it’s not math only. There’s something real out there. In this case, that ‘something’ is a traveling electromagnetic field. :-)

But why do we call it a gauge? That should be equally obvious. It’s really like choosing a gauge in another context, such as measuring the pressure of a tyre, as shown below. :-)


Gauges and group theory

You’ll usually see gauges mentioned with some reference to group theory. For example, you will see or hear phrases like: “The existence of arbitrary numbers of gauge functions ψ(r, t) corresponds to the U(1) gauge freedom of the electromagnetic theory.” The U(1) notation stands for a unitary group of degree n = 1. It is also known as the circle group. Let me copy the introduction to the unitary group from the Wikipedia article on it:

In mathematics, the unitary group of degree n, denoted U(n), is the group of n × n unitary matrices, with the group operation that of matrix multiplication. The unitary group is a subgroup of the general linear group GL(n, C). In the simple case n = 1, the group U(1) corresponds to the circle group, consisting of all complex numbers with absolute value 1 under multiplication. All the unitary groups contain copies of this group.

The unitary group U(n) is a real Lie group of of dimension n2. The Lie algebra of U(n) consists of n × n skew-Hermitian matrices, with the Lie bracket given by the commutator. The general unitary group (also called the group of unitary similitudes) consists of all matrices A such that A*A is a nonzero multiple of the identity matrix, and is just the product of the unitary group with the group of all positive multiples of the identity matrix.

Phew! Does this make you any wiser? If anything, it makes me realize I’ve still got a long way to go. :-) The Wikipedia article on gauge fixing notes something that’s more interesting (if only because I more or less understand what it says):

Although classical electromagnetism is now often spoken of as a gauge theory, it was not originally conceived in these terms. The motion of a classical point charge is affected only by the electric and magnetic field strengths at that point, and the potentials can be treated as a mere mathematical device for simplifying some proofs and calculations. Not until the advent of quantum field theory could it be said that the potentials themselves are part of the physical configuration of a system. The earliest consequence to be accurately predicted and experimentally verified was the Aharonov–Bohm effect, which has no classical counterpart.

This confirms, once again, that the fields are real. In fact, what this says is that the potentials are real: they have a meaningful physical interpretation. I’ll leave it to you to expore that Aharanov-Bohm effect. In the meanwhile, I’ll study what Feynman writes on potentials and all that as used in quantum physics. It will probably take a while before I’ll get into group theory though. :-/

Indeed, it’s probably best to study physics at a somewhat less abstract level first, before getting into the more sophisticated stuff.

Maxwell, Lorentz, gauges and gauge transformations

The blackbody radiation problem revisited: quantum statistics

The equipartition theorem – which states that the energy levels of the modes of any (linear) system, in classical as well as in quantum physics, are always equally spaced – is deep and fundamental in physics. In my previous post, I presented this theorem in a very general and non-technical way: I did not use any exponentials, complex numbers or integrals. Just simple arithmetic. Let’s go a little bit beyond now, and use it to analyze that blackbody radiation problem which bothered 19th century physicists, and which led Planck to ‘discover’ quantum physics. [Note that, once again, I won’t use any complex numbers or integrals in this post, so my kids should actually be able to read through it.]

Before we start, let’s quickly introduce the model again. What are we talking about? What’s the black box? The idea is that we add heat to atoms (or molecules) in a gas. The heat results in the atoms acquiring kinetic energy, and the kinetic theory of gases tells us that the mean value of the kinetic energy for each independent direction of motion will be equal to kT/2. The blackbody radiation model analyzes the atoms (or molecules) in a gas as atomic oscillators. Oscillators have both kinetic as well as potential energy and, on average, the kinetic and potential energy is the same. Hence, the energy in the oscillation is twice the kinetic energy, so its average energy is 〈E〉 = 2·kT/2 = kT. However, oscillating atoms implies oscillating electric charges. Now, electric charges going up and down radiate light and, hence, as light is emitted, energy flows away.

How exactly? It doesn’t matter. It is worth noting that 19th century physicists had no idea about the inner structure of an atom. In fact, at that time, the term electron had not yet been invented: the first atomic model involving electrons was the so-called plum pudding model, which J.J. Thompson advanced in 1904, and he called electrons “negative corpuscles“. And the Rutherford-Bohr model, which is the first model one can actually use to explain how and why excited atoms radiate light, came in 1913 only, so that’s long after Planck’s solution for the blackbody radiation problem, which he presented to the scientific community in December 1900. It’s really true: it doesn’t matter. We don’t need to know about the specifics. The general idea is all that matters. As Feynman puts it: it’s how “A hot stove cools on a cold night, by radiating the light into the sky, because the atoms are jiggling their charge and they continually radiate, and slowly, because of this radiation, the jiggling motion slows down.” :-)

His subsequent description of the black box is equally simple: “If we enclose the whole thing in a box so that the light does not go away to infinity, then we can eventually get thermal equilibrium. We may either put the gas in a box where we can say that there are other radiators in the box walls sending light back or, to take a nicer example, we may suppose the box has mirror walls. It is easier to think about that case. Thus we assume that all the radiation that goes out from the oscillator keeps running around in the box. Then, of course, it is true that the oscillator starts to radiate, but pretty soon it can maintain its kT of energy in spite of the fact that it is radiating, because it is being illuminated, we may say, by its own light reflected from the walls of the box. That is, after a while there is a great deal of light rushing around in the box, and although the oscillator is radiating some, the light comes back and returns some of the energy that was radiated.”

So… That’s the model. Don’t you just love the simplicity of the narrative here? :-) Feynman then derives Rayleigh’s Law, which gives us the frequency spectrum of blackbody radiation as predicted by classical theory, i.e. the intensity (I) of the light as a function of (a) its (angular) frequency (ω) and (b) the average energy of the oscillators, which is nothing but the temperature of the gas (Boltzmann’s constant k is just what it is: a proportionality constant which makes the units come out alright). The other stuff in the formula, given hereunder, are just more constants (and, yes, the is the speed of light!). The grand result is:

Rayleigh's law

The formula looks formidable but the function is actually very simple: it’s quadratic in ω and linear in 〈E〉 = kT. The rest is just a bunch of constants which ensure all of the units we use to measures stuff come out alright. As you may suspect, the derivation of the formula is not so simple as the narrative of the black box model, and so I won’t copy it here (you can check yourself). Indeed, let’s focus on the results, not on the technicalities. Let’s have a look at the graph.

Rayleigh's law graph

The I(ω) graphs for T = T0 and T = 2T0 are given by the solid black curves. They tell us how much light we should have at different frequencies. They just go up and up and up, so Rayleigh’s Law implies that, when we open our stove – and, yes, I know, some kids don’t know what a stove is :-/ – and take a look, we should burn our eyes from x-rays. We know that’s not the case, in reality, so our theory must be wrong. An even bigger problem is that the curve implies that the total energy in the box, i.e. the total of all this intensity summed up over all frequencies, is infinite: we’ve got an infinite curve here indeed, and so an infinite area under it. Therefore, as Feynman puts it: “Rayleigh’s Law is fundamentally, powerfully, and absolutely wrong.” The actual graphs, indeed, are the dashed curves. I’ll come back to them.

The blackbody radiation problem is history, of course. So it’s no longer a problem. Let’s see how the equipartition theorem solved it. We assume our oscillators can only take on equally spaced energy levels, with the space between them equal to h·f = ħ·ω. The frequency f (or ω = 2π·f) is the fundamental frequency of our oscillator, and you know and ħ = h/2π, course: Planck’s constant. Hence, the various energy levels are given by the following formula: En = n·ħ·ω = n·h·f. The first five are depicted below.

energy levelsNext to the energy levels, we write the probability of an oscillator occupying that energy level, which is given by Boltzmann’s Law. I wrote about Boltzmann’s Law in another post too, so I won’t repeat myself here, except for noting that Boltzmann’s Law says that the probabilities of different conditions of energy are given by e−energy/kT = 1/eenergy/kT. Different ‘conditions of energy’ can be anything: density, molecular speeds, momenta, whatever. Here we have a probability Pn as a function of the energy En = n·ħ·ω, so we write: Pn = A·e−energy/kT = A·en·ħ·ω/kT. [Note that P0 is equal to A, as a consequence.]

Now, we need to determine how many oscillators we have in each of the various energy states, so that’s N0, N1, N2, etcetera. We’ve done that before: N1/N0 = P1/P0 = (A·e−2ħω/kT)/(A·eħω/kT) = eħω/kT. Hence, N1 = N0·eħω/kT. Likewise, it’s not difficult to see that, N2 = N0·e−2ħω/kT or, more in general, that Nn = N0·e−nħω/kT = N0·[eħω/kT]n. To make the calculations somewhat easier, Feynman temporarily substitutes eħω/kT for x. Hence, we write: N1 = N0·x, N2 = N0·x2,…, Nn = N0·xn, and the total number of oscillators is obviously Ntot = N0+N1+…+Nn+… = N0·(1+x+x2+…+xn+…).

What about their energy? The energy of all oscillators in state 0 is, obviously, zero. The energy of all oscillators in state 1 is N1·ħω = ħω·N0·x. Adding it all up for state 2 yields N2·2·ħω = 2·ħω·N0·x2. More generally, the energy of all oscillators in state n is equal to Nn·n·ħω = n·ħω·N0·xn. So now we can write the total energy of the whole system as Etot = E0+E1+…+En+… = 0+ħω·N0·x+2·ħω·N0·x2+…+n·ħω·N0·xn+… = ħω·N0·(x+2x2+…+nxn+…). The average energy of one oscillator, for the whole system, is therefore:

average energy

Now, Feynman leaves the exercise of simplifying that expression to the reader and just says it’s equal to:


I should try to figure out how he does that. It’s something like Horner’s rule but that’s not easy with infinite polynomials. Or perhaps it’s just some clever way of factoring both polynomials. I didn’t break my head over it but just checked if the result is correct. [I don’t think Feynman would dare to joke here, but one could never be sure with him it seems. :-)] Note he substituted eħω/kT for x, not e+ħω/kT, so there is a minus sign there, which we don’t have in the formula above. Hence, the denominator, eħω/kT–1 = (1/x)–1 = (1–x)/x, and 1/(eħω/kT–1) = x/(1–x). Now, if (x+2x2+…+nxn+…)/(1+x+x2+…+xn+…) = x/(1–x), then (x+2x2+…+nxn+…)·(1–x) must be equal to x·(1+x+x2+…+xn+…). Just write it out: (x+2x2+…+nxn+…)·(1–x) = x+2x2+…+nxn+….−x2−2x3−…−nxn+1+… = x+x2+…+xn+… Likewise, we get x·(1+x+x2+…+xn+…) = x+x2+…+xn+… So, yes, done.

Now comes the Big Trick, the rabbit out of the hat, so to speak. :-) We’re going to substitute the classical expression for 〈E〉 (i.e. kT) in Rayleigh’s Law for it’s quantum-mechanical equivalent (i.e. 〈E〉 = ħω/[eħω/kT–1].

What’s the logic behind? Rayleigh’s Law gave the intensity for the various frequencies that are present as a function of (a) the frequency (of course!) and (b) the average energy of the oscillators, which is kT according to classical theory. Now, our assumption that an oscillator cannot take on just any energy value but that the energy levels are equally spaced, combined with Boltzmann’s Law, gives us a very different formula for the average energy: it’s a function of the temperature, but it’s a function of the fundamental frequency too! I copied the graph below from the Wikipedia article on the equipartition theorem. The black line is the classical value for the average energy as a function of the thermal energy. As you can see, it’s one and the same thing, really (look at the scales: they happen to be both logarithmic but that’s just to make them more ‘readable’). Its quantum-mechanical equivalent is the red curve. At higher temperatures, the two agree nearly perfectly, but at low temperatures (with low being defined as the range where kT << ħ·ω, written as h·ν in the graph), the quantum mechanical value decreases much more rapidly. [Note the energy is measured in units equivalent to h·ν: that’s a nice way to sort of ‘normalize’ things so as to compare them.]


So, without further ado, let’s take Rayleigh’s Law again and just substitute kT (i.e. the classical formula for the average energy) for the ‘quantum-mechanical’ formula for 〈E〉, i.e. ħω/[eħω/kT–1]. Adding the dω factor to emphasize we’re talking some continuous distribution here, we get the even grander result (Feynman calls it the first quantum-mechanical formula ever known or discussed):

formula 2So this function is the dashed I(ω) curve (I copied the graph below again): this curve does not ‘blow up’. The math behind the curve is the following: even for large ω, leading that ω3 factor in the numerator to ‘blow up’, we also have Euler’s number being raised to a tremendous power in the denominator. Therefore, the curves come down again, and so we don’t get those incredible amounts of UV light and x-rays.

Rayleigh's law graph

So… That’s how Max Planck solved the problem and how he became the ‘reluctant father of quantum mechanics.’ The formula is not as simple as Rayleigh’s Law (we have a cubic function in the numerator, and an exponential in the denominator), but its advantage is that it’s correct. Indeed, when everything is said and done, indeed, we do want our formulas to describe something real, don’t we? :-)

Let me conclude by looking at that ‘quantum-mechanical’ formula for the average energy once more:

E〉 = ħω/[eħω/kT–1]

It’s not a distribution function (the formula for I(ω) is the distribution function), but the –1 term in the denominator does tell us already we’re talking Bose-Einstein statistics. In my post on quantum statistics, I compared the three distribution functions. Let ‘s quickly look at them again:

  • Maxwell-Boltzmann (for classical particles): f(E) = 1/[A·eE/kT]
  • Fermi-Dirac (for fermions): f(E) = 1/[AeE/kT + 1]
  • Bose-Einstein (for bosons):  f(E) = 1/[AeE/kT − 1]

So here we simply substitute ħω for E, which makes sense, as the Planck-Einstein relation tells us that the energy of the particles involved is, indeed, equal to E = ħω . Below, you’ll find the graph of these three functions, first as a function of E, so that’s f(E), and then as a function of T, so that’s f(T) (or f(kT) if you want).

graph energy graph temperature

The first graph, for which E is the variable, is the more usual one. As for the interpretation, you can see what’s going on: bosonic particles (or bosons, I should say) will crowd the lower energy levels (the associated probabilities are much higher indeed), while for fermions, it’s the opposite: they don’t want to crowd together and, hence, the associated probabilities are much lower. So fermions will spread themselves over the various energy levels. The distribution for ‘classical’ particles is somewhere in the middle.

In that post of mine, I gave an actual example involving nine particles and the various patterns that are possible, so you can have a look there. Here I just want to note that the math behind is easy to understand when dropping the A (that’s just another normalization constant anyway) and re-writing the formulas as follows:

  • Maxwell-Boltzmann (for classical particles): f(E) = e−E/kT
  • Fermi-Dirac (for fermions): f(E) = e−E/kT/[1+e−E/kT]
  • Bose-Einstein (for bosons):  f(E) = e−E/kT/[1−e−E/kT]

Just use Feynman’s substitution xeħω/kT: the Bose-Einstein distribution then becomes 1/[1/x–1] = 1/[(1–x)/x] = x/(1–x). Now it’s easy to see that the denominator of the formula of both the Fermi-Dirac as well as the Bose-Einstein distribution will approach 1 (i.e. the ‘denominator’ of the Maxwell-Boltzmann formula) if e−E/kT approaches zero, so that’s when E becomes larger and larger. Hence, for higher energy levels, the probability densities of the three functions approach each other indeed, as they should.

Now what’s the second graph about? Here we’re looking at one energy level only, but we let the temperature vary from 0 to infinity. The graph says that, at low temperature, the probabilities will also be more or less the same, and the three distributions only differ at higher temperatures. That makes sense too, of course!

Well… That says it all, I guess. I hope you enjoyed this post. As I’ve sort of concluded Volume I of Feynman’s Lectures with this, I’ll be silent for a while… […] Or so I think. :-)

The blackbody radiation problem revisited: quantum statistics

Strings in classical and quantum physics

This post is not about string theory. The goal of this post is much more limited: it’s to give you a better understanding of why the metaphor of the string is so appealing. Let’s recapitulate the basics by see how it’s used in classical as well as in quantum physics.

In my posts on music and math, or music and physics, I described how a simple single string always vibrates in various modes at the same time: every tone is a mixture of an infinite number of elementary waves. These elementary waves, which are referred to as harmonics (or as (normal) modes, indeed) are perfectly sinusoidal, and their amplitude determines their relative contribution to the composite waveform. So we can always write the waveform F(t) as the following sum:

F(t) = a1sin(ωt) + a2sin(2ωt) + a3sin(3ωt) + … + ansin(nωt) + …

[If this is your first reading of my post, and the formula shies you away, please try again. I am writing most of my posts with teenage kids in mind, and especially this one. So I will not use anything else than simple arithmetic in this post: no integrals, no complex numbers, no logarithms. Just a bit of geometry. That’s all. So, yes, you should go through the trouble of trying to understand this formula. The only thing that you may have some trouble with is ω, i.e. angular frequency: it’s the frequency expressed in radians per time unit, rather than oscillations per second, so ω = 2π·f = 2π/T, with the frequency as you know it (i.e. oscillations per second) and T the period of the wave.]

I also noted that the wavelength of these component waves (λ) is determined by the length of the string (L), and by its length only: λ1 = 2L, λ2 = L, λ3 = (2/3)·L. So these wavelengths do not depend on the material of the string, or its tension. At any point in time (so keeping t constant, rather than x, as we did in the equation above), the component waves look like this:


etcetera (1/8, 1/9,…,1/n,… 1/∞)

That the wavelengths of the harmonics of any actual string only depend on its length is an amazing result in light of the complexities behind: a simple wound guitar string, for example, is not simple at all (just click the link here for a quick introduction to guitar string construction). Simple piano wire isn’t simple either: it’s made of high-carbon steel, i.e. a very complex metallic alloy. In fact, you should never think any material is simple: even the simplest molecular structures are very complicated things. Hence, it’s quite amazing all these systems are actually linear systems and that, despite the underlying complexity, those wavelength ratios form a simple harmonic series, i.e. a simple reciprocal function y = 1/x, as illustrated below.


A simple harmonic series? Hmm… I can’t resist noting that the harmonic series is, in fact, a mathematical beast. While its terms approach zero as x (or n) increases, the series itself is divergent. So it’s not like 1+1/2+1/4+1/8+…+1/2n+…, which adds up to 2. Divergent series don’t add up to any specific number. Even Leonhard Euler – the most famous mathematician of all times, perhaps – struggled with this. In fact, as late as in 1826, another famous mathematician, Niels Henrik Abel (in light of the fact he died at age 26 (!), his legacy is truly amazing), exclaimed that a series like this was “an invention of the devil”, and that it should not be used in any mathematical proof. But then God intervened through Abel’s contemporary Augustin-Louis Cauchy :-) who finally cracked the nut by rigorously defining the mathematical concept of both convergent as well as divergent series, and equally rigorously determining their possibilities and limits in mathematical proofs. In fact, while medieval mathematicians had already grasped the essentials of modern calculus and, hence, had already given some kind of solution to Zeno’s paradox of motion, Cauchy’s work is the full and final solution to it. But I am getting distracted, so let me get back to the main story.

More remarkable than the wavelength series itself, is its implication for the respective energy levels of all these modes. The material of the string, its diameter, its tension, etc will determine the speed with which the wave travels up and down the string. [Yes, that’s what it does: you may think the string oscillates up and down, and it does, but the waveform itself travels along the string. In fact, as I explained in my previous post, we’ve got two waves traveling simultaneously: one going one way and the other going the other.] For a specific string, that speed (i.e. the wave velocity) is some constant, which we’ll denote by c. Now, is, obviously, the product of the wavelength (i.e. the distance that the wave travels during one oscillation) and its frequency (i.e. the number of oscillations per time unit), so c = λ·f. Hence, f = c/λ and, therefore, f1 = (1/2)·c/L, f2 = (2/2)·c/L, f3 = (3/2)·c/L, etcetera. More in general, we write fn = (n/2)·c/L. In short, the frequencies are equally spaced. To be precise, they are all (1/2)·c/L apart.

Now, the energy of a wave is directly proportional to its frequency, always, in classical as well as in quantum mechanics. For example, for photons, we have the Planck-Einstein relation: E = h·f = ħ·ω. So that relation states that the energy is proportional to the (light) frequency of the photon, with h (i.e. he Planck constant) as the constant of proportionality. [Note that ħ is not some different constant. It’s just the ‘angular equivalent’ of h, so we have to use ħ = h/2π when frequencies are expressed in angular frequency, i.e. radians per second rather than hertz.] Because of that proportionality, the energy levels of our simple string are also equally spaced and, hence, inserting another proportionality constant, which I’ll denote by a instead of (because it’s some other constant, obviously), we can write:

En = a·fn = (n/2)·a·c/L

Now, if we denote the fundamental frequency f1 = (1/2)·c/L, quite simply, by f (and, likewise, its angular frequency as ω), then we can re-write this as:

En = n·a·f = n·ā·ω (ā = a/2π)

This formula is exactly the same as the formula used in quantum mechanics when describing atoms as atomic oscillators, and why and how they radiate light (think of the blackbody radiation problem, for example), as illustrated below: En = n·ħ·ω = n·h·f. The only difference between the formulas is the proportionality constant: instead of a, we have Planck’s constant here: h, or ħ when the frequency is expressed as an angular frequency.

quantum energy levels

This grand result – that the energy levels associated with the various states or modes of a system are equally spaced – is referred to as the equipartition theorem in physics, and it is what connects classical and quantum physics in a very deep and fundamental way.

In fact, because they’re nothing but proportionality constants, the value of both a and h depends on our units. If w’d use the so-called natural units, i.e. equating ħ to 1, the energy formula becomes En = n·ω, and, hence, our unit of energy and our unit of frequency become one and the same. In fact, we can, of course, also re-define our time unit such that the fundamental frequency ω is one, i.e. one oscillation per (re-defined) time unit, so then we have the following remarkable formula:

En = n

Just think about it for a moment: what I am writing here is E0 = 0, E1 = 1, E2 = 2, E3 = 3, E4 = 4, etcetera. Isn’t that amazing? I am describing the structure of a system here – be it an atom emitting or absorbing photons, or a macro-thing like a guitar string – in terms of its basic components (i.e. its modes), and it’s as simple as counting: 0, 1, 2, 3, 4, etc.

You may think I am not describing anything real here, but I am. We cannot do whatever we wanna do: some stuff is grounded in reality, and in reality only—not in the math. Indeed, the fundamental frequency of our guitar string – which we used as our energy unit – is a property of the string, so that’s real: it’s not just some mathematical shape out: it depends on the string’s length (which determines its wavelength), and it also depends on the propagation speed of the wave, which depends on other basic properties of the string, such as its material, its diameter, and its tension. Likewise, the fundamental frequency of our atomic oscillator is a property of the atomic oscillator or, to use a much grander term, a property of the Universe. That’s why h is a fundamental physical constant. So it’s not like π or e. [When reading physics as a freshman, it’s always useful to clearly distinguish physical constants (like Avogadro’s number, for example) from mathematical constants (like Euler’s number).]

The theme that emerges here is what I’ve been saying a couple of times already: it’s all about structure, and the structure is amazingly simple. It’s really that equipartition theorem only: all you need to know is that the energy levels of the modes of a system – any system really: an atom, a molecular system, a string, or the Universe itself – are equally spaced, and that the space between the various energy levels depends on the fundamental frequency of the system. Moreover, if we use natural units, and also re-define our time unit so the fundamental frequency is equal to 1 (so the frequencies of the other modes are 2, 3, 4 etc), then the energy levels are just 0, 1, 2, 3, 4 etc. So, yes, God kept things extremely simple. :-)

In order to not cause too much confusion, I should add that you should read what I am writing very carefully: I am talking the modes of a system. The system itself can have any energy level, of course, so there is no discreteness at the level of the system. I am not saying that we don’t have a continuum there. We do. What I am saying is that its energy level can always be written as a (potentially infinite) sum of the energies of its components, i.e. its fundamental modes, and those energy levels are discrete. In quantum-mechanical systems, their spacing is h·f, so that’s the product of Planck’s constant and the fundamental frequency. For our guitar, the spacing is a·f (or, using angular frequency, ā·ω: it’s the same amount). But that’s it really. That’s the structure of the Universe. :-)

Let me conclude by saying something more about a. What information does it capture? Well… All of the specificities of the string (like its material or its tension) determine the fundamental frequency f and, hence, the energy levels of the basic modes of our string. So a has nothing to do with the particularities of our string, of our system in general. However, we can, of course, pluck our string very softly or, conversely, give it a big jolt. So our a coefficient is not related to the string as such, but to the total energy of our string. In other words, a is related to those amplitudes  a1, a2, etc in our F(t) = a1sin(ωt) + a2sin(2ωt) + a3sin(3ωt) + … + ansin(nωt) + … wave equation.

How exactly? Well… Based on the fact that the total energy of our wave is equal to the sum of the energies of all of its components, I could give you some formula. However, that formula does use an integral. It’s an easy integral: energy is proportional to the square of the amplitude, and so we’re integrating the square of the wave function over the length of the string. But then I said I would not have any integral in this post, and so I’ll stick to that. In any case, even without the formula, you know enough now. For example, one of the things you should be able to reflect on is the relation between a and h. It’s got to do with structure, of course. :-) But I’ll let you think about that yourself.

[…] Let me help you. Think of the meaning of Planck’s constant h. Let’s suppose we’d have some elementary ‘wavicle’, like that elementary ‘string’ that string theorists are trying to define: the smallest ‘thing’ possible. It would have some energy, i.e. some frequency. Perhaps it’s just one full oscillation. Just enough to define some wavelength and, hence, some frequency indeed. Then that thing would define the smallest time unit that makes sense: it would the time corresponding to one oscillation. In turn, because of the E = h·relation, it would define the smallest energy unit that makes sense. So, yes, h is the quantum (or fundamental unit) of energy. It’s very small indeed (h = 6.626070040(81)×10−34 J·s, so the first significant digit appears only after 33 zeroes behind the decimal point) but that’s because we’re living at the macro-scale and, hence, we’re measuring stuff in huge units: the joule (J) for energy, and the second (s) for time. In natural units, h would be one. [To be precise, physicist prefer to equate ħ, rather than h, to one when talking natural units. That’s because angular frequency is more ‘natural’ as well when discussing oscillations.]

What’s the conclusion? Well… Our will be some integer multiple of h. Some incredibly large multiple, of course, but a multiple nevertheless. :-)

Post scriptum: I didn’t say anything about strings in this post or, let me qualify, about those elementary ‘strings’ that string theorists try to define. Do they exist? Feynman was quite skeptical about it. He was happy with the so-called Standard Model of phyics, and he would have been very happy to know that the existence Higgs field has been confirmed experimentally (that discovery is what prompted my blog!), because that confirms the Standard Model. The Standard Model distinguishes two types of wavicles: fermions and bosons. Fermions are matter particles, such as quarks and electrons. Bosons are force carriers, like photons and gluons. I don’t know anything about string theory, but my guts instinct tells me there must be more than just one mathematical description of reality. It’s the principle of duality: concepts, theorems or mathematical structures can be translated into other concepts, theorems or structures. But… Well… We’re not talking equivalent descriptions here: string theory is different theory, it seems. For a brief but totally incomprehensible overview (for novices at least), click on the following link, provided by the C.N. Yang Institute for Theoretical Physics. If anything, it shows I’ve got a lot more to study as I am inching forward on the difficult Road to Reality. :-)

Strings in classical and quantum physics

Modes in classical and in quantum physics


Waves are peculiar: there is one single waveform, i.e. one motion only, but that motion can always be analyzed as the sum of the motions of all the different wave modes, combined with the appropriate amplitudes and phases. Saying the same thing using different words: we can always analyze the wave function as the sum of a (possibly infinite) number of components, i.e. a so-called Fourier series:

Fourier series

Fourier 2

The f(t) function can be any wave, but the simple examples in physics textbooks usually involve a string or, in two dimensions, some vibrating membrane, and I’ll stick to those examples too in this post. Feynman calls the Fourier components harmonic functions, or harmonics tout court, but the term ‘harmonic’ refers to so many different things in math that it may be better not to use it in this context. The component waves are sinusoidal functions, so sinusoidals might be a better term but it’s not in use, because a more general analysis will use complex exponentials, rather than sines and/or cosines. Complex exponentials (e.g. 10ix) are periodic functions too, so they are totally unlike real exponential functions (e.g. (e.g. 10x). Hence, Feynman also uses the term ‘exponentials’. At some point, he also writes that the pattern of motion (of a mode) varies ‘exponentially’ but, of course, he’s thinking of complex exponentials, and, therefore, we should substitute ‘exponentially’ for ‘sinusoidally’ when talking real-valued wave functions.

[…] I know. I am already getting into the weeds here. As I am a bit off-track anyway now, let me make another remark here. You may think that we have two types of sinusoidals, or two types of functions, in that Fourier decomposition: sines and cosines. You should not think of it that way: the sine and cosine function are essentially the same. I know your old math teacher in high school never told you that, but it’s true. They both come with the same circle (yes, I know that’s ridiculous statement but I don’t know how to phrase it otherwise): the difference between a sine and a cosines is just a phase shift: cos(ωt) = sin(ωt + π/2) and, conversely, sin(ωt) = cos(ωt − π/2). If the starting phases of all of the component waves would be the same, we’d have a Fourier decomposition involving cosines only, or sines only—whatever you prefer. Indeed, because they’re the same function except for that phase shift (π/2), we can always go from one to the other by shifting our origin of space (x) and/or time (t). However, we cannot assume that all of the component waves have the same starting phase and, therefore, we should write each component as cos(n·ωt + Φn), or a sine with a similar argument. Now, you’ll remember – because your math teacher in high school told you that at least :-) – that there’s a formula for the cosine (and sine) of the sum of two angles: we can write cos(n·ωt + Φn) as cos(n·ωt + Φn) = [cos(Φn)·cos(n·ωt) – sin(Φn)·sin(n·ωt)]. Substituting cos(Φn) and – sin(Φn) for an and bn respectively gives us the an·cos(n·ωt) + bn·sin(n·ωt) expressions above. In addition, the component waves may not only differ in phase, but also in amplitude, and, hence, the an and bn coefficients do more than only capturing the phase differences. But let me get back on the track. :-)

Those sinusoidals have a weird existence: they are not there, physically—or so it seems. Indeed, there is one waveform only, i.e. one motion only—and, if it’s any real wave, it’s most likely to be non-sinusoidal. At the same time, I noted, in my previous post, that, if you pluck a string or play a chord on your guitar, some string you did not pluck may still pick up one or more of its harmonics (i.e. one or more of its overtones) and, hence, start to vibrate too! It’s the resonance phenomenon. If you have a grand piano, it’s even more obvious: if you’d press the C4 key on a piano, a small hammer will strike the C4 string and it will vibrate—but the C5 string (one octave higher) will also vibrate, although nothing touched it—except for the air transmitting the sound wave (including the harmonics causing the resonance) from the C4 string, of course! So the component waves are there and, at the same time, they’re not. Whatever they are, they are more than mathematical forms: the so-called superposition principle (on which the Fourier analysis is based) is grounded in reality: it’s because we can add forces. I know that sounds extremely obvious – or ridiculous, you might say :-) – but it is actually not so obvious. […] I am tempted to write something about conservative forces here but… Well… I need to move on.

Let me show that diagram of the first seven harmonics of an ideal string once again. All of them, and the higher ones too, would be in our wave function. Hence, assuming there’s no phase difference between the harmonics, we’d write:

f(t) = sin(ωt) + sin(2ωt) + sin(3ωt) + … + sin(nωt) + …


The frequencies of the various modes of our ideal string are all simple multiples of the fundamental frequency ω, as evidenced from the argument in our sine functions (ω, 2ω, 3ω, etcetera). Conversely, the respective wavelengths are λ, λ/2, λ/3, etcetera. [Remember: the speed of the wave is fixed, and frequency and wavelength are inversely proportional: = λ·f = λ/T = λ·(ω/2π).] So, yes, these frequencies and wavelengths can all be related to each other in terms of equally simple harmonic ratios: 1:2, 2:3, 3:5, 4:5 etcetera. I explained in my previous posts why that does not imply that the musical notes themselves are related in such way: the musical scale is logarithmic. So I won’t repeat myself. All of the above is just an introduction to the more serious stuff, which I’ll talk about now.

Modes in two dimensions

An analysis of waves in two dimensions is often done assuming some drum membrane. The Great Teacher played drums, as you can see from his picture in his Lectures, and there are also videos of him performing on YouTube. So that’s why the drum is used almost all textbooks now. :-)

The illustration of one of the normal modes of a circular membrane comes from the Wikipedia article on modes. There are many other normal modes – some of them with a simpler shape, but some of them more complicated too – but this is a nice one as it also illustrates the concept of a nodal line, which is closely related to the concept of a mode. Huh? Yes. The modes of a one-dimensional string have nodes, i.e. points where the displacement is always zero. Indeed, as you can see from the illustration above (not below), the first overtone has one node, the second two, etcetera. So the equivalent of a node in two dimensions is a nodal line: for the mode shown below, we have one bisecting the disc and then another one—a circle about halfway between the edge and center. The third nodal line is the edge itself, obviously. [The author of the Wikipedia article nodes that the animation isn’t perfect, because the nodal line and the nodal circle halfway the edge and the center both move a little bit. In any case, it’s pretty good, I think. I should also learn how to make animations like that. :-)]

Mode_Shape_of_a_Round_Plate_with_Node_Lines Drum_vibration_mode12

What’s a mode?

How do we find these modes? And how are they defined really? To explain that, I have to briefly return to the one-dimensional example. The key to solving the problem (i.e. finding the modes, and defining their characteristics) is the following fact: when a wave reaches the clamped end of a string, it will be reflected with a change in sign, as illustrated below: we’ve got that F(x+ct) wave coming in, and then it goes back indeed, but with the sign reversed.


It’s a complicated illustration because it also shows some hypothetical wave coming from the other side, where there is no string to vibrate. That hypothetical wave is the same wave, but travelling in the other direction and with the sign reversed (–F). So what’s that all about? Well… I never gave any general solution for a waveform traveling up and down a string: I just said the waveform was traveling up and down the string (now that is obvious: just look at that diagram with the seven first harmonics once again, and think about how that oscillation goes up and down with time), but so I did not really give any general solution for them (the sine and cosine functions are specific solutions). So what is the general solution?

Let’s first assume the string is not held anywhere, so that we have an infinite string along which waves can travel in either direction. In fact, the most general functional form to capture the fact that a waveform can travel in any direction is to write the displacement y as the sum of two functions: one wave traveling one way (which we’ll denote by F), and the other wave (which we’ll denote by G) traveling the other way. From the illustration above, it’s obvious that the F wave is traveling towards the negative x-direction and, hence, its argument will be x + ct. Conversely, the G wave travels in the positive x-direction, so its argument is x – ct. So we write:

y = F(x + ct) + G(x – ct)

[I’ve explained this thing about directions and why the argument in a wavefunction (x ± ct) is what it is before. You should look it up in case you don’t understand. As for the in this equation, that’s the wave velocity once more, which is constant and which depends, as always, on the medium, so that’s the material and the diameter and the tension and whatever of the string.]

So… We know that the string is actually not infinite, but that it’s fixed to some ‘infinitely solid wall’ (as Feynman puts it). Hence, y is equal to zero there: y = 0. Now let’s choose the origin of our x-axis at the fixed end so as to simplify the analysis. Hence, where y is zero, x is also zero. Now, at x = 0, our general solution above for the infinite string becomes  y = F(ct) + G(−ct) = 0, for all values of t. Of course, that means G(−ct) must be equal to –F(ct). Now, that equality is there for all values of t. So it’s there for all values of ct and −ct. In short, that equality is valid for whatever value of the argument of G and –F. As Feynman puts it: “of anything must be –of minus that same thing.” Now, the ‘anything’ in G is its argument: x – ct, so ‘minus that same thing’ is –(x – ct) = −x + ct. Therefore, our equation becomes:

y = F(x + ct) − F(−x + ct)

So that’s what’s depicted in the diagram above: the F(x + ct) wave ‘vanishes’ behind the wall as the − F(−x + ct) wave comes out of it. Conversely, the − F(−x + ct) is hypothetical indeed until it reaches the origin, after which it becomes the real wave. Their sum is only relevant near the origin x = 0, and on the positive side only (on the negative side of the x-axis, the F and G functions are both hypothetical). [I know, it’s not easy to follow, but textbooks are really short on this—which is why I am writing my blog: I want to help you ‘get’ it.]

Now, the results above are valid for any wave, periodic or not. Let’s now confine the analysis to periodic waves only. In fact, we’ll limit the analysis to sinusoidal wavefunctions only. So that should be easy. Yes. Too easy. I agree. :-)

So let’s make things difficult again by introducing the complex exponential notation, so that’s Euler’s formula: eiθ = cosθ + isinθ, with the imaginary unit, and isinθ the imaginary component of our wave. So the only thing that is real, is cosθ.

What the heck? Just bear with me. It’s good to make the analysis somewhat more general, especially because we’ll be talking about the relevance of all of this to quantum physics, and in quantum physics the waves are complex-valued indeed! So let’s get on with it. To use Euler’s formula, we need to substitute x + ct for the phase of the wave, so that involves the angular frequency and the wavenumber. Let me just write it down:

F(x + ct) = eiω(t+x/c) and F(−x + ct) = eiω(t−x/c)

Huh? Yeah. Sorry. I’ll resist the temptation to go off-track here, because I really shouldn’t be copying what I wrote in other posts. Most of what I write above is really those simple relations: c = λ·f = ω/k, with k, i.e. the wavenumber, being defined as k = 2π/λ. For details, go to one of my others posts indeed, in which I explain how that works in very much detail: just click on the link here, and scroll down to the section on the phase of a wave, in which I explain why the phase of wave is equal to θ = ωt–kx = ω(t–x/c). And, yes, I know: the thing with the wave directions and the signs is quite tricky. Just remember: for a wave traveling in the positive x-direction, the signs in front of x and t are each other’s opposite but, if the wave’s traveling in the negative y-direction, they are the same. As mentioned, all the rest is usually a matter of shifting the phase, which amounts to shifting the origin of either the x- or the t-axis. I need to move on. Using the exponential notation for our sinusoidal wave, y = F(x + ct) − F(−x + ct) becomes:

y = eiω(t+x/c) − eiω(t−x/c)

I can hear you sigh again: Now what’s that for? What can we do with this? Just continue to bear with me for a while longer. Let’s factor the eiωt term out. [Why? Patience, please!] So we write:

y = eiωt [eiωx/c) − eiωx/c)]

Now, you can just use Euler’s formula again to double-check that eiθ − e−θ = 2isinθ. [To get that result, you should remember that cos(−θ) = cosθ, but sin(−θ) = −sin(θ).] So we get:

y = eiωt [eiωx/c) − eiωx/c)] = 2ieiωtsin(ωx/c)

Now, we’re only interested in the real component of this amplitude of course – but that’s only we’re in the classical world here, not in the real world, which is quantum-mechanical and, hence, involves the imaginary stuff also :-) – so we should write this out using Euler’s formula again to convert the exponential to sinusoidals again. Hence, remembering that i2 = −1, we get:

y = 2ieiωtsin(ωx/c) = 2icos(ωt)·sin(ωx/c) – 2sin(ωt)·sin(ωx/c)


OK. You need a break. So let me pause here for a while. What the hell are we doing? Is this legit? I mean… We’re talking some real wave, here, don’t we? We do. So is this conversion from/to real amplitudes to/from complex amplitudes legit? It is. And, in this case (i.e. in classical physics), it’s true that we’re interested in the real component of y only. But then it’s nice the analysis is valid for complex amplitudes as well, because we’ll be talking complex amplitudes in quantum physics.

[…] OK. I acknowledge it all looks very tricky so let’s see what we’d get using our old-fashioned sine and/or cosine function. So let’s write F(x + ct) as cos(ωt+ωx/c) and F(−x + ct) as cos(ωt−ωx/c). So we write y = cos(ωt+ωx/c) − cos(ωt−ωx/c). Now work on this using the cos(α+β) = cosα·cosβ − sinα·sinβ formula and the cos(−α) = cosα and sin(−α) = −sinα identities. You (should) get: y = −2sin(ωt)·sin(ωx/c). So that’s the real component in our y function above indeed. So, yes, we do get the same results when doing this funny business using complex exponentials as we’d get when sticking to real stuff only! Fortunately! :-)

[Why did I get off-track again? Well… It’s true these conversions from real to complex amplitudes should not be done carelessly. It is tricky and non-intuitive, to say the least. The weird thing about it is that, if we multiply two imaginary components, we get a real component, because i2 is a real number: it’s −1! So it’s fascinating indeed: we add an imaginary component to our real-valued function, do all kinds of manipulations with – including stuff that involves the use of the i2 = −1 – and, when done, we just take out the real component and it’s alright: we know that the result is OK because of the ‘magic’ of complex numbers! In any case, I need to move on so I can’t dwell on this. I also explained much of the ‘magic’ in other posts already, so I shouldn’t repeat myself. If you’re interested, click on this link, for instance.]

Let’s go back to our y = – 2sin(ωt)·sin(ωx/c) function. So that’s the oscillation. Just look at the equation and think about what it tells us. Suppose we fix x, so we’re looking at one point on the string only and only let t vary: then sin(ωx/c) is some constant and it’s our sin(ωt) factor that goes up and down. So our oscillation has frequency ω, at every point x, so that’s everywhere!

Of course, this result shouldn’t surprise us, should it? That’s what we put in when we wrote F as F(x + ct) = eiω(t+x/c) or as cos(ωt+ωx/c), isn’t it? Well… Yes and no. Yes, because you’re right: we put in that angular frequency. But then, no, because we’re talking a composite wave here: a wave traveling up and down, with the components traveling in opposite directions. Indeed, we’ve also got that G(x) = −F(–x) function here. So, no, it’s not quite the same.

Let’s fix t now, and take a snapshot of the whole wave, so now we look at x as the variable and sin(ωt) is some constant. What we see is a sine wave, and sin(ωt) is its maximum amplitude. Again, you’ll say: of course! Well… Yes. The thing is: the point where the amplitude of our oscillation is equal to zero, is always the same, regardless of t. So we have fixed nodes indeed. Where are they? The nodes are, obviously, the points where sin(ωx/c) = 0, so that’s when ωx/c is equal to 0, obviously, or – more importantly – whenever ωx/c is equal to π, 2π, 3π, 4π, etcetera. More, generally, we can say whenever ωx/c = n·π with n = 0, 1, 2,… etc. Now, that’s the same as writing x = n·π·c/ω = n·π/k = n·π·λ/2π = n·λ/2.

Now let’s remind ourselves of what λ really is: for the fundamental frequency it’s twice the length of the string, so λ = 2·L. For the next mode (i.e. the second harmonic), it’s the length itself: λ = L. For the third, it’s λ = (2/3)·L, etcetera. So, in general, it’s λ = (2/m)·L with m = 1, 2, etcetera. [We may or may not want to include a zero mode by allowing m to equal zero as well, so then there’s no oscillation and y = 0 everywhere. :-) But that’s a minor point.] In short, our grand result is:

x = n·λ/2 = n·(2/m)·L/2 = (n/m)·L

Of course, we have to exclude the x points lying outside of our string by imposing that n/m ≤ 1, i.e. the condition that n ≤ m. So for m = 1, n is 0 or 1, so the nodes are, effectively, both ends of the string. For m = 2, n can be 0, 1 and 2, so the nodes are the ends of the string and it’s middle point L/2. And so on and so on.

I know that, by now, you’ve given up. So no one is reading anymore and so I am basically talking to myself now. What’s the point? Well… I wanted to get here in order to define the concept of a mode: a mode is a pattern of motion, which has the property that, at any point, the object moves perfectly sinusoidally, and that all points move at the same frequency (though some will move more than others). Modes also have nodes, i.e. points that don’t move at all, and above I showed how we can find the nodes of the modes of a one-dimensional string.

Also note how remarkable that result actually is: we didn’t specify anything about that string, so we don’t care about its material or diameter or tension or whatever. Still, we know its fundamental (or normal modes), and we know their nodes: they’re a function of the length of the string, and the number of the mode only: x = (n/m)·L. While an oscillating string may seem to be the most simple thing on earth, it isn’t: think of all the forces between the molecules, for instance, as that string is vibrating. Still, we’ve got this remarkably simple formula. Don’t you find that amazing?

[…] OK… If you’re still reading, I know you want me to move on, so I’ll just do that.

Back to two dimensions

The modes are all that matters: when linear forces (i.e. linear systems) are involved, any motion can be analyzed as the sum of the motions of all the different modes, combined with appropriate amplitudes and phases. Let me reproduce the Fourier series once more (the more you see, the better you’ll understand it—I should hope!): Fourier seriesOf course, we should generalize this also include x as a variable which, again, is easier if we’d use complex exponentials instead of the sinusoidal components. The nice illustration on Fourier analysis from Wikipedia shows how it works, in essence, that is. The red function below consists of six of those modes.


OK. Enough of this. Let’s go to the two-dimensional case now. To simplify the analysis, Feynman invented a rectangular drum. A rectangular drum is probably more difficult to play, but it’s easier to analyze—as compared to a circular drum, that is! :-)


In two dimensions, our sinusoidal one-dimensional ei(ωt−kx) waveform becomes ei(ωt−kxx−kyy). So we have a wavenumber for the x and y directions, and the sign in front is determined by the direction of the wave, so we need to check whether it moves in the positive or negative direction of the x- and y-axis respectively. Now, we can rewrite ei(ωt+kxx+kyy) as eiωt·ei(ωt+kxx+kyy), of course, which is what you see in the diagram above, except that the wave is moving in the negative y direction and, hence, we’ve got + sign in front of our kyy term. All the rest is rather well explained in Feynman, so I’ll refer you to the textbook here.

We basically need to ensure that we have a nodal line at x = 0 and at x = a, and then we do the same for y = 0 and y = a. Then we apply exactly the same logic as for the one-dimensional string: the wave needs to be coherently reflected. The analysis is somewhat more complicated because it involves some angle of incidence now, i.e. the θ in the diagram above, so that’s another page in Feynman’s textbook. And then we have the same gymnastics for finding wavelengths in terms of the dimensions and b, as well as in terms of n and m, where n is the number of the mode involved when fixing the nodal lines at x = 0 and x = a, and m is the number of the mode involved when fixing the nodal lines at = 0 and y = b. Sounds difficult? Well… Yes. But I won’t copy Feynman here. Just go and check for yourself. 

The grand result is that we do get some formula for a wavelength λ of what satisfies the definition of a mode: a perfectly sinusoidal motion, that has all points on the drum move at the same frequency, though some move more than others. Also, as evidenced from my illustration for the circular disk: we’ve got nodal lines, and then I mean other nodal lines, different from the edges! I’ll just give you that formula here (again, for the detail, go and check Feynman yourself):


Feynman also works out an example for a = 2b. I’ll just copy the results hereunder, which is a formula for the (angular) frequencies ω, and a table of the mode shapes in a qualitative way (I’ll leave it to you to google animations that match the illustration).



Again, we should note the amazing simplicity of the result: we don’t care about the type of membrane or whatever other material the drum is made of. It’s proportions are all that matters.

Finally, you should also note the last two columns in the table above: these just show to illustrate that, unlike our modes in the one-dimensional case, the natural frequencies here are not multiples of the fundamental frequency. As Feynman notes, we should not be led astray by the example of the one-dimensional ideal string. It’s again a departure from the Pythagorean idea, that all in Nature respects harmonic ratios. It’s just not true. Let me quote Feynman, as I have no better summary: “The idea that the natural frequencies are harmonically related is not generally true. It is not true for a system with more than one dimension, nor is it true for one-dimensional systems which are more complicated than a string with uniform density and tension.

So… That says it all, I’d guess. Maybe I should just quote his example of a one-dimensional system that does not obey Pythagoras’ prescription: a hanging chain which, because of the weight of the chain, has higher tension at the top than at the bottom. If such chain is set in oscillation, there are various modes and frequencies, but the frequencies will not be simply multiples of each other, nor of any other number. It is also interesting to note that the mode shapes will also not be sinusoidal. However, here we’re getting into non-linear dynamics, and so I’ll you read about that elsewhere too: once again, Feynman’s analysis of non-linear systems is very accessible and an interesting read. Hence, I warmly recommend it.

Modes in three dimensions and in quantum mechanics.

Well… Unlike what you might expect, I won’t bury you under formulas this time. Let me refer you, instead, to Wikipedia’s article on the so-called Leidenfrost effect. Just do it. Don’t bother too much about the text, scroll down a bit, and play the video that comes with it. I saw it, sort of by accident, and, at first, I thought it was something very high-tech. But no: it’s just a drop of water skittering around in a hot pan. It takes on all kinds of weird forms and oscillates in the weirdest of ways, but all is nothing but an excitation of the various normal modes of it, with various amplitudes and phases, of course, as a Fourier analysis of the phenomenon dictates.

There’s plenty of other stuff around to satisfy your curiosity, all quite understandable and fun—because you now understand the basics of it for the one- and two-dimensional case.

So… Well… I’ve kept this section extremely short, because now I want to say a few words about quantum-mechanical systems. Well… In fact, I’ll simply quote Feynman on it, because he writes about in a style that’s unsurpassed. He also nicely sums up the previous conversation. Here we go:

The ideas discussed above are all aspects of what is probably the most general and wonderful principle of mathematical physics. If we have a linear system whose character is independent of the time, then the motion does not have to have any particular simplicity, and in fact may be exceedingly complex, but there are very special motions, usually a series of special motions, in which the whole pattern of motion varies exponentially with the time. For the vibrating systems that we are talking about now, the exponential is imaginary, and instead of saying “exponentially” we might prefer to say “sinusoidally” with time. However, one can be more general and say that the motions will vary exponentially with the time in very special modes, with very special shapes. The most general motion of the system can always be represented as a superposition of motions involving each of the different exponentials.

This is worth stating again for the case of sinusoidal motion: a linear system need not be moving in a purely sinusoidal motion, i.e., at a definite single frequency, but no matter how it does move, this motion can be represented as a superposition of pure sinusoidal motions. The frequency of each of these motions is a characteristic of the system, and the pattern or waveform of each motion is also a characteristic of the system. The general motion in any such system can be characterized by giving the strength and the phase of each of these modes, and adding them all together. Another way of saying this is that any linear vibrating system is equivalent to a set of independent harmonic oscillators, with the natural frequencies corresponding to the modes.

In quantum mechanics the vibrating object, or the thing that varies in space, is the amplitude of a probability function that gives the probability of finding an electron, or system of electrons, in a given configuration. This amplitude function can vary in space and time, and satisfies, in fact, a linear equation. But in quantum mechanics there is a transformation, in that what we call frequency of the probability amplitude is equal, in the classical idea, to energy. Therefore we can translate the principle stated above to this case by taking the word frequency and replacing it with energy. It becomes something like this: a quantum-mechanical system, for example an atom, need not have a definite energy, just as a simple mechanical system does not have to have a definite frequency; but no matter how the system behaves, its behavior can always be represented as a superposition of states of definite energy. The energy of each state is a characteristic of the atom, and so is the pattern of amplitude which determines the probability of finding particles in different places. The general motion can be described by giving the amplitude of each of these different energy states. This is the origin of energy levels in quantum mechanics. Since quantum mechanics is represented by waves, in the circumstance in which the electron does not have enough energy to ultimately escape from the proton, they are confined waves. Like the confined waves of a string, there are definite frequencies for the solution of the wave equation for quantum mechanics. The quantum-mechanical interpretation is that these are definite energies. Therefore a quantum-mechanical system, because it is represented by waves, can have definite states of fixed energy; examples are the energy levels of various atoms.

Isn’t that great? What a summary! It also shows a deeper understanding of classical physics makes it sooooo much better to read something about quantum mechanics. In any case, as for the examples, I should add – because that’s what you’ll often find when you google for quantum-mechanical modes – the vibrational modes of molecules. There’s tons of interesting analysis out there, and so I’ll let you now have fun with it yourself! :-)

Modes in classical and in quantum physics

Music and Math

I ended my previous post, on Music and Physics, by emphatically making the point that music is all about structure, about mathematical relations. Let me summarize the basics:

1. The octave is the musical unit, defined as the interval between two pitches with the higher frequency being twice the frequency of the lower pitch. Let’s denote the lower and higher pitch by a and b respectively, so we say that b‘s frequency is twice that of a.

2. We then divide the [a, b] interval (whose length is unity) in twelve equal sub-intervals, which define eleven notes in-between a and b. The pitch of the notes in-between is defined by the exponential function connecting a and b. What exponential function? The exponential function with base 2, so that’s the function y = 2x.

Why base 2? Because of the doubling of the frequencies when going from a to b, and when going from b to b + 1, and from b + 1 to b + 2, etcetera. In music, we give a, b, b + 1, b + 2, etcetera the same name, or symbol: A, for example. Or Do. Or C. Or Re. Whatever. If we have the unit and the number of sub-intervals, all the rest follows. We just add a number to distinguish the various As, or Cs, or Gs, so we write A1, A2, etcetera. Or C1, C2, etcetera. The graph below illustrates the principle for the interval between C4 and C5. Don’t think the function is linear. It’s exponential: note the logarithmic frequency scale. To make the point, I also inserted another illustration (credit for that graph goes to another blogger).



You’ll wonder: why twelve sub-intervals? Well… That’s random. Non-Western cultures use a different number. Eight instead of twelve, for example—which is more logical, at first sight at least: eight intervals amounts to dividing the interval in two equal halves, and the halves in halves again, and then once more: so the length of the sub-interval is then 1/2·1/2·1/2 = (1/2)3 = 1/8. But why wouldn’t we divide by three, so we have 9 = 3·3 sub-intervals? Or by 27 = 3·3·3? Or by 16? Or by 5?

The answer is: we don’t know. The limited sensitivity of our ear demands that the intervals be cut up somehow. [You can do tests of the sensitivity of your ear to relative frequency differences online: it’s fun. Just try them! Some of the sites may recommend a hearing aid, but don’t take that crap.] So… The bottom line is that, somehow, mankind settled on twelve sub-intervals within our musical unit—or our sound unit, I should say. So it is what it is, and the ratio of the frequencies between two successive (semi)tones (e.g. C and C#, or E and F, as E and F are also separated by one half-step only) is 21/12 = 1.059463… Hence, the pitch of each note is about 6% higher than the pitch of the previous note. OK. Next thing.

3. What’s the similarity between C1, C2, C3 etcetera? Or between A1, A2, A3 etcetera? The answer is: harmonics. The frequency of the first overtone of a string tuned at pitch A3 (i.e. 220 Hz) is equal to the fundamental frequency of a string tuned at pitch A4 (i.e. 440 Hz). Likewise, the frequency of the (pitch of the) C4 note above (which is the so-called middle C) is 261.626 Hz, while the frequency of the (pitch of the) next C note (C5) is twice that frequency: 523.251 Hz. [I should quickly clarify the terminology here: a tone consists of several harmonics, with frequencies f, 2·f, 3·f,… n·f,… The first harmonic is referred to as the fundamental, with frequency f. The second, third, etc harmonics are referred to as overtones, with frequency 2·f, 3·f, etc.]

To make a long story short: our ear is able to identify the individual harmonics in a tone, and if the frequency of the first harmonic of one tone (i.e. the fundamental) is the same frequency as the second harmonic of another, then we feel they are separated by one musical unit.

Isn’t that most remarkable? Why would it be that way?

My intuition tells me I should look at the energy of the components. The energy theorem tells us that the total energy in a wave is just the sum of the energies in all of the Fourier components. Surely, the fundamental must carry most of the energy, and then the first overtone, and then the second. Really? Is that so?

Well… I checked online to see if there’s anything on that, but my quick check reveals there’s nothing much out there in terms of research: if you’d google ‘energy levels of overtones’, you’ll get hundreds of links to research on the vibrational modes of molecules, but nothing that’s related to music theory. So… Well… Perhaps this is my first truly original post! :-) Let’s go for it. :-)

The energy in a wave is proportional to the square of its amplitude, and we must integrate over one period (T) of the oscillation. The illustration below should help you to understand what’s going on. The fundamental mode of the wave is an oscillation with a wavelength (λ1) that is twice the length of the string (L). For the second mode, the wavelength (λ2) is just L. For the third mode, we find that λ3 = (2/3)·L. More in general, the wavelength of the nth mode is λn = (2/n)·L.


The illustration above shows that we’re talking sine waves here, differing in their frequency (or wavelength) only. [The speed of the wave (c), as it travels back and forth along the string, i constant, so frequency and wavelength are in that simple relationship: c = f·λ.] Simplifying and normalizing (i.e. choosing the ‘right’ units by multiplying scales with some proportionality constant), the energy of the first mode would be (proportional to):

Integral 1

What about the second and third modes? For the second mode, we have two oscillations per cycle, but we still need to integrate over the period of the first mode T = T1, which is twice the period of the second mode: T1 = 2·T2. Hence, T2 = (1/2)·T1. Therefore, the argument of the sine wave (i.e. the x variable in the integral above) should go from 0 to 4π. However, we want to compare the energies of the various modes, so let’s substitute cleverly. We write:

Integral 2

The period of the third mode is equal to T3 = (1/3)·T1. Conversely, T1 = 3·T3. Hence, the argument of the sine wave should go from 0 to 6π. Again, we’ll substitute cleverly so as to make the energies comparable. We write:

Integral 3

Now that is interesting! For a so-called ideal string, whose motion is the sum of a sinusoidal oscillation at the fundamental frequency f, another at the second harmonic frequency 2·f, another at the third harmonic 3·f, etcetera, we find that the energies of the various modes are proportional to the values in the harmonic series 1, 1/2, 1/3, 1/4,… 1/n, etcetera. Again, Pythagoras’ conclusion was wrong (the ratio of frequencies of individual notes do not respect simple ratios), but his intuition was right: the harmonic series ∑n−1 (n = 1, 2,…,∞) is very relevant in describing natural phenomena. It gives us the respective energies of the various natural modes of a vibrating string! In the graph below, the values are represented as areas. It is all quite deep and mysterious really!


So now we know why we feel C4 and C5 have so much in common that we call them by the same name: C, or Do. It also helps us to understand why the E and A tones have so much in common: the third harmonic of the 110 Hz A2 string corresponds to the fundamental frequency of the E4 string: both are 330 Hz! Hence, E and A have ‘energy in common’, so to speak, but less ‘energy in common’ than two successive E notes, or two successive A notes, or two successive C notes (like C4 and C5).

[…] Well… Sort of… In fact, the analysis above is quite appealing but – I hate to say it – it’s wrong, as I explain in my post scriptum to this post. It’s like Pythagoras’ number theory of the Universe: the intuition behind is OK, but the conclusions aren’t quite right. :-)

Ideality versus reality

We’ve been talking ideal strings. Actual tones coming out of actual strings have a quality, which is determined by the relative amounts of the various harmonics that are present in the tone, which is not some simple sum of sinusoidal functions. Actual tones have a waveform that may resemble something like the wavefunction I presented in my previous post, when discussing Fourier analysis. Let me insert that illustration once again (and let me also acknowledge its source once more: it’s Wikipedia). The red waveform is the sum of six sine functions, with harmonically related frequencies, but with different amplitudes. Hence, the energy levels of the various modes will not be proportional to the values in that harmonic series ∑n−1, with n = 1, 2,…,∞.


Das wohltemperierte Klavier

Nothing in what I wrote above is related to questions of taste like: why do I seldomly select a classical music channel on my online radio station? Or why am I not into hip hop, even if my taste for music is quite similar to that of the common crowd (as evidenced from the fact that I like ‘Listeners’ Top’ hit lists)?

Not sure. It’s an unresolved topic, I guess—involving rhythm and other ‘structures’ I did not mention. Indeed, all of the above just tells us a nice story about the structure of the language of music: it’s a story about the tones, and how they are related to each other. That relation is, in essence, an exponential function with base 2. That’s all. Nothing more, nothing less. It’s remarkably simple and, at the same time, endlessly deep. :-) But so it is not a story about the structure of a musical piece itself, of a pop song of Ellie Goulding, for instance, or one of Bach’s preludes or fugues.

That brings me back to the original question I raised in my previous post. It’s a question which was triggered, long time ago, when I tried to read Douglas Hofstadter‘s Gödel, Escher and Bach, frustrated because my brother seemed to understand it, and I didn’t. So I put it down, and never ever looked at it again. So what is it really about that famous piece of Bach?

Frankly, I still amn’t sure. As I mentioned in my previous post, musicians were struggling to find a tuning system that would allow them to easily transpose musical compositions. Transposing music amounts to changing the so-called key of a musical piece, so that’s moving the whole piece up or down in pitch by some constant interval that is not equal to an octave. It’s a piece of cake now. In fact, increasing or decreasing the playback speed of a recording also amounts to transposing a piece: a increase or decrease of the playback speed by 6% will shift the pitch up or down by about one semitone. Why? Well… Go back to what I wrote above about that 12th root of 2. We’ve got the right tuning system now, and so everything is easy. Logarithms are great! :-)

Back to Bach. Despite their admiration for the Greek ideas around aesthetics – and, most notably, their fascination with harmonic ratios! – (almost) all Renaissance musicians were struggling with the so-called Pythagorean tuning system, which was used until the 18th century and which was based on a correct observation (similar strings, under the same tension but differing in length, sound ‘pleasant’ when sounded together if – and only if  – the ratio of the length of the strings is like 1:2, 2:3, 3:4, 3:5, 4:5, etcetera) but a wrong conclusion (the frequencies of musical tones should also obey the same harmonic ratios), and Bach’s so-called ‘good’ temperament tuning system was designed such that the piece could, indeed, be played in most keys without sounding… well… out of tune. :-)

Having said that, the modern ‘equal temperament’ tuning system, which prescribes that tuning should be done such that the notes are in the above-described simple logarithmic relation to each other, had already been invented. So the true question is: why didn’t Bach embrace it? Why did he stick to ratios? Why did it take so long for the right system to be accepted?

I don’t know. If you google, you’ll find a zillion of possible explanations. As far as I can see, most are all rather mystic. More importantly, most of them do not mention many facts. My explanation is rather simple: while Bach was, obviously, a musical genius, he may not have understood what an exponential, or a logarithm, is all about. Indeed, a quick read of summary biographies reveals that Bach studied a wide range of topics, like Latin and Greek, and theology—of course! But math is not mentioned. He didn’t write about tuning and all that: all of his time went to writing musical masterpieces!

What the biographies do mention is that he always found other people’s tunings unsatisfactory, and that he tuned his harpsichords and clavichords himself. Now that is quite revealing, I’d say! In my view, Bach couldn’t care less about the ratios. He knew something was wrong with the Pythagorean system (or the variants as were then used, which are referred to as meantone temperament) and, as a musical genius, he probably ended up tuning by ear. [For those who’d wonder what I am talking about, let me quickly insert a Wikipedia graph illustrating the difference between the Pythagorean system (and two of these meantone variants) and the equal temperament tuning system in use today.]


So… What’s the point I am trying to make? Well… Frankly, I’d bet Bach’s own tuning was actually equal temperament, and so he should have named his masterpiece Das gleichtemperierte Klavier. Then we wouldn’t have all that ‘noise’ around it. :-)

Post scriptum: Did you like the argument on the respective energy levels of the harmonics of an ideal string? Too bad. It’s wrong. I made a common mistake: when substituting variables in the integral, I ‘forgot’ to substitute the lower and upper bound of the interval over which I was integrating the function. The calculation below corrects the mistake, and so it does the required substitutions—for the first three modes at least. What’s going on here? Well… Nothing much… I just integrate over the length L taking a snapshot at t = 0 (as mentioned, we can always shift the origin of our independent variable, so here we do it for time and so it’s OK). Hence, the argument of our wave function sin(kx−ωt) reduces to kx, with k = 2π/λ, and λ = 2L, λ = L, λ = (2/3)·L for the first, second and third mode respectively. [As for solving the integral of the sine squared, you can google the formula, and please do check my substitutions. They should be OK, but… Well… We never know, do we? :-)]

energy integrals

[…] No… This doesn’t make all that much sense either. Those integrals yield the same energy for all three modes. Something must be wrong: shorter wavelengths (i.e. higher frequencies) are associated with higher energy levels. Full stop. So the ‘solution’ above can’t be right… […] You’re right. That’s where the time aspect comes into play. We were taking a snapshot, indeed, and the mean value of the sine squared function is 1/2 = 0.5, as should be clear from Pythagoras’ theorem: cos2x + sin2x = 1. So what I was doing is like integrating a constant function over the same-length interval. So… Well… Yes: no wonder I get the same value again and again.


We need to integrate over the same time interval. You could do that, as an exercise, but there’s a more direct approach to it: the energy of a wave is directly proportional to its frequency, so we write: E ∼ f. If the frequency doubles, triples, quadruples etcetera, then its energy doubles, triples, quadruples etcetera too. But – remember – we’re talking one string only here, with a fixed wave speed c = λ·f – so f = c/λ (read: the frequency is inversely proportional to the wavelength) – and, therefore (assuming the same (maximum) amplitude), we get that the energy level of each mode is inversely proportional to the wavelength, so we find that E ∼ 1/f.

Now, with direct or inverse proportionality relations, we can always invent some new unit that makes the relationship an identity, so let’s do that and turn it into an equation indeed. [And, yes, sorry… I apologize again to your old math teacher: he may not quite agree with the shortcut I am taking here, but he’ll justify the logic behind.] So… Remembering that λ1 = 2L, λ2 = L, λ3 = (2/3)·L, etcetera, we can then write:

E1 = (1/2)/L, E2 = (2/2)/L, E3 = (3/2)/L, E4 = (4/2)/L, E5 = (5/2)/L,…, En = (n/2)/L,…

That’s a really nice result, because… Well… In quantum theory, we have this so-called equipartition theorem, which says that the permitted energy levels of a harmonic oscillator are equally spaced, with the interval between them equal to h or ħ (if you use the angular frequency to describe a wave (so that’s ω = 2π·f), then Planck’s constant (h) becomes ħ = h/2π). So here we’ve got equipartition too, with the interval between the various energy levels equal to (1/2)/L.

You’ll say: So what? Frankly, if this doesn’t amaze you, stop reading—but if this doesn’t amaze you, you actually stopped reading a long time ago. :-) Look at what we’ve got here. We didn’t specify anything about that string, so we didn’t care about its materials or diameter or tension or how it was made (a wound guitar string is a terribly complicated thing!) or about whatever. Still, we know its fundamental (or normal) modes, and their frequency or nodes or energy or whatever depend on the length of the string only, with the ‘fundamental’ unit of energy being equal to the reciprocal length. Full stop. So all is just a matter of size and proportions. In other words, it’s all about structure. Absolute measurements don’t matter.

You may say: Bull****. What’s the conclusion? You still didn’t tell me anything about how the total energy of the wave is supposed to be distributed over its normal modes! 

That’s true. I didn’t. Why? Well… I am not sure, really. I presented a lot of stuff here, but I did not present a clear and unambiguous answer as to how the total energy of a string is distributed over its modes. Not for actual strings, nor for ideal strings. Let me be honest: I don’t know. I really don’t. Having said that, my guts instinct that most of the energy – of, let’s say, a C4 note – should be in the primary mode (i.e. in the fundamental frequency) must be right: otherwise we would not call it a C4 note. So let’s try to make some assumptions. However, before doing so, let’s first briefly touch base with reality.

For actual strings (or actual musical sounds), I suspect the analysis can be quite complicated, as evidenced by the following illustration, which I took from one of the many interesting sites on this topic. Let me quote the author: “A flute is essentially a tube that is open at both ends. Air is blown across one end and sound comes out the other. The harmonics are all whole number multiples of the fundamental frequency (436 Hz, a slightly flat A4 — a bit lower in frequency than is normally acceptable). Note how the second harmonic is nearly as intense as the fundamental. [My = blog writer’s :-) italics] This strong second harmonic is part of what makes a flute sound like a flute.”

Hmmm… What I see in the graph is a first harmonic that is actually more intense than its fundamental, so what’s that all about? So can we actually associate a specific frequency to that tone? Not sure. :-/ So we’re in trouble already.


If reality doesn’t match our thinking, what about ideality? Hmmm… What to say? As for ideal strings – or ideal flutes :-) – I’d venture to say that the most obvious distribution of energy over the various modes (or harmonics, when we’re talking sound) would is the Boltzmann distribution.

Huh? Yes. Have a look at one of my posts on statistical mechanics. It’s a weird thing: the distribution of molecular speeds in a gas, or the density of the air in the atmosphere, or whatever involving many particles and/or a great degree of complexity (so many, or such a degree of complexity, that only some kind of statistical approach to the problem works—all that involves Boltzmann’s Law, which basically says the distribution function will be a function of the energy levels involved: fe–energy. So… Well… Yes. It’s the logarithmic scale again. It seems to govern the Universe. :-)

Huh? Yes. That’s why think: the distribution of the total energy of the oscillation should be some Boltzmann function, so it should depend on the energy of the modes: most of the energy will be in the lower modes, and most of the most in the fundamental. […] Hmmm… It again begs the question: how much exactly?

Well… The Boltzmann distribution strongly resembles the ‘harmonic’ distribution shown above (1, 1/2, 1/3, 1/4 etc), but it’s not quite the same. The graph below shows how they are similar and dissimilar in shape. You can experiment yourself with coefficients and all that, but your conclusion will be the same. As they say in Asia: they are “same-same but different.” :-) […] It’s like the ‘good’ and ‘equal’ temperament used when tuning musical instruments: the ‘good’ temperament – which is based on harmonic ratios – is good, but not good enough. Only the ‘equal’ temperament obeys the logarithmic scale and, therefore, is perfect. So, as I mentioned already, while my assumption isn’t quite right (the distribution is not harmonic, in the Pythagorean sense), the intuition behind is OK. So it’s just like Pythagoras’ number theory of the Universe. Having said that, I’ll leave it to you to draw the correct the conclusions from it. :-)


Music and Math

Music and physics

My first working title for this post was Music and Modes. Yes. Modes. Not moods. The relation between music and moods is an interesting research topic as well but so it’s not what I am going to write about. :-)

It started with me thinking I should write something on modes indeed, because the concept of a mode of a wave, or any oscillator really, is quite central to physics, both in classical physics as well as in quantum physics (quantum-mechanical systems are analyzed as oscillators too!). But I wondered how to approach it, as it’s a rather boring topic if you look at the math only. But then I was flying back from Europe, to Asia, where I live and, as I am also playing a bit of guitar, I suddenly wanted to know why we like music. And then I thought that’s a question you may have asked yourself at some point of time too! And so then I thought I should write about modes as part of a more interesting story: a story about music—or, to be precise, a story about the physics behind music. So… Let’s go for it.

Philosophy versus physics

There is, of course, a very simple answer to the question of why we like music: we like music because it is music. If it would not be music, we would not like it. That’s a rather philosophical answer, and it probably satisfies most people. However, for someone studying physics, that answer can surely not be sufficient. What’s the physics behind? I reviewed Feynman’s Lecture on sound waves in the plane, combined it with some other stuff I googled when I arrived, and then I wrote this post, which gives you a much less philosophical answer. :-)

The observation at the center of the discussion is deceptively simple: why is it that similar strings (i.e. strings made of the same material, with the same thickness, etc), under the same tension but differing in length, sound ‘pleasant’ when sounded together if – and only if  – the ratio of the length of the strings is like 1:2, 2:3, 3:4, 3:5, 4:5, etc (i.e. like whatever other ratio of two small integers)?

You probably wonder: is that the question, really? It is. The question is deceptively simple indeed because, as you will see in a moment, the answer is quite complicated. So complicated, in fact, that the Pythagoreans didn’t have any answer. Nor did anyone else for that matter—until the 18th century or so, when musicians, physicists and mathematicians alike started to realize that a string (of a guitar, or a piano, or whatever instrument Pythagoras was thinking of at the time), or a column of air (in a pipe organ or a trumpet, for example), or whatever other thing that actually creates the musical tone, actually oscillates at numerous frequencies simultaneously.

The Pythagoreans did not suspect that a string, in itself, is a rather complicated thing – something which physicists refer to as a harmonic oscillator – and that its sound, therefore, is actually produced by many frequencies, instead of only one. The concept of a pure note, i.e. a tone that is free of harmonics (i.e. free of all other frequencies, except for the fundamental frequency) also didn’t exist at the time. And if it did, they would not have been able to produce a pure tone anyway: producing pure tones – or notes, as I’ll call them, somewhat inaccurately (I should say: a pure pitch) – is remarkably complicated, and they do not exist in Nature. If the Pythagoreans would have been able to produce pure tones, they would have observed that pure tones do not give any sensation of consonance or dissonance if their relative frequencies respect those simple ratios. Indeed, repeated experiments, in which such pure tones are being produced, have shown that human beings can’t really say whether it’s a musical sound or not: it’s just sound, and it’s neither pleasant (or consonant, we should say) or unpleasant (i.e. dissonant).

The Pythagorean observation is valid, however, for actual (i.e. non-pure) musical tones. In short, we need to distinguish between tones and notes (i.e. pure tones): they are two very different things, and the gist of the whole argument is that musical tones coming out of one (or more) string(s) under tension are full of harmonics and, as I’ll explain in a minute, that’s what explains the observed relation between the lengths of those strings and the phenomenon of consonance (i.e. sounding ‘pleasant’) or dissonance (i.e. sounding ‘unpleasant’).

Of course, it’s easy to say what I say above: we’re 2015 now, and so we have the benefit of hindsight. Back then –  so that’s more than 2,500 years ago! – the simple but remarkable fact that the lengths of similar strings should respect some simple ratio if they are to sound ‘nice’ together, triggered a fascination with number theory (in fact, the Pythagoreans actually established the foundations of what is now known as number theory). Indeed, Pythagoras felt that similar relationships should also hold for other natural phenomena! To mention just one example, the Pythagoreans also believed that the orbits of the planets would also respect such simple numerical relationships, which is why they talked of the ‘music of the spheres’ (Musica Universalis).

We now know that the Pythagoreans were wrong. The proportions in the movements of the planets around the Sun do not respect simple ratios and, with the benefit of hindsight once again, it is regrettable that it took many courageous and brilliant people, such as Galileo Galilei and Copernicus, to convince the Church of that fact. :-( Also, while Pythagoras’ observations in regard to the sounds coming out of whatever strings he was looking at were correct, his conclusions were wrong: the observation does not imply that the frequencies of musical notes should all be in some simple ratio one to another.

Let me repeat what I wrote above: the frequencies of musical notes are not in some simple relationship one to another. The frequency scale for all musical tones is logarithmic and, while that implies that we can, effectively, do some tricks with ratios based on the properties of the logarithmic scale (as I’ll explain in a moment), the so-called ‘Pythagorean’ tuning system, which is based on simple ratios, was plain wrong, even if it – or some variant of it (instead of the 3:2 ratio, musicians used the 5:4 ratio from about 1510 onwards) – was generally used until the 18th century! In short, Pythagoras was wrong indeed—in this regard at least: we can’t do much with those simple ratios.

Having said that, Pythagoras’ basic intuition was right, and that intuition is still very much what drives physics today: it’s the idea that Nature can be described, or explained (whatever that means), by quantitative relationships only. Let’s have a look at how it actually works for music.

Tones, noise and notes

Let’s first define and distinguish tones and notes. A musical tone is the opposite of noise, and the difference between the two is that musical tones are periodic waveforms, so they have a period T, as illustrated below. In contrast, noise is a non-periodic waveform. It’s as simple as that.

noise versus music

Now, from previous posts, you know we can write any period function as the sum of a potentially infinite number of simple harmonic functions, and that this sum is referred to as the Fourier series. I am just noting it here, so don’t worry about it as for now. I’ll come back to it later.

You also know we have seven musical notes: Do-Re-Mi-Fa-Sol-La-Si or, more common in the English-speaking world, A-B-C-D-E-F-G. And then it starts again with A (or Do). So we have two notes, separated by an interval which is referred to as an octave (from the Greek octo, i.e. eight), with six notes in-between, so that’s eight notes in total. However, you also know that there are notes in-between, except between E and F and between B and C. They are referred to as semitones or half-steps. I prefer the term ‘half-step’ over ‘semitone’, because we’re talking notes really, not tones.

We have, for example, F–sharp (denoted by F#), which we can also call G-flat (denoted by Gb). It’s the same thing: a sharp # raises a note by a semitone (aka half-step), and a flat b lowers it by the same amount, so F# is Gb. That’s what shown below: in an octave, we have eight notes but twelve half-steps. 


Let’s now look at the frequencies. The frequency scale above (expressed in oscillations per second, so that’s the hertz unit) is a logarithmic scale: frequencies double as we go from one octave to another: the frequency of the C4 note above (the so-called middle C) is 261.626 Hz, while the frequency of the next C note (C5) is double that: 523.251 Hz. [Just in case you’d want to know: the 4 and 5 number refer to its position on a standard 88-key piano keyboard: C4 is the fourth C key on the piano.]

Now, if we equate the interval between C4 and C5 with 1 (so the octave is our musical ‘unit’), then the interval between the twelve half-steps is, obviously, 1/12. Why? Because we have 12 halve-steps in our musical unit. You can also easily verify that, because of the way logarithms work, the ratio of the frequencies of two notes that are separated by one half-step (between D# and E, for example) will be equal to 21/12. Likewise, the ratio of the frequencies of two notes that are separated by half-steps is equal to 2n/12. [In case you’d doubt, just do an example. For instance, if we’d denote the frequency of C4 as f0, and the frequency of C# as f1 and so on (so the frequency of D is f2, the frequency of C5 is f12, and everything else is in-between), then we can write the f2/fratio as f2/f= ( f2/f1)(f1/f0) =  21/12·21/12 = 22/12 = 21/6. I must assume you’re smart enough to generalize this result yourself, and that f12/fis, obviously, equal to 212/12 =21 = 2, which is what it should be!]

Now, because the frequencies of the various C notes are expressed as a number involving some decimal fraction (like 523.251 Hz, and the 0.251 is actually an approximation only), and because they are, therefore, a bit hard to read and/or work with, I’ll illustrate the next idea – i.e. the concept of harmonics – with the A instead of the C. :-)


The lowest A on a piano is denoted by A0, and its frequency is 27.5 Hz. Lower A notes exist (we have one at 13.75 Hz, for instance) but we don’t use them, because they are near (or actually beyond) the limit of the lowest frequencies we can hear. So let’s stick to our grand piano and start with that 27.5 Hz frequency. The next A note is A1, and its frequency is 55 Hz. We then have A2, which is like the A on my (or your) guitar: its frequency is equal to 2×55 = 110 Hz. The next is A3, for which we double the frequency once again: we’re at 220 Hz now. The next one is the A in the illustration of the C scale above: A4, with a frequency of 440 Hz.

[Let me, just for the record, note that the A4 note is the standard tuning pitch in Western music. Why? Well… There’s no good reason really, except convention. Indeed, we can derive the frequency of any other note from that A4 note using our formula for the ratio of frequencies but, because of the properties of a logarithmic function, we could do the same using whatever other note really. It’s an important point: there’s no such thing as an absolute reference point in music: once we define our musical ‘unit’ (so that’s the so-called octave in Western music), and how many steps we want to have in-between (so that’s 12 steps—again, in Western music, that is), we get all the rest. That’s just how logarithms work. So music is all about structure, i.e. mathematical relationships. Again, Pythagoras’ conclusions were wrong, but his intuition was right.]

Now, the notes we are talking about here are all so-called pure tones. In fact, when I say that the A on our guitar is referred to as A2 and that it has a frequency of 110 Hz, then I am actually making a huge simplification. Worse, I am lying when I say that: when you play a string on a guitar, or when you strike a key on a piano, all kinds of other frequencies – so-called harmonics – will resonate as well, and that’s what gives the quality to the sound: it’s what makes it sound beautiful. So the fundamental frequency (aka as first harmonic) is 110 Hz alright but we’ll also have second, third, fourth, etc harmonics with frequency 220 Hz, 330 Hz, 440 Hz, etcetera. In music, the basic or fundamental frequency is referred to as the pitch of the tone and, as you can see, I often use the term ‘note’ (or pure tone) as a synonym for pitch—which is more or less OK, but not quite correct actually. [However, don’t worry about it: my sloppiness here does not affect the argument.]

What’s the physics behind? Look at the illustration below (I borrowed it from the Physics Classroom site). The thick black line is the string, and the wavelength of its fundamental frequency (i.e. the first harmonic) is twice its length, so we write λ1 = 2·L or, the other way around, L = (1/2)·λ1. Now that’s the so-called first mode of the string. [One often sees the term fundamental or natural or normal mode, but the adjective is not necessary really. In fact, I find it confusing, although I sometimes find myself using it too.]


We also have a second, third, etc mode, depicted below, and these modes correspond to the second, third, etc harmonic respectively.


For the second, third, etc mode, the relationship between the wavelength and the length of the string is, obviously, the following: L = (2/2)·λ= λ2, L = L = (3/2)·λ3, etc. More in general, for the nth mode, L will be equal to L = (n/2)·λn, with n = 1, 2, etcetera. In fact, because L is supposed to be some fixed length, we should write it the other way around: λn = (2/n)·L.

What does it imply for the frequencies? We know that the speed of the wave – let’s denote it by c – as it travels up and down the string, is a property of the string, and it’s a property of the string only. In other words, it does not depend on the frequency. Now, the wave velocity is equal to the frequency times the wavelength, always, so we have c = f·λ. To take the example of the (classical) guitar string: its length is 650 mm, i.e. 0.65 m. Hence, the identities λ1 = (2/1)·L, λ2 = (2/2)·L, λ3 = (2/3)·L etc become λ1 = (2/1)·0.65 = 1.3 m, λ2 = (2/2)·0.65 = 0.65 m, λ3 = (2/3)·0.65 = 0.433.. m and so on. Now, combining these wavelengths with the above-mentioned frequencies, we get the wave velocity c = (110 Hz)·(1.3 m) = (220 Hz)·(0.65 m) = (330 Hz)·(0.433.. m) = 143 m/s.

Let me now get back to Pythagoras’ string. You should note that the frequencies of the harmonics produced by a simple guitar string are related to each other by simple whole number ratios. Indeed, the frequencies of the first and second harmonics are in a simple 2 to 1 ratio (2:1). The second and third harmonics have a 3:2 frequency ratio. The third and fourth harmonics a 4:3 ratio. The fifth and fourth harmonic 5:4, and so on and so on. They have to be. Why? Because the harmonics are simple multiples of the basic frequency. Now that is what’s really behind Pythagoras’ observation: when he was sounding similar strings with the same tension but different lengths, he was making sounds with the same harmonics. Nothing more, nothing less. 

Let me be quite explicit here, because the point that I am trying to make here is somewhat subtle. Pythagoras’ string is Pythagoras’ string: he talked similar strings. So we’re not talking some actual guitar or a piano or whatever other string instrument. The strings on (modern) string instruments are not similar, and they do not have the same tension. For example, the six strings of a guitar strings do not differ in length (they’re all 650 mm) but they’re different in tension. The six strings on a classical guitar also have a different diameter, and the first three strings are plain strings, as opposed to the bottom strings, which are wound. So the strings are not similar but very different indeed. To illustrate the point, I copied the values below for just one of the many commercially available guitar string sets.  tensionIt’s the same for piano strings. While they are somewhat more simple (they’re all made of piano wire, which is very high quality steel wire basically), they also differ—not only in length but in diameter as well, typically ranging from 0.85 mm for the highest treble strings to 8.5 mm (so that’s ten times 0.85 mm) for the lowest bass notes.

In short, Pythagoras was not playing the guitar or the piano (or whatever other more sophisticated string instrument that the Greeks surely must have had too) when he was thinking of these harmonic relationships. The physical explanation behind his famous observation is, therefore, quite simple: musical tones that have the same harmonics sound pleasant, or consonant, we should say—from the Latin con-sonare, which, literally, means ‘to sound together’ (from sonare = to sound and con = with). And otherwise… Well… Then they do not sound pleasant: they are dissonant.

To drive the point home, let me emphasize that, when we’re plucking a string, we produce a sound consisting of many frequencies, all in one go. One can see it in practice: if you strike a lower A string on a piano – let’s say the 110 Hz A2 string – then its second harmonic (220 Hz) will make the A3 string vibrate too, because it’s got the same frequency! And then its fourth harmonic will make the A4 string vibrate too, because they’re both at 440 Hz. Of course, the strength of these other vibrations (or their amplitude we should say) will depend on the strength of the other harmonics and we should, of course, expect that the fundamental frequency (i.e. the first harmonic) will absorb most of the energy. So we pluck one string, and so we’ve got one sound, one tone only, but numerous notes at the same time!

In this regard, you should also note that the third harmonic of our 110 Hz A2 string corresponds to the fundamental frequency of the E4 tone: both are 330 Hz! And, of course, the harmonics of E, such as its second harmonic (2·330 Hz = 660 Hz) correspond to higher harmonics of A too! To be specific, the second harmonic of our E string is equal to the sixth harmonic of our A2 string. If your guitar is any good, and if your strings are of reasonable quality too, you’ll actually see it: the (lower) E and A strings co-vibrate if you play the A major chord, but by striking the upper four strings only. So we’ve got energy – motion really – being transferred from the four strings you do strike to the two strings you do not strike! You’ll say: so what? Well… If you’ve got any better proof of the actuality (or reality) of various frequencies being present at the same time, please tell me! :-)

So that’s why A and E sound very well together (A, E and C#, played together, make up the so-called A major chord): our ear likes matching harmonics. And so that why we like musical tones—or why we define those tones as being musical! :-) Let me summarize it once more: musical tones are composite sound waves, consisting of a fundamental frequency and so-called harmonics (so we’ve got many notes or pure tones altogether in one musical tone). Now, when other musical tones have harmonics that are shared, and we sound those notes too, we get the sensation of harmony, i.e. the combination sounds consonant.

Now, i’s not difficult to see that we will always have such shared harmonics if we have similar strings, with the same tension but different lengths, being sounded together. In short, what Pythagoras observed has nothing much to do with notes, but with tones. Let’s go a bit further in the analysis now by introducing some more math. And, yes, I am very sorry: it’s the dreaded Fourier analysis indeed! :-)

Fourier analysis

You know that we can decompose any periodic function into a sum of a (potentially infinite) series of simple sinusoidal functions, as illustrated below. I took the illustration from Wikipedia: the red function s6(x) is the sum of six sine functions of different amplitudes and (harmonically related) frequencies. The so-called Fourier transform S(f) (in blue) relates the six frequencies with the respective amplitudes.


In light of the discussion above, it is easy to see what this means for the sound coming from a plucked string. Using the angular frequency notation (so we write everything using ω instead of f), we know that the normal or natural modes of oscillation have frequencies ω = 2π/T = 2πf  (so that’s the fundamental frequency or first harmonic), 2ω (second harmonic), 3ω (third harmonic), and so on and so on.

Now, there’s no reason to assume that all of the sinusoidal functions that make up our tone should have the same phase: some phase shift Φ may be there and, hence, we should write our sinusoidal function  not as cos(ωt), but as cos(ωt + Φ) in order to ensure our analysis is general enough. [Why not a sine function? It doesn’t matter: the cosine and sine function are the same, except for another phase shift of 90° = π/2.] Now, from our geometry classes, we know that we can re-write cos(ωt + Φ) as

cos(ωt + Φ) = [cos(Φ)cos(ωt) – sin(Φ)sin(ωt)]

We have a lot of these functions of course – one for each harmonic, in fact – and, hence, we should use subscripts, which is what we do in the formula below, which says that any function f(t) that is periodic with the period T can be written mathematically as:

Fourier series

You may wonder: what’s that period T? It’s the period of the fundamental mode, i.e. the first harmonic. Indeed, the period of the second, third, etc harmonic will only be one half, one third etcetera of the period of the first harmonic. Indeed, T2 = (2π)/(2ω) = (1/2)·(2π)/ω = (1/2)·T1, and T3 = (2π)/(3ω) = (1/3)·(2π)/ω = (1/3)·T1, and so on. However, it’s easy to see that these functions also repeat themselves after two, three, etc periods respectively. So all is alright, and the general idea behind the Fourier analysis is further illustrated below. [Note that both the formula as well as the illustration below (which I took from Feynman’s Lectures) add a ‘zero-frequency term’ a0 to the series. That zero-frequency term will usually be zero for a musical tone, because the ‘zero’ level of our tone will be zero indeed. Also note that the an and bn coefficients are, of course, equal to an = cos Φand b= –sinΦn, so you can relate the illustration and the formula easily.]

Fourier 2You’ll say: What the heck! Why do we need the mathematical gymnastics here? It’s just to understand that other characteristic of a musical tone: its quality (as opposed to its pitch). A so-called rich tone will have strong harmonics, while a pure tone will only have the first harmonic. All other characteristics – the difference between a tone produced by a violin as opposed to a piano – are then related to the ‘mix’ of all those harmonics.

So we have it all now, except for loudness which is, of course, related to the magnitude of the air pressure changes as our waveform moves through the air: pitch, loudness and quality. that’s what makes a musical tone. :-)


As mentioned above, if the sounds are not consonant, they’re dissonant. But what is dissonance really? What’s going on? The answer is the following: when two frequencies are near to a simple fraction, but not exact, we get so-called beats, which our ear does not like.

Huh? Relax. The illustration below, which I copied from the Wikipedia article on piano tuning, illustrates the phenomenon. The blue wave is the sum of the red and the green wave, which are originally identical. But then the frequency of the green wave is increased, and so the two waves are no longer in phase, and the interference results in a beating pattern. Of course, our musical tone involves different frequencies and, hence, different periods T1,T2, Tetcetera, but you get the idea: the higher harmonics also oscillate with period T1, and if the frequencies are not in some exact ratio, then we’ll have a similar problem: beats, and our ear will not like the sound.


Of course, you’ll wonder: why don’t we like beats in tones? We can ask that, can’t we? It’s like asking why we like music, isn’t it? […] Well… It is and it isn’t. It’s like asking why our ear (or our brain) likes harmonics. We don’t know. That’s how we are wired. The ‘physical’ explanation of what is musical and what isn’t only goes so far, I guess. :-(

Pythagoras versus Bach

From all of what I wrote above, it is obvious that the frequencies of the harmonics of a musical tone are, indeed, related by simple ratios of small integers: the frequencies of the first and second harmonics are in a simple 2 to 1 ratio (2:1); the second and third harmonics have a 3:2 frequency ratio; the third and fourth harmonics a 4:3 ratio; the fifth and fourth harmonic 5:4, etcetera. That’s it. Nothing more, nothing less.

In other words, Pythagoras was observing musical tones: he could not observe the pure tones behind, i.e. the actual notesHowever, aesthetics led Pythagoras, and all musicians after him – until the mid-18th century – to also think that the ratio of the frequencies of the notes within an octave should also be simple ratios. From what I explained above, it’s obvious that it should not work that way: the ratio of the frequencies of two notes separated by n half-steps is 2n/12, and, for most values of n, 2n/12 is not some simple ratio. [Why? Just take your pocket calculator and calculate the value of 21/12: it’s 20.08333… = 1.0594630943… and so on… It’s an irrational number: there are no repeating decimals. Now, 2n/12 is equal to 21/12·21/12·…·21/12 (n times). Why would you expect that product to be equal to some simple ratio?]

So – I said it already – Pythagoras was wrong—not only in this but also in other regards, such as when he espoused his views on the solar system, for example. Again, I am sorry to have to say that, but it is what is: the Pythagoreans did seem to prefer mathematical ideas over physical experiment. :-) Having said that, musicians obviously didn’t know about any alternative to Pythagoras, and they had surely never heard about logarithmic scales at the time. So… Well… They did use the so-called Pythagorean tuning system. To be precise, they tuned their instruments by equating the frequency ratio between the first and the fifth tone in the C scale (i.e. the C and G, as they did not include the C#, D# and F# semitones when counting) with the ratio 3/2, and then they used other so-called harmonic ratios for the notes in-between.

Now, the 3/2 ratio is actually almost correct, because the actual frequency ratio is 27/12 (we have seven tones, including the semitones—not five!), and so that’s 1.4983, approximately. Now, that’s pretty close to 3/2 = 1.5, I’d say. :-) Using that approximation (which, I admit, is fairly accurate indeed), the tuning of the other strings would then also be done assuming certain ratios should be respected, like the ones below.


So it was all quite good. Having said that, good musicians, and some great mathematicians, felt something was wrong—if only because there were several so-called just intonation systems around (for an overview, check out the Wikipedia article on just intonation). More importantly, they felt it was quite difficult to transpose music using the Pythagorean tuning system. Transposing music amounts to changing the so-called key of a musical piece: what one does, basically, is moving the whole piece up or down in pitch by some constant interval that is not equal to an octave. Today, transposing music is a piece of cake—Western music at least. But that’s only because all Western music is played on instruments that are tuned using that logarithmic scale (technically, it’s referred to as the 12-tone equal temperament (12-TET) system). When you’d use one of the Pythagorean systems for tuning, a transposed piece does not sound quite right. 

The first mathematician who really seemed to know what was wrong (and, hence, who also knew what to do) was Simon Stevin, who wrote a manuscript based on the ’12th root of 2 principle’ around AD 1600. It shouldn’t surprise us: the thinking of this mathematician from Bruges would inspire John Napier’s work on logarithms. Unfortunately, while that manuscript describes the basic principles behind the 12-TET system, it didn’t get published (Stevin had to run away from Bruges, to Holland, because he was protestant and the Spanish rulers at the time didn’t like that). Hence, musicians, while not quite understanding the math (or the physics, I should say) behind their own music, kept trying other tuning systems, as they felt it made their music sound better indeed.

One of these ‘other systems’ is the so-called ‘good’ temperament, which you surely heard about, as it’s referred to in Bach’s famous composition, Das Wohltemperierte Klavier, which he finalized in the first half of the 18th century. What is that ‘good’ temperament really? Well… It is what it is: it’s one of those tuning systems which made musicians feel better about their music for a number of reasons, all of which are well described in the Wikipedia article on it. But the main reason is that the tuning system that Bach recommended was a great deal better when it came to playing the same piece in another key. However, it still wasn’t quite right, as it wasn’t the equal temperament system (i.e. the 12-TET system) that’s in place now (in the West at least—the Indian music scale, for instance, is still based on simple ratios).

Why do I mention this piece of Bach? The reason is simple: you probably heard of it because it’s one of the main reference points in a rather famous book: Gödel, Escher and Bach—an Eternal Golden Braid. If not, then just forget about it. I am mentioning it because one of my brothers loves it. It’s on artificial intelligence. I haven’t read it, but I must assume Bach’s master piece is analyzed there because of its structure, not because of the tuning system that one’s supposed to use when playing it. So… Well… I’d say: don’t make that composition any more mystic than it already is. :-) The ‘magic’ behind it is related to what I said about A4 being the ‘reference point’ in music: since we’re using a universal logarithmic scale now, there’s no such thing as an absolute reference point any more: once we define our musical ‘unit’ (so that’s the so-called octave in Western music), and also define how many steps we want to have in-between (so that’s 12—in Western music, that is), we get all the rest. That’s just how logarithms work.

So, in short, music is all about structure, i.e. it’s all about mathematical relations, and about mathematical relations only. Again, Pythagoras’ conclusions were wrong, but his intuition was right. And, of course, it’s his intuition that gave birth to science: the simple ‘models’ he made – of how notes are supposed to be related to each other, or about our solar system – were, obviously, just the start of it all. And what a great start it was! Looking back once again, it’s rather sad conservative forces (such as the Church) often got in the way of progress. In fact, I suddenly wonder: if scientists would not have been bothered by those conservative forces, could mankind have sent people around the time that Charles V was born, i.e. around A.D. 1500 already? :-)

Post scriptum: My example of the the (lower) E and A guitar strings co-vibrating when playing the A major chord striking the upper four strings only, is somewhat tricky. The (lower) E and A strings are associated with lower pitches, and we said overtones (i.e. the second, third, fourth, etc harmonics) are multiples of the fundamental frequency. So why is that the lower strings co-vibrate? The answer is easy: they oscillate at the higher frequencies only. If you have a guitar: just try it. The two strings you do not pluck do vibrate—and very visibly so, but the low fundamental frequencies that come out of them when you’d strike them, are not audible. In short, they resonate at the higher frequencies only. :-)

The example that Feynman gives is much more straightforward: his example mentions the lower C (or A, B, etc) notes on a piano causing vibrations in the higher C strings (or the higher A, B, etc string respectively). For example, striking the C2 key (and, hence, the C2 string inside the piano) will make the (higher) C3 string vibrate too. But few of us have a grand piano at home, I guess. That’s why I prefer my guitar example. :-)

Music and physics

Maxwell-Boltzmann, Bose-Einstein and Fermi-Dirac statistics

I’ve discussed statistics, in the context of quantum mechanics, a couple of times already (see, for example, my post on amplitudes and statistics). However, I never took the time to properly explain those distribution functions which are referred to as the Maxwell-Boltzmann, Bose-Einstein and Fermi-Dirac distribution functions respectively. Let me try to do that now—without, hopefully, getting lost in too much math! It should be a nice piece, as it connects quantum mechanics with statistical mechanics, i.e. two topics I had nicely separated so far. :-)

You know the Boltzmann Law now, which says that the probabilities of different conditions of energy are given by e−energy/kT = 1/eenergy/kT. Different ‘conditions of energy’ can be anything: density, molecular speeds, momenta, whatever. The point is: we have some probability density function f, and it’s a function of the energy E, so we write:

f(E) = C·e−energy/kT = C/eenergy/kT

C is just a normalization constant (all probabilities have to add up to one, so the integral of this function over its domain must be one), and k and T are also usual suspects: T is the (absolute) temperature, and k is the Boltzmann constant, which relates the temperate to the kinetic energy of the particles involved. We also know the shape of this function. For example, when we applied it to the density of the atmosphere at various heights (which are related to the potential energy, as P.E. = m·g·h), assuming constant temperature, we got the following graph. The shape of this graph is that of an exponential decay function (we’ll encounter it again, so just take a mental note of it).


A more interesting application is the quantum-mechanical approach to the theory of gases, which I introduced in my previous post. To explain the behavior of gases under various conditions, we assumed that gas molecules are like oscillators but that they can only take on discrete levels of energy. [That’s what quantum theory is about!] We denoted the various energy levels, i.e. the energies of the various molecular states, by E0, E1, E2,…, Ei,…, and if Boltzmann’s Law applies, then the probability of finding a molecule in the particular state Ei is proportional to e−Ei /kT. We can then calculate the relative probabilities, i.e. the probability of being in state Ei, relative to the probability of being in state E0, is:

Pi/P0 = e−Ei /kT/e−E0 /kT = e−(Ei–E0)/kT = 1/e(Ei–E0)/kT

Now, Pi obviously equals ni/N, so it is the ratio of the number of molecules in state Ei (ni) and the total number of molecules (N). Likewise, P0 = n0/N and, therefore, we can write:

ni/ne−(Ei−E0)/kT = 1/e(Ei–E0)/kT

This formulation is just another Boltzmann Law, but it’s nice in that it introduces the idea of a ground state, i.e. the state with the lowest energy level. We may or may not want to equate E0 with zero. It doesn’t matter really: we can always shift all energies by some arbitrary constant because we get to choose the reference point for the potential energy.

So that’s the so-called Maxwell-Boltzmann distribution. Now, in my post on amplitudes and statistics, I had jotted down the formulas for the other distributions, i.e. the distributions when we’re not talking classical particles but fermions and/or bosons. As you know, fermions are particles governed by the Fermi exclusion principle: indistinguishable particles cannot be together in the same state. For bosons, it’s the other way around: having one in some quantum state actually increases the chance of finding another one there, and we can actually have an infinite number of them in the same state.

We also know that fermions and bosons are the real world: fermions are the matter-particles, bosons are the force-carriers, and our ‘Boltzmann particles’ are nothing but a classical approximation of the real world. Hence, even if we can’t see them in the actual world, the Fermi-Dirac and Bose-Einstein distributions are the real-world distributions. :-) Let me jot down the equations once again:

Fermi-Dirac (for fermions): f(E) = 1/[Ae(E − EF)/kT + 1]

Bose-Einstein (for bosons):  f(E) = 1/[AeE/kT − 1]

We’ve got some other normalization constant here (A), which we shouldn’t be too worried about—for the time being, that is. Now, to see how these distributions are different from the Maxwell-Boltzmann distribution (which we should re-write as f(E) = C·e−E/kT = 1/[A·eE/kT] so as to make all formulas directly comparable), we should just make a graph. Please go online to find a graph tool (I found a new one recently—really easy to use), and just do it. You’ll see they are all like that exponential decay function. However, in order to make a proper comparison, we would actually need to calculate the normalization coefficients and, for the Fermi energy, we would also need the Fermi energy E(note that, for simplicity, we did equate E0 with zero). Now, we could give it a try, but it’s much easier to google and find an example online.

The HyperPhysics website of Georgia State University gives us one: the example assumes 6 particles and 9 energy levels, and the table and graph below compare the Maxwell-Boltzmann and Bose-Einstein distributions for the model.

Graph Table

Now that is an interesting example, isn’t it? In this example (but all depends on its assumptions, of course), the Maxwell-Boltzmann and Bose-Einstein distributions are almost identical. Having said that, we can clearly see that the lower energy states are, indeed, more probable with Bose-Einstein statistics than with the Maxwell-Boltzmann statistics. While the difference is not dramatic at all in this example, the difference does become very dramatic, in reality, with large numbers (i.e. high matter density) and, more importantly, at very low temperatures, at which bosons can condense into the lowest energy state. This phenomenon is referred to as Bose-Einstein condensation: it causes superfluidity and superconductivity, and it’s real indeed: it has been observed with supercooled He-4, which is not an everyday substance, but real nevertheless!

What about the Fermi-Dirac distribution for this example? The Fermi-Dirac distribution is given below: the lowest energy state is now less probable, the mid-range energies much more, and none of the six particles occupy any of the four highest energy levels. Again, while the difference is not dramatic in this example, it can become very dramatic, in reality, with large numbers (read: high matter density) and very low temperatures: at absolute zero, all of the possible energy states up to the Fermi energy level will be occupied, and all the levels above the Fermi energy will be vacant.

graph 2 Table 2

What can we make out of all of this? First, you may wonder why we actually have more than one particle in one state above: doesn’t that contradict the Fermi exclusion principle? No. We need to distinguish micro- and macro-states. In fact, the example assumes we’re talking electrons here, and so we can have two particles in each energy state—with opposite spin, however. At the same time, it’s true we cannot have three, or more, in any state. That results, in the example we’re looking at here, in five possible distributions only, as shown below.

Table 3

The diagram is an interesting one: if the particles were to be classical particles, or bosons, then 26 combinations are possible, including the five Fermi-Dirac combinations, as shown above. Note the little numbers above the 26 possible combinations (e.g. 6, 20, 30,… 180): they are proportional to the likelihood of occurring under the Maxwell-Boltzmann assumption (so if we assume the particles are ‘classical’ particles). Let me introduce you to the math behind the example by using the diagram below, which shows three possible distributions/combinations (I know the terminology is quite confusing—sorry for that!).

table 4

If we could distinguish the particles, then we’d have 2002 micro-states, which is the total of all those little numbers on top of the combinations that are shown (6+60+180+…). However, the assumption is that we cannot distinguish the particles. Therefore, the first combination in the diagram above, with five particles in the zero energy state and one particle in state 9, occurs 6 times into 2002 and, hence, it has a probability of 6/2002 ≈ 0.003 only. In contrast, the second combination is 10 times more likely, and the third one is 30 times more likely! In any case, the point is, in the classical situation (and in the Bose-Einstein hypothesis as well), we have 26 possible macro-states, as opposed to 5 only for fermions, and so that leads to a very different density function. Capito?

No? Well, this blog is not a textbook on physics and, therefore, I should refer you to the mentioned site once again, which references a 1992 textbook on physics (Frank Blatt, Modern Physics, 1992) as the source of this example. However, I won’t do that: you’ll find the details in the Post Scriptum to this post. :-)

Let’s first focus on the fundamental stuff, however. The most burning question is: if the real world consists of fermions and bosons, why is that that we only see the Maxwell-Boltzmann distribution in our actual (non-real?) world? :-) The answer is that both the Fermi-Dirac and Bose-Einstein distribution approach the Maxwell–Boltzmann distribution if higher temperatures and lower particle densities are involved. In other words, we cannot see the Fermi-Dirac distributions (all matter is fermionic, except for weird stuff like superfluid helium-4 at 1 or 2 degrees Kelvin), but they are there!

Let’s approach it mathematically: the most general formula, encompassing both Fermi-Dirac and Bose-Einstein statistics, is:

Ni(Ei) ∝ 1/[e(Ei − μ)/kT ± 1]

If you’d google, you’d find a formula involving an additional coefficient, gi, which is the so-called degeneracy of the energy level Ei. I included it in the formula I used in the above-mentioned post of mine. However, I don’t want to make it any more complicated than it already is and, therefore, I omitted it this time. What you need to look at are the two terms in the denominator: e(Ei − μ)/kT and ± 1.

From a math point of view, it is obvious that the values of e(Ei − μ)/kT + 1 (Fermi-Dirac) and e(Ei − μ)/kT − 1 (Bose-Einstein) will approach each other if e(Ei − μ)/kT is much larger than ±1, so if e(Ei − μ)/kT >> 1. That’s the case, obviously, if the (Ei − μ)/kT ratio is large, so if (Ei − μ) >> kT. In fact, (Ei − μ) should, obviously, be much larger than kT for the lowest energy levels too! Now, the conditions under which that is the case are associated with the classical situation (such as a cylinder filled with gas, for example). Why?

Well… […] Again, I have to say that this blog can’t substitute for a proper textbook. Hence, I am afraid I have to leave it to you to do the necessary research to see why. :-) The non-mathematical approach is to simple note that quantum effects, i.e. the ±1 term, only apply if the concentration of particles is high enough. Indeed, quantum effects appear if the concentration of particles is higher than the so-called quantum concentration. Only when the quantum concentration is reached, particles will start interacting according to what they are, i.e. as bosons or as fermions. At higher temperature, that concentration will not be reached, except in massive objects such as a white dwarf (white dwarfs are stellar remnants with the mass like that of the Sun but a volume like that of the Earth). So, in general, we can say that at higher temperatures and at low concentration we will not have any quantum effects. That should settle the matter—as for now, at least.

You’ll have one last question: we derived Boltzmann’s Law from the kinetic theory of gases, but how do we derive that Ni(Ei) = 1/[Ae(Ei − μ)/kT ± 1] expression? Good question but, again, we’d need more than a few pages to explain that! The answer is: quantum mechanics, of course! Go check it out in Feynman’s third Volume of Lectures! :-)

Post scriptum: combinations, permutations and multiplicity

The mentioned example from HyperPhysics is really interesting, if only because it shows you also need to master a bit of combinatorics to get into quantum mechanics. Let’s go through the basics. If we have n distinct objects, we can order hem in n! ways, with n! (read: n factorial) equal to n·(n–1)·(n–2)·…·3·2·1. Note that 0! is equal to 1, per definition. We’ll need that definition.

For example, a red, blue and green ball can be ordered in 3·2·1 = 6 ways. Each way is referred to as a permutation.

Besides permutations, we also have the concept of a k-permutation, which we can denote in a number of ways but let’s choose P(n, k). [The P stands for permutation here, not for probability.] P(n, k) is the number of ways to pick k objects out of a set of n objects. Again, the objects are supposed to be distinguishable. The formula is P(n, k) = n·(n–1)·(n–2)·…·(n–k+1) = n!/(n–k)!. That’s easy to understand intuitively: on your first pick you have n choices; on your second, n–1; on your third, n–2, etcetera. When n = k, we obviously get n! again.

There is a third concept: the k-combination (as opposed to the k-permutation), which we’ll denote by C(n, k). That’s when the order within our subset doesn’t matter: an ace, a queen and a jack taken out of some card deck are a queen, a jack, and an ace: we don’t care about the order. If we have k objects, there are k! ways of ordering them and, hence, we just have to divide P(n, k) by k! to get C(n, k). So we write: C(n, k) = P(n, k)/k! = n!/[(n–k)!k!]. You recognize C(n, k): it’s the binomial coeficient.

Now, the HyperPhysics example illustrating the three mentioned distributions (Maxwell-Boltzmann, Bose-Einstein and Fermi-Dirac) is a bit more complicated: we need to associate q energy levels with N particles. Every possible configuration is referred to as a micro-state, and the total number of possible micro-states is referred to as the multiplicity of the system, denoted by Ω(N, q). The formula for Ω(N, q) is another binomial coefficient: Ω(N, q) = (q+N–1)!/[q!(N–1)!]. Ω(N, q) = Ω(6, 9) = (9+6–1)!/[9!(6–1)!] = 2002.

In our example, however, we do not have distinct particles and, therefore, we only have 26 macro-states (as opposed to 2002 micro-states), which are also referred to, confusingly, as distributions or combinations.

Now, the number of micro-states associated with the same macro-state is given by yet another formula: it is equal to N!/[n1!·n2!·n3!·…·nq!], with ni! the number of particles in level i. [See why we need the 0! = 1 definition? It ensures unoccupied states do not affect the calculation.] So that’s how we get those numbers 6, 60 and 180 for those three macro-states.

But how do we calculate those average numbers of particles for each energy level? In other words, how do we calculate the probability densities under the Maxwell-Boltzmann, Fermi-Dirac and Bose-Einstein hypothesis respectively?

For the Maxwell-Boltzmann distribution, we proceed as follows: for each energy level j (or Ej, I should say), we calculate n= ∑nij·Pi over all macro-states i. In this summation, we have nij, which is the number of particles in energy level j in micro-state i, while Pi is the probability of macro-state i as calculated by the ratio of (i) the number of micro-states associated with macro-state i and (ii) the total number of micro-states. For Pi, we gave the example of 3/2002 ≈ 0.3%. For 60 and 180, we get 60/2002 ≈ 3% and 180/2002 ≈ 9%. Calculating all the nj‘s for j ranging from 1 to 9 should yield the numbers and the graph below indeed.

M-B graphOK. That’s how it works for Maxwell-Boltzmann. Now, it is obvious that the Fermi-Dirac and the Bose-Einstein distribution should not be calculated in the same way because, if they were, they would not be different from the Maxwell-Boltzmann distribution! The trick is as follows.

For the Bose-Einstein distribution, we give all macro-states equal weight—so that’s a weight of one, as shown below. Hence, the probability Pi  is, quite simply, 1/26 ≈ 3.85% for all 26 macro-states. So we use the same n= ∑nij·Pformula but with Pi = 1/26.


Finally, I already explained how we get the Fermi-Dirac distribution: we can only have (i) one, (ii) two, or (iii) zero fermions for each energy level—not more than two! Hence, out of the 26 macro-states, only five are actually possible under the Fermi-Dirac hypothesis, as illustrated below once more. So it’s a very different distribution indeed!

Table 3

Now, you’ll probably still have questions. For example, why does the assumption, for the Bose-Einstein analysis, that macro-states have equal probability favor the lower energy states? The answer is that the model also integrates other constraints: first, when associating a particle with an energy level, we do not favor one energy level over another, so all energy levels have equal probability. However, at the same time, the whole system has some fixed energy level, and so we cannot put the particles in the higher energy levels only! At the same time, we know that, if we have q particles, and the probability of a particle having some energy level j is the same for all j, then they are likely not to be all at the same energy level: they’ll be distributed, effectively, as evidenced by the very low chance (0.3% only) of having 5 particles in the ground state and 1 particle at a higher level, as opposed to the 3% and 9% chance of the other two combinations shown in that diagram with three possible Maxwell-Boltzmann (MB) combinations.

So what happens when assigning an equal probability to all 26 possible combinations (with value 1/26) is that the combinations that were previously rather unlikely – because they did have a rather heavy concentration of particles in the ground state only – are now much more likely. So that’s why the Bose-Einstein distribution, in this example at least, is skewed towards the lowest energy level—as compared to the Maxwell-Boltzmann distribution, that is.

So that’s what’s behind, and that should also answer the other question you surely have when looking at those five acceptable Fermi-Dirac configurations: why don’t we have the same five configurations starting from the top down, rather than from the bottom up? Now you know: such configuration would have much higher energy overall, and so that’s not allowed under this particular model.

There’s also this other question: we said the particles were indistinguishable, but so then we suddenly say there can be two at any energy level, because their spin is opposite. It’s obvious this is rather ad hoc as well. However, if we’d allow only one particle at any energy level, we’d have no allowable combinations and, hence, we’d have no Fermi-Dirac distribution at all in this example.

In short, the example is rather intuitive, which is actually why I like it so much: it shows how bosonic and fermionic behavior appear rather gradually, as a consequence of variables that are defined at the system level, such as density, or temperature. So, yes, you’re right if you think the HyperPhysics example lacks rigor. That’s why I think it’s such wonderful pedagogic device. :-)

Maxwell-Boltzmann, Bose-Einstein and Fermi-Dirac statistics

The Quantum-Mechanical Gas Law

In my previous posts, it was mentioned repeatedly that the kinetic theory of gases is not quite correct: the experimentally measured values of the so-called specific heat ratio (γ) vary with temperature and, more importantly, their values differ, in general, from what classical theory would predict. It works, more or less, for noble gases, which do behave as ideal gases and for which γ is what the kinetic theory of gases would want it to be: γ = 5/3—but we get in trouble immediately, even for simple diatomic gases like oxygen or hydrogen, as illustrated below: the theoretical value is 9/7 (so that’s 1.286, more or less), but the measured value is very different.

Heat ratioLet me quickly remind you how we get the theoretical number. According to classical theory, a diatomic molecule like oxygen can be represented as two atoms connected by a spring. Each of the atoms absorbs kinetic energy, and for each direction of motion (x, y and z), that energy is equal to kT/2, so the kinetic energy of both atoms – added together – is 2·3·kT/2 = 3kT. However, I should immediately add that not all of that energy is to be associated with the center-of-mass motion of the whole molecule, which determines the temperature of the gas: that energy is and remains equal to the 3kT/2, always. We also have rotational and vibratory motion. The molecule can rotate in two independent directions (and any combination of these directions, of course) and, hence, rotational motion is to absorb an amount of energy equal to 2·kT/2 = kT. Finally, the vibratory motion is to be analyzed as any other oscillation, so like a spring really. There is only one dimension involved and, hence, the kinetic energy here is just kT/2. However, we know that the total energy in an oscillator is the sum of the kinetic and potential energy, which adds another kT/2 term. Putting it all together, we find that the average energy for each diatomic particle is (or should be) equal to 7·kT/2 = (7/2)kT. Now, as mentioned above, the temperature of the gas (T) is proportional to the mean molecular energy of the center-of-mass motion only (in fact, that’s how temperature is defined), with the constant of proportionality equal to 3k/2. Hence, for monatomic ideal gases, we can write: U = N·(3k/2)T and, therefore, PV = NkT = (2/3)·U. Now, γ appears as follows in the ideal gas law: PV = (γ–1)U. Therefore, γ = 2/3 + 1 = 5/3, but so that’s for monatomic ideal gases only! The total kinetic energy of our diatomic molecule is U = N·(7k/2)T and, therefore, PV = (2/7)·U. So γ must be γ = 2/7 + 1 = 9/7 ≈ 1.286 for diatomic gases, like oxygen and hydrogen.

Phew! So that’s the theory. However, as we can see from the diagram, γ approaches that value only when we heat the gas to a few thousand degrees! So what’s wrong? One assumption is that certain kinds of motions “freeze out” as the temperature falls—although it’s kinda weird to think of something ‘freezing out’ at a thousand degrees Kelvin! In any case, at the end of the 19th century, that was the assumption that was advanced, very reluctantly, by scientists such as James Jeans. However, the mystery was about to be solved then, as Max Planck, even more reluctantly, presented his quantum theory of energy at the turn of the century itself.

But the quantum theory was confirmed and so we should now see how we can apply it to the behavior of gas. In my humble view, it’s a really interesting analysis, because we’re applying quantum theory here to a phenomenon that’s usually being analyzed as a classical problem only.

Boltzmann’s Law

We derived Boltzmann’s Law in our post on the First Principles of Statistical Mechanics. To be precise, we gave Boltzmann’s Law for the density of a gas (which we denoted by n = N/V)  in a force field, like a gravitational field, or in an electromagnetic field (assuming our gas particles are electrically charged, of course). We noted, however, Boltzmann’s Law was also applicable to much more complicated situations, like the one below, which shows a potential energy function for two molecules that is quite characteristic of the way molecules actually behave: when they come very close together, they repel each other but, at larger distances, there’s a force of attraction. We don’t really know the forces behind but we don’t need to: as long as these forces are conservative, they can combine in whatever way they want to combine, and Boltzmann’s Law will be applicable. [It should be obvious why. If you hesitate, just think of the definition of work and how it affects potential energy and all that. Work is force times distance, but when doing work, we’re also changing potential energy indeed! So if we’ve got a potential energy function, we can get all the rest.]

randomBoltzmann’s Law itself is illustrated by the graph below, which also gives the formula for it: n = n0·e−P.E/kT.


It’s a graph starting at n = n0 for P.E. = 0, and it then decreases exponentially. [Funny expression, isn’t it? So as to respect mathematical terminology, I should say that it decays exponentially.] In any case, if anything, Boltzmann’s Law shows the natural exponential function is quite ‘natural’ indeed, because Boltzmann’s Law pops up in Nature everywhere! Indeed, Boltzmann’s Law is not limited to functions of potential energy only. For example, Feynman derives another Boltzmann Law for the distribution of molecular speeds or, so as to ensure the formula is also valid in relativity, the distribution of molecular momenta. In case you forgot, momentum (p) is the product of mass (m) and velocity (u), and the relevant Boltzmann Law is:

f(p)·dp = C·e−K.E/kT·dp

The argument is not terribly complicated but somewhat lengthy, and so I’ll refer you to the link for more details. As for the f(p) function (and the dp factor on both sides of the equation), that’s because we’re not talking exact values of p but some range equal to dp and some probability of finding particles that have a momentum within that range. The principle is illustrated below for molecular speeds (denoted by u = p/m), so we have a velocity distribution below. The illustration for p would look the same: just substitute u for p.


Boltzmann’s Law can be stated, much more generally, as follows:

The probability of different conditions of energy (E), potential or kinetic, is proportional to e−E/kT

As Feynman notes, “This is a rather beautiful proposition, and a very easy thing to remember too!” It is, and we’ll need it for the next bit.

The quantum-mechanical theory of gases

According to quantum theory, energy comes in discrete packets, quanta, and any system, like an oscillator, will only have a discrete set of energy levels, i.e. states of different energy. An energy state is, obviously, a condition of energy and, hence, Boltzmann’s Law applies. More specifically, if we denote the various energy levels, i.e. the energies of the various molecular states, by E0, E1, E2,…, Ei,…, and if Boltzmann’s Law applies, then the probability of finding a molecule in the particular state Ei will be proportional to e−Ei /kT.

Now, we know we’ve got some constant there, but we can get rid of that by calculating relative probabilities. For example, the probability of being in state E1, relative to the probability of being in state E0, is:

P1/P0 = e−E1 /kT/e−E0 /kT = e−(E1–E0)/kT

But the relative probability Pshould, obviously, also be equal to the ratio n1/N, i.e. the ratio of the number of molecules in state E1 and the total number of molecules. Likewise, P= n0/N. Hence, P1/P0 = n1/nand, therefore, we can write:

n = n0e−(E1–E0)/kT

What can we do with that? Remember we want to explain the behavior of non-monatomic gas—like diatomic gas, for example. Now we need some other assumption, obviously. As it turns out, the assumption that we can represent a system as some kind of oscillation still makes sense! In fact, the assumption that our diatomic molecule is like a spring is equally crucial to our quantum-theoretical analysis of gases as it is to our classical kinetic theory of gases. To be precise, in both theories, we look at it as a harmonic oscillator.

Don’t panic. A harmonic oscillator is, quite simply, a system that, when displaced from its equilibrium position, experiences some kind of restoring force. Now, for it to be harmonic, the force needs to be linear. For example, when talking springs, the restoring force F will be proportional to the displacement x). It basically means we can use a linear differential equation to analyze the system, like m·(d2x/dt2) = –kx. […] I hope you recognize this equation, because you should! It’s Newton’s Law: F = m·a with F = –k·x. If you remember the equation, you’ll also remember that harmonic oscillations were sinusoidal oscillations with a constant amplitude and a constant frequency. That frequency did not depend on the amplitude: because of the sinusoidal function involved, it was easier to write that frequency as an angular frequency, which we denoted by ω0 and which, in the case of our spring, was equal to ω0 = (k/m)1/2. So it’s a property of the system. Indeed, ωis the square root of the ratio of (1) k, which characterizes the spring (it’s its stiffness), and (2) m, i.e. the mass on the spring. Solving the differential equation yielded x = A·cos(ω0t + Δ) as a general solution, with A the (maximum) amplitude, and Δ some phase shift determined by our t = 0 point. Let me quickly jot down too more formulas: the potential energy in the spring is kx2/2, while its kinetic energy is mv2/2, as usual (so the kinetic energy depends on the mass and its velocity, while the potential energy only depends on the displacement and the spring’s stiffness). Of course, kinetic and potential energy add up to the total energy of the system, which is constant and proportional to the square of the (maximum) amplitude: K.E. + P.E. = E ∝ A2. To be precise, E = kA2/2.

That’s simple enough. Let’s get back to our molecular oscillator. While the total energy of an oscillator in classical theory can take on any value, Planck challenged that assumption: according to quantum theory, it can only take up energies equal to ħω at a time. [Note that we use the so-called reduced Planck constant here (i.e. h-bar), because we’re dealing with angular frequencies.] Hence, according to quantum theory, we have an oscillator with equally spaced energy levels, and the difference between them is ħω. Now, ħω is terribly tiny—but it’s there. Let me visualize what I just wrote:


So our expression for P1/P0 becomes P1/P0 = e−ħω/kT/e−0/kT = e−ħω/kT. More generally, we have Pi/P0 = e−i·ħω/kT. So what? Well… We’ve got a function here which gives the chance of finding a molecule in state Pi relative to that of finding it in state E0, and it’s a function of temperature. Now, the graph below illustrates the general shape of that function. It’s a bit peculiar, but you can see that the relative probability goes up and down with temperature. The graph makes it clear that, at extremely low temperatures, most particles will be in state E0 and, of course, the internal energy of our body of gas will be close to nil.


Now, we can look at the oscillators in the bottom state (i.e. particles in the molecular energy state E0) as being effectively ‘frozen’: they don’t contribute to the specific heat. However, as we increase the temperature, our molecules gradually begin to have an appreciable probability to be in the second state, and then in the next state, and so on, and so the internal energy of the gas increases effectively. Now, when the probability is appreciable for many states, the quantized states become nearly indistinguishable and, hence, the situation is like classical physics: it is nearly indistinguishable from a continuum of energies.

Now, while you can imagine such analysis should explain why the specific heat ratio for oxygen and hydrogen varies as it does in the very first graph of this post, you can also imagine the details of that analysis fill quite a few pages! In fact, even Feynman doesn’t include it in his Lectures. What he does include is the analysis of the blackbody radiation problem, which is remarkably similar. So… Well… For more details on that, I’ll refer you to Feynman indeed. :-)

I hope you appreciated this little ‘lecture’, as it sort of wraps up my ‘series’ of posts on statistical mechanics, thermodynamics and, central to both, the classical theory of gases. Have fun with it all!

The Quantum-Mechanical Gas Law

Entropy, energy and enthalpy

Phew! I am quite happy I got through Feynman’s chapters on thermodynamics. Now is a good time to review the math behind it. We thoroughly understand the gas equation now:

PV = NkT = (γ–1)U

The gamma (γ) in this equation is the specific heat ratio: it’s 5/3 for ideal gases (so that’s about 1.667) and, theoretically, 4/3 ≈ 1.333 or 9/7 ≈ 1.286 for diatomic gases, depending on the degrees of freedom we associate with diatomic molecules. More complicated molecules have even more degrees of freedom and, hence, can absorb even more energy, so γ gets closer to one—according to the kinetic gas theory, that is. While we know that the kinetic gas theory is not quite accurate – an approach involving molecular energy states is a better match for reality – that doesn’t matter here. As for the term (specific heat ratio), I’ll explain that later. [I promise. :-) You’ll see it’s quite logical.]

The point to note is that this body of gas (or whatever substance) stores an amount of energy U that is directly proportional to the temperature (T), and Nk/(γ–1) is the constant of proportionality. We can also phrase it the other way around: the temperature is directly proportional to the energy, with (γ–1)/Nk the constant of proportionality. It means temperature and energy are in a linear relationship. [Yes, direct proportionality implies linearity.] The graph below shows the T = [(γ–1)/Nk]·U relationship for three different values of γ, ranging from 5/3 (i.e. the maximum value, which characterizes monatomic noble gases such as helium, neon or krypton) to a value close to 1, which is characteristic of more complicated molecular arrangements indeed, such as heptane (γ = 1.06) or methyl butane ((γ = 1.08). The illustration shows that, unlike monatomic gas, more complicated molecular arrangements allow the gas to absorb a lot of (heat) energy with a relatively moderate rise in temperature only.

CaptureWe’ll soon encounter another variable, enthalpy (H), which is also linearly related to energy: H = γU. From a math point of view, these linear relationships don’t mean all that much: they just show these variables – temperature, energy and enthalphy – are all directly related and, hence, can be defined in terms of each other.

We can invent other variables, like the Gibbs energy, or the Helmholtz energy. In contrast, entropy, while often being mentioned as just some other state function, is something different altogether. In fact, the term ‘state function’ causes a lot of confusion: pressure and volume are state variables too. The term is used to distinguish these variables from so-called process functions, notably heat and work. Process functions describe how we go from one equilibrium state to another, as opposed to the state variables, which describe the equilibrium situation itself. Don’t worry too much about the distinction—for now, that is.

Let’s look at non-linear stuff. The PV = NkT = (γ–1)U says that pressure (P) and volume (V) are inversely proportional one to another, and so that’s a non-linear relationship. [Yes, inverse proportionality is non-linear.] To help you visualize things, I inserted a simple volume-pressure diagram below, which shows how pressure and volume are related for three different values of U (or, what amounts to the same, three different values of T).

graph 2

The curves are simple hyperbolas which have the x- and y-axis as horizontal and vertical asymptote respectively. If you’ve studied social sciences (like me!) – so if you know a tiny little bit of the ‘dismal science’, i.e. economics (like me!) – you’ll note they look like indifference curves. The x- and y-axis then represent the quantity of some good X and some good Y respectively, and the curves closer to the origin are associated with lower utility. How much X and Y we will buy then, depends on (a) their price and (b) our budget, which we represented by a linear budget line tangent to the curve we can reach with our budget, and then we are a little bit happy, very happy or extremely happy, depending on our budget. Hence, our budget determines our happiness. From a math point of view, however, we can also look at it the other way around: our happiness determines our budget. [Now that‘s a nice one, isn’t it? Think about it! :-) And, in the process, think about hyperbolas too: the y = 1/x function holds the key to understanding both infinity and nothingness. :-)]

U is a state function but, as mentioned above, we’ve got quite a few state variables in physics. Entropy, of course, denoted by S—and enthalpy too, denoted by H. Let me remind you of the basics of the entropy concept:

  1. The internal energy U changes because (a) we add or remove some heat from the system (ΔQ), (b) because some work is being done (by the gas on its surroundings or the other way around), or (c) because of both. Using the differential notation, we write: dU = dQ – dW, always. The (differential) work that’s being done is PdV. Hence, we have dU = dQ – PdV.
  2. When transferring heat to a system at a certain temperature, there’s a quantity we refer to as the entropy. Remember that illustration of Feynman’s in my post on entropy: we go from one point to another on the temperature-volume diagram, taking infinitesimally small steps along the curve, and, at each step, an infinitesimal amount of work dW is done, and an infinitesimal amount of entropy dS = dQ/T is being delivered.
  3. The total change in entropy, ΔS, is a line integral: ΔS = ∫dQ/T = ∫dS.

That’s somewhat tougher to understand than economics, and so that’s why it took me more time to come with terms with it. :-) Just go through Feynman’s Lecture on it, or through that post I referenced above. If you don’t want to do that, then just note that, while entropy is a very mysterious concept, it’s deceptively simple from a math point of view: ΔS = ΔQ/T, so the (infinitesimal) change in entropy is, quite simply, the ratio of (1) the (infinitesimal or incremental) amount of heat that is being added or removed as the system goes from one state to another through a reversible process and (2) the temperature at which the heat is being transferred. However, I am not writing this post to discuss entropy once again. I am writing it to give you an idea of the math behind the system.

So dS = dQ/T. Hence, we can re-write dU = dQ – dW as:

dU = TdS – PdV ⇔ dU + d(PV) = TdS – PdV + d(PV)

⇔ d(U + PV) = dH = TdS – PdV + PdV + VdP = TdS + VdP

The U + PV quantity on the left-hand side of the equation is the so-called enthalpy of the system, which I mentioned above. It’s denoted by H indeed, and it’s just another state variable, like energy: same-same but different, as they say in Asia. We encountered it in our previous post also, where we said that chemists prefer to analyze the behavior of substances using temperature and pressure as ‘independent variables’, rather than temperature and volume. Independent variables? What does that mean, exactly?

According to the PV = NkT equation, we only have two independent variables: if we assign some value to two variables, we’ve got a value for the third one. Indeed, remember that other equation we got when we took the total differential of U. We wrote U as U(V, T) and, taking the total differential, we got:

dU = (∂U/∂T)dT + (∂U/∂V)dV

We did not need to add a (∂U/∂P)dP term, because the pressure is determined by the volume and the temperature. We could also have written U = U(P, T) and, therefore, that dU = (∂U/∂T)dT + (∂U/∂P)dP. However, when working with temperature and pressure as the ‘independent’ variables, it’s easier to work with H rather than U. The point to note is that it’s all quite flexible really: we have two independent variables in the system only. The third one (and all of the other variables really, like energy or enthalpy or whatever) depend on the other two. In other words, from a math point of view, we only have two degrees of freedom in the system here: only two variables are actually free to vary. :-)

Let’s look at that dH = TdS + VdP equation. That’s a differential equation in which not temperature and pressure, but entropy (S) and pressure (P) are ‘independent’ variables, so we write:

dH(S, P) = TdS + VdP

Now, it is not very likely that we will have some problem to solve with data on entropy and pressure. At our level of understanding, any problem that’s likely to come our way will probably come with data on more common variables, such as the heat, the pressure, the temperature, and/or the volume. So we could continue with the expression above but we don’t do that. It makes more sense to re-write the expression substituting TdS for dQ once again, so we get:

dH = dQ + VdP

That resembles our dU = dQ – PdV expression: it just substitutes V for –P. And, yes, you guessed it: it’s because the two expressions resemble each other that we like to work with H now. :-) Indeed, we’re talking the same system and the same infinitesimal changes and, therefore, we can use all the formulas we derived already by just substituting H for U, V for –P, and dP for dV. Huh? Yes. It’s a rather tricky substitution. If we switch V for –P (or vice versa) in a partial derivative involving T, we also need to include the minus sign. However, we do not need to include the minus sign when substituting dV and dP, and we also don’t need to change the sign of the partial derivatives of U and H when going from one expression to another! It’s a subtle and somewhat weird point, but a very important one! I’ll explain it in a moment. Just continue to read as for now. Let’s do the substitution using our rules:

dU = (∂Q/∂T)VdT + [T(∂P/∂T)V − P]dV becomes:

dH = (∂Q/∂T)PdT + (∂H/∂P)TdP = CPdT + [–T·(∂V/∂T)P + V]dP

Note that, just as we referred to (∂Q/∂T)as the specific heat capacity of a substance at constant volume, which we denoted by CV, we now refer to (∂Q/∂T)P as the specific heat capacity at constant pressure, which we’ll denote, logically, as CP. Dropping the subscripts of the partial derivatives, we re-write the expression above as:

dH = CPdT + [–T·(∂V/∂T) + V]dP

So we’ve got what we wanted: we switched from an expression involving derivatives assuming constant volume to an expression involving derivatives assuming constant pressure. [In case you wondered what we wanted, this is it: we wanted an equation that helps us to solve another type of problem—another formula for a problem involving a different set of data.]

As mentioned above, it’s good to use subscripts with the partial derivatives to emphasize what changes and what is constant when calculating those partial derivatives but, strictly speaking, it’s not necessary, and you will usually not find the subscripts when googling other texts. For example, in the Wikipedia article on enthalpy, you’ll find the expression written as:

dH = CPdT + V(1–αT)dP with α = (1/V)(∂V/∂T)

Just write it all out and you’ll find it’s the same thing, exactly. It just introduces another coefficient, α, i.e. the coefficient of (cubic) thermal expansion. If you find this formula is easier to remember, then please use this one. It doesn’t matter.

Now, let’s explain that funny business with the minus signs in the substitution. I’ll do so by going back to that infinitesimal analysis of the reversible cycle in my previous post, in which we had that formula involving ΔQ for the work done by the gas during an infinitesimally small reversible cycle: ΔW = ΔVΔP = ΔQ·(ΔT/T). Now, we can either write that as:

  1. ΔQ = T·(ΔP/ΔT)·ΔV = dQ = T·(∂P/∂T)V·dV – which is what we did for our analysis of (∂U/∂V)or, alternatively, as
  2. ΔQ = T·(ΔV/ΔT)·ΔP = dQ = T·(∂V/∂T)P·dP, which is what we’ve got to do here, for our analysis of (∂H/∂P)T.

Hence, dH = dQ + VdP becomes dH = T·(∂V/∂T)P·dP + V·dP, and dividing all by dP gives us what we want to get: dH/dP = (∂H/∂P)= T·(∂V/∂T)+ V.

[…] Well… NO! We don’t have the minus sign in front of T·(∂V/∂T)P, so we must have done something wrong or, else, that formula above is wrong.

The formula is right (it’s in Wikipedia, so it must be right :-)), so we are wrong. Indeed! The thing is: substituting dT, dV and dP for ΔT, ΔV and ΔP is somewhat tricky. The geometric analysis (illustrated below) makes sense but we need to watch the signs.

Carnot 2

We’ve got a volume increase, a temperature drop and, hence, also a pressure drop over the cycle: the volume goes from V to V+ΔV (and then back to V, of course), while the pressure and the temperature go from P to P–ΔP and T to T–ΔT respectively (and then back to P and T, of course). Hence, we should write: ΔV = dV, –ΔT = dT, and –ΔP = dP. Therefore, as we replace the ratio of the infinitesimal change of pressure and temperature, ΔP/ΔT, by a proper derivative (i.e. ∂P/∂T), we should add a minus sign: ΔP/ΔT = –∂P/∂T. Now that gives us what we want: dH/dP = (∂H/∂P)= –T·(∂V/∂T)+ V, and, therefore, we can, indeed, write what we wrote above:

dU = (∂Q/∂T)VdT + [T(∂P/∂T)V − P]dV becomes:

dH = (∂Q/∂T)PdT + [–T·(∂V/∂T)P + V]dP = CPdT + [–T·(∂V/∂T)P + V]dP

Now, in case you still wonder: what’s the use of all these different expressions stating the same? The answer is simple: it depends on the problem and what information we have. Indeed, note that all derivatives we use in our expression for dH expression assume constant pressure, so if we’ve got that kind of data, we’ll use the chemists’ representation of the system. If we’ve got data describing performance at constant volume, we’ll need the physicists’ formulas, which are given in terms of derivatives assuming constant volume. It all looks complicated but, in the end, it’s the same thing: the PV = NkT equation gives us two ‘independent’ variables and one ‘dependent’ variable. Which one is which will determine our approach.

Now, we left one thing unexplained. Why do we refer to γ as the specific heat ratio? The answer is: it is the ratio of the specific heat capacities indeed, so we can write:

γ = CP/CV

However, it is important to note that that’s valid for ideal gases only. In that case, we know that the (∂U/∂V)derivative in our dU = (∂U/∂T)VdT + (∂U/∂V)TdV expression is zero: we can change the volume, but if the temperature remains the same, the internal energy remains the same. Hence, dU = (∂U/∂T)VdT = CVdT, and dU/dT = CV. Likewise, the (∂H/∂P)T derivative in our dH = (∂H/∂T)PdT + (∂H/∂P)TdP expression is zero—for ideal gases, that is. Hence, dH = (∂H/∂T)PdT = CPdT, and dH/dT = CP. Hence,

CP/CV = (dH/dT)/(dU/dT) = dH/dU

Does that make sense? If dH/dU = γ, then H must be some linear function of U. More specifically, H must be some function H = γU + c, with c some constant (it’s the so-called constant of integration). Now, γ is supposed to be constant too, of course. That’s all perfectly fine: indeed, combining the definition of H (H = U + PV), and using the PV = (γ–1)U relation, we have H = U + (γ–1)U = γU (hence, c = 0). So, yes, dH/dU = γ, and γ = CP/CV.

Note the qualifier, however: we’re assuming γ is constant (which does not imply the gas has to be ideal, so the interpretation is less restrictive than you might think it is). If γ is not a constant, it’s a different ballgame. […] So… Is γ actually constant? The illustration below shows γ is not constant for common diatomic gases like hydrogen or (somewhat less common) oxygen. It’s the same for other gases: when mentioning γ, we need to state the temperate at which we measured it too. :-(  However, the illustration also shows the assumption of γ being constant holds fairly well if temperature varies only slightly (like plus or minus 100° C), so that’s OK. :-)

Heat ratio

I told you so: the kinetic gas theory is not quite accurate. An approach involving molecular energy states works much better (and is actually correct, as it’s consistent with quantum theory). But so we are where we are and I’ll save the quantum-theoretical approach for later. :-)

So… What’s left? Well… If you’d google the Wikipedia article on enthalphy in order to check if I am not writing nonsense, you’ll find it gives γ as the ratio of H and U itself: γ = H/U. That’s not wrong, obviously (γ = H/U = γU/U = γ), but that formula doesn’t really explain why γ is referred to as the specific heat ratio, which is what I wanted to do here.

OK. We’ve covered a lot of ground, but let’s reflect some more. We did not say a lot about entropy, and/or the relation between energy and entropy. Too bad… The relationship between entropy and energy is obviously not so simple as between enthalpy and energy. Indeed, because of that easy H = γU relationship, enthalpy emerges as just some auxiliary variable: some temporary variable we need to calculate something. Entropy is, obviously, something different. Unlike enthalpy, entropy involves very complicated thinking, involving (ir)reversibility and all that. So it’s quite deep, I’d say – but I’ll write more about that later. I think this post has gone as far as it should. :-)

Entropy, energy and enthalpy

Is gas a reversible engine?

We’ve worked on very complicated matters in the previous posts. In this post, I am going to tie up a few loose ends, not only about the question in the title but also other things. Let me first review a few concepts and constructs.


We’ve talked a lot about temperature, but what it is really? You have an answer ready of course: it is the mean kinetic energy of the molecules of a gas or a substance. You’re right. To be precise, it is the mean kinetic energy of the center-of-mass (CM) motions of the gas molecules.

The added precision in the definition above already points out temperature is not just mean kinetic energy or, to put it differently, that the concept of mean kinetic energy itself is not so simple when we are not talking ideal gases. So let’s be precise indeed. First, let me jot down the formula for the mean kinetic energy of the CM motions of the gas particles:

(K.E.)CM = <(1/2)·mv2>

Now let’s recall the most fundamental law in the kinetic theory of gases, which states that the mean value of the kinetic energy for each independent direction of motion will be equal to kT/2. [I know you know the kinetic theory of gases itself is not accurate – we should be talking about molecular energy states – but let’s go along with it.] Now, because we have only three independent directions of motions (the x, y and z directions) for ideal gas molecules (or atoms, I should say), the mean kinetic energy of the gas particles is kT/2 + kT/2 + kT/2 = 3kT/2.

What’s going on here is that we are actually defining temperature here: we basically say that the kinetic energy is linearly proportional to something that we define as the temperature. For practical reasons, that constant of proportionality is written as 3k/2, with k the Boltzmann constant. So we write our definition of temperature as:

(K.E.)CM = 3kT/2 ⇔ T = (3k/2)–1<(1/2)·mv2> = [2/(3k)]·(K.E.)CM

What happens with temperature when considering more complex gases, such as diatomic gases? Nothing. The temperature will still be proportional to the kinetic energy of the center-of-mass motions, but we should just note it’s the (K.E.)CM of the whole diatomic molecule, not of the individual atoms. The thing with more complicated arrangements is that, when adding or removing heat, we’ve got something else going on too: part of the energy will go into the rotational and vibratory motions inside the molecule, which is why we’ll need to add a lot more heat in order to achieve the same change in temperature or, vice versa, we’ll be able to extract a lot more heat out of the gas – as compared to an ideal gas, that is – for the same drop in temperature. [When talking molecular energy states, rather than independent directions of motions, we’re saying the same thing: energy does not only go in center-of-mass motion but somewhere else too.]

You know the ideal gas law is based on the reasoning above and the PV = NkT equation, which is always valid. For ideal gases, we write:

PV = NkT = Nk(3k/2)–1<(1/2)mv2> = (2/3)N<(1/2)·mv2> = (2/3)·U

For diatomic gases, we have to use another coefficient. According to our theory above, which distinguishes 6 independent directions of motions, the mean kinetic energy is twice 3kT/2 now, so that’s 3kT, and, hence, we write: T = (3k)–1<K.E.> =

PV = NkT = Nk(3k)–1<K.E.> = (1/3)·U

The two equations above will usually be written as PV = (γ–1)U, so γ, which is referred to as the specific heat ratio, would be equal 5/3 ≈ 1.67 for ideal gases and 4/3 ≈ 1.33 for diatomic gases. [If you read my previous posts, you’ll note I used 9/7 ≈ 1.286, but that’s because Feynman suddenly decides to add the potential energy of the oscillator as another ‘independent direction of motion’.]

Now, if we’re not adding or removing heat to/from the gas, we can do a differential analysis yielding a differential equation (what did you expect?), which we can then integrate to find that P = C/Vγ relationship. You’ve surely seen it before. The C is some constant related to the energy and/or the state of the gas. It is actually interesting to plot the pressure-volume relationship using that P = C/Vγ relationship for various values of γ. The blue graph below assumes γ = 5/3 ≈ 1.667, which is the theoretical value for ideal gases (γ for helium or krypton comes pretty close to that), while the red graph gives the same relationship for γ = 4/3 ≈ 1.33, which is the theoretical value for diatomic gases (gases like bromine and iodine have a γ that’s close to that).

graph 1

Let me repeat that this P = C/Vγ relationship is only valid for adiabatic expansion or compression: we do not add or remove heat and, hence, this P = C/Vγ function gives us the adiabatic segments only in a Carnot cycle (i.e. the adiabatic lines in a pressure-volume diagram). Now, it is interesting to observe that the slope of the adiabatic line for the ideal gas is more negative than the slope of the adiabatic line for the diatomic gas: the blue curve is the steeper one. That’s logical: for the same volume change, we should get a bigger drop in pressure for the ideal gas, as compared to the diatomic gas, because… Well… You see the logic, don’t you?

Let’s freewheel a bit and see what it implies for our Carnot cycle.

Carnot engines with ideal and non-ideal gas

We know that, if we could build an ideal frictionless gas engine (using a cylinder with a piston or whatever other device we can think of), its efficiency will be determined by the amount of work it can do over a so-called Carnot cycle, which consists of four steps: (1) isothermal expansion (gas absorbs heat and the volume expands at constant temperature), (2) adiabatic expansion (the volume expands while the temperature drops), (3) isothermal compression (the volume decreases at constant temperature, so heat is taken out), and (4) isothermal compression (the volume decreases as we bring the gas back to the same temperature).

Capture Carnot cycle graph

It is important to note that work is being done, by the gas on its surroundings, or by the surroundings on the gas, during each step of the cycle: work is being done by the gas as it expands, always, and work is done on the gas as it is being compressed, always.

You also know that there is only one Carnot efficiency, which is defined as the ratio of (a) the net amount of work we get out of our machine in one such cycle, which we’ll denote by W, and (b) the amount of heat we have to put in to get it (Q1). We’ve also shown that W is equal to the difference between the heat we put during the first step (isothermal expansion) and the heat that’s taken out in the third step (isothermal compression): W = Q1 − Q2, which basically means that all heat is converted into useful work—which is why it’s an efficient engine! We also know that the formula for the efficiency is given by:

W/Q1 = (T1 − T2)/T1.

Where’s Q2 in this formula? It’s there, implicitly, as the efficiency of the engine depends on T2. In fact, that’s the crux of the matter: for efficient engines, we also have the same Q1/T= Q2/Tratio, which we define as the entropy S = Q1/T= Q2/T2. We’ll come back to this.

Now how does it work for non-ideal gases? Can we build an equally efficient engine with actual gases? This was, in fact, Carnot’s original question, and we haven’t really answered it in our previous posts, because we weren’t quite ready for it. Let’s consider the various elements to the answer:

  1. Because we defined temperature the way we defined it, it is obvious that the gas law PV = NkT still holds for diatomic gases, or whatever gas (such as steam vapor, for example, the stuff which was used in Carnot’s time). Hence, the isothermal lines in our pressure-volume diagrams don’t change. For a given temperature T, we’ll have the same green and red isothermal line in the diagram above.
  2. However, the adiabatic lines (i.e .the blue and purple lines in the diagram above) for the non-ideal gas are much flatter than the one for an ideal gas. Now, just take that diagram and draw two flatter curves through point a and c indeed—but not as flat as the isothermal segments, of course! What you’ll notice is that the area of useful work becomes much smaller.

What does that imply in terms of efficiency? Well… Also consider the areas under the graph which, as you know, represent the amount of work done during each step (and you really need to draw the graph here, otherwise you won’t be able to follow my argument):

  1. The phase of isothermal expansion will be associated with a smaller volume change, because our adiabatic line for the diatomic gas intersects the T = T1 isothermal line at a smaller value for V. Hence, less work is being done during that stage.
  2. However, more work will be done during adiabatic expansion, and the associated volume change is also larger.
  3. The isothermal compression phase is also associated with a smaller volume change, because our adiabatic line for the diatomic gas intersects the T = T2 isothermal line at a larger value for V.
  4. Finally, adiabatic compression requires more work to be done to get from T2 to Tagain, and the associated volume change is also larger.

The net result is clear from the graph: the net amount of work that’s being done over the complete cycle is less for our non-ideal gas than as compared to our engine working with ideal gas. But, again, the question here is what it implies in terms of efficiency? What about the W/Q1 ratio?

The problem is that we cannot see how much heat is being put in (Q1) and how much heat is being taken out (Q2) from the graph. The only thing we know is that we have an engine working here between the same temperature T1 to T2. Hence, if we use subscript A for the ideal gas engine and subscript B for the one working with ordinary (i.e. non-ideal) gas, and if both engines are to have the same efficiency W/Q= WB/Q1= WA/Q1A, then it’s obvious that,

if W> WB, then Q1A > Q1B.

Is that consistent with what we wrote above for each of the four steps? It is. Heat energy is taken in during the first step only, as the gas expands isothermally. Now, because the temperature stays the same, there is no change in internal energy, and that includes no change in the internal vibrational and rotational energy. All of the heat energy is converted into work. Now, because the volume change is less, the work will be less and, hence, the heat that’s taken in must also be less. The same goes for the heat that’s being taken out during the third step, i.e. the isothermal compression stage: we’ve got a smaller volume change here and, hence, the surroundings of the gas do less work, and a lesser amount of heat energy is taken out.

So what’s the grand conclusion? It’s that we can build an ideal gas engine working between the same temperature T1 and T1, and with exactly the same efficiency and W/Q1 = (T1 − T2)/Tusing non-ideal gas. Of course, there must be some difference! You’re right: there is. While the ordinary gas machine will be as efficient as the ideal gas machine, it will not do the same amount of work. The key to understanding this is to remember that efficiency is a ratio, not some absolute number.  Let’s go through it. Because their efficiency is the same, we know that the W/Q1 ratios for both engines (A and B) is the same and, hence, we can write:


What about the entropy? The entropy S = Q1A/T1 = Q2A/T2 is not the same for both machines. For example, if the engine with ideal gas (A) does twice the work of the engine with ordinary gas (B), then Q1A will also be twice the amount Q1B. Indeed, SA = Q1A /T1 and SB = Q1B/T1. Hence, SA/SB = Q1A/Q1B. For example, if Q1A = 2·Q1B, then engine A’s entropy will also be twice that of engine B. [Now that we’re here, I should also note you’ll have the same ratio for Q2A. Indeed, we know that, for an efficient machine, we have: Q1/T= Q2/T2. Hence, Q1A/Q2A = T1/T2 and Q1B/Q2B = T1/T2. So Q1A/Q2= Q1B/Q2and, therefore, So Q1A/Q1= Q2A/Q2B.]

Why would the entropy be any different? We’ve got the same number of particles, the same volume and the same working temperatures, and so the only difference is that the particles in engine B are diatomic: the molecules consist of two atoms, rather than one only. An intuitive answer to the question as to why the entropy is different can be given by comparing it to another example, which I mentioned in a previous post, for which the entropy is also different fro some non-obvious reason. Indeed, we can think of the two atoms as the equivalent of the white and black particles in the box (see my previous post on entropy): if we allow the white and black particles to mix in the same volume, rather than separate them in two compartments, then the entropy goes up (we calculated the increase as equal to k·ln2). Likewise, the entropy is much lower if all particles have to come in pairs, which is the case for a diatomic gas. Indeed, if they have to come in pairs, we significantly reduce the number of ways all particles can be arranged, or the ‘disorder’, so to say. As the entropy is a measure of that number (one can loosely define entropy as the logarithm of the number of ways), the entropy must go down as well. Can we illustrate that using the ΔS = Nkln(V2/V1) formula we introduced in our previous post, or our more general S(V, T) = Nk[lnV + (1/γ-1)lnT] + a formula? Maybe. Let’s give it a try.

We know that our diatomic molecules have an average kinetic energy equal to 3kT/2. Well… Sorry. I should be precise: that’s the kinetic energy of their center-of-mass motion only! Now, let us suppose all our diatomic molecules spit up. We know the average kinetic energy of the constituent parts will also equal 3kT/2. Indeed, if a gas molecule consists of two atoms (let’s just call them atom A and B respectively), and if their combined mass is M = mA + mB, we know that:

<mAvA2/2> = <mBvB2/2> = <MvCM2/2> = 3kT/2

Hence, if they split, we’ll have twice the number of particles (2N) in the same volume with the same average kinetic energy: 3kT/2. Hence, we double the energy, but the average kinetic energy of the particles is the same, so the temperature should be the same. Hmm… You already feel something is wrong here… What about the energy that we associated with the internal motions within the molecule, i.e. the internal rotational and vibratory motions of the atoms, when they were still part of the same molecule? That was also equal to 3kT/2, wasn’t it? It was. Yes. In case you forgot why, let me remind you: the total energy is the sum of the (average) kinetic energy of the two atoms, so that’s <mAvA2/2> + <mBvB2/2> = 3kT/2 + 3kT/2 = 3kT. Now, that sum is also equal to the sum of the center-of-mass motion (which is 3 kT/2) and the average kinetic energy of the rotational and vibratory motions. Hence, the average kinetic energy of the rotational and vibratory motions is 3kT – 3 kT/2 = 3 kT/2. It’s all part of the same theorem: the average kinetic energy for each independent direction of motion is kT/2, and the number of degrees of freedom for a molecule consisting of r atoms is 3, because each atom can move in three directions. Rotation involves another two independent motions (in three dimensions, we’ve got two axes of rotation only), and vibration another one. So the kinetic energy going into rotation is kT/2 + kT/2 = kT and for vibration it’s kT/2. Adding all yields 3kT/2 + kT + kT/2 = 3kT.

The arithmetic is quite tricky. Indeed, you may think that, if we split the molecule, that the rotational and vibratory energy has to go somewhere, and that it is only natural to assume that, when we spit the diatomic molecule, the individual atoms have to absorb it. Hence, you may think that the temperature of the gas will be higher. How much higher? We had an average energy of 3kT per molecule in the diatomic situation, but so now we have twice as many particles, and hence, the average energy per particle now is… Re-read what I wrote above: it’s just 3kT/2 again. The energy that’s associated with the center-of-mass motions and the rotational and vibratory motions is not something extra: it’s part of the average kinetic energy of the atoms themselves. So no rise in temperature!

Having said that, our PV = NkT = (2/3)U equation obviously doesn’t make any sense anymore, as we’ve got twice as many particles now. While the temperature has not gone up, both the internal energy and the pressure have doubled, as we’ve got twice as many particles hitting the walls of our cylinder now. To restore the pressure to its ex ante value, we need to increase the volume. Remember, however, that pressure is force per unit surface area, not per volume unit: P = F/A. So we don’t have to double the volume: we only have to double the surface area. Now, it all depends on the shape of the volume: are we thinking of a box or of some sphere? One thing we know though: if we calculate the volume using some radius r, which may also be the length of the edge of a cube, then we know the volume is going to be proportional to r3, while the surface area is going to be proportional to r2. Hence, the ratio between the surface area and the volume is going to be proportional to r2/r3 = r2/3. So that’s another 2/3 ratio which pops us here, as an exponent this time. It’s not a coincidence, obviously.

Hmm… Interesting exercise. I’ll let you work it out. I am sure you’ll find some sensible value for the new volume, so you should able to use that ΔS = Nkln(V2/V1) formula. However, you also need to think about the comparability of the two situations. We wanted to compare two equal volumes with an equal number of particles (diatomic molecules versus atoms), and so you’ll need to move back in that direction to get a final answer to your question. Please do mail me the answer: I hope it makes sense. :-)

Inefficient engines

When trying to understand efficient engines, it’s interesting to also imagine how inefficient engines work, so as to see what they imply for our Carnot diagram. Suppose we’ve tried to build a Carnot engine in our kitchen, and we end up with one that is fairly frictionless, and fairly well isolated, so there is little heat loss during the heat transfer steps. We also have good contact surfaces so we think the the heat transfer processes will also be fairly frictionless, so to speak. So we did our calculations and built the engine using the best kitchen design and engineering practices. Now it’s the time for the test. Will it work?

What might happen is the following: while we’ve designed the engine to get some net amount of work out of it (in each and every cycle) that is given by the isothermic and adiabatic lines below, we may find that we’re not able to keep the temperature constant. So we try to follow the green isothermic line alright, but we can’t. We may also find that, when our heat counter tells us we’ve put Q1 in already, that our piston hasn’t moved out quite as far we thought it would. So… Damn, we’re never going to get to c. What’s the reason? Some heat loss, because our isolation wasn’t perfect, and friction.

Inefficient engine

So we’re likely to have followed an actual path that’s closer to the red arrow, which brings us near point d. So we’ve missed point c. We have no choice, however: the temperature has dropped to T2 and, hence, we need to start with the next step. Which one? The second? The third? It’s not quite clear, because our actual path on the pressure-volume diagram doesn’t follow any of our ideal isothermal or adiabatic lines. What to do? Let’s just take some heat out and start compressing to see what happens. If we’ve followed a path like the red arrow, we’re likely to be on something like the black arrow now. Indeed, if we’ve got a problem with friction or heat loss, we’ll continue to have that problem, and so the temperature will drop much faster than we think it should, and so we will not have the expected volume decrease. In fact, we’re not able to maintain the temperature even at T2. What horror! We can’t repeat our process and, hence, it is surely not reversible! All our work for nothing! We have to start all over and re-examine our design.

So our kitchen machine goes nowhere. But then how do actual engines work? The answer is: they put much more heat in, and they also take much more heat out. More importantly, they’re also working much below the theoretical efficiency of an ideal engine, just like our kitchen machine. So that’s why we’ve got the valves and all that in a steam engine. Also note that a car engine works entirely different: it converts chemical energy into heat energy by burning fuel inside of the cylinder. Do we get any useful work out? Of course! My Lamborghini is fantastic. :-) Is it efficient? Nope. We’re converting huge amounts of heat energy into a very limited amount of useful work, i.e. the type of energy we need to drive the wheels of my car, or a dynamo. Actual engines are a shadow only of ideal engines. So what’s the Carnot cycle really? What does it mean in practice? Does the mathematical model have any relevance at all?

The Carnot cycle revisited

Let’s look at those differential equations once again. [Don’t be scared by the concept of a differential equation. I’ll come back to it. Just keep reading.] Let’s start with the ΔU = (∂U/∂T)ΔT + (∂U/∂V)ΔV equation, which mathematical purists would probably prefer to write as:

dU = (∂U/∂T)dT + (∂U/∂V)dV

I find Feynman’s use of the Δ symbol more appropriate, because, when dividing by dV or dT, we get dU/dV and dU/dt, which makes us think we’re dealing with ordinary derivatives here, and we are not: it’s partial derivatives that matter here. [I’ll illustrate the usefulness of distinguishing the Δ and d symbol in a moment.] Feynman is even more explicit about that as he uses subscripts for the partial derivatives, so he writes the equation above as:

ΔU = (∂U/∂T)VΔT+ (∂U/∂V)TΔV

However, partial derivatives always assume the other variables are kept constant and, hence, the subscript is not needed. It makes the notation rather cumbersome and, hence, I think it makes the analysis even more unreadable than it already is. In any case, it is obvious that we’re looking at a situation in which all changes: the volume, the temperature and the pressure. However, in the PV = NkT equation (which, I repeat, is valid for all gases, ideal or not, and in all situations, be it adiabatic or isothermal expansion or compression), we have only two independent variables for a given number of particles. We can choose: volume and temperature, or pressure and temperature, or volume and pressure. The third variable depends on the two other variables and, hence, is referred to as dependent. Now, one should not attach too much importance to the terms (dependent or independent does not mean more or less fundamental) but, when everything is said and done, we need to make a choice when approaching the problem. In physics, we usually look at the volume and the temperature as the ‘independent’ variables but the partial derivative notation makes it clear it doesn’t matter. With three variables, we’ll have three partial derivatives: ∂P/∂T, ∂V/∂T and ∂P/∂V, and their reciprocals ∂T/∂P, ∂T/∂V and ∂V/∂P too, of course!

Having said that, when calculating the value of derived variables like energy, or entropy, or enthalpy (which is a state variable used in chemistry), we’ll use two out of the three mentioned variables only, because the third one is redundant, so to speak. So we’ll have some formula for the internal energy of a gas that depends on temperature and volume only, so we write:

U = U(V, T)

Now, in physics, one will often only have a so-called differential equation for a variable, i.e. something that is written in terms of differentials and derivatives, so we’ll do that here too. But let me give some other example first. You may or may not remember that we had this differential equation telling us how the density (n = N/V) of the atmosphere changes with the height (h), as a function of the molecular mass (m), the temperature (T) and the density (n) itself: dn/dh = –(mg/kT)·n, with g the gravitational constant and k the Boltzmann constant. Now, it  is not always easy to go from a differential equation to a proper formula, but this one can be solved rather easily. Indeed, a function which has a derivative that is proportional to itself (that’s what this differential equation says really) is an exponential, and the solution was n = n0e–mgh/kT, with n0 some other constant (the density at h = 0, which can be chosen anywhere). This explicit formula for n says that the density goes down exponentially with height, which is what we would expect.

Let’s get back to our gas though. We also have differentials here, which are infinitesimally small changes in variables. As mentioned above, we prefer to write them with a Δ in front (rather than using the symbol)—i.e. we write ΔT, ΔU, ΔU, or ΔQ. When we have two variables only, say x and y, we can use the d symbol itself and, hence, write Δx and Δy as dx and dy. However, it’s still useful to distinguish, in order to write something like this:

Δy = (dy/dx)Δx

This says we can approximate the change in y at some point x when we know the derivative there. For a function in two variables, we can write the same, which is what we did at the very start of this analysis:

ΔU = (∂U/∂T)ΔT + (∂U/∂V)ΔV

Note that the first term assumes constant volume (because of the ∂U/∂T derivative), while the second assumes constant temperature (because of the ∂U/∂V derivative).

Now, we also have a second equation for ΔU, expressed in differentials only (so no partial derivatives here):


This equation basically states that the internal energy of a gas can change because (a) some heat is added or removed or (b) some work is being done by or on the gas as its volume gets bigger or smaller. Note the minus sign in front of PΔV: it’s there to ensure the signs come out alright. For example, when compressing the gas (so ΔV is negative), ΔU = – PΔV will be positive. Conversely, when letting the gas expand (so ΔV is positive), ΔU = – PΔV will be negative, as it should be.

What’s the relation between these two equations? Both are valid, but you should surely not think that, just because we have a ΔV in the second term of each equation, we can write –P = ∂U/∂V. No.

Having said that, let’s look at the first term of the ΔU = (∂U/∂T)ΔT + (∂U/∂V)ΔV equation and analyze it using the ΔU = ΔQ – PΔV equation. We know (∂U/∂T)ΔT assumes we keep the volume constant, so ΔV = 0 and, hence, ΔU = ΔQ: all the heat goes into changing the internal energy; none goes into doing some work. Therefore, we can write:

(∂U/∂T)ΔT = (∂Q/∂T)ΔT = CVΔT

You already know that we’ve got a name for that CV function (remember: a derivative is a function too!): it’s the (specific) heat capacity of the gas (or whatever substance) at constant volume. For ideal gases, CV is some constant but, remember, we’re not limiting ourselves to analyzing ideal gases only here!

So we’re done with the first term in that ΔU = (∂U/∂T)ΔT + (∂U/∂V)ΔV. Now it’s time for the second one: (∂U/∂V)ΔV. Now both ΔQ and –PΔV are relevant: the internal energy changes because (a) some heat is being added and (b) because the volume changes and, hence, some work is being done. You know what we need to find. It’s that weird formula:

∂U/∂V = T(∂P/∂T) – P

But how do we get there? We can visualize what’s going on as a tiny Carnot cycle. So we think of gas as an ideal engine itself: we put some heat in (ΔQ) which gets an isothermal expansion at temperature T going, during a tiny little instant, doing a little bit of work. But then we stop adding heat and, hence, we’ll have some tiny little adiabatic expansion, during which the gas keeps going and also does a tiny amount of work as it pushes against the surrounding gas molecules. However, this step involves an infinitesimally small temperature drop—just a little bit, to T–ΔT. And then the surrounding gas will start pushing back and, hence, we’ve got some isothermal compression going, at temperature T–ΔT, which is then followed, once again, by adiabatic compression as the temperature goes back to T. The last two steps involve the surroundings of the tiny little volume of gas we’re looking at, doing work on the gas, instead of the other way around.

Carnot 2 equivalence

You’ll say this sounds very fishy. It does, but it is Feynman’s analysis, so who am I to doubt it? You’ll ask: where does the heat go, and where does the work go? Indeed, if ΔQ is Q1, what about Q2? Also, we can sort of imagine that the gas can sort of store the energy of the work that’s being done during step 1 and 2, to then give (most of it) back during step 3 and 4, but what about the net work that’s being done in this cycle, which is (see the diagram) equal to W = Q1 – Q2 = ΔPΔV? Where does that go? In some kind of flywheel or something? Obviously not! Hmm… Not sure. In any case, Q1 is infinitesimally small and, hence, nearing zero. Q2 is even smaller, so perhaps we should equate it to zero and just forget about it. As for the net work done by the cycle, perhaps this may just go into moving the gas molecules in the equally tiny volume of gas we’re looking at. Hence, perhaps there’s nothing left to be transferred to the surrounding gas. In short, perhaps we should look at ΔQ as the energy that’s needed to do just one cycle.

Well… No. If gas is an ideal engine, we’re talking elastic collisions and, hence, it’s not like a transient, like something that peters out. The energy has to go somewhere—and it will. The tiny little volume we’re looking at will come back to its original state, as it should, because we’re looking at (∂U/∂V)ΔV, which implies we’re doing an analysis at constant temperature, but the energy we put in has got to go somewhere: even if Q2 is zero, and all of ΔQ goes into work, it’s still energy that has to go somewhere!

It does go somewhere, of course! It goes into the internal energy of the gas we’re looking at. It adds to the kinetic energy of the surrounding gas molecules. The thing is: when doing such infinitesimal analysis, it becomes difficult to imagine the physics behind. All is blurred. Indeed, if we’re talking a very small volume of gas, we’re talking a limited number of particles also and, hence, these particles doing work on other gas particles, or these particles getting warmer or colder as they collide with the surrounding body of gas, it all becomes more or less the same. To put it simply: they’re more likely to follow the direction of the red and black arrows in our diagram above. So, yes, the theoretical analysis is what it is: a mathematical idealization, and so we shouldn’t think that’s what actually going on in a gas—even if Feynman tries to think of it in that way. So, yes, I agree with some critics, but to a very limited extent only, who say that Feynman’s Lectures on thermodynamics aren’t the best in the Volume: it may be simpler to just derive the equation we need from some Hamiltonian or whatever other mathematical relationship involving state variables like entropy or what have you. However, I do appreciate Feynman’s attempt to connect the math with the physics, which is what he’s doing here. If anything, it’s sure got me thinking!

In any case, we need to get on with the analysis, so let’s wrap it up. We know the net amount of work that’s being done is equal to W = Q1(T1 – T2)/ T1 = ΔQ(ΔT/T). So that’s equal to ΔPΔV and, hence, we can write:

net work done by the gas = ΔPΔV = ΔQ(ΔT/T)

This implies ΔQ = T(ΔP/ΔT)ΔV. Now, looking at the diagram, we can appreciate ΔP/ΔT is equal to ∂P/∂T (ΔP is the change in pressure at constant volume). Hence, ΔQ = T(∂P/∂T)ΔV. Now we have to add the work, so that’s −PΔV. We get:

ΔU = ΔQ − PΔV = T(∂P/∂T)ΔV − PΔV ⇔ ΔU/ΔV = ∂U/∂V = T(∂P/∂T) − P

So… We are where we wanted to be. :-) It’s a rather surprising analysis, though. Is the Q2 = 0 assumption essential? It is, as part of the analysis of the analysis of the second term in the ΔU = (∂U/∂T)ΔT + (∂U/∂V)ΔV expression, that is. Make no mistake: the W = Q1(T1−T2)/ T1 = ΔQ(ΔT/T) formula is valid, always, and the Q2 is taken into account in it implicitly, because of the ΔT (which is defined using T2). However, if Q2 would not be zero, it would add to the internal energy without doing any work and, as such, it would be part of the first term in the ΔU = (∂U/∂T)ΔT + (∂U/∂V)ΔV expression: we’d have heat that is not changing the volume (and, hence, that is not doing any work) but that’s just… Well… Heat that’s just adding heat to the gas. :-)

To wrap everything up, let me jot down the whole thing now:

ΔU = (∂Q/∂T)·ΔT + [T(∂P/∂T) − P]·ΔV

Now, strangely enough, while we started off saying the second term in our ΔU expression assumed constant temperature (because of the ∂U/∂V derivative), we now re-write that second term using the ∂P/∂T derivative, which assumes constant volume! Now, our first term assumes constant volume too, and so we end up with an expression which assumes constant volume throughout! At the same time, we do have that ΔV factor of course, which implies we do not really assume volume is constant. On the contrary: the question we started off with was about how the internal energy changes with temperature and volume. Hence, the assumptions of constant temperature and volume only concern the partial derivatives that we are using to calculate that change!

Now, as for the model itself, let me repeat: when doing such analysis, it is very difficult to imagine the physics behind. All is blurred. When talking infinitesimally small volumes of gas, one cannot really distinguish between particles doing work on other gas particles, or these particles getting warmer or colder as they collide with them. It’s all the same. So, in reality, the actual paths are more like the red and black arrows in our diagram above. Even for larger volumes of gas, we’ve got a problem: one volume of gas is not thermally isolated from another and, hence, ideal gas is not some Carnot engine. A Carnot engine is this theoretical construct, which assumes we can nicely separate isothermal from adiabatic expansion/compression. In reality, we can’t. Even to get the isothermal expansion started, we need a temperature difference in order to get the energy flow going, which is why the assumption of frictionless heat transfer is so important. But what’s frictionless, and what’s an infinitesimal temperature difference? In the end, it’s a difference, right? So we already have some entropy increase: some heat (let’s say ΔQ) leaves the reservoir, which has temperature T, and enters the cylinder, which has to have a temperature that’s just-a-wee bit lower, let’s say T – ΔT. Hence, the entropy of the reservoir is reduced by ΔQ/T, and the entropy of the cylinder is increased by ΔQ/(T – ΔT). Hence, ΔS = ΔQ/(T–ΔT) –  ΔQ/T = ΔQΔT/[T(T–ΔT)].

You’ll say: sure, but then the temperature in the cylinder must go up to T and… No. Why? We don’t have any information on the volume of the cylinder here. We should also involve the time derivatives, so we should start asking questions like: how much power goes into the cylinder, so what’s the energy exchange per unit time here? The analysis will become endlessly more complicated of course – it may have played a role in Sadi Carnot suffering from “mania” and “general delirium” when he got older :-) – but you should arrive at the same conclusion: when everything is said and done, the model is what it is, and that’s a mathematical model of some ideal engine – i.e. an idea of a device we don’t find in Nature, and which we’ll never be able to actually build – that shows how we could, potentially, get some energy out of a gas when using some device build to do just that. As mentioned above, thinking in terms of actual engines – like steam engines or, worse, combustion engines – does not help. Not at all really: just try to understand the Carnot cycle as it’s being presented, and that’s usual a mathematical presentation, which is why textbooks always remind the reader to not take the cylinder and piston thing too literally.

Let me note one more thing. Apart from the heat or energy loss question, there’s another unanswered question: from what source do we take the energy to move our cylinder from one heat reservoir to the other? We may imagine it all happens in space so there’s no gravity and all that (so we do not really have to spend some force just holding it) but even then: we have to move it from one place to another, and so that involves some acceleration and deceleration and, hence, some force times a distance. In short, the conclusion is all the same: the reversible Carnot cycle does not really exist and entropy increases, always.

With this, you should be able to solve some practical problems, which should help you to get the logic of it all. Let’s start with one.

Feynman’s rubber band engine

Feynman’s rubber band engine shows the model is quite general indeed, so it’s not limited to some Carnot engine only. A rubber band engine? Yes. When we heat a rubber band, it does not expand: it contracts, as shown below.

rubber band engine

Why? It’s not very intuitive: heating a metal bar makes it longer, not shorter. It’s got to do with the fact that rubber consists of an enormous tangle of long chains of molecules: think of molecular spaghetti. But don’t worry about the details: just accept we could build an engine using the fact, as shown above. It’s not a very efficient machine (Feynman thinks he’d need heating lamps delivering 400 watts of power to lift a fly with it), but let’s apply our thermodynamic relations:

  1. When we heat the rubber band, it will pull itself in, thereby doing some work. We can write that amount of work as FD So that’s like -PΔV in our ΔU = ΔQ – PΔV equation, but not that F has a direction that’s opposite to the direction of the pressure, so we don’t have the minus sign.
  2. So here we can write: ΔU = ΔQ + FΔL.

So what? Well… We can re-write all of our gas equations by substituting –F for P and L for V, and they’ll apply! For example, when analyzing that infinitesimal Carnot cycle above, we found that ΔQ = T(∂P/∂T)ΔV, with ΔQ the heat that’s needed to change the volume by ΔV at constant temperature. So now we can use the above-mentioned substitution (P becomes –F and V becomes L) to calculate the heat that’s needed to change the length of the rubber band by ΔL at constant temperature: it is equal to ΔQ = –T(∂F/∂T)ΔL. The result may not be what we like (if we want the length to change significantly, we’re likely to need a lot of heat and, hence, we’re likely to end up melting the rubber), but it is what it is. :-)

As Feynman notes: the power of these thermodynamic equations is that we can apply them to very different situations than gas. Another example is a reversible electric cell, like a rechargeable storage battery. Having said that, the assumption that these devices are all efficient is a rather theoretical one and, hence, that constrains the usefulness of our equations significantly. Having said that, engineers have to start somewhere, and the efficient Carnot cycle is the obvious point of departure. It is also a theoretical reference point to calculate actual efficiencies of actual engines, of course.

Post scriptum: Thermodynamic temperature

Let me quickly say something about an alternative definition of temperature: it’s what Feynman refers to as the thermodynamic definition. It’s an equivalent to the kinetic definition really, but let me quickly show why. As we think about efficient engines, it would be good to have some reference temperature T2, so we can drop the subscripts and have our engines run between T and that reference temperature, which we’ll simply call ‘one degree’ (1°). The amount of heat that an ideal engine will deliver at that reference temperature is denoted by QS, so we can drop the subscript for Q1 and denote it, quite simply, as Q.

We’ve defined entropy as S = Q/T, so Q = ST and QS = S·1°. So what? Nothing much. Just note we can use the S = Q/T and QS = S×1° equations to define temperature in terms of entropy. This definition is referred to as the thermodynamic definition, and it is fully equivalent with our kinetic energy. It’s just a different approach. Feynman makes kind of a big deal out of this but, frankly, there’s nothing more to it.

Just note that the definition also works for our ideal engine with non-ideal gas: the amounts of heat involved for the engine with non-ideal gas, i.e. Q and QS, will be proportionally less than the Q and QS amounts for the reversible engine with ideal gas. Remember that Q1A/Q1= Q2A/Q2B equation, in case you’d have doubts.] Hence, we do not get some other thermodynamical temperature! All makes sense again, as it should! :-)

Is gas a reversible engine?

The Ideal versus the Actual Gas Law

In previous posts, we referred, repeatedly, to the so-called ideal gas law, for which we have various expressions. The expression we derived from analyzing the kinetics involved when individual gas particles (atoms or molecules) move and collide was P·V = N·k·T, in which the variables are P (pressure), V (volume), N (the number of particles in the given volume), T (temperature) and k (the Boltzmann constant). We also wrote it as P·V = (2/3)·U, in which U represents the total energy, i.e. the sum of the energies of all gas particles. We also said the P·V = (2/3)·U formula was only valid for monatomic gases, in which case U is the kinetic energy of the center-of-mass motion of the atoms.

In order to provide some more generality, the equation is often written as P·V = (γ–1)·U. Hence, for monatomic gases, we have γ = 5/3. For a diatomic gas, we’ll also have vibrational and rotational kinetic energy. As we pointed out in a previous post, each independent direction of motion, i.e. each degree of freedom in the system, will absorb an amount of energy equal to k·T/2. For monatomic gases, we have three independent directions of motion (x, y, z) and, hence, the total energy U = 3·k·T/2 = (2/3)·U.

Finally, when we’re considering adiabatic expansion/compression only – so when we do not add or remove any heat to/from to the gas – we can also write the ideal gas law as PVγ = C, with C some constant. [It is importqnt to note that this PVγ = C relation can be derived from the more general P·V = (γ–1)·U expression, but that the two expressions are not equivalent. Please have a look at the P.S. to this post on this, which shows how we get that PVγ = constant expression, and talks a bit about its meaning.]

So what’s the gas law for diatomic gas, like O2, i.e. oxygen? The key to the analysis of diatomic gases is, basically, a model which represents the oxygen molecule as two atoms connected by a spring, but with a force law that’s not as simplistic as Hooke’s law: we’re not looking at some linear force, but a force that’s referred to as a van der Waals force. The image below gives a vague idea of what that might imply. Remember: when moving an object in a force field, we change its potential energy, and the work done, as we move with or against the force, is equal to the change in potential energy. The graph below shows the force is anything but linear.

randomThe illustration above is a graph of potential energy for two molecules, but we can also apply it for the ‘spring’ model for two atoms within a single molecule. For the detail, I’ll refer you to Feynman’s Lecture on this. It’s not that the full story is too complicated: it’s just too lengthy to reproduce it in this post. Just note the key point of the whole story: one arrives at a theoretical value for γ that is equal to γ = 9/7 ≈ 1.286Wonderful! Yes. Except for the fact that value does not correspond to what is measured in reality: the experimentally confirmed value for γ for oxygen (O2) is about 1.40.

What about other gases? When measuring the value for other diatomic gases, like iodine (I2) or bromine (Br2), we get a value closer to the theoretical value (1.30 and 1.32 respectively) but, still, there’s a variation to be explained here. The value for hydrogen H2 is about 1.4, so that’s like oxygen again. For other gases, we again get different values. Why? What’s the problem?

It cannot be explained using classical theory. In addition, doing the measurements for oxygen and hydrogen at various temperatures also reveals that γ is a function of temperature, as shown below. Now that’s another experimental fact that does not line up with our kinetic theory of gases!

Heat ratioReality is right, always. Hence, our theory must be wrong. Our analysis of the independent direction of motions inside of a molecule doesn’t work—even for the simple case of a diatomic molecule. Great minds such as James Clerk Maxwell couldn’t solve the puzzle in the 19th century and, hence, had to admit classical theory was in trouble. Indeed, popular belief has it that the black-body radiation problem was the only thing classical theory couldn’t explain in the late 19th century but that’s not true: there were many more problems keeping physicists awake. But so we’ve got a problem here. As Feynman writes: “We might try some force law other than a spring but it turns out that anything else will only make γ higher. If we include more forms of energy, γ approaches unity more closely, contradicting the facts. All the classical theoretical things that one can think of will only make it worse. The fact is that there are electrons in each atom, and we know from their spectra that there are internal motions; each of the electrons should have at least kT/2 of kinetic energy, and something for the potential energy, so when these are added in, γ gets still smaller. It is ridiculous. It is wrong.

So what’s the answer? The answer is to be found in quantum mechanics. Indeed, one can develop a model distinguishing various molecular states with various energy levels E0, E1, E2,…, Ei,…, and then associate a probability distribution which gives us the probability of finding a molecule in a particular state. Some more assumptions, all quite similar to the assumptions used by Planck when he solved the black-body radiation problem, then give us what we want: to put it simply, it is like some of the motions ‘freeze out’ at lower temperatures. As a result, γ goes up as we go down in temperature.

Hence, quantum mechanics saves the day, again. However, that’s not what I want to write about here. What I want to do here is to give you an equation for the internal energy of a gas which is based on what we can actually measure, so that’s pressure, volume and temperature. I’ll refer to it as the Actual Gas Law, because it takes into account that γ is not some fixed value (so it’s not some natural constant, like Planck’s or Boltzmann’s constant), and it also takes into account that we’re not always gas—ideal or actual gas—but also liquids and solids.

Now, we have many inter-connected variables here, and so the analysis is quite complicated. In fact, it’s a great opportunity to learn more about partial derivatives and how we can use them. So the lesson is as much about math as it about physics. In fact, it’s probably more about math. :-) Let’s see what we can make out of it.

Energy, work, force, pressure and volume

First, I should remind you that work is something that is done by a force on some object in the direction of the displacement of that object. Hence, work is force times distance. Now, because the force may actually vary as our object is being displaced and while the work is being done, we represent work as a line integral:

W = ∫F·ds

We write F and s in bold-face and, hence, we’ve got a vector dot product here, which ensures we only consider the component of the force in the direction of the displacement: F·Δ= |F|·|Δs|·cosθ, with θ the angle between the force and the displacement.

As for the relationship between energy and work, you know that one: as we do work on an object, we change its energy, and that’s what we are looking at here: the (internal) energy of our substance. Indeed, when we have a volume of gas exerting pressure, it’s the same thing: some force is involved (pressure is the force per unit area, so we write: P = F/A) and, using the model of the box with the frictionless piston (illustrated below), we write:

dW = F(–dx) = – PAdx = – PdV


The dW = – PdV formula is the one we use when looking at infinitesimal changes. When going through the full thing, we should integrate, as the volume (and the pressure) changes over the trajectory, so we write:

W = ∫PdV

Now, it is very important to note that the formulas above (dW = – PdV and W = ∫PdV) are always valid. Always? Yes. We don’t care whether or not the compression (or expansion) is adiabatic or isothermal. [To put it differently, we don’t care whether or not heat is added to (or removed from) the gas as it expands (or decreases in volume).] We also don’t keep track of the temperature here. It doesn’t matter. Work is work.

Now, as you know, an integral is some area under a graph so I can rephrase our result as follows: the work that is being done by a gas, as it expands (or the work that we need to put in in order to compress it), is the area under the pressure-volume graph, always.

Of course, as we go through a so-called reversible cycle, getting work out of it, and then putting some work back in, we’ll have some overlapping areas cancelling each other. That’s how we derived the amount of useful (i.e. net) work that can be done by an ideal gas engine (illustrated below) as it goes through a Carnot cycle, taking in some amount of heat Q1 from one reservoir (which is usually referred to as the boiler) and delivering some other amount of heat (Q1) to another reservoir (usually referred to as the condenser). As I don’t want to repeat myself too much, I’ll refer you to one of my previous posts for more details. Hereunder, I just present the diagram once again. If you want to understand anything of what follows, you need to understand it—thoroughly.

Carnot cycle graphIt’s important to note that work is being done in each of the four steps of the cycle, and that the work done by the gas is positive when it expands, and negative when its volume is being reduced. So, let me repeat: the W = ∫PdV formula is valid for both adiabatic as well as isothermal expansion/compression. We just need to be careful about the sign and see in which direction it goes. Having said that, it’s obvious adiabatic and isothermal expansion/compression are two very different things and, hence, their impact on the (internal) energy of the gas is quite different:

  1. Adiabatic compression/expansion assumes that no (external) heat energy (Q) is added or removed and, hence, all the work done goes into changing the internal energy (U). Hence, we can write: W = PΔV = –ΔU and, therefore, ΔU = –PΔV. Of course, adiabatic compression/expansion must involve a change in temperature, as the kinetic energy of the gas molecules is being transferred from/to the piston. Hence, the temperature (which is nothing but the average kinetic energy of the molecules) changes.
  2. In contrast, isothermal compression/expansion (i.e. a volume change without any change in temperature) must involve an exchange of heat energy with the surroundings so to allow the temperature to remain constant. So ΔQ ≠ 0 in this case.

The grand but simple formula capturing all is, obviously:


It says what we’ve said already: the internal energy of a substance (a gas) changes because some work is being done as its volume changes and/or because some heat is added or removed.

Now we have to get serious about partial derivatives, which relate one variable (the so-called ‘dependent’ variable) to another (the ‘independent’ variable). Of course, in reality, all depends on all and, hence, the distinction is quite artificial. Physicists tend to treat temperature and volume as the ‘independent’ variables, while chemists seem to prefer to think in terms of pressure and temperature. In math, it doesn’t matter all that much: we simply take the reciprocal and there you go: dy/dx = 1/(dx/dy). We go from one to another. Well… OK… We’ve got a lot of variables here, so… Yes. You’re right. It’s not going to be that simple, obviously! :-)

Differential analysis

If we have some function f in two variables, x and y, then we can write: Δf = f(x + Δx, y + Δy) –  f(x, y). We can then write the following clever thing:

partial derivativeWhat’s being said here is that we can approximate Δf using the partial derivatives ∂f/∂x and ∂f/∂y. Note that the formula above actually implies that we’re evaluating the (partial) ∂f/∂x derivative at point (x, y+Δy), rather than the point (x, y) itself. It’s a minor detail, but I think it’s good to signal it: this ‘clever thing’ is just pedagogical. [Feynman is the greatest teacher of all times! :-)] The mathematically correct approach is to simply give the formal definition of partial derivatives, and then just get on with it:

Partial derivative definitionNow, let us apply that Δf formula to what we’re interested in, and that’s the change in the (internal) energy U. So we write:

formula 1Now, we can’t do anything with this, in practice, because we cannot directly measure the two partial derivatives. So, while this is an actual gas law (which is what we want), it’s not a practical one, because we can’t use it. :-) Let’s see what we can do about that. We need to find some formula for those partial derivatives. Let’s have a look at the (∂U/∂T)factor first. That factor is defined and referred to as the specific heat capacity at constant volume, and it’s usually denoted by CV. Hence, we write:

CV = specific heat capacity at constant volume = (∂U/∂T)V

Heat capacity? But we’re talking internal energy here? It’s the same. Remember that ΔU = ΔQ – PΔV formula: if we keep the volume constant, then ΔV = 0 and, hence, ΔU = ΔQ. Hence, all of the change in internal energy (and I really mean all of the change) is the heat energy we’re adding or removing from the gas. Hence, we can also write CV in its more usual definitional form:

C= (∂Q/∂T)V

As for its interpretation, you should look at it as a ratio: Cis the amount of heat one must put into (or remove from) a substance in order to change its temperature by one degree with the volume held constant. Note that the term ‘specific heat capacity’ is usually referred to as the ‘specific heat’, as that’s shorter and simpler. However, you can see it’s some kind of ‘capacity’ indeed. More specifically, it’s a capacity of a substance to absorb heat. Now that’s stuff we can actually measure and, hence, we’re done with the first term in that ΔU = ΔT·(∂U/∂T)+ ΔV·(∂U/∂V)expression, which we can now write as:

ΔT·(∂U/∂T)= ΔT·(∂Q/∂T)= ΔT·CV

OK. So we’re done with the first term. Just to make sure we’re on the right track here, let’s have a quick look at the units here: the unit in which we should measure Cis, obviously, joule per degree (Kelvin), i.e. J/K. And then we multiply with ΔT, which is measured in degrees Kelvin, and we get some amount in Joule. Fine. We’re done, indeed. :-)

Let’s look at the second term now, i.e. the ΔV·(∂U/∂V)T term. Now, you may think that we could define CT = (∂U/∂V)as the specific heat capacity at constant temperature because… Well… Hmm… It is the amount of heat one must put into (or remove from) a substance in order to change its volume by one unit with the temperature held constant, isn’t it? So we write CT = (∂U/∂V)T = (∂Q/∂V)T and we’re done here too, aren’t we?


It’s not that simple. Two very different things are happening here. Indeed, the change in (internal) energy ΔU, as the volume changes by ΔV while keeping the temperature constant (we’re looking at that (∂U/∂V)T factor here, and I’ll remind you of that subscript T a couple of times), consists of two parts:

  1. First, the volume is not being kept constant and, hence, the internal energy (U) changes because work is being done.
  2. Second, the internal energy (U) also changes because heat is being put in, so the temperature can be kept constant indeed.

So we cannot simplify. We’re stuck with the full thing: ΔU = ΔQ – PΔV, in which – PΔV is the (infinitesimal amount of) work that’s being done on the substance, and ΔQ is the (infinitesimal amount of) heat that’s being put in. What can we do? How can we relate this to actual measurables?

Now, the logic is quite abstruse, so please be patient and bear with me. The key to the analysis is that diagram of the reversible Carnot cycle, with the shaded area representing the net work that’s being done, except that we’re now talking infinitesimally small changes in volume, temperature and pressure. So we redraw the diagram and get something like this:

Carnot 2Now, you can easily see the equivalence between the shaded area and the ΔPΔV rectangle below:

equivalenceSo the work done by the gas is the shaded area, whose surface is equal to ΔPΔV. […] But… Hey, wait a minute! You should object: we are not talking ideal engines here and, hence, we are not going through a full Carnot cycle, are we? We’re calculating the change in internal energy when the temperature changes with ΔT, the volume changes with ΔV, and the pressure changes with ΔP. Full stop. So we’re not going back to where we came from and, hence, we should not be analyzing this thing using the Carnot cycle, should we? Well… Yes and no. More yes than no. Remember we’re looking at the second term only here: ΔV·(∂U/∂V)T. So we are changing the volume (and, hence, the internal energy) but the subscript in the (∂U/∂V)term makes it clear we’re doing so at constant temperature. In practice, that means we’re looking at a theoretical situation here that assumes a complete and fully reversible cycle indeed. Hence, the conceptual idea is, indeed, that we put some heat in, that the gas does some work as it expands, and that we then are actually putting some work back in to bring the gas back to its original temperature T. So, in short, yes, the reversible cycle idea applies.

[…] I know, it’s very confusing. I am actually struggling with the analysis myself, so don’t be too hard on yourself. Think about it, but don’t lose sleep over it. :-) I added a note on it in the P.S. to this post on it so you can check that out too. However, I need to get back to the analysis itself here. From our discussion of the Carnot cycle and ideal engines, we know that the work done is equal to the difference between the heat that’s being put in and the heat that’s being delivered: W = Q1 – Q2. Now, because we’re talking reversible processes here, we also know that Q1/T1 = Q2/T2. Hence, Q2 = (T 2/T1)Q1 and, therefore, the work done is also equal to W = Q– (T 2/T1)Q1 = Q1(1 – T 2/T1) = Q1[(T– T2)/T1]= Q1(ΔT/T1). Let’s now drop the subscripts by equating Q1 with ΔQ, so we have:

W = ΔQ(ΔT/T)

You should note that ΔQ is not the difference between Q1 and Q2. It is not. ΔQ is the heat we put in as it expands isothermally from volume V to volume V + ΔV. I am explicit about it because the Δ symbol usually denotes some difference between two values. In case you wonder how we can do away with Q2, think about it. […] The answer is that we did not really get away with it: the information is captured in the ΔT factor, as T–ΔT is the final temperature reached by the gas as it expands adiabatically on the second leg of the cycle, and the change in temperature obviously depends on Q2! Again, it’s all quite confusing because we’re looking at infinitesimal changes only, but the analysis is valid. [Again, go through the P.S. of this post if you want more remarks on this, although I am not sure they’re going to help you much. The logic is really very deep.]

[…] OK… I know you’re getting tired, but we’re almost done. Hang in there. So what do we have now? The work done by the gas as it goes through this infinitesimally small cycle is the shaded area in the diagram above, and it is equal to:


From this, it follows that ΔQ = T·ΔV·ΔP/ΔT. Now, you should look at the diagram once again to check what ΔP actually stands for: it’s the change in pressure when the temperature changes at constant volume. Hence, using our partial derivative notation, we write:

ΔP/ΔT = (∂P/∂T)V

We can now write ΔQ = T·ΔV·(∂P/∂T)and, therefore, we can re-write ΔU = ΔQ – PΔV as:

ΔU = T·ΔV·(∂P/∂T)– PΔV

Now, dividing both sides by ΔV, and writing all using the partial derivative notation, we get:

ΔU/ΔV = (∂U/∂V)T = T·(∂P/∂T)– P

So now we know how to calculate the (∂U/∂V)factor, from measurable stuff, in that ΔU = ΔT·(∂U/∂T)+ ΔV·(∂U/∂V)expression, and so we’re done. Let’s write it all out:

ΔU = ΔT·(∂U/∂T)+ ΔV·(∂U/∂V)= ΔT·C+ ΔV·[T·(∂P/∂T)– P]

Phew! That was tough, wasn’t it? It was. Very tough. As far as I am concerned, this is probably the toughest of all I’ve written so far.

Dependent and independent variables 

Let’s pause to take stock of what we’ve done here. The expressions above should make it clear we’re actually treating temperature and volume as the independent variables, and pressure and energy as the dependent variables, or as functions of (other) variables, I should say. Let’s jot down the key equations once more:

  1. ΔU = ΔQ – PΔV
  2. ΔU = ΔT·(∂U/∂T)+ ΔV·(∂U/∂V)
  3. (∂U/∂T)= (∂Q/∂T)V = CV
  4. (∂U/∂V)T = T·(∂P/∂T)– P

It looks like Chinese, doesn’t it? :-) What can we do with this? Plenty. Especially the first equation is really handy for analyzing and solving various practical problems. The second equation is much more difficult and, hence, less practical. But let’s try to apply this equation for actual gases to an ideal gas—just to see if we’re getting our ideal gas law once again. :-) We know that, for an ideal gas, the internal energy depends on temperature, not on V. Indeed, if we change the volume but we keep the temperature constant, the internal energy should be the same, as it only depends on the motion of the molecules and their number. Hence, (∂U/∂V)must equal zero and, hence, T·(∂P/∂T)– P = 0. Replacing the partial derivative with an ordinary one (not forgetting that the volume is kept constant), we get:

T·(dP/dT) – P = 0 (constant volume)

⇔ (1/P)·(dP/dT) = 1/T (constant volume)

Integrating both sides yields: lnP = lnT + constant. This, in turn, implies that P = T × constant. [Just re-write the first constant as the (natural) logarithm of some other constant, i.e. the second constant, obviously).] Now that’s consistent with our ideal gas P = NkT/V, because N, k and V are all constant. So, yes, the ideal gas law is a special case of our more general thermodynamical expression. Fortunately! :-)

That’s not very exciting, you’ll say—and you’re right. You may be interested – although I doubt it :-) – in the chemists’ world view: they usually have performance data (read: values for derivatives) measured under constant pressure. The equations above then transform into:

  1. ΔH = Δ(U + P·V) = ΔQ + VΔP
  2. ΔH = ΔT·(∂H/∂T)+ ΔP·(∂H/∂P)
  3. (∂H/∂P)T = –T·(∂V/∂T)+ V

H? Yes. H is another so-called state variable, so it’s like entropy or internal energy but different. As they say in Asia: “Same-same but different.” :-) It’s defined as H = U + PV and its name is enthalpy. Why do we need it? Because some clever man noted that, if you take the total differential of P·V, i.e. Δ(P·V) = P·ΔV + V·ΔP, and our ΔU = ΔQ – P·ΔV expression, and you add both sides of both expressions, you get Δ(U + P·V) = ΔQ + VΔP. So we’ve substituted –P for V – so as to please the chemists – and all our equations hold provided we substitute U for H and, importantly, –P for V. [Note the sign switch is to be applied to derivatives as well: if we substitute P for –V, then ∂P/∂T becomes ∂(–V)/∂T = –(∂V/∂T)!

So that’s the chemists’ model of the world, and they’ll usually measure the specific heat capacity at constant pressure, rather than at constant volume. Indeed, one can show the following:

(∂H/∂T)= (∂Q/∂T)= CP = the specific heat capacity at constant pressure

In short, while we referred to γ as the specific heat ratio in our previous posts, assuming we’re talking ideal gases only, we can now appreciate the fact there is actually no such thing as the specific heat: there are various variables and, hence, various definitions. Indeed, it’s not only pressure or volume: the specific heat capacity of some substance will usually also be expressed as a function of its mass (i.e. per kg), the number of particles involved (i.e. per mole), or its volume (i.e. per m3). In that case, we talk about the molar or volumetric heat capacity respectively. The name for the same thing expressed in joule per degree Kelvin and per kg (J/kg·K) is the same: specific heat capacity. So we’ve got three different concepts here, and two ways of measuring them: at constant pressure or at constant volume. No wonder one gets quite confused when googling tables listing the actual values! :-)

Now, there’s one question left: why is γ being referred to as the specific heat ratio? The answer is simple: it actually is the ratio of the specific heat capacities CP and CV. Hence, γ is equal to:

γ = CP/CV

I could show you how that works. However, I would just be copying the Wikipedia article on it, so I won’t do that: you’re sufficiently knowledgeable now to check it out yourself, and verify it’s actually true. Good luck with it ! In the process, please also do check why Cis always larger than Cso you can explain why γ is always larger than one. :-)

Post scriptum: As usual, Feynman’s Lectures, were the inspiration here—once more. Now, Feynman has a habit of ‘integrating’ expressions and, frankly, I never found a satisfactory answer to a pretty elementary question: integration in regard to what variable? His exposé on both the ideal as well as the actual gas law has enlightened me. The answer is simple: it doesn’t matter. :-) Let me show that by analyzing the following argument of Feynman:


So… What is that ‘integration’ that ‘yields’ that γlnV + lnP = lnC expression? Are we solving some differential equation here? Well… Yes. But let’s be practical and take the derivative of the expression in regard to V, P and T respectively. Let’s first see where we come from. The fundamental equation is PV = (γ–1)U. That means we’ve got two ‘independent’ variables, and one that ‘depends’ on the others: if we fix P and V, we have U, or if we fix U, then P and V are in an inversely proportional relationship. That’s easy enough. We’ve got three ‘variables’ here: U, P and V—or, in differential form, dU, dP and dV. However, Feynman eliminates one by noting that dU = –PdV. He rightly notes we can only do that because we’re talking adiabatic expansion/compression here: all the work done while expanding/compressing the gas goes into changing the internal energy: no heat is added or removed. Hence, there is no dQ term here.

So we are left with two ‘variables’ only now: P and V, or dP and dV when talking differentials. So we can choose: P depends on V, or V depends on P. If we think of V as the independent variable, we can write:

d[γ·lnV + lnP]/dV = γ·(1/V)·(dV/dV) + (1/P)·(dP/dV), while d[lnC]/dV = 0

So we have γ·(1/V)·(dV/dV) + (1/P)·(dP/dV) = 0, and we can then multiply sides by dV to get:

(γ·dV/V) + (dP/P) = 0,

which is the core equation in this argument, so that’s the one we started off with. Picking P as the ‘independent’ variable and, hence, integrating with respect to P yields the same:

d[γ·lnV + lnP]/dP = γ·(1/V)·(dV/dP) + (1/P)·(dP/dP), while d[lnC]/dP = 0

Multiplying both sides by dP yields the same thing: (γ·dV/V) + (dP/P) = 0. So it doesn’t matter, indeed. But let’s be smart and assume both P and V, or dP and dV, depend on some implicit variable—a parameter really. The obvious candidate is temperature (T). So we’ll now integrate and differentiate in regard to T. We get:

d[γ·lnV + lnP]/dT = γ·(1/V)·(dV/dT) + (1/P)·(dP/dT), while d[lnC]/dT = 0

We can, once again, multiply both sides with dT and – surprise, surprise! – we get the same result: 

(γ·dV/V) + (dP/P) = 0

The point is that the γlnV + lnP = lnC expression is damn valid, and C or lnC or whatever is ‘the constant of integration’ indeed, in regard to whatever variable: it doesn’t matter. So then we can, indeed, take the exponential of both sides (which is much more straightforward than ‘integrating both sides’), so we get:

eγlnV + lnP = eln= C

It then doesn’t take too much intelligence to see that eγlnV + lnP = e(lnV)γ+ln= e(lnV)γ·elnP Vγ·P = P·Vγ. So we’ve got the grand result that what we wanted: PVγ = C, with C some constant determined by the situation we’re in (think of the size of the box, or the density of the gas).

So, yes, we’ve got a ‘law’ here. We should just remind ourselves, always, that it’s only valid when we’re talking adiabatic compression or expansion: so we we do not add or remove heat energy or, as Feynman puts it, much more succinctly, “no heat is being lost“. And, of course, we’re also talking ideal gases only—which excludes a number of real substances. :-) In addition, we’re talking adiabatic processes only: we’re not adding nor removing heat.

It’s a weird formula: the pressure times the volume to the 5/3 power is a constant for monatomic gas. But it works: as long as individual atoms are not bound to each other, the law holds. As mentioned above, when various molecular states, with associated energy levels are at play, it becomes an entirely different ballgame. :-)

I should add one final note as to the functional form of PVγ = C. We can re-write it as P = C/Vγ. Because The shape of that graph is similar to the P = NkT/V relationship we started off with. Putting the two equations side by side, makes it clear our constant and temperature are obviously related one to another, but they are not directly proportional to each other. In fact, as the graphs below clearly show, the P = NkT/V gives us these isothermal lines on the pressure-volume graph (i.e. they show P and V are related at constant temperature), while the P = C/Vγ equation gives us the adiabatic lines. Just google an online function graph tool, and you can now draw your own diagrams of the Carnot cycle! Just change the denominator (i.e. the constants C and T in both equations). :-)

graphNow, I promised I would say something more about that infinitesimal Carnot cycle: why is it there? Why don’t we limit the analysis to just the first two steps? In fact, the shortest and best explanation I can give is something like this: think of the whole cycle as the first step in a reversible process really. We put some heat in (ΔQ) and the gas does some work, but so that heat has to go through the whole body of gas, and the energy has to go somewhere too. In short, the heat and the work is not being absorbed by the surroundings but it all stays in the ‘system’ that we’re analyzing, so to speak, and that’s why we’re going through the full cycle, not the first two steps only. Now, this ‘answer’ may or may not satisfy you, but I can’t do better. You may want to check Feynman’s explanation itself, but he’s very short on this and, hence, I think it won’t help you much either. :-(

The Ideal versus the Actual Gas Law


The two previous posts were quite substantial. Still, they were only the groundwork for what we really want to talk about: entropy, and the second law of thermodynamics, which you probably know as follows: all of the energy in the universe is constant, but its entropy is always increasing. But what is entropy really? And what’s the nature of this so-called law?

Let’s first answer the second question: Wikipedia notes that this law is more like an empirical finding that has been accepted as an axiom. That probably sums it up best. That description does not downplay its significance. In fact, Newton’s laws of motion, or Einstein’s relatively principle, have the same status: axioms in physics – as opposed to those in math – are grounded in reality. At the same time, and just like in math, one can often choose alternative sets of axioms. In other words, we can derive the law of ever-increasing entropy from other principles, notably the Carnot postulate, which basically says that, if the whole world were at the same temperature, it would impossible to reversibly extract and convert heat energy into work. I talked about that in my previous post, and so I won’t go into more detail here. The bottom line is that we need two separate heat reservoirs at different temperatures, denoted by Tand T2, to convert heat into useful work.

Let’s go to the first question: what is entropy, really?

Defining entropy

Feynman, the Great Teacher, defines entropy as part of his discussion on Carnot’s ideal reversible heat engine, so let’s have a look at it once more. Carnot’s ideal engine can do some work by taking an amount of heat equal to Qout of one heat reservoir and putting an amount of heat equal to Q2 into the other one (or, because it’s reversible, it can also go the other way around, i.e. it can absorb Q2 and put Q1 back in, provided we do the same amount of work W on the engine).

The work done by such machine, or the work that has to be done on the machine when reversing the cycle, is equal W = Q1 – Q2 (the equation shows the machine is as efficient as it can be, indeed: all of the difference in heat energy is converted into useful work, and vice versa—nothing gets ‘lost’ in frictional energy or whatever else!). Now, because it’s a reversible thermodynamic process, one can show that the following relationship must hold:

Q1/T= Q2/T2

This law is valid, always, for any reversible engine and/or for any reversible thermodynamic process, for any Q1, Q2, T1 and T2. [Ergo, it is not valid for non-reversible processes and/or non-reversible engines, i.e. real machines.] Hence, we can look at Q/T as some quantity that remains unchanged: an equal ‘amount’ of Q/T is absorbed and given back, and so there is no gain or loss of Q/T (again, if we’re talking reversible processes, of course). [I need to be precise here: there is no net gain or loss in the Q/T of the substance of the gas. The first reservoir obviously looses Q1/T1, and the second reservoir gains Q2/T2. The whole environment only remains unchanged if we’d reverse the cycle.]

In fact, this Q/T ratio is the entropy, which we’ll denote by S, so we write:

S = Q1/T= Q2/T2

What the above says, is basically the following: whenever the engine is reversible, this relationship between the heats must follow: if the engine absorbs Qat Tand delivers Qat T2, then Qis to Tas Qis to T2 and, therefore, we can define the entropy S as S = Q/T. That implies, obviously:

Q = S·T

From these relations (S = Q/T and Q = S·T), it is obvious that the unit of entropy has to be joule per degree (Kelvin), i.e. J/K. As such, it has the same dimension as the Boltzmann constant, k≈ 1.38×10−23 J/K, which we encountered in the ideal gas formula PV = NkT, and which relates the mean kinetic energy of atoms or molecules in an ideal gas to the temperature. However, while kis, quite simply, a constant of proportionality, S is obviously not a constant: its value depends on the system or, to continue with the mathematical model we’re using, the heat engine we’re looking at.

Still, this definition and relationships do not really answer the question: what is entropy, really? Let’s further explore the relationships so as to try to arrive at a better understanding.

I’ll continue to follow Feynman’s exposé here, so let me use his illustrations and arguments. The first argument revolves around the following set-up, involving three reversible engines (1, 2 and 3), and three temperatures (T1 > T> T3): Three engines

Engine 1 runs between T1 and  Tand delivers W13 by taking in Q1 at T1 and delivering Q3 at T3. Similarly, engine 2 and 3 deliver or absorb W32  and W12 respectively by running between T3 and  T2 and between T2 and  Trespectively. Now, if we let engine 1 and 2 work in tandem, so engine 1 produces W13 and delivers Q3, which is then taken in by engine 2, using an amount of work W32, the net result is the same as what engine 3 is doing: it runs between T1 and  Tand delivers W12, so we can write:

W12 = W13 – W32

This result illustrates that there is only one Carnot efficiency, which Carnot’s Theorem expresses as follows:

  1. All reversible engines operating between the same heat reservoirs are equally efficient.
  2. No actual engine operating between two heat reservoirs can be more efficient than a Carnot engine operating between the same reservoirs.

Now, it’s obvious that it would be nice to have some kind of gauge – or a standard, let’s say – to describe the properties of ideal reversible engines in order to compare them. We can define a very simple gauge by assuming Tin the diagram above is one degree. One degree what? Whatever: we’re working in Kelvin for the moment, but any absolute temperature scale will do. [An absolute temperature scale uses an absolute zero. The Kelvin scale does that, but the Rankine scale does so too: it just uses different units than the Kelvin scale (the Rankine units correspond to Fahrenheit units, while the Kelvin units correspond to Celsius degrees).] So what we do is to let our ideal engines run between some temperature T – at which it absorbs or delivers a certain heat Q – and 1° (one degree), at which it delivers or absorbs an amount of heat which we’ll denote by QS. [Of course, I note this assumes that ideal engines are able to run between one degree Kelvin (i.e. minus 272.15 degrees Celsius) and whatever other temperature. Real (man-made) engines are obviously likely to not have such tolerance. :-)] Then we can apply the Q = S·T equation and write:

Q= S·1°

Like that we solve the gauge problem when measuring the efficiency of ideal engines, for which the formula is W/Q= (T1 –  T)/T1. In my previous post, I illustrated that equation with some graphs for various values of T(e.g. T= 4, 1, or 0.3). [In case you wonder why these values are so small, it doesn’t matter: we can scale the units, or assume 1 unit corresponds to 100 degrees, for example.] These graphs all look the same but cross the x-axis (i.e. the T1-axis) at different points (at T= 4, 1, and 0.3 respectively, obviously). But let us now use our gauge and, hence, standardize the measurement by setting T2 to 1. Hence, the blue graph below is now the efficiency graph for our engine: it shows how the efficiency (W/Q1) depends on its working temperature Tonly. In fact, if we drop the subscripts, and define Q as the heat that’s taken in (or delivered when we reverse the machine), we can simply write:

 W/Q = (T – 1)/T = 1 – 1/T


Note the formula allows for negative values of the efficiency W/Q: if Twould be lower than one degree, we’d have to put work in and, hence, our ideal engine would have negative efficiency indeed. Hence, the formula is consistent over the whole temperature domain T > 0. Also note that, coincidentally, the three-engine set-up and the W/Q formula also illustrate the scalability of our theoretical reversible heat engines: we can think of one machine substituting for two or three others, or any combination really: we can have several machines of equal efficiency working in parallel, thereby doubling, tripling, quadruping, etcetera, the output as well as the heat that’s being taken in. Indeed, W/Q = 2W/2Q = 3W/3Q1 = 4W/4Q and so on.

Also, looking at that three-engine model once again, we can set T3 to one degree and re-state the result in terms of our standard temperature:

If one engine, absorbing heat Qat T1, delivers the heat QS at one degree, and if another engine absorbing heat Qat T2, will also deliver the same heat QS at one degree, then it follows that an engine which absorbs heat Qat the temperature T1 will deliver heat Qif it runs between T1 and T2.

That’s just stating what we showed, but it’s an important result. All these machines are equivalent, so to say, and, as Feynman notes, all we really have to do is to find how much heat (Q) we need to put in at the temperature T in order to deliver a certain amount of heat Qat the unit temperature (i.e. one degree). If we can do that, then we have everything. So let’s go for it.

Measuring entropy

We already mentioned that we can look at the entropy S = Q/T as some quantity that remains unchanged as long as we’re talking reversible thermodynamic processes. Indeed, as much Q/T is absorbed as is given back in a reversible cycle or, in other words: there is no net change in entropy in a reversible cycle. But what does it mean really?

Well… Feynman defines the entropy of a system, or a substance really (think of that body of gas in the cylinder of our ideal gas engine), as a function of its condition, so it is a quantity which is similar to pressure (which is a function of density, volume and temperature: P = NkT/V), or internal energy (which is a function of pressure and volume (U = (3/2)·PV) or, substituting the pressure function, of density and temperature: U = (3/2)·NkT). That doesn’t bring much clarification, however. What does it mean? We need to go through the full argument and the illustrations here.

Suppose we have a body of gas, i.e. our substance, at some volume Va and some temperature Ta (i.e. condition a), and we bring it into some other condition (b), so it now has volume Vb and temperature Tb, as shown below. [Don’t worry about the ΔS = Sb – Sa and ΔS = Sa – Sb formulas as for now. I’ll explain them in a minute.]  

Entropy change

You may think that a and b are, once again, steps in the reversible cycle of a Carnot engine, but no! What we’re doing here is something different altogether: we’ve got the same body of gas at point b but in a completely different condition: indeed, both the volume and temperature (and, hence, its pressure) of the gas is different in b as compared to a. What we do assume, however, is that the gas went from condition a to condition b through a completely reversible process. Cycle, process? What’s the difference? What do we mean with that?

As Feynman notes, we can think of going from a to b through a series of steps, during which tiny reversible heat engines take out an infinitesimal amount of heat dQ in tiny little reservoirs at the temperature corresponding to that point on the path. [Of course, depending on the path, we may have to add heat (and, hence, do work rather than getting work out). However, in this case, we see a temperature rise but also an expansion of volume, the net result of which is that the substance actually does some (net) work from a to b, rather than us having to put (net) work in.] So the process consists, in principle, of a (potentially infinite) number of tiny little cycles. The thinking is illustrated below. 

Entropy change 2

Don’t panic. It’s one of the most beautiful illustrations in all of Feynman’s Lectures, IMHO. Just analyze it. We’ve got the same horizontal and vertical axis here, showing volume and temperature respectively, and the same points a and b showing the condition of the gas before and after and, importantly, also the same path from condition a to condition b, as in the previous illustration. It takes a pedagogic genius like Feynman to think of this: he just draws all those tiny little reservoirs and tiny engines on a mathematical graph to illustrate what’s going on: at each step, an infinitesimal amount of work dW is done, and an infinitesimal amount of entropy dS = dQ/T is being delivered at the unit temperature.

As mentioned, depending on the path, some steps may involve doing some work on those tiny engines, rather than getting work out of them, but that doesn’t change the analysis. Now, we can write the total entropy that is taken out of the substance (or the little reservoirs, as Feynman puts it), as we go from condition a to b, as:

ΔS = Sb – Sa

Now, in light of all the above, it’s easy to see that this ΔS can be calculated using the following integral:

integral entropy

So we have a function S here which depends on the ‘condition’ indeed—i.e. the volume and the temperature (and, hence, the pressure) of the substance. Now, you may or may not notice that it’s a function that is similar to our internal energy formula (i.e. the formula for U). At the same time, it’s not internal energy. It’s something different. We write:

S = S(V, T)

So now we can rewrite our integral formula for change in S as we go from a to b as:

integral entropy 2

Now, a similar argument as the one we used when discussing Carnot’s postulate (all ideal reversible engines operating between two temperatures are essentially equivalent) can be used to demonstrate that the change in entropy does not depend on the path: only the start and end point (i.e. point a and b) matter. In fact, the whole discussion is very similar to the discussion of potential energy when conservative force fields are involved (e.g. gravity or electromagnetism): the difference between the values for our potential energy function at different points was absolute. The paths we used to go from one point to another didn’t matter. The only thing we had to agree on was some reference point, i.e. a zero point. For potential energy, that zero point is usually infinity. In other words, we defined zero potential energy as the potential energy of a charge or a mass at an infinite distance away from the charge or mass that’s causing the field.

Here we need to do the same: we need to agree on a zero point for S, because the formula above only gives the difference of entropy between two conditions. Now, that’s where the third law of thermodynamics comes in, which simply states that the entropy of any substance at the absolute zero temperature (T = 0) is zero, so we write:

S = 0 at T = 0

That’s easy enough, isn’t it?

Now, you’ll wonder whether we can actually calculate something with that. We can. Let me simply reproduce Feynman’s calculation of the entropy function for an ideal gas. You’ll need to pull all that I wrote in this and my previous posts together, but you should be able to follow his line of reasoning:

Entropy for ideal gas

Huh? I know. At this point, you’re probably suffering from formula overkill. However, please try again. Just go over the text and the formulas above, and try to understand what they really mean. [In case you wonder about the formula with the ln[Vb/Va] factor (i.e. the reference to section 44.4), you can check it in my previous post.] So just try to read the S(V, T) formula: it says that a substance (a gas, liquid or solid) consisting of N atoms or molecules, at some temperature T and with some volume V, is associated with some exact value for its entropy S(V, T). The constant, a, should, of course, ensure that S(V, T) = 0 at T = 0.

The first thing you can note is that S is an increasing function of V at constant temperature T. Conversely, decreasing the volume results in a decrease of entropy. To be precise, using the formula for S, we can derive the following formula for the difference in entropy when keeping the temperature constant at some value T:

Sb – Sa = S(Vb, T) – S(Va, T)

= ΔS = N·k·ln[Vb/Va]

What this formula says, for example, is that we’d do nothing but double the volume (while keeping the temperature constant) of a gas when going from  to a to b (hence, Vb/V= 2), the entropy will change by N·k·ln(2) ≈ 0.7·N·k. Conversely, if we would halve the volume (again, assuming the temperature remains constant), then the change in entropy will be N·k·ln(0.5) ≈ –0.7·N·k.

The graph below shows how it works. It’s quite simple really: it’s just the ln(x) function, and I just inserted it here so you have an idea of how the entropy changes with volume. [In case you would think it looks the same like that efficiency graph, i.e. the graph of the W/Q = (T – 1)/T = 1 – 1/T function, think again: the efficiency graph has a horizontal asymptote (y = 1), while the logarithmic function does not have any horizontal asymptote.]

Capture 2

Now, you may think entropy changes only marginally as we keep increasing the volume, but you should also think twice here. It’s just the nature of the logarithmic scale. Indeed, when we double the volume, going from V = 1 to V = 2, for example, the change in entropy will be equal to N·k·ln(2) ≈ 0.7·N·k. Now, that’s the same change as going from V = 2 to V = 4, and the same as going from V = 4 to V = 8. So, if we double the volume three times in a row, the total change in entropy will be that of going from V = 1 to V = 8, which is equal to N·k·ln(8) = N·k·ln(23) = 3·ln(2). So, yes, looking at the intervals here that are associated with the same ln(2) increase  in entropy, i.e. [1, 2], [2, 4] and [4, 8] respectively, you may think that the increase in entropy is marginal only, as it’s the same increase but the length of each interval is double that of the previous one. However, when reducing the volume, the logic works the other way around, and so the logarithmic function ensures the change is anything but marginal. Indeed, if we halve the volume, going from V = 1 to V = 1/2, and then halve it again, to V = 1/4, and the again, to V = 1/8, we get the same change in entropy once more—but with a minus sign in front, of course: N·k·ln(2–3) = –3·ln(2)—but the same ln(2) change is now associated with intervals on the x-axis (between 1 and 0.5, 0.5 and 0.25, and 0.25 and 0.125 respectively) that are getting smaller and smaller as we further reduce the volume. In fact, the length of each interval is now half of that of the previous interval. Hence, the change in entropy is anything but marginal now!

[In light of the fact that the (negative) change in entropy becomes larger and larger as we further reduce the volume, and in a way that’s anything but marginal, you may now wonder, for a very brief moment, whether or not the entropy might actually take on a negative value. The answer is obviously no. The change in entropy can take on a large negative volume but the S(V, T) = N·k·[ln(V) + ln(T)/(γ–1)] + a formula, with ensuring that the entropy is zero at T = 0, ensures things come out alright—as it should, of course!]

Now, as we’re continue to try to understand what entropy really means, it’s quite interesting to think of what this formula implies at the level of the atoms or molecules that make up the gas: the entropy change per molecule is k·ln2 – or k·ln(1/2) when compressing the gas at the same temperature. Now, its kinetic energy remains the same – because – don’t forget! – we’re changing the volume at constant temperature here. So what causes the entropy change here really? Think about it: the only thing that changed, physically, is how much room the molecule has to run around in—as Feynman puts it aptly. Hence, while everything stays the same (atoms or molecules with the same temperature and energy), we still have an entropy increase (or decrease) when the distribution of the molecules changes.

This remark brings us to the connection between order and entropy, which you vaguely know, for sure, but probably never quite understood because, if you did, you wouldn’t be reading this post. :-) So I’ll talk about in a moment. I first need to wrap up this section, however, by showing why all of the above is, somehow, related to that ever-increasing entropy law. :-)

However, before doing that, I want to quickly note something about that assumption of constant temperature here. How can it remain constant? When a body of gas expands, its temperature should drop, right? Well… Yes. But only if it is pushing against something, like in cylinder with a piston indeed, or as air escapes from a tyre and pushes against the (lower-pressure) air outside of the tyre. What happens here is that the kinetic energy of the gas molecules is being transferred (to the piston, or to the gas molecules outside of the tyre) and, hence, temperature decreases indeed. In such case, the assumption is that we add (or remove) heat from our body of gas as we expand (or decrease) its volume. Having said that, in a more abstract analysis, we could envisage a body of gas that has nothing to push against, except for the walls of its container, which have the same temperature. In such more abstract analysis, we need not worry about how we keep temperature constant: the point here is just to compare the ex post and ex ante entropy of the volume. That’s all.

The Law of Ever-Increasing Entropy 

With all of the above, we’re finally armed to ‘prove’ the second law of thermodynamics which we can also state as follows indeed: while the energy of the universe is constant, its entropy is always increasing. Why is this so? Out of respect, I’ll just quote Feynman once more, as I can’t see how I could possibly summarize it better:

Universe of entropy

So… That should sum it all up. You should re-read the above a couple of times, so you’re sure you grasp it. I’ll also let Feynman summarize all of those ‘laws’ of thermodynamics that we have just learned as, once more, I can’t see how I could possibly write more clearly or succinctly. His statement is much more precise that the statement we started out with: the energy of the universe is always constant but its entropy is always increasing. As Feynman notes, this version of the two laws of thermodynamics don’t say that entropy stays the same in a reversible cycle, and also doesn’t say what entropy actually is. So Feynman’s summary is much more precise and, hence, much better indeed:

Laws of thermodynamics

Entropy and order

What I wrote or reproduced above may not have satisfied you. So we’ve got this funny number, S, describing some condition or state of a substance, but you may still feel you don’t really know what it means. Unfortunately, I cannot do all that much about that. Indeed, technically speaking, a quantity like entropy (S) is a state function, just like internal energy (U), or like enthalpy (usually denoted by H), a related concept which you may remember from chemistry and which is defined H = U + PV. As such, you may just think of S as some number that pops up in a thermodynamical equations. It’s perfectly fine to think of it like that. However, if you’re reading this post, then it’s likely you do so because some popular science book mentioned entropy and related it to order and/or disorder indeed. However, I need to disappoint you here: that relationship is not as straightforward as you may think it is. To get some idea, let’s go through another example, which I’ll also borrow from Feynman.

Let’s go back to that relationship between volume and entropy, keeping temperature constant:

ΔS = N·k·ln[Vb/Va]

We discussed, rather at length, how entropy increases as we allow a body of gas to expand. As the formula shows, it increases logarithmically with the ratio of the ex ante and ex post volume. Now, let us think about two gases, which we can think of as ‘white’ and ‘black’ respectively. Or neon or argon. Whatever. Two different gases. Let’s suppose we’ve kept them into two separate compartments of a box, with some barrier in-between them.

Now, you know that, if we’d take out the barrier, they’ll mix it. That’s just a fact of life. As Feynman puts it: somehow, the whites will worm their way across in the space of blacks, and the blacks will worm their way, by accident, into the space of whites. [There’s a bit of a racist undertone in this, isn’t there? But then I am sure Feynman did not intend it that way.] Also, as he notes correctly: we’ve got a very simple example here of an irreversible process which is completely composed of reversible events. We know this mixing will not affect the kinetic (or internal) energy of the gas. Having said that, both the white and the black molecules now have ‘much more room to run around in’. So is there a change in entropy? You bet.

If we take away that barrier, it’s just similar to moving that piston out when we were discussing one volume of gas only. Indeed, we effectively double the volume for the whites, and we double the volume for the blacks, while keeping all at the same temperature. Hence, both the entropy of the white and black gas increases. By how much? Look at the formula: the amount is given by the product of the number of molecules (N), the Boltzman constant (k), and ln(2), i.e. the natural logarithm of the ratio of the ex post and ex ante volumes: ΔS = N·k·ln[Vb/Va].

So, yes, entropy increases as the molecules are now distributed over a much larger space. Now, if we stretch our mind a bit, we could define as a measure of order, or disorder, especially when considering the process going the other way: suppose the gases were mixed up to begin with and, somehow, we manage to neatly separate them in two separate volumes, each half of the original. You’d agree that amounts to an increase in order and, hence, you’d also agree that, if entropy is, somehow, some measure for disorder, entropy should decrease–which it obviously does using that ΔS = N·k·ln[Vb/Va] formula. Indeed, we calculated ΔS as –0.7·N·k.

However, the interpretation is quite peculiar and, hence, not as straightforward as popular science books suggest. Indeed, from that S(V, T) = Nk[lnV + (1/γ−1)lnT] + a formula, it’s obvious we can also decrease entropy by decreasing the number of molecules, or by decreasing the temperature. You’ll have to admit that in both cases (decrease in N, or decrease in T), you’ll have to be somewhat creative in interpreting such decrease as a decrease in disorder.

So… What more can we say? Nothing much. However, in order to be complete, I should add a final note on this discussion of entropy measuring order (or, to be more precise, measuring disorder). It’s about another concept of entropy, the so-called Shannon entropy. It’s a concept from information theory, and our entropy and the Shannon entropy do have something in common: in both, we see that logarithm pop up. It’s quite interesting but, as you might expect, complicated. Hence, I should just refer you to the Wikipedia article on it, from which I took the illustration and text below.

coin flip

We’ve got two coins with two faces here. They can, obviously, be arranged in 22 = 4 ways. Now, back in 1948, the so-called father of information theory, Claude Shannon, thought it was nonsensical to just use that number (4) to represent the complexity of the situation. Indeed, if we’d take three coins, or four, or five, respectively, then we’d have 2= 8, 2= 16, and 2= 32 ways, respectively, of combining them. Now, you’ll agree that, as a measure of the complexity of the situation, the exponents 1, 2, 3, 4 etcetera describe the situation much better than 2, 4, 8, 16 etcetera.

Hence, Shannon defined the so-called information entropy as, in this case,  the base 2 logarithm of the number of possibilities. To be precise, the information entropy of the situation which we’re describing here (i.e. the ways a set of coins can be arranged) is equal to S = N = log2(2N) = 1, 2, 3, 4 etcetera for N = 1, 2, 3, 4 etcetera. In honor of Shannon, the unit is shannons. [I am not joking.] However, information theorists usually talk about bits, rather than shannons. [We’re not talking a computer bit here, although the two are obviously related, as computer bits are binary too.]

Now, one of the many nice things of logarithmic functions is that it’s easy to switch bases. Hence, instead of expressing information entropy in bits, we can also express it in trits (for base 3 logarithms), nats (for base e logarithms, so that’s the natural logarithmic function ln), or dits (for base 10 logarithms). So… Well… Feynman is right in noting that “the logarithm of the number of ways we can arrange the molecules is (the) entropy”, but that statement needs to be qualified: the concepts of information entropy and entropy tout court, as used in the context of thermodynamical analysis, are related but, as usual, they’re also different. :-) Bridging the two concepts involves probability distributions and other stuff. One extremely simple popular account illustrates the principle behind as follows:

Suppose that you put a marble in a large box, and shook the box around, and you didn’t look inside afterwards. Then the marble could be anywhere in the box. Because the box is large, there are many possible places inside the box that the marble could be, so the marble in the box has a high entropy. Now suppose you put the marble in a tiny box and shook up the box. Now, even though you shook the box, you pretty much know where the marble is, because the box is small. In this case we say that the marble in the box has low entropy.

Frankly, examples like this make only very limited sense. They may, perhaps, help us imagine, to some extent, how probability distributions of atoms or molecules might change as the atoms or molecules get more space to move around in. Having said that, I should add that examples like this are, at the same time, also so simplistic they may confuse us more than they enlighten us. In any case, while all of this discussion is highly relevant to statistical mechanics and thermodynamics, I am afraid I have to leave it at this one or two remarks. Otherwise this post risks becoming a course! :-)

Now, there is one more thing we should talk about here. As you’ve read a lot of popular science books, you probably know that the temperature of the Universe is decreasing because it is expanding. However, from what you’ve learnt so far, it is hard to see why that should be the case. Indeed, it is easy to see why the temperature should drop/increase when there’s adiabatic expansion/compression: momentum and, hence, kinetic energy, is being transferred from/to the piston indeed, as it moves out or into the cylinder while the gas expands or is being compressed. But the expanding universe has nothing to push against, does it? So why should its temperature drop? It’s only the volume that changes here, right? And so its entropy (S) should increase, in line with the ΔS = Sb – Sa = S(Vb, T) – S(Va, T) = ΔS = N·k·ln[Vb/Va] formula, but not its temperature (T), which is nothing but the (average) kinetic energy of all of the particles it contains. Right? Maybe.

[By the way, in case you wonder why we believe the Universe is expanding, that’s because we see it expanding: an analysis of the redshifts and blueshifts of the light we get from other galaxies reveals the distance between galaxies is increasing. The expansion model is often referred to as the raisin bread model: one doesn’t need to be at the center of the Universe to see all others move away: each raisin in a rising loaf of raisin bread will see all other raisins moving away from it as the loaf expands.]

Why is the Universe cooling down?

This is a complicated question and, hence, the answer is also somewhat tricky. Let’s look at the entropy formula for an increasing volume of gas at constant temperature once more. Its entropy must change as follows:

ΔS = Sb – Sa = S(Vb, T) – S(Va, T) = ΔS = N·k·ln[Vb/Va]

Now, the analysis usually assumes we have to add some heat to the gas as it expands in order to keep the temperature (T) and, hence, its internal energy (U) constant. Indeed, you may or may not remember that the internal energy is nothing but the product of the number of gas particles and their average kinetic energy, so we can write:

U = N<mv2/2>

In my previous post, I also showed that, for an ideal gas (i.e. no internal motion inside of the gas molecules), the following equality holds: PV = (2/3)U. For a non-ideal gas, we’ve got a similar formula, but with a different coefficient: PV = (γ−1)U. However, all these formulas were based on the assumption that ‘something’ is containing the gas, and that ‘something’ involves the external environment exerting a force on the gas, as illustrated below.


As Feynman writes: “Suppose there is nothing, a vacuum, on the outside of the piston. What of it? If the piston were left alone, and nobody held onto it, each time it got banged it would pick up a little momentum and it would gradually get pushed out of the box. So in order to keep it from being pushed out of the box, we have to hold it with a force F.” We know that the pressure is the force per unit area: P = F/A. So can we analyze the Universe using these formulas?

Maybe. The problem is that we’re analyzing limiting situations here, and that we need to re-examine our concepts when applying them to the Universe. :-)

The first question, obviously, is about the density of the Universe. You know it’s close to a vacuum out there. Close. Yes. But how close? If you google a bit, you’ll find lots of hard-to-read articles on the density of the Universe. If there’s one thing you need to pick up from them, is that, in order for the Universe to expand forever, it should have some critical density (denoted by ρc), which is like a watershed point between an expanding and a contracting Universe.

So what about it? According to Wikipedia, the critical density is estimated to be approximately five atoms (of monatomic hydrogen) per cubic metre, whereas the average density of (ordinary) matter in the Universe is believed to be 0.2 atoms per cubic metre. So that’s OK, isn’t it?

Well… Yes and no. We also have non-ordinary matter in the Universe, which is usually referred to as dark matter in the Universe. The existence of dark matter, and its properties, are inferred from its gravitational effects on visible matter and radiation. In addition, we’ve got dark energy as well. I don’t know much about it, but it seems the dark energy and the dark matter bring the actual density (ρ) of the Universe much closer to the critical density. In fact, cosmologists seem to agree thatρ ≈ ρc and, according to a very recent scientific research mission involving an ESA space observatory doing very precise measurements of the Universe’s cosmic background radiation, the Universe should consist of 4.82 ± 0.05% ordinary matter,25.8 ± 0.4% dark matter and 69 ± 1% dark energy. I’ll leave it to you to challenge that. :-)

OK. Very low density. So that means very low pressure obviously. But what’s the temperature? I checked on the Physics Stack Exchange site, and the best answer is pretty nuanced: it depends on what you want to average. To be precise, the quoted answer is:

  1. If one averages by volume, then one is basically talking about the ‘temperature’ of the photons that reach us as cosmic background radiation—which is the temperature of the Universe that those popular science books refer to. In that case, we get an average temperature of 2.72 degrees Kelvin. So that’s pretty damn cold!
  2. If we average by observable mass, then our measurement is focused mainly on the temperature of all of the hydrogen gas (most matter in the Universe is hydrogen), which has a temperature of a few 10s of Kelvin. Only one tenth of that mass is in stars, but their temperatures are far higher: in the range of 104to 105 degrees. Averaging gives a range of 10to 104 degrees Kelvin. So that’s pretty damn hot!
  3. Finally, including dark matter and dark energy, which is supposed to have even higher temperature, we’d get an average by total mass in the range of 107 Kelvin. That’s incredibly hot!

This is enlightening, especially the first point: we’re not measuring the average kinetic energy of matter particles here but some average energy of (heat) radiation per unit volume. This ‘cosmological’ definition of temperature is quite different from the ‘physical’ definition that we have been using and the observation that this ‘temperature’ must decrease is quite logical: if the energy of the Universe is a constant, but its volume becomes larger and larger as the Universe expands, then the energy per unit volume must obviously decrease.

So let’s go along with this definition of ‘temperature’ and look at an interesting study of how the Universe is supposed to have cooled down in the past. It basically measures the temperature of that cosmic background radiation, i.e. a remnant of the Big Bang, a few billion years ago, which was a few degrees warmer then than it is now. To be precise, it was measured as 5.08 ± 0.1 degrees Kelvin, and this decrease has nothing to do with our simple ideal gas laws but with the Big Bang theory, according to which the temperature of the cosmic background radiation should, indeed, drop smoothly as the universe expands.

Going through the same logic but the other way around, if the Universe had the same energy at the time of the Big Bang, it was all focused in a very small volume. Now, very small volumes are associated with very small entropy according to that S(V, T) = N·k·[ln(V) + ln(T)/(γ–1)] + a formula, but then temperature was not the same obviously: all that energy has to go somewhere, and a lot of it was obviously concentrated in the kinetic energy of its constituent particles (whatever they were) and, hence, a lot of it was in their temperature. 

So it all makes sense now. It was good to check out it out, as it reminds us that we should not try to analyze the Universe as a simple of body of gas that’s not contained in anything in order to then apply our equally simple ideal gas formulas. Our approach needs to be much more sophisticated. Cosmologists need to understand physics (and thoroughly so), but there’s a reason why it’s a separate discipline altogether. :-)


First Principles of Thermodynamics

Thermodynamics is not an easy topic, but one can’t avoid it in physics. The main obstacle, probably, is that we very much like to think in terms of dependent and independent variables. While that approach is still valid in thermodynamics, it is more complicated, because it is often not quite clear what the dependent and independent variables are. We’ve got a lot of quantities in thermodynamics indeed: volume, pressure, internal energy, temperature and – soon to be defined – entropy, which are all some function of each other. Hence, the math involves partial derivatives and other subtleties. Let’s try to get through the basics.

Volume, pressure, temperature and the ideal gas law

We all know what a volume is. That’s an unambiguous quantity. Pressure and temperature are not so unambiguous. In fact, as far as I am concerned, the key to understanding thermodynamics is to be able to not only distinguish but also relate pressure and temperature.

The pressure of a gas or a liquid (P) is the force, per unit area, exerted by the atoms or molecules in that gas or liquid as they hit a surface, such as a piston, or the wall of the body that contains it. Hence, pressure is expressed in newton per square meter: 1 pascal (Pa) = 1 N/m2. It’s a small unit for daily use: the standard atmospheric pressure is 1 atm = 101,325 Pa = 1.01325×105 Pa = 1.01325 bar. We derived the formula for pressure in the previous post:

P = F/A = (2/3)·n·〈m·v2/2〉

This formula shows that the pressure depends on two variables:

  1. The density of the gas or the liquid (i.e. the number of particles per unit volume, so it’s two variables really: a number and a volume), and
  2. Their average kinetic energy.

Now, this average kinetic energy of the particles is nothing but the temperature (T), except that, because of historical reasons, we define temperature (expressed in degrees Kelvin) using a constant of proportionality—the Boltzmann constant k = kB. In addition, in order to get rid of that ugly 2/3 factor in our next formula, we’ll also throw in a 3/2 factor. Hence, we re-write the average kinetic energy 〈m·v2/2〉 as:

〈m·v2/2〉 = (3/2)·k·T

Now we substitute that definition into the first equation (while also noting that, if n is the number of particles in a unit volume, we will have N = n·V atoms in a volume V) to get what we want: the so-called ideal gas law, which you should remember from your high-school days:

PV = NkT

The equation implies that, for a given number of particles (for some given substance, that is), and for some given temperature, pressure and volume are inversely proportional one to another: P = NkT/V. The curve representing that relationship between P and V has the same shape as the reciprocal function y = 1/x. To be precise, it has the same shape as a rectangular hyperbola with the center at the origin, i.e. the shape of an y = m/x curve, assuming non-negative values for x and y only. The illustration below shows that graph for m = 1, 3 and 0.3 respectively. We’ll need that graph later when looking at more complicated graphs depicting processes during which we will not keep temperature constant—so that’s why I quickly throw it in here.


Of course, n·〈m·v2/2〉 is the number of atoms times the average kinetic energy of each and, therefore, it is also the internal energy of the gas. Hence, we can also write the PV = NkT equation as:

PV = (2/3)·U

We should immediately note that we’re considering an ideal gas here, so we disregard any possibility of excitation or motion inside the atoms or molecules. It matters because, if we’re decreasing the volume and, hence, increasing the pressure, we’ll be doing work, and the energy needs to go somewhere. The equation above assumes it all goes into that 〈m·v2/2〉 factor and, hence, into the temperature. Hence, it is obvious that, if were to allow for all kinds of rotational and vibratory motions inside of the atoms or molecules motions also, then the analysis would become more complicated. Having said, in my previous post I showed that the complications are limited: we can account for all kinds of internal motion by inserting another coefficient—i.e. other than 2/3. For example, Feynman calculates it as 2/7, rather than 2/3, for the diatomic oxygen molecule. That is why we usually see a much more general expression of the equation above. We will write:

PV = (γ – 1)·U

The gamma (γ) in the equation above is the rather infamous specific heat ratio, and so it’s equal to 5/3 for the ideal gas (5/3 – 1 = 2/3). I call γ infamous because its theoretical value does not match the experimental value for most gases. For example, while I just noted γ’s theoretical value for O(i.e. he diatomic oxygen molecule) – it’s 9/7 ≈ 1.286, because 9/7 – 1 = 2/7), the experimentally measured value for Ois 1.399. The difference can only be explained using quantum mechanics, which is obviously not the topic of this post, and so we won’t write much about γ. However, I need to say one or two things about it—which I’ll do by showing how we could possibly measure it. Let me reproduce the illustration in my previous post here.

Gas pressureThe pressure is the force per unit area (P = F/A and, hence, F = P·A), and compressing the gas amounts to applying a force over some (infinitesimal) distance dx. Hence, the (differential) work done is equal to dW = F·(−dx) = – P·A·dx = – P·dV, as A·dx = dV, obviously (the area A times the distance dx is the volume change). Now, all the work done goes into changing the internal energy U: there is no heat energy that’s being added or removed here, and no other losses of energy. That’s why it’s referred to as a so-called adiabatic compression, from the Greek a (not), dia (through) and bainein (to go): no heat is going through. The cylinder is thermally insulated. Hence, we write:

dU = – P·dV

This is a very simple differential equation. Note the minus sign: the volume is going to decrease while we do work by compressing the piston, thereby increasing the internal energy. [If you are clever (which, of course, you are), you’ll immediately say that, with increasing internal energy, we should also have an increase in pressure and, hence, we shouldn’t treat P as some constant. You’re right, but so we’re doing a marginal analysis only here: we’ll deal with the full thing later. As mentioned above, the complete picture involves partial derivatives and other mathematical tricks.]

Taking the total differential of U = PV/(γ – 1), we also have another equation:

dU = (P·dV + V·dP)/(γ – 1)

Hence, we have – P·dV = (P·dV + V·dP)/(γ – 1) or, rearranging the terms:

γdV/V + dP/P = 0

Assuming that γ is constant (which is true in theory but not in practice—another reason why this γ is rather infamous), we can integrate this. It gives γlnV + lnP = lnC, with lnC the constant of integration. Now we take the exponential of both sides to get that other formulation of the gas law, which you also may or may not remember from your high-school days:

PVγ = C (a constant)

So here you have the answer to the question as to how we can measure γ: the pressure times the volume to the γth power must be some constant. To be precise, for monatomic gases the pressure times the volume to the 5/3 ≈ 1.67 power must be a constant. The formula works for gases like helium, krypton and argon. However, the issue is more complicated when looking at more complex molecules. You should also note the serious limitation in this analysis: we should not think of P as a constant in the dU = – P·dV equation! But I’ll come back to this. As for now, just take note of it and move on to the next topic.

The Carnot heat engine

The definitions above should help us to understand and distinguish isothermal expansion and compression versus adiabatic expansion and compression which, in turn, should help us to understand what the Carnot cycle is all about. We’re looking at a so-called reversible engine here: there is no friction, and we also assume heat flows ‘frictionless’. The cycle is illustrated below: this so-called heat engine takes an amount of heat (Q1) from a high temperature (T1) heat pad (often referred to as the furnace or the boiler or, more generally, the heat source) and uses it to make some body (i.e. a piston in a cylinder in Carnot’s example) do some work, with some other amount of heat (Q2) goes back into some cold sink (usually referred to as the condenser), which is nothing but a second pad at much lower temperature (T2).

Carnot cycle

The four steps involved are the following:

(1) Isothermal expansion: The gas absorbs heat and expands while keeping the same temperature (T1). As the number of gas atoms or molecules, and their temperature, stays the same, the heat does work, as the gas expands and pushes the piston upwards. So that’s isothermal expansion. The next is different.

(2) Adiabatic expansion: The cylinder and piston are now removed from the heat pad, and the gas continues to expand, thereby doing even more work by pushing the piston further upwards. However, as the piston and cylinder are assumed to be thermally insulated, they neither gain nor lose heat. So it is the gas that loses internal energy: its temperature drops. So the gas cools. How much? It depends on the temperature of the condenser, i.e. T2, or – if there’s no condenser – the temperature of the surroundings. Whatever, the temperature cannot fall below T2.

(3) Isothermal compression: Now we (or the surroundings) will be doing work on the gas (as opposed to the gas doing work on its surroundings). The piston is being pushed back, and so the gas is slowly being compressed while, importantly, keeping it at the same temperature T2. Therefore, it delivers, through the head pad, a heat amount Q2 to the second heat reservoir (i.e. the condenser).

(4) Adiabatic compression: We take the cylinder off the heat pad and continue to compress it, without letting any heat flow out this time around. Hence, the temperature must rise, back to T1. At that point, we can put it back on the first heat pad, and start the Carnot cycle all over again.

The graph below shows the relationship between P and V, and temperature (T), as we move through this cycle. For each cycle, we put in Q1 at temperature T1, and take out Q2 at temperature T2, and then the gas does some work, some net work, or useful work as it’s labeled below.

Carnot cycle graph

Let’s go step by step once again:

  1. Isothermal expansion: Our engine takes in Q1 at temperature T1 from the heat source (isothermal expansion), as we move along line segment (1) from point a to point b on the graph above: the pressure drops, the volume increases, but the temperature stays the same.
  2. Adiabatic expansion: We take the cylinder off the heat path and continue to let the gas expand. Hence, it continues to push the piston, and we move along line segment (2) from point b to c: the pressure further drops, and the volume further increases, but the temperature drops too—from T1 to T2 to be precise.
  3. Isothermal compression: Now we bring the cylinder in touch with the T2 reservoir (the condenser or cold sink) and we now compress the gas (so we do work on the gas, instead of letting the gas do work on its surroundings). As we compress the gas, we reduce the volume and increase the pressure, moving along line segment (3) from c to d, while the temperature of the gas stays at T2.
  4. Adiabatic compression: Finally, we take the cylinder of the cold sink, but we further compress the gas. As its volume further decreases, its pressure and, importantly, its temperature too rises, from T2 to T1 – so we move along line segment 4 from d to – and then we put it back on the heat source to start another cycle.

We could also reverse the cycle. In that case, the steps would be the following:

  1. Our engine would first take in Q2 at temperature T2 (isothermal expansion). We move along line segment (3) here but in the opposite direction: from d to c.
  2. Then we would push the piston to compress the gas (so we’d be doing some work on the gas, rather than have the gas do work on its surroundings) so as to increase the temperature from T2 to T1 (adiabatic compression). On the graph, we go from c to b along line segment (2).
  3. Then we would bring the cylinder in touch with the T1 reservoir and further compress the gas so an amount of heat equal to Q1 is being delivered to the boiler at (the higher) temperature T1 (isothermal compression). So we move along line segment (1) from b to a.
  4. Finally, we would let the gas expand, adiabatically, so the temperature drops, back to T(line segment (4), from a to d), so we can put it back on the T2 reservoir, on which we will let it further expand to take in Q2 again.

It’s interesting to note that the only reason why we can get the machine to do some net work (or why, in the reverse case, we are able to transfer heat by putting some work into some machine) is that there is some mechanism here that allows the machine to take in and transfer heat through isothermal expansion and compression. If we would only have adiabatic expansion and compression, then we’d just be going back and forth between temperature T1 and T2 without getting any net work out of the engine. The shaded area in the graph above then collapses into a line. That is why actual steam engines are very complicated and involve valves and other engineering tricks, such as multiple expansion. Also note that we need two heat reservoirs: we can imagine isothermal expansion and compression using one heat reservoir only but then the engine would also not be doing any net work that is useful to us.

Let’s analyze the work that’s being doing during such Carnot cycle somewhat more in detail.

The work done when compressing a gas, or the work done by a gas as it expands, is an integral. I won’t explain in too much detail here but just remind you of that dW = F·(−dx) = – P·A·dx = – P·dV formula. From this, it’s easy to see that the integral is ∫ PdV.

An integral is an area under a curve: just substitute P for y = f(x) and V for x, and think of ∫ f(x)dx = ∫ y dx. So the area under each of the numbered curves is the work done by or on the gas in the corresponding step. Hence, the net work done (i.e. the so-called useful workis the shaded area of the picture. 

So what is it exactly?

Well… Assuming there are no other losses, the work done should, of course, be equal to the difference in the heat that was put in, and the heat that was taken out, so we write:

W = Q– Q2

So that’s key to understanding it all: an efficient (Carnot) heat engine is one that converts all of the heat energy (i.e. Q– Q2) into useful work or, conversely, which converts all of the work done on the gas into heat energy.

Schematically, Carnot’s reversible heat engine is represented as follows:

Heat engine

So what? You may we’ve got it all now, and that there’s nothing to add to the topic. But that’s not the case. No. We will want to know more about the exact relationship between Q1, Q2, Tand T2. Why? Because we want to be able to answer the very same questions Sadi Carnot wanted to answer, like whether or not the engine could be made more efficient by using another liquid or gas. Indeed, as a young military engineer, fascinated by the steam engines that had – by then – become quite common, Carnot wanted to find an unambiguous answer to two questions:

  1. How much work can we get out of a heat source? Can all heat be used to do useful work?
  2. Could we improve heat engines by replacing the steam with some other working fluid or gas?

These questions obviously make sense, especially in regard to the relatively limited efficiency of steam engines. Indeed, the actual efficiency of the best steam engines at the time was only 10 to 20 percent, and that’s under favorable conditions!

Sadi Carnot attempted to answer these in a memoir, published as a popular work in 1824 when he was only 28 years old. It was entitled Réflexions sur la Puissance Motrice du Feu (Reflections on the Motive Power of Fire). Let’s see if we can make sense of it using more modern and common language. [As for Carnot’s young age, like so many, he was not destined to live long: he was interned in a private asylum in 1832 suffering from ‘mania’ and ‘general delirium’, and died of cholera shortly after, aged 36.]

Carnot’s Theorem

You may think that both questions have easy answers. The first question is, obviously, related to the principle of conservation of energy. So… Well… If we’d be able to build a frictionless Carnot engine, including a ‘frictionless’ heat transfer mechanism, then, yes, we’d be able to convert all heat energy into useful work. But that’s an ideal only.

The second question is more difficult. The formal answer is the following: if an engine is reversible, then it makes no difference how it is designed. In other words, the amount of work that we’ll get out of a reversible Carnot heat engine as it absorbs a given amount of heat (Q1) at temperature Tand delivers some other amount of heat (Q2) at some other temperature T does not depend on the design of the machine. More formally, Carnot’s Theorem can be expressed as follows:

  1. All reversible engines operating between the same heat reservoirs are equally efficient.
  2. No actual engine operating between two heat reservoirs can be more efficient than a Carnot engine operating between the same reservoirs.

Feynman sort of ‘proves’ this Theorem from what he refers to as Carnot’s postulate. However, I feel his ‘proof’ is not a real proof, because Carnot’s postulate is too closely related to the Theorem, and so I feel he’s basically proving something using the result of the proof! However, in order to be complete, I did reproduce Feynman’s ‘proof’ of Carnot’s Theorem in the post scriptum to this post.

So… That’s it. What’s left to do is to actually calculate the efficiency of an ideal reversible Carnot heat engine, so let’s do that now. In fact, the calculation below is much more of a real proof of Carnot’s Theorem and, hence, I’d recommend you go through it.

The efficiency of an ideal engine

Above, I said I would need the result that PVγ is equal to some constant. We do, in the following proof that, for an ideal engine, the following relationship holds, always, for any Q1, Q2, T1 and T2:

Q1/T= Q2/T2


Now, we still don’t have the efficiency with this. The efficiency of an ideal engine is the ratio of the amount of work done and the amount of heat it takes in:

Efficiency = W/Q1

But W is equal to Q– Q2. Hence, re-writing the equation with the two heat/temperature ratios above as Q= (T/T1)·Q1, we get: W = Q1(1 –  T/T1) = Q1(T1 –  T)/T1. The grand result is:

Efficiency = W/Q= (T1 –  T)/T1

Let me help you to interpret this result by inserting a graph for T1 going from zero to 20 degrees, and for T2 set at 0.3, 1 and 4 degrees respectively.

graph efficiency

The graph makes it clear we need some kind of gauge so as to be able to actually compare the efficiency of ideal engines. I’ll come back to that in my next post. However, in the meanwhile, please note that the result makes sense: Tneeds to be higher than Tfor the efficiency to be positive (of course, we can interpret negative values for the efficiency just as well, as they imply we need to do work on the engine, rather than the engine doing work for us), and the efficiency is always less than unity, getting closer to one as the working temperature of the engine goes up.

Where does the power go?

So we have an engine that does useful work – so it works, literally – and we know where it gets its energy for that: it takes in more heat than it returns. But where is the work going? It is used to do something else, of course—like moving a car. Now how does that work, exactly? The gas exerts a force on the piston, thereby giving it an acceleration a = F/m, in accordance with Newton’s Law: F = m·a.

That’s all great. But then we need to re-compress the gas and, therefore, we need to (a) decelerate the piston, (b) reverse its direction and (c) push it back in. So that should cancel all of the work, shouldn’t it?

Well… No.

Let’s look at the Carnot cycle once more to show why. The illustrations below reproduce the basic steps in the cycle and the diagram relating pressure, volume and temperature for each of the four steps once more.

Carnot cycle Carnot cycle graph

Above, I wrote that the only reason why we can get the machine to do some net work (or why, in the reverse case, we are able to transfer heat from lower to higher temperature by doing some (net) work on it) is that there is some mechanism here that allows the machine to take in and transfer heat through isothermal expansion and compression and that, if we would only have adiabatic expansion and compression, then we’d just be going back and forth between temperature T1 and T2 without getting any net work out of the engine.

Now, that’s correct and incorrect at the same time. Just imagine a cylinder and a piston in equilibrium, i.e. the pressure on the inside and the outside of the piston are the same. Then we could push it in a bit but, as soon as we release, it would come back to its equilibrium situation. In fact, as we assume the piston can move in and out without any friction whatsoever, we’d probably have a transient response before the piston settles back into the steady state position (see below). Hence, we’d be moving back and forth on segment (2), or segment (4), in that P-V-T diagram above.


The point is: segment (2) and segment (4) are not the same: points a and b, and points c and d, are marked by the same temperature (T1 and Trespectively) butpressure and volume is very different. Why? Because we had a step in-between step (2) and (4): isothermal compression, which reduced the volume, i.e. step (3). Hence, the area underneath these two segments is different too. Indeed, you’ll remember we can write dW = F·(−dx) = – P·A·dx = – P·dV and, hence, the work done (or put in) during each step of the cycle is equal to the integral ∫ PdV, so that’s the area under each of the line segments. So it’s not like these two steps do not contribute to the net work that’s being done through the cycle. They do. Likewise, step (1) and (3) are not each other’s mirror image: they too take place at different volume and pressure, but that’s easier to see because they take place at different temperature and involve different amounts of heat (Q1 and Qrespectively).

But, again, what happens to the work? When everything is said and done, the piston does move up and down over the same distance in each cycle, and we know that work is force times distance. Hence, if the distance is the same… Yes. You’re right: the piston must exert some net force on something or, to put it differently, the energy W = Q1 − Qmust go somewhere. Now that’s where the time variable comes in, which we’ve neglected so far.

Let’s assume we connect the piston to a flywheel, as illustrated below, there had better be some friction on it because, if not, the flywheel would spin faster and faster and, eventually, spin out of control and all would break down. Indeed, each cycle would transfer additional kinetic energy to the flywheel. When talking work and kinetic energy, one usually applies the following formula: W = Q1 and Q= Δ[mv2/2] = [mv2/2]after − [mv2/2]before. However, we’re talking rotational kinetic energy so we should use the rotational equivalent for mv2/2, which is Iω2/2, in which I is the moment of inertia of the mass about the center of rotation and ω is the angular velocity.


You get the point. As we’re talking time now, we should also remind you of the concept of power. Power is the amount of work or energy being delivered, or consumed, per unit of time (i.e. per second). So we can write it as P(t) = dW/dt. For linear motion, P(t) can be written as the vector product (I mean the scalar, inner or dot product here) of the force and velocity vectors, so P(t) = F·v. Again, when rotation is involved, we’ve got an equivalent formula: P(t) = τ·ω, in which τ represents the torque and ω is, once again, the angular velocity of the flywheel. Again, we’d better ensure some load is placed on the engine, otherwise it will spin out of control as vand/or ω get higher and higher and, hence, the power involved gets higher and higher too, until all breaks down.

So… Now you know it all. :-)

Post scriptum: The analysis of the Carnot cycle involves some subtleties which I left out. For example, you may wonder why the gas would actually expand isothermically in the first step of the Carnot cycle. Indeed, if it’s at the same temperature Tas the heat source, there should be no heat flow between the heat pad and the gas and, hence, no gas expansion, no? Well… No. :-) The gas particles pound on every wall, but only the piston can move. As the piston moves out, frictionless, inside of the cylinder, kinetic energy is being transferred from the gas particles to the piston and, hence, the gas temperature will want to drop—but then that temperature drop will immediately cause a heat transfer. That’s why the description of a Carnot engine also postulates ‘frictionless’ heat transfer.

In fact, I note that Feynman himself struggles a bit to correctly describe what’s going on here, as his description of the Carnot cycle suggests some active involvement is needed to make the piston move and ensure the temperature does not drop too fast. Indeed, he actually writes following: “If we pull the piston out too fast, the temperature of the gas will fall too much below T and then the process will not quite be reversible.” This sounds, and actually is, a bit nonsensical: no pulling is needed, as the gas does all of the work while pushing the piston and, while it does, its temperature tends to drop, so it will suck it heat in order to equalize its temperature with its surroundings (i.e. the heat source). The situation is, effectively, similar to that of a can with compressed air: we can let the air expand, and thereby we let it do some work. However, the air will not re-compress itself by itself. To re-compress the air, you’ll need to apply the same force (or pressure I should say) but in the reverse direction.

Finally, I promised I would reproduce Feynman’s ‘proof’ of Carnot’s Theorem. This ‘proof’ involves the following imaginary set-up (see below): we’ve got engines, A and B. We assume A is an ideal reversible engine, while B may or may not be reversible. We don’t care about its design. We just assume that both can do work by taking a certain amount of heat out of one reservoir and putting another amount of heat back into another reservoir. In fact, in this set-up, we assume both engines share a large enough reservoir so as to be able to transfer heat through that reservoir.

Ideal engine

Engine A can take an amount of heat equal to Qat temperature T1 from the first reservoir, do an amount of work equal to W, and then deliver an amount of heat equal to Q= Q– W at temperature T2 to the second reservoir. However, because it’s a reversible machine, it can also go the other way around, i.e. it can take Q= Q– W from the second reservoir, have the surroundings do an amount of work W on it, and then deliver Q= Q+ W at temperature T1. We know that engine B can do the same, except that, because it’s different, the work might be different as well, so we’ll denote it by W’.

Now, let us suppose that the design of engine B is, somehow, more efficient, so we can get more work out of B for the same Qand the same temperatures Tand T2. What we’re saying, then, is that W’ – W is some positive non-zero amount. If that would be true, we could combine both machines. Indeed, we could have engine B take Qfrom the reservoir at temperature T1, do an amount of work equal to W on engine A so it delivers the same amount Qback to the reservoir at the same temperature T1, and we’d still be left with some positive amount of useful work W’ – W. In fact, because the amount of heat in the first reservoir is restored (in each cycle, we take Qout but we also put the same amount of heat back in), we could include it as part of the machine. It would no longer need to be some huge external thing with unlimited heat capacity.

So it’s great! Each cycle gives us an amount of useful work equal to W’ – W. What about the energy conservation law? Well… engine A takes Q– W from the reservoir at temperature T2, and engine B gives Q– W’ back to it, so we’re taking a net amount of heat equal to (Q– W) – (Q– W’) = W’ – W out of the T2 reservoir. So that works out too! So we’ve got a combined machine converting thermal energy into useful work. It looks like a nice set-up, doesn’t it?

Yes. The problem is that, according to Feynman, it cannot work. Why not? Because it violates Carnot’s postulate. The reasoning here is not easy. Let’s me try to do my best to present the argument correctly. What’s the problem? The problem is that we’ve got an engine here that operates at one temperature only. Now, according to Carnot’s postulate, it is not possible to extract the energy of heat at a single temperature with no other change in the system or the surroundings. Why not? Feynman gives the example of the can with compressed air. Imagine a can of compressed air indeed, and imagine we let the air expand, to drive a piston, for example. Now, we can imagine that our can with compressed air was in touch with a large heat reservoir at the same temperature, so its temperature doesn’t drop. So we’ve done work with that can at a single temperature. However, this doesn’t violate Carnot’s postulate because we’ve also changed the system: the air has expanded. It would only violate Carnot’s postulate if we’d find a way to put the air back in using exactly the same amount of work, so the process would be fully reversible. Now, Carnot’s postulate says that’s not possible at the same temperature. If the whole world is at the same temperature, then it is not possible to reversibly extract and convert heat energy into work.

I am not sure the example of the can with compressed air helps, but Feynman obviously thinks it should. He then phrases Carnot’s postulate as follows: “It is not possible to obtain useful work from a reservoir at a single temperature with no other changes.” He therefore claims that the combined machine as described above cannot exist. Ergo, W’ cannot be greater than W. Switching the role of A and B (so B becomes reversible too now), he concludes that W can also not be greater than W’. Hence, W and W’ have to be equal.

Hmm… I know that both philosophers and engineers have worked tirelessly to try to disprove Carnot’s postulate, and that they all failed. Hence, I don’t want to try to disprove Carnot’s postulate. In fact, I don’t doubt its truth at all. All that I am saying here is that I do have my doubts on the logical rigor of Feynman’s ‘proof’. It’s like… Well… It’s just a bit too tautological I’d say.

First Principles of Thermodynamics

First Principles of Statistical Mechanics

Feynman seems to mix statistical mechanics and thermodynamics in his chapters on it. At first, I thought all was rather messy but, as usual, after re-reading it a couple of times, it all makes sense. Let’s have a look at the basics. We’ll start by talking about gas first.

The ideal gas law

The pressure P is the force we have to apply to the piston containing the gas (see below)—per unit area, that is. So we write: P = F/A. Compressing the gas amounts to applying a force over some (infinitesimal) distance dx. Hence, we can write:

dU = F·(−dx) = – P·A·dx = – P·dV

Gas pressure

However, before looking at the dynamics, let’s first look at the stationary situation: let’s assume the volume of the gas does not change, and so we just have the gas atoms bouncing of the piston and, hence, exerting pressure on it. Every gas atom or particle delivers a momentum 2mvto the piston (the factor 2 is there because the piston does not bounce back, so there is no transfer of momentum). If there are N atoms in the volume N, then there are n = N/V in each unit volume. Of course, only the atoms within a distance vx·t are going to hit the piston within the time t and, hence, the number of atoms hitting the piston within that time is n·A·vx·t. Per unit time (i.e. per second), it’s n·A·vx·t/t = n·A·vx. Hence, the total momentum that’s being transferred per second is n·A·vx·2mvx.

So far, so good. Indeed, we know that the force is equal to the amount of momentum that’s being transferred per second. If you forget, just check the definitions and units: a force of 1 newton gives an mass of 1 kg an acceleration of 1 m/s per second, so 1 N = 1 kg·m/s= 1 kg·(m/s)/s. [The kg·(m/s) unit is the unit of momentum (mass times velocity), obviously. So there we are.] Hence,

P = F/A = n·A·vx·2mvx/A = 2nmvx2

Of course, we need to take an average 〈vx2〉 here, and we should drop the factor 2 because half of the atoms/particles move away from the piston, rather than towards it. In short, we get:

P = F/A = nm〈vx2

Now, the average velocity in the x-, y- and z-direction are all the same and uncorrelated, so 〈vx2〉 = 〈vy2〉 = 〈vz2〉 = [〈vx2〉 + 〈vy2〉 + 〈vz2〉]/3 = 〈v2〉/3. So we don’t worry about any direction and simply write:

P = F/A = (2/3)·n·〈m·v2/2〉

[As Feynman notes, the math behind this are not difficult but, at the same time, also less straightforward than you may think.] The last factor is, obviously, the kinetic energy of the (center-of-mass) motion of the atom or particle. Multiplying by V gives:

P·V = (2/3)·N·〈m·v2/2〉 = (2/3)U

Now, that’s not a law you’ll remember from your high school days because… Well… The internal energy of a gas – how do you measure that? We should link it to a measure we do know, and that’s temperature. The atoms or molecules in a gas will have an average kinetic energy which we could define as… Well… That average should have been defined as the temperature but, for historical reasons, the scale of what we know as the ‘temperature’ variable (T) is different. We need to apply a conversion factor, which is usually written as k. To be precise, we’ll write the mean atomic or molecular energy as (3/2)·kT = 3kT/2. The 3/2 factor has been thrown in here to get rid of it later (in a few seconds, that is), and you should also remember that we have three independent directions of motion, and that the magnitude of the component of motion in any of the three directions x, y or z is 1/2 kT = (3kT/2)/3 = kT/2.

I said we’d get rid of that 3/2 factor. Indeed, applying the above-mentioned definition of temperature, we get:

P·V = N·k·T

That k factor is a constant of proportionality, which makes the units come out alright. U is energy, indeed, and, hence, measured in joule (J). N is a pure number, so k is expressed in joule per degree (Kelvin). To be precise, k is (about) 1.38×10−23 joule for every degree Kelvin, so it’s a very tiny constant: it’s referred to as the Boltzmann constant and it’s usually denoted with a little B as subscript (kB). As for how the product of pressure and volume can (also) yield something in joule, you can work that out for yourself, remembering the definition of a joule.

One immediate implication of the formula above is that gases at the same temperature and pressure, in the same volume, must consist of an equal number of atoms/molecules. You’ll say: of course – because you remember that from your high school classes. However, thinking about it some more – and also in light of what we’ll be learning a bit later on gases composed of more complex molecules (diatomic molecules, for example) – you’ll have to admit it’s not all that obvious as a result.

Now, the number of atoms/molecules is usually measured in moles: one mole (or mol) is 6.02×1023 units (more or less, that is). To be somewhat more precise, its CODATA value is 6.02214129(27)×1023. It is defined as the amount of any substance that contains as many elementary entities (e.g. atoms, molecules, ions or electrons) as there are atoms in 12 grams of pure carbon-12 (12C), the isotope of carbon with relative atomic mass of exactly 12 (also by definition). The number corresponds to the Avogadro constant, and it’s one of the base units in the International Systems of Units, usually denoted by n or N0.

Now, if we reinterpret N as the number of moles, rather than the number of atoms, ions or molecules in a gas, we can re-write the same equation using the so-called universal or ideal gas constant, which is equal to R = (1.38×10−23 joule)×(6.02×1023/mol) per degree Kelvin = 8.314 J·K−1·mol−1. In short, the ideal gas constant is the product of two other constants: the Boltzmann constant (kB) and the Avogadro number (N0). So we get:

P·V = N·R·T with N = no. of moles and R = kB·N0

The ideal gas law and internal motion

There’s an interesting and essential remark to be made in regard to complex molecules in a gas. The simpler example of a complex molecule is a diatomic molecule, consisting of two parts, which we’ll denote by A and B, with mass mand mrespectively. A and B are together but are able to oscillate or move relative to one another. In short, we also have some internal motion here, in addition to the motion of the whole thing, which will also has some kinetic energy. Hence, the kinetic energy of the gas consists of two parts:

  1. The kinetic energy of the so-called center-of-mass motion of the whole thing (i.e. the molecule), which we’ll denote by M = m+ mB, and
  2. The kinetic energy of the rotational and vibratory motions of the two atoms (A and B) inside the molecule.

We noted that for single atoms the mean value of the kinetic energy in one direction is kT/2 and that the total kinetic energy is 3kT/2, i.e. three times as much. So what do we have here? Well… The reasoning we followed for the single atoms is also valid for the diatomic molecule considered as a single body of total mass M and with some center-of-mass velocity vCM. Hence, we can write that

M·vCM2/2 = (3/2)·kT

So that’s the same, regardless of whether or not we’re considering the separate pieces or the whole thing. But let’s look at the separate pieces now. We need some vector analysis here, because A and B can move in separate directions, so we have vand v(note the boldface used for vectors). So what’s the relation between vand von the one hand, and vCM on the other? The analysis is somewhat tricky here but – assuming that the vand vB representations themselves are some idealization of the actual rotational and vibratory movements of the A and B atoms – we can write:

   vCM = (mAv+ mBvB)/M

Now we need to calculate 〈vCM2〉, of course, i.e. the average velocity squared. I’ll refer you to Feynman for the details which, in the end, do lead to that M·vCM2/2 = (3/2)·kT equation. The whole calculation depends on the assumption that the relative velocity wvvis not any more likely to point in one direction than another, so its average component in any direction is zero. Indeed, the interim result is that

M·vCM2/2 = (3/2)·kT + 2mAmBvA·vB〉/M

Hence, one needs to prove, somehow, that 〈vA·vB〉 is zero in order to get the result we want, which is what that assumption about the relative velocity w ensures. Now, we still don’t have the kinetic energy of the A and B parts of the molecule. Because A and B can move in all three directions in space, their average kinetic energy 〈mA·vA2/2〉 and  〈mB·vB2/2〉 is also 3·k·T/2. Now, adding 3·k·T/2 and 3·k·T/2 yields 3kT. So now we have what we wanted:

  1. The kinetic energy of the center-of-mass motion of the diatomic molecule is (3/2)·k·T.
  2. The total energy of the diatomic molecule is the sum of the energies of A and B, and so that’s 3·k·T/2 + 3·k·T/2 = 3 k·T.
  3. The kinetic energy of the internal rotational and vibratory motions of the two atoms (A and B) inside the molecule is the difference, so that’s 3·k·T – (3/2)·k·T = (3/2)·k·T.

The more general result can be stated as follows:

  1. A r-atom molecule in a gas will have a kinetic energy of (3/2)·r·k·T, on average, of which
  2. 3/2·k·T is kinetic energy of the center-of-mass motion of the entire molecule,
  3. The rest, (3/2)·(r−1)·k·T, is internal vibrational and rotational kinetic energy.

Another way to state is that, for an r-atom molecule, we find that the average energy for each ‘independent direction of motion’, i.e. for each degree of freedom in the system, is kT/2, with the number of degrees of freedom being equal to 3r.

So in this particular case (example of a diatomic molecule), we have 6 degrees of freedom (two times three), because we have three directions in space for each of the two atoms. A common error is to consider the center-of-mass energy as something separate, rather than including it as a part of the total energy. Remember: the total kinetic energy is, quite simply, the sum of the kinetic energies of the separate atoms, which can be separated into (1) the kinetic energy associated with the center-of-mass motion and (2) the kinetic energy of the internal motions.

You see? It’s not that difficult. Let’s move on to the next topic.

The exponential atmosphere

Feynman uses this rather intriguing title to introduce Boltzmann’s Law, which is a law about densities. Let’s jot it down first:

n = n0·e−P.E/kT

In this equation, P.E. is the potential energy, with k and T our Boltzmann constant and the temperature expressed in Kelvin. As for n0, that’s just a constant which depends on the reference point (P.E. = 0). What are we calculating here? Densities, so that’s the relative or absolute number of molecules per unit volume, so we look for a formula for a variable like n = N/V.

Let’s do an example: the ‘exponential’ atmosphere. :-) Feynman models our ‘atmosphere’ as a huge column of gas (see below). To simplify the analysis, we make silly assumptions. For example, we assume the temperature is the same at all heights. That’s assured by the mechanism for equalizing temperature: if the molecules on top would have less energy than those at the bottom, the molecules at the bottom would shake the molecules at the top, via the rod and the balls. That’s a very theoretical set-up, of course, but let’s just go along with it. The idea is that the average kinetic energy of all molecules is the same. It makes for a much easier mathematical analysis. 


So what’s different? The pressure, of course, which is determined by the number of molecules per unit volume. The pressure must increase with lower altitude because it has to hold, so to speak, the weight of all the gas above it. Conversely, as we go higher, the atmosphere becomes more tenuous. So what’s the ‘law’ or formula here?

We’ll use our gas law: PV = NkT, which we can re-write as P = nkT with n = N/V, so n is the number of molecules per unit volume indeed. What’s stated here is that the pressure (P) and the number of molecules per unit volume (n) are directly proportional, with kT the proportionality factor. So we have gravity (the g force) and we can do a differential analysis: what happens when we go from h to h + dh? If m is the mass of each molecule, and if we assume we’re looking at unit areas (both at h as well as h + dh), then the gravitational force on each molecule will be mg, and ndh will be the total number of molecules in that ‘unit section’.

Now, we can write dP as dP = Ph+dh − Ph and, of course, we know that the difference in pressure must be sufficient to hold, so to speak, the molecules in that small unit section dh. So we can write the following:

dP = Ph+dh − Ph = − m·g·n·dh

Now, P is P = nkT and, hence, because we assume T to be constant, we can write the whole equation as dP = k·T·d= − m·g·n·dh. From that, we get a differential equation:

dn/d= − (m·g)/(k·T)·n

We all hate differential equations, of course, but this one has an easy solution: the equation basically states we should find a function for n which has a derivative which is proportional to itself. The exponential function is such function, so the solution of the differential equation is:

n = n0·e−mgh/kT

n0 is the constant of integration and is, as mentioned above, the density at h = 0. Also note that mgh is, indeed, the potential energy of the molecules, increasing with height. So we have a Boltzmann Law indeed here, which we can write as n = n0·e−P.E/kT. Done ! The illustration below was also taken from Feynman, and illustrates the ‘exponential atmosphere’ for two gases: oxygen and hydrogen. Because their mass is very different, the curve is different too: it shows how, in theory and in practice, lighter gases will dominate at great heights, because the exponentials for the heavier stuff have all died out.



It is easy to show that we’ll have a Boltzmann Law in any situation where the force comes from a potential. In other words, we’ll have a Boltzmann Law in any situation for which the work done when taking a molecule from x to x + dx can be represented as potential energy. An example would be molecules that are electrically charged and attracted by some electric field or another charge that attracts them. In that case, we have an electric force of attraction which varies with position and acts on all molecules. So we could take two parallel planes in the gas, separated by a distance dx indeed, and we’d have a similar situation: the force on each atom, times the number of atoms in the unit section that’s delineated by dx, would have to be balanced by the pressure change, and we’d find a similar ‘law’: n = n0·e−P.E/kT.

Let’s quickly show it. The key variable is the density n, of course: n = N/V. If we assume volume and temperature remain constant, then we can use our gas law to write the pressure as P = NkT/V = kT·n, which implies that any change in pressure must involve a density change. To be precise, dP = d(kT·n) = kT·dn. Now, we’ve got a force, and moving a molecule from x to x + dx involves work, which is the force times the distance, so the work is F·dx. The force can be anything, but we assume it’s conservative, like the electromagnetic force or gravity. Hence, the force field can be represented by a potential and the work done is equal to the change in potential energy. Hence, we can write: Fdx = –d(P.E.). Why the minus sign? If the force is doing work, we’re moving with the force and, hence, we’ll have a decrease in potential energy. Conversely, if the surroundings are doing work against the force, we’ll increase potential energy.

Now, we said the force must be balanced by the pressure. What does that mean, exactly? It’s the same analysis as the one we did for our ‘exponential’ atmosphere: we’ve got a small slice, given by dx, and the difference in pressure when going from x to x + dx must be sufficient to hold, so to speak, the molecules in that small unit section dx. [Note we assume we’re talking unit areas once again.] So, instead of writing dP = Ph+dh − Ph = − m·g·n·dh, we now write dP = F·n·dx. So, when it’s a gravitational field, the magnitude of the force involved is, obviously, F = m·g.

The minus sign business is confusing, as usual: it’s obvious that dP must be negative for positive dh, and vice versa, but here we are moving with the force, so no minus sign is needed. If you find that confusing, let me give you another way of getting that dP = F·n·dx expression. The pressure is, quite simply, the force times the number of particles, so P = F·N. Dividing both sides by V yields P/V = F·N/V = F·n. Therefore, P = F·n·V and, hence, dP must be equal to dP = d(F·n·V) = F·n·dV = F·n·dx. [Again, the assumption is that our unit of analysis is the unit area.] OK. I need to move on. Combining (1) dP = d(kT·n) = kT·dn, (2) dP = F·n·dx and (3) Fdx = –d(P.E.), we get:

kT·dn = –d(P.E.)·n ⇔ dn/d(P.E.) = [1/(kT)]·n

That’s a differential equation that’s easy to solve. We’ve repeated it ad nauseam: a function which has a derivative proportional to itself is an exponential. Hence, we have our grand equation:

n = n0·e−P.E/kT

If the whole thing troubles you, just remember that the key to solving problems like this is to clearly identify and separate the so-called ‘dependent’ and ‘independent’ variables. In this case, we want a formula for n and, hence, it’s potential energy that’s the ‘independent’ variable. That’s all. The graph looks the same, of course: the density is greatest at P.E. = 0. To be precise, the density there will be equal to n = n0·e= n0 (don’t think it’s infinity there!). And for higher (potential) energy values, we get lower density values. It’s a simple but powerful graph, and so you should always remember it.


Boltzmann’s Law is a very simple law but it can be applied to very complicated situations. Indeed, while the law is simple, the potential energy curve can be very complicated. So our Law can be applied to other situations than gravity or the electric force. The potential can combine a number of forces (as long as they’re all conservative), as shown in the graph below, which shows a situation in which molecules will attract each other at a distance r > r(and, hence, their potential energy decreases as they come closer together), but repel each other strongly as r becomes smaller than r(so potential energy increases, and very much so as we try to force them on top of each other).

Potential energy

Again, despite the complicated shape of the curve, the density function will follow Boltzmann’s Law: in a given volume, the density will be highest at the distance of minimum energy, and the density will be much less at other distances, thereby respecting the e−P.E/kT distribution, in which the potential energy and the temperature are the only variables. So, yes, Boltzmann’s Law is pretty powerful !

First Principles of Statistical Mechanics