Magnetostatics: the vector potential

This and the next posts are supposed to wrap up a few loose ends on magnetism. One of these loose ends is the (magnetic) vector potential, which we introduced in our post on gauge transformations, but then we didn’t do much with it. Another topic I neglected so far is that of the magnetic dipole moment (as opposed to the electric dipole moment), which is an extremely important concept both in classical as well as in quantum mechanics. So let’s do the vector potential here, and the magnetic dipole moment in the next. 🙂

Let’s go for it. Let me recall the basics which, as usual, are just Maxwell’s equations. You’ll remember that the electrostatic field was curl-free: ∇×E = 0, everywhere. Therefore, we can apply the following mathematical theorem: if the curl of a vector field is zero (everywhere), then the vector field can be represented as the gradient of some scalar function:

if ∇×C = 0, then there is some Ψ for which C = ∇Ψ

Substituting C for E, and taking into account our conventions on charge and the direction of flow, we wrote:

E = –∇Φ

Φ (phi) is referred to as the electric potential. Combining E = –∇Φ with Gauss’ Law – ∇•E = ρ/ε₀ − we got Poisson’s equation:

∇²Φ = −ρ/ε₀

So that equation sums up all of electrostatics. Really: that’s it! 🙂

Now, the two equations for magnetostatics are: ∇•B = 0 and c²∇×B = j/ε₀. Let me say something more about them:

The ∇•B = 0 equation is true, always, unlike the ∇×E = 0 expression, which is true for electrostatics only (no moving charges).
The ∇•B = 0 equation says the divergence of B is zero, always. Now, you can verify for yourself that the divergence of the curl of a vector field is always zero, so div (curl A) = ∇•(∇×A) = 0, always. Therefore, there’s another theorem that we can apply. It says the following: if the divergence of a vector field, say D, is zero – so if ∇•D = 0, then $D will be the$ the curl of some other vector field $C, so we can write: D = \nabla \times C . Applying this to \nabla • B = 0, we can write:$

If ∇•B = 0, then there is an A such that B = ∇×A

We can also write this as follows: ∇·B = ∇·(∇×A) = 0 and, hence, B = ∇×A. Now, it’s this vector field A that is referred to as the (magnetic) vector potential, and so that’s what we want to talk about here. As a start, it may be good to write all of the components of our B = ∇×A vector:

Note that we have no ‘time component’ because we assume the fields are static, so they do not change with time. Now, because that’s a relatively simple situation, you may wonder whether we really simplified anything with this vector potential. B is a vector with three components, and so is A. The answer to that question is somewhat subtle, and similar to what we did for electrostatics: it’s mathematically convenient to use A, and then calculate the derivatives above to find B. So the number of components doesn’t matter really: it’s just more convenient to first get A using our data on the currents j, and then we get B from A.

That’s it really. Let me show you how it works. The whole argument is somewhat lengthy, but it’s not difficult, and once it’s done, it’s done. So just carry on and please bear with me 🙂

First, we need to put some constraints on A, because the B = ∇×A equation does not fully define A. It’s like the scalar potential Φ: any Φ’ = Φ + C was as good a choice as Φ (with C any constant), so we needed a reference point Φ = 0, which we usually took at infinity. With the vector potential A, we have even more latitude: we can not only add a constant but any field which is the gradient of some scalar field, so any A’ = A + ∇Ψ will do. Why? Just write it all out: ∇×(A + ∇Ψ) = ∇×A + ∇×(∇Ψ). But the curl of the gradient of a scalar field (or a scalar function) is always zero (you can check my post on vector calculus on this), so ∇×(∇Ψ) = 0 and so ∇×(A + ∇Ψ) = ∇×A + ∇×(∇Ψ) = ∇×A + 0 = ∇×A = B.

So what constraints should we put on our choice of A? The choice is, once again, based on mathematical convenience: in magnetostatics, we’ll choose A such that ∇•A = 0. Can we do that? Yes. The A’ = A + ∇Ψ flexibility allows us to make ∇•A’ anything we wish, and so A and A’ will have the same curl, but they don’t need to have the same divergence. So we can choose an A’ so ∇•A’ = 0, and then we denote A’ by A. 🙂 So our ‘definition’ of the vector potential A is now:

B = ∇×A and ∇•A = 0

I have to make two points here:

First, you should note that, in my post on gauges, I mentioned that the choice is different when the time derivatives of E and B are not equal to zero, so when we’re talking changing currents and charge distributions, so that’s dynamics. However, that’s not a concern here.
To be fully complete, I should note that the ‘definition’ above does still not uniquely determine A. For a unique specification, we also need some reference point, or say how the field behaves on some boundary, or at large distances. It is usually convenient to choose a field which goes to zero at large distances, just like our electric potential.

Phew! We’ve said so many things about A now, but nothing that has any relevance to how we’d calculate A. 😦 So we are we heading here?

Fortunately, we can go a bit faster now. The c²∇×B = j/ε₀ equation and our B = ∇×A give us:

c²∇×(∇×A) = j/ε₀

Now, there’s this other vector identity, which you surely won’t remember either—but trust me: I am not lying: ∇×(∇×A) = ∇(∇•A) − ∇²A. So, now you see why we choose A such that ∇•A = 0 ! It allows us to write:

c²∇×(∇×A) = − c²∇²A = j/ε₀⇔ ∇²A = –j/ε₀c²

Now, the three components of ∇²A = –j/ε₀c²are, of course:

As you can see, each of these three equations is mathematically identical to that Poisson equation: ∇²Φ = − ρ/ε₀. So all that we learned about solving for potentials when ρ is known can now be used to solve for each component of A when j is known. Now, to calculate Φ, we used the following integral:

Simply substituting symbols then gives us the solution for A_x:

We have a similar integral for A_y and A_z, of course, and we can combine the three equations in vector form:

Finally, and just in case you wonder what is what, there’s the illustration below (taken from Feynman’s Lecture on this topic here) that, hopefully, will help you to make sense of it all.

At this point, you’re probably tired of these formulas (or asleep) or (if you’re not asleep) wondering what they mean really, so let’s do two examples. Of course, you won’t be surprised that we’ll be talking a straight wire and a solenoid respectively once again. 🙂

The magnetic field of a straight wire

We already calculated the magnetic field of a straight wire, using Ampère’s Law and the symmetry of the situation, in our previous post on magnetostatics. We got the following formula:

Do we get the same using those formulas for A and then doing our derivations to get B? We should, and we do, but I’ll be lazy here and just refer you to the relevant section in Feynman’s Lecture on it, because the solenoid stuff is much more interesting. 🙂

The magnetic field of a solenoid

In the mentioned post on magnetostatics, we also derived a formula for the magnetic field inside a solenoid. We got:

solenoid formula 2 with $n$ the number of turns per unit length of the solenoid, and I the current going through it. However, in the mentioned post, we assumed that the magnetic field outside of the solenoid was zero, for all practical purposes, but it is not. It is very weak but not zero, as shown below. In fact, it’s fairly strong at very short distances from the solenoid! Calculating the vector potential allows us to calculate its exact value, everywhere. So let’s go for it.

The relevant quantities are shown in the illustration below. So we’ve got a very long solenoid here once again, with n turns of wire per unit length and, therefore, a circumferential current on the surface of n·I per unit length (the slight pitch of the winding is being neglected).

Now, just like that surface charge density ρ in electrostatics, we have a ‘surface current density’ J here, which we define as J = n·I. So we’re going from a scalar to a vector quantity, and the components of J are:

J_x = –J·sinϕ, J_y = –J·cosϕ, J_z = 0

So how do we do this? As should be clear from the whole development above, the principle is that the x-component of the vector potential arising from a current density j is the same as the electric potential Φ that would be produced by a charge density $ρ$ equal to j_x divided by c² $, and similarly for the y- and z-components. Huh? Yes. Just read it a couple of times and think about it: we should imagine some cylinder with a surface charge ρ = -(J/ c 2) \cdotsinϕ to calculate A x . And then we equate ρ with -(J/ c 2) \cdotcosϕ and zero respectively to find A y and A z .$

Now, that sounds pretty easy but Feynman’s argument is quite convoluted here, so I’ll just skip it (click the link here if you’d want to see it) and give you the final result, i.e. the magnitude of A:

Of course, you need to interpret the result above with the illustration, which shows that A is always perpendicular to r’. [In case you wonder why we write r’ (so r with a prime) and not r, that’s to make clear we’re talking the distance from the z-axis, so it’s not the distance from the origin.]

Now, you may think that c² $in the denominator explains the very weak field, but it doesn’t: it’s the inverse proportionality to r’ that makes the difference!$ Indeed, you should compare the formula above with the result we get for the vector potential inside of the solenoid, which is equal to:

The illustration below shows the quantities involved. Note that we’re talking a uniform magnetic field here, along the z-axis, which has the same direction as B₀and, hence, is pointing towards you as you look at the illustration, which is why you don’t see the B₀ field lines and/or the z-axis: they’re perpendicular to your computer screen, so to speak.

As for the direction of A, it’s shown on the illustration, of course, but let me remind you of the right-hand rule for the vector cross product a×b once again, so you can make sense of the direction of A = (1/2)B₀×r’ indeed:

Also note the magnitude this formula implies: a×b = |a|·|b|·sinθ·n, with θ the angle between a and b, and n the normal unit vector in the direction given by that right-hand rule above. Now, unlike a vector dot product, the magnitude of the vector cross product is not zero for perpendicular vectors. In fact, when θ = π/2, which is the case for B₀and r’, then sinθ = 1, and, hence, we can write:

|A| = A = (1/2)|B₀||r’| = (1/2)·B₀·r’

Now, just substitute B₀for B₀= n·I/ε₀c², which is the field inside the solenoid, then you get:

A = (1/2)·n·I·r’/ε₀c²

You should compare this formula with the formula for A outside the solenoid, so you can draw the right conclusions. Note that both formulas incorporate the same (1/2)·n·I/ε₀c²factor. The difference, really, is that inside the solenoid, A is proportional to r’ (as shown in the illustration: if r’ doubles, triples etcetera, then A will double, triple etcetera too) while, outside of the solenoid, A is inversely proportional to r’. In addition, outside the solenoid, we have the a²factor, which doesn’t matter inside. Indeed, the radius of the solenoid (i.e. a) changes the flux, which is the product of B and the cross-section area π·a², but not B itself.

Let’s do a quick check to see if the formula makes sense. We do not want A to be larger outside of the solenoid than inside, obviously, so the a²/r’ factor should be smaller than r’ for r’ > a. Now, a²/r’ < r’ if a²< r’², and because a an r’ are both positive real numbers, that’s the case if r’ > a indeed. So we’ve got something that resembles the electric field inside and outside of a uniformly charged sphere, except that A decreases as 1/r’ rather than as 1/r’², as shown below.

Hmm… That’s all stuff to think about… The thing you should take home from all of this is the following:

A (uniform) magnetic field B in the z-direction corresponds to a vector potential A that rotates about the z-axis with magnitude A = B₀·r’/2 (with r’ the displacement from the z-axis, not from the origin—obviously!). So that gives you the A inside of a solenoid. The magnitude is A = (1/2)·n·I·r’/ε₀c², so A is proportional with r’.
Outside of the solenoid, A‘s magnitude (i.e. A) is inversely proportional to the distance r’, and it’s given by the formula: A = (1/2)·n·I·a²/ε₀c²·r’. That’s, of course, consistent with the magnetic field diminishing with distance there. But remember: contrary to what you’ve been taught or what you often read, it is not zero. It’s only near zero if r’ >> a.

Alright. Done. Next post. So that’s on the magnetic dipole moment 🙂

Ferroelectrics and ferromagnetics

Ferroelectricity and ferromagnetism are two different things, but they are analogous. Materials are ferroelectric if they have a spontaneous electric polarization that can be changed or reversed by the application of an external electric field. Ferromagnetism, in contrast, refers to materials which exhibit a permanent magnetic moment.

The materials are very different. In fact, most ferroelectric materials do not contain any iron and, hence, the ferro in the term is somewhat misleading. Ferroelectric materials are a special class of crystals, like barium or lead titanate (BaTiO₃or PbTiO₃). Lead zirconate titanate (LZT) is another example. These materials are also piezoelectric: when applying some mechanical stress, they will generate some voltage. In fact, the process goes both ways: when applying some voltage to them, it will also create mechanical deformation, as illustrated below (credit for this illustration goes to Wikipedia).

Ferroelectricity has to do with electric dipoles, while ferromagnetism has to do with magnetic dipoles. We’ve only discussed electric dipoles so far (see the section on dielectrics in my post on capacitors) and so we’re only in a position to discuss ferroelectricity right now, which is what I’ll do here. However, before doing so, let me briefly quote from the Wikipedia article on ferromagnetism, because that’s really concise and to the point on this:

“One of the fundamental properties of an electron (besides that it carries charge) is that it has a magnetic dipole moment, i.e. it behaves like a tiny magnet. This dipole moment comes from the more fundamental property of the electron that it has quantum mechanical spin. Due to its quantum nature, the spin of the electron can be in one of only two states; with the magnetic field either pointing “up” or “down” (for any choice of up and down). The spin of the electrons in atoms is the main source of ferromagnetism, although there is also a contribution from the orbital angular momentum of the electron about the nucleus.”

In short, ferromagnetism was discovered and known much before ferroelectricity was discovered and studied, but it’s actually more complicated, because it’s a quantum-mechanical thing really, unlike ferroelectricity, which we’ll discuss now. Before we start, let me note that, in many ways, this post is a continuation of the presentation on dielectrics, which I referred to above already, so you may want to check that discussion in that post I referred to if you have trouble following the arguments below.

Molecular dipoles

Let me first remind you of the basics. The (electric) dipole moment is the product of the distance between two equal but opposite charges q₊ and q₋. Usually, it’s written as a vector so as to also keep track of its direction and use it in vector equations, so we write p = qd, with d the vector going from the negative to the positive charge, as shown below.

Now, molecules like water molecules have a permanent dipole moment, as illustrated below. It’s because the center of ‘gravity’ of the positive and negative charges do not coincide, so that’s what makes the H₂O molecule polar, as opposed to the O₂ molecule, which is non-polar.

Now, if we place polar molecules in some electric field, we’d expect them to line up, to some extent at least, as shown below (the second illustration has more dipoles pointing vaguely north).

However, at ordinary temperatures and electric fields, the collisions of the molecules in their thermal motion keeps them from lining up too much. In fact, we can apply the principles of statistics mechanics to calculate how much exactly. You can check out the details in Feynman’s Lecture on it, but the result is that the net dipole moment per unit volume (so that’s the polarization) is equal to:

So the polarization is proportional to the number of molecules per unit volume (N), the square of their dipole moment (p₀) and, as we’d might expect, the electric field E, and inversely proportional to the temperature (T). In fact, the formula above is a sort of first-order approximation, in line with what we wrote on the electric susceptibility χ (chi) in our post on capacitors, where we also assumed the relation between P and E was linear, so we wrote: P = ε₀·χ·E. Now, engineers and physicists often use different symbols and definitions and so you may of may not have heard about another concept saying essentially the same thing: the dielectric constant, which is denoted by κ (kappa) and is, quite simply, equal to κ = 1 + χ. Combining the expression for P above, and the P = ε₀·χ·E = ε₀·(κ−1)·E expression, we get:

This doesn’t say anything new: it just states the dependence of χ on the temperature. Now, you can imagine this linear relationship has been verified experimentally. As it turns out, it’s sort of valid, but it is not as straightforward as you might imagine. There’s a nice post on this on the University of Cambridge’s Materials Science site. But this blog is about physics, not about materials science, so let’s move on. The only thing I should add to this section is a remark on the energy of dipoles.

You know charges in a field have energy, potential energy. You can look up the detail behind the formulas in one of my other posts on electromagnetism, so I’ll just remind you of them: the energy of a charge is, quite simply, the product of the charge (q) and the electric potential (Φ) at the location of the charge. Why? Well… The potential is the amount of work we’d do when bringing the unit charge there from some other (reference) point where Φ = 0. In short, the energy of the positive charge is q·Φ(1) and the energy of the negative charge is −q·Φ(2), with 1 and 2 denoting their respective location, as illustrated below.

So we have U = q·Φ(1) − q·Φ(2) = q·[Φ(1)−Φ(2)]. Now, we’re talking tiny little dipoles here, so we can approximate ΔΦ = Φ(1)−Φ(2) by ΔΦ = d•∇Φ = Δx·(∂Φ/∂x) + Δy·(∂Φ/∂y). Hence, also noting that E = −∇Φ and qd = p₀, we get:

U = q·Φ(1) − q·Φ(2) = qd•∇Φ = −p₀•E = −p₀·E·cosθ, with θ the angle between p₀and E

So the energy is lower when the dipoles are lined up with the field, which is what we would expect, of course. However, it’s an interesting thing so I just wanted to show you that. 🙂

Electrets, piezoelectricity and ferroelectricity

The analysis above was very general, so we actually haven’t started our discussion on ferroelectricity yet! All of the above is just a necessary introduction to the topic. So let’s move on. Ferroelectrics are solids, so let’s look at solids. Let me just copy Feynman’s introduction here, as it’s perfectly phrased:

“The first interesting fact about solids is that there can be a permanent polarization built in—which exists even without applying an electric field. An example occurs with a material like wax, which contains long molecules having a permanent dipole moment. If you melt some wax and put a strong electric field on it when it is a liquid, so that the dipole moments get partly lined up, they will stay that way when the liquid freezes. The solid material will have a permanent polarization which remains when the field is removed. Such a solid is called an electret. An electret has permanent polarization charges on its surface. It is the electrical analog of a magnet. It is not as useful, though, because free charges from the air are attracted to its surfaces, eventually cancelling the polarization charges. The electret is “discharged” and there are no visible external fields.”

Another example (i.e. other than wax) of an electret is the crystal lattice below. As you can see, all the dipoles are pointing in the same direction even with no applied electric field. Many crystals have such polarization but, again, we do not normally notice it because the external fields are discharged, just as for the electrets.

Now, this gives rise to the phenomena of pyroelectricity and piezoelectricity. Indeed, as Feynman explains: “If these internal dipole moments of a crystal are changed, external fields appear because there is not time for stray charges to gather and cancel the polarization charges. If the dielectric is in a condenser, free charges will be induced on the electrodes. The moments can also change when a dielectric is heated, because of thermal expansion. The effect is called pyroelectricity. Similarly, if we change the stresses in a crystal—for instance, if we bend it—again the moment may change a little bit, and a small electrical effect, called piezoelectricity, can be detected.”

But, still, piezoelectricity is not the same as ferroelectricity. In fact, there’s a hierarchy here:

Out of all crystals, some will be piezoelectric.
Among all piezoelectric crystals, some will also be pyroelectric.
Among the pyroelectric crystals, we can find some ferroelectric crystals.

The defining characteristic of ferroelectricity is that the built-in permanent moment can be reversed by the application of an external electric field. Feynman defines them as “nearly cubic crystals, whose moments can be turned in different directions, so we can detect a large change in the moment when an applied electric field is changed: all the moments flip over and we get a large effect.”

Because this is a blog, not a physics handbook, I’ll refer you to Feynman and/or the Wikipedia article on ferroelectricity for an explanation of the mechanism. Indeed, the objective of this post is to explain what it is, and so I don’t want to go off into the weeds. The two diagrams below, which I took from the mentioned Wikipedia article, illustrate the difference between your average dielectric material as opposed to a ferroelectric material. The first diagram shows you the linear relationship between P and E we discussed above: if we reverse the field, so E becomes negative, then the polarization will be reversed as well, but gradually, as shown below.

In contrast, the illustration below shows a hysteresis effect, which can be used as a memory function, and ferroelectric materials are indeed used for ferroelectric RAM (FeRAM) memory chips for computers! I’ll let you google that for yourself − it’s fun: just have a look at the following link, for example − because it’s about time I start wrapping up this post. 🙂

OK. That’s it for today. More tomorrow. 🙂

Magnetostatics

Pre-script (dated 26 June 2020): This post got mutilated by the removal of some material by the dark force. You should be able to follow the main story line, however. If anything, the lack of illustrations might actually help you to think things through for yourself.

Original post:

Not all is exciting when studying physics. In fact, electromagnetism is, most of the time, a extremely boring subject-matter. But so we need to get through it, because we need the math and the formulas. So… Here we go…

When going from electrostatics to electrodynamics, one first needs to have a look at magnetostatics, to get familiar with (steady) electric currents. So let’s have a look at what they are. Of course, you already know what steady currents are. In that case, you should, perhaps, stop reading. But I’d recommend you go through it anyway. It’s always good to be explicit, so let’s be explicit.

Let me first make a very pedantic note. There are a couple of sections in Feynman’s Lectures in which he assumes that a steady current in a wire is uniformly distributed throughout the cross-section of the current-carrying wire: that assumption amounts to saying that the current density j is uniform. He uses that assumption, for example, when calculating the force per unit length of a current-carrying wire in a magnetic field (see Vol. II, section 13-3). He also uses it when calculating the magnetic field it creates itself (see Vol. II, section 14-3). This raises two questions:

Is the assumption true?
Does it matter?

My impression is that it’s a simplification that doesn’t matter. So the answer to both question would be negative. But let’s examine them. First note that, in previous posts, we repeatedly said that, if we place a charge Q on any conductor, all charges will spread out in some way on the surface, so we have an equipotential on the surface and no electric field inside of the conductor. The physics behind are easy to understand: if there were an electric field inside of the conductor, and the surface were not an equipotential, the charges would keep moving until it became zero.

Does it matter? Maybe. Maybe not. I discussed the electric field from a conductor in a previous post, so let me just recall some formulas here, first and foremost Gauss’ Law, which says that the electric flux from any closed surface S is equal to Q_inside/ε₀. Now, Q_insideis, obviously, the sum of the charges inside the volume enclosed by the surface, and the most remarkable thing about Gauss’ Law is that the charge distribution inside of the volume doesn’t matter. So if we’re talking a uniformly charged sphere or a thin spherical shell of charge, it’s the same. The illustration below shows the field for a uniformly charged sphere: E is proportional to r (to be precise: E = (ρ·r)/(3ε₀) for r ≤ R) inside the sphere, and outside E is proportional to 1/r² (to be precise: E = Q_inside/(4πε₀r²) for r ≥ R).

However, Gauss’ Law is a law that gives us the electric flux only, so we’re talking E only. We also have the magnetic field, i.e. the field vector B. So what’s the equivalent of Gauss’ Law for B? That’s Ampère’s Law, obviously, so let’s have a look at how Feynman derives that law.

Ampère’s Law

Feynman starts by defining the current through some surface S as the following integral:

The illustration below explains the logic behind. The vector j is like the heat flow vector h which we used when explaining the basics of vector calculus: it is some amount passing expressed per unit time and per unit area. As for the use of n, that’s the same normal unit vector we used for h as well: we then wrote that h·n = |h|·|n|·cosθ = h·cosθ was the component of the heat flow that’s perpendicular or normal (as mathematicians prefer to say) to the surface. So here we’ve got the same: j·n·dS is the amount of charge flowing across an infinitesimally small area dS in a unit time. So to get the electric current I, which is the total charge passing per unit time through a surface S, we need to integrate the normal component of the flow through all the surface elements, which is what the integral above is doing.

Note that I is not a vector but a scalar. We could, however, include the idea of the direction of flow by making I a vector, so then we write it in boldface: I. It is measured in coulomb per second, aka as ampere: 1 A = 1 C/s. Also note we don’t have any wires here: just surfaces and volumes. 🙂 Onwards!

The equations of magnetostatics are Maxwell’s third and fourth equation and, as we used Maxwell’s first and second equation to derive Gauss’ Law, we’ll use these two to derive Ampère’s Law: (1) ∇•B = 0 and (2) c²∇×B = j/ε₀.

You know these equations: the first one basically says there’s no flux of B: there’s no such thing as magnetic charges, in other words. The second one says that a current produces some circulation of B. You also know these equations are valid only for static fields: all electric charge densities are constant, and all currents are steady, so the electric and magnetic fields are not changing with time: ∂E/∂t = 0 = ∂B/∂t. Forget about c² for a moment (it’s just a constant) and note that ∇×B is referred to as the curl of B.

Now, as I pointed out in one of my posts on vector analysis, the divergence of the curl of a vector is always equal to zero, so ∇•(∇×B) = 0. However, because ∇×B = j/ε₀c², that means ∇•(j/ε₀c²) must also be equal to zero (we’re just taking the divergence of both sides of the equation here), and so we find that ∇•j must be equal to zero. What does that mean?Well… From the same post, you may or may not remember that the divergence of some vector field C (so that’s ∇•C) is the (net) flux out of an (infinitesimal) volume around the point we’re considering, so ∇•j = 0 implies that as much charge must be coming in as it going out, always and everywhere. So that means that, because of the charge conservation law (no charges are created or lost), we can only look at charges flowing in paths that close back on themselves, so we can only consider closed circuits. It’s a minor point – so don’t worry too much about it – but it does imply that we’re not looking at condensers, for example. Just remember: magnetostatics is about circulation, we have no flux, not of B, and not of j: our field, and our charges, circulate. 🙂

OK. Let’s get back to the lesson. We need to find Ampère’s Law, so we’d better get on with it. 🙂 To find Gauss’ Law, we used Gauss’ Theorem. To find Ampère’s Law, we’ll use… Stokes’ Theorem. [Sorry!] I need to refer you, once again, to that post on vector analysis for it. Here I can only remind you of the Theorem itself. It says that the line integral of the tangential component of a vector (field) around a closed loop is equal to the surface integral of the normal component of the curl of that vector over any surface which is bounded by the loop. […] I know that’s quite a mouthful, so let me jot down the equation:

Applying it to the magnetic field vector B, we get:

This is the illustration which goes with it.

Now, using our ∇×B = j/ε₀c² equation, we get:

Finally, we just plug in our I = ∫ j·n dS integral and we’re done. This is Ampère’s Law:

It basically says that the circulation of B around any closed curve is equal to the current I through the loop, divided by ε₀c². So what can we do with it? Well… We used Gauss’ Law to find the electric field in various circumstances, so let’s now use Ampère’s Law to find the magnetic field in various circumstances. 🙂

Before doing so, however, let me note that Ampère’s Law does not depend on any particular assumption in regard to the distribution of the charge densities j. So, frankly speaking, don’t worry too much about that assumption about a steady current in a wire: a current is a current in Ampère’s Law. 🙂

Wires

You know the magnetic field around a wire, as you’ll surely remember that right-hand rule for it from your high-school physics classes. Note, however, that it assumes you apply the usual convention: charge flows from positive to negative, because our unit of electric charge is obviously +1, not –1. So the electron flow actually goes the other way. 🙂

But so we’re past our high school days and we need to apply Ampère’s Law. The symmetry of the situation implies that that line integral of B·ds, taken along some closed circle around the wire, is, quite simply, the magnitude of B times the circumference r of our circle. Indeed, the symmetry of the situation implies that B at some distance r should be of the same magnitude everywhere, so we have:

But from Ampère’s Law we know that integral is equal to I/ε₀c² and, therefore, B·2π·r must equal I/ε₀c², and so we get the grand result we were looking for. The magnetic field outside of a (long) wire carrying the current I is:

As Feynman notes, we can write this in vector form to include the directions, remembering that B is at right angles both to I as well as to r, and remembering that the order matters, of course, because of the right-hand rule for a vector cross product. 🙂

Solenoids

Coils of wire, and solenoids, pop up almost everywhere when studying electromagnetism. Indeed, transformers, inductances, electrical motors: it’s all coils. So, yes, we can’t escape them. 😦 So let’s get on with it. As you know, a solenoid is a long coil of wire wound in a tight spiral. The illustrations below show a cross-section and its magnetic field.

Now, this is probably one of Feynman’s most intuitive arguments. Read: he’s cutting an awful lot of corners here. 🙂 I’ll just copy him:

We observe experimentally that when a solenoid is very long compared with its diameter, the field outside is very small compared with the field inside. Using just that fact, together with Ampère’s law, we can find the size of the field inside. Since the field stays inside (and has zero divergence), its lines must go along parallel to the axis, as shown above. That being the case, we can use Ampère’s law with the rectangular ‘curve’ Γ shown in the figure. This loop goes the distance $L$ inside the solenoid, where the field is, say, B₀, then goes at right angles to the field, and returns along the outside, where the field is negligible. The line integral of B for this curve is just B₀·L, and it must be $1/ε 0 c 2$ times the total current through Γ, which is $N\cdotI$ if there are $N$ turns of the solenoid in the length $L$ . We have:

Or, letting $n$ be the number of turns per unit length of the solenoid (that is, $n=N/L$ ), we get:

Oh… What happens to the lines of B when they get to the end of the solenoid? Well… They just spread out in some way and return to enter the solenoid at the other end. Hmm… He’s really cutting corners here, isn’t he? But the formula is right, and I’d rather keep it short—just like he seems to want to do here. 🙂 I’ll just insert an illustration showing another right-hand rule—the right-hand rule for solenoids: if the direction of the fingers of your right hand is the direction of current, then your thumb gives the direction of the magnetic field inside.

You may wonder: does it matter where the + and − ends of the coil are? Good question because, in practice, we’ll have something that’s very tightly wound, like the coil below, so when making an actual coil (click on this link for a nice video), we’ll have several rows and so we wind from right to left and then back from left to right and so on and so on. So if we’d have two rows of wire, the two ends of the wire would come out on the same side, and that’s OK.

Of course, the wire needs to be insulated. What you see on the picture (and in the video) is the use of so-called magnet wire, which has a polymer film insulation. So when making the electrical connections at both ends, after winding the coil, you need to get rid of the insulation, but then it often melts just by the heat of soldering. And now that we’re talking practical stuff, let me say something about the magnetic core you see in the illustration above.

A magnetic core is a material with high magnetic permeability as compared to the surrounding air, and this high permeability will cause the magnetic field to be concentrated in the core material. Now, there’s a phenomenon that’s called hysteresis, which means that the core material will tend to retain its magnetization when the applied field is removed. This is not very desirable in many applications, such as transformers or electric engines. That’s why so-called ‘soft’ magnetic materials with low hysteresis are often preferred. The so-called soft iron is such material: it’s literally softer because of a heat treatment increasing its ductility and reducing its hardness. Of course, for permanent magnets, a so-called ‘hard’ magnetic material will be used. But here we’re getting into engineering and that’s not what I want to write about in this blog.

I’ll just end by noting that a magnetic field has a so-called north (N) and south (S) pole. That convention refers to the Earth’s north and south pole, of course. However, since opposite poles (north and south) attract, the North Magnetic Pole is actually the south pole of the Earth’s magnetic field, and the South is the north. 🙂 So it’s better not to think too much of the Earth’s poles when discussing the poles of a magnet. By convention, a magnet’s north pole is where the field lines of a magnet emerge, and the south pole is where they enter, as shown below.

In any case… Folks: that’s it for today. I’ll continue tomorrow. 🙂

A post for Vincent: on the math of waves

Pre-scriptum (dated 26 June 2020): These posts on elementary math and physics for my kids (they are 21 and 23 now and no longer need such explanations) have not suffered much the attack by the dark force—which is good because I still like them. While my views on the true nature of light, matter and the force or forces that act on them have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. In fact, I find the simplest stuff is often the best. 🙂

Original post:

I wrote this post to just briefly entertain myself and my teenage kids. To be precise, I am writing this for Vincent, as he started to study more math this year (eight hours a week!), and as he also thinks he might go for engineering studies two years from now. So let’s see if he gets this and − much more importantly − if he likes the topic. If not… Well… Then he should get even better at golf than he already is, so he can make a living out of it. 🙂

To be sure, nothing what I write below requires an understanding of stuff you haven’t seen yet, like integrals, or complex numbers. There’s no derivatives, exponentials or logarithms either: you just need to know what a sine or a cosine is, and then it’s just a bit of addition and multiplication. So it’s just… Well… Geometry and waves as I would teach it to an interested teenager. So let’s go for it. And, yes, I am talking to you now, Vincent! 🙂

The animation below shows a repeating pulse. It is a periodic function: a traveling wave. It obviously travels in the positive x-direction, i.e. from left to right as per our convention. As you can see, the amplitude of our little wave varies as a function of time (t) and space (x), so it’s a function in two variables, like y = F(u, v). You know what that is, and you also know we’d refer to y as the dependent variable and to u and v as the independent variables.

Now, because it’s a wave, and because it travels in the positive x-direction, the argument of the wave function F will be x−ct, so we write:

y = F(x−ct)

Just to make sure: c is the speed of travel of this particular wave, so don’t think it’s the speed of light. This wave can be any wave: a water wave, a sound wave,… Whatever. Our dependent variable y is the amplitude of our wave, so it’s the vertical displacement − up or down − of whatever we’re looking at. As it’s a repeating pulse, y is zero most of the time, except when that pulse is pulsing. 🙂

So what’s the wavelength of this thing?

[…] Come on, Vincent. Think! Don’t just look at this!

[…] I got it, daddy! It’s the distance between two peaks, or between the center of two successive pulses— obviously! 🙂

[…] Good! 🙂 OK. That was easy enough. Now look at the argument of this function once again:

F = F(x−ct)

We are not merely acknowledging here that F is some function of x and t, i.e. some function varying in space and time. Of course, F is that too, so we can write: y = F = F(x, t) = F(x−ct), but it’s more than just some function: we’ve got a very special argument here, x−ct, and so let’s start our little lesson by explaining it.

The x−ct argument is there because we’re talking waves, so that is something moving through space and time indeed. Now, what are we actually doing when we write x−ct? Believe it or not, we’re basically converting something expressed in time units into something expressed in distance units. So we’re converting time into distance, so to speak. To see how this works, suppose we add some time Δt to the argument of our function y = F, so we’re looking at F[x−c(t+Δt)] now, instead of F(x−ct). Now, F[x−c(t+Δt)] = F(x−ct−cΔt), so we’ll get a different value for our function—obviously! But it’s easy to see that we can restore our wave function F to its former value by also adding some distance Δx = cΔt to the argument. Indeed, if we do so, we get F[x+Δx−c(t+Δt)] = F(x+cΔt–ct−cΔt) = F(x–ct). For example, if c = 3 m/s, then 2 seconds of time correspond to (2 s)×(3 m/s) = 6 meters of distance.

The idea behind adding both some time Δt as well as some distance Δx is that you’re traveling with the waveform itself, or with its phase as they say. So it’s like you’re riding on its crest or in its trough, or somewhere hanging on to it, so to speak. Hence, the speed of a wave is also referred to as its phase velocity, which we denote by v_p = c. Now, let me make some remarks here.

First, there is the direction of travel. The pulses above travel in the positive x-direction, so that’s why we have x minus ct in the argument. For a wave traveling in the negative x-direction, we’ll have a wave function y = F(x+ct). [And, yes, don’t be lazy, Vincent: please go through the Δx = cΔt math once again to double-check that.]

The second thing you should note is that the speed of a regular periodic wave is equal to to the product of its wavelength and its frequency, so we write: v_p = c = λ·f, which we can also write as λ = c/f or f = c/λ. Now, you know we express the frequency in oscillations or cycles per second, i.e. in hertz: one hertz is, quite simply, 1 s⁻¹, so the unit of frequency is the reciprocal of the second. So the m/s and the Hz units in the fraction below give us a wavelength λ equal to λ = (20 m/s)/(5/s) = 4 m. You’ll say that’s too simple but I just want to make sure you’ve got the basics right here.

The third thing is that, in physics, and in math, we’ll usually work with nice sinusoidal functions, i.e. sine or cosine functions. A sine and a cosine function are the same function but with a phase difference of 90 degrees, so that’s π/2 radians. That’s illustrated below: cosθ = sin(θ+π/2).

Now, when we converted time to distance by multiplying it with c, what we actually did was to ensure that the argument of our wavefunction F was expressed in one unit only: the meter, so that’s the distance unit in the international SI system of units. So that’s why we had to convert time to distance, so to speak.

The other option is to express all in seconds, so that’s in time units. So then we should measure distance in seconds, rather than meters, so to speak, and the corresponding argument is t–x/c, and our wave function would be written as y = G(t–x/c). Just go through the same Δx = cΔt math once more: G[t+Δt–(x+Δx)/c] = G(t+Δt–x/c−cΔt/c) = G(t–x/c).

In short, we’re talking the same wave function here, so F(x−ct) = G(t−x/c), but the argument of F is expressed in distance units, while the argument of G is expressed in time units. If you’d want to double-check what I am saying here, you can use the same 20 m/s wave example again: suppose the distance traveled is 100 m, so x = 100 m and x/c = (100 m)/(20 m/s) = 5 seconds. It’s always important to check the units, and you can see they come out alright in both cases! 🙂

Now, to go from F or G to our sine or cosine function, we need to do yet another conversion of units, as the argument of a sinusoidal function is some angle θ, not meters or seconds. In physics, we refer to θ as the phase of the wave function. So we need degrees or, more common now, radians, which I’ll explain in a moment. Let me first jot it down:

y = sin(2π(x–ct)/λ)

So what are we doing here? What’s going on? Well… First, we divide x–ct by the wavelength λ, so that’s the (x–ct)/λ in the argument of our sine function. So our ‘distance unit’ is no longer the meter but the wavelength of our wave, so we no longer measure in meter but in wavelengths. For example, if our argument x–ct was 20 m, and the wavelength of our wave is 4 m, we get (x–ct)/λ = 5 between the brackets. It’s just like comparing our length: ten years ago you were about half my size. Now you’re the same: one unit. 🙂 When we’re saying that, we’re using my length as the unit – and so that’s also your length unit now 🙂 – rather than meters or centimeters.

Now I need to explain the 2π factor, which is only slightly more difficult. Think about it: one wavelength corresponds to one full cycle, so that’s the full 360° of the circle below. In fact, we’ll express angles in radians, and the two animations below illustrate what a radian really is: an angle of 1 rad defines an arc whose length, as measured on the circle, is equal to the radius of that circle. […] Oh! Please look at the animations as two separate things: they illustrate the same idea, but they’re not synchronized, unfortunately! 🙂
Circle_radians

So… I hope it all makes sense now: if we add one wavelength to the argument of our wave function, we should get the same value, and so it’s equivalent to adding 2π to the argument of our sine function. Adding half a wavelength, or 35% of it, or a quarter, or two wavelengths, or e wavelengths, etc is equivalent to adding π, or 35%·2π ≈ 2.2, or 2π/4 = π/2, or 2·2π = 4π, or e·2π, etc to it. So… Well… Think about it: to go from the argument of our wavefunction expressed as a number of wavelengths − so that’s (x–ct)/λ – to the argument of our sine function, which is expressed in radians, we need to multiply by 2π.

[…] OK, Vincent. If it’s easier for you, you may want to think of the 1/λ and 2π factors in the argument of the sin(2π(x–ct)/λ) function as scaling factors: you’d use a scaling factor when you go from one measurement scale to another indeed. It’s like using vincents rather than meter. If one vincent corresponds to 1.8 m, then we need to re-scale all lengths by dividing them by 1.8 so as to express them in vincents. Vincent ten year ago was 0.9 m, so that’s half a vincent: 0.9/1.8 = 0.5. 🙂

[…] OK. […] Yes, you’re right: that’s rather stupid and makes nobody smile. Fine. You’re right: it’s time to move on to more complicated stuff. Now, read the following a couple of times. It’s my one and only message to you:

If there’s anything at all that you should remember from all of the nonsense I am writing about in this physics blog, it’s that any periodic phenomenon, any motion really, can be analyzed by assuming that it is the sum of the motions of all the different modes of what we’re looking at, combined with appropriate amplitudes and phases.

It really is a most amazing thing—it’s something very deep and very beautiful connecting all of physics with math.

We often refer to these modes as harmonics and, in one of my posts on the topic, I explained how the wavelengths of the harmonics of a classical guitar string – it’s just an example – depended on the length of the string only. Indeed, if we denote the various harmonics by their harmonic number n = 1, 2, 3,… n,… and the length of the string by L, we have λ₁ = 2L = (1/1)·2L, λ₂ = L = (1/2)·2L, λ₃ = (1/3)·2L,… λ_n = (1/n)·2L. So they look like this:

etcetera (1/8, 1/9,…,1/n,… 1/∞)

The diagram makes it look like it’s very obvious, but it’s an amazing fact: the material of the string, or its tension, doesn’t matter. It’s just the length: simple geometry is all that matters! As I mentioned in my post on music and physics, this realization led to a somewhat misplaced fascination with harmonic ratios, which the Greeks thought could explain everything. For example, the Pythagorean model of the orbits of the planets would also refer to these harmonic ratios, and it took intellectual giants like Galileo and Copernicus to finally convince the Pope that harmonic ratios are great, but that they cannot explain everything. 🙂 [Note: When I say that the material of the string, or its tension, doesn’t matter, I should correct myself: they do come into play when time becomes the variable. Also note that guitar strings are not the same length when strung on a guitar: the so-called bridge saddle is not in an exact right angle to the strings: this is a link to some close-up pictures of a bridge saddle on a guitar, just in case you don’t have a guitar at home to check.]

Now, I already explained the need to express the argument of a wave function in radians – because we’re talking periodic functions and so we want to use sinusoidals − and how it’s just a matter of units really, and so how we can go from meter to wavelengths to radians. I also explained how we could do the same for seconds, i.e. for time. The key to converting distance units to time units, and vice versa, is the speed of the wave, or the phase velocity, which relates wavelength and frequency: c = λ·f. Now, as we have to express everything in radians anyway, we’ll usually substitute the wavelength and frequency by the wavenumber and the angular frequency so as to convert these quantities too to something expressed in radians. Let me quickly explain how it works:

The wavenumber k is equal to k = 2π/λ, so it’s some number expressed in radians per unit distance, i.e. radians per meter. In the example above, where λ was 4 m, we have k = 2π/(4 m) = π/2 radians per meter. To put it differently, if our wave travels one meter, its phase θ will change by π/2.
Likewise, the angular frequency is ω = 2π·f = 2π/T. Using the same example once more, so assuming a frequency of 5 Hz, i.e. a period of one fifth of a second, we have ω = 2π/[(1/5)·s] = 10π per second. So the phase of our wave will change with 10 times π in one second. Now that makes sense because, in one second, we have five cycles, and so that corresponds to 5 times 2π.

Note that our definition implies that λ = 2π/k, and that it’s also easy to figure out that our definition of ω, combined with the f = c/λ relation, implies that ω = 2π·c/λ and, hence, that c = ω·λ/(2π) = (ω·2π/k)/(2π) = ω/k. OK. Let’s move on.

Using the definitions and explanations above, it’s now easy to see that we can re-write our y = sin(2π(x–ct)/λ) as:

y = sin(2π(x–ct)/λ) = sin[2π(x–(ω/k)t)/(2π/k)] = sin[(x–(ω/k)t)·k)] = sin(kx–ωt)

Remember, however, that we were talking some wave that was traveling in the positive x-direction. For the negative x-direction, the equation becomes:

y = sin(2π(x+ct)/λ) = sin(kx+ωt)

OK. That should be clear enough. Let’s go back to our guitar string. We can go from λ to k by noting that λ = 2L and, hence, we get the following for all of the various modes:

k = k₁ = 2π·1/(2L) = π/L, k₂ = 2π·2/(2L) = 2k, k₃ = 2π·3/(2L) = 3k,,… k_n = 2π·3/(2L) = nk,…

That gives us our grand result, and that’s that we can write some very complicated waveform Ψ(x) as the sum of an infinite number of simple sinusoids, so we have:

Ψ(x) = a₁sin(kx) + a₂sin(2kx) + a₃sin(3kx) + … + a_nsin(nkx) + … = ∑ a_nsin(nkx)

The equation above assumes we’re looking at the oscillation at some fixed point in time. If we’d be looking at the oscillation at some fixed point in space, we’d write:

Φ(t) = a₁sin(ωt) + a₂sin(2ωt) + a₃sin(3ωt) + … + a_nsin(nωt) + … = ∑ a_nsin(nωt)

Of course, to represent some very complicated oscillation on our guitar string, we can and should combine some Ψ(x) as well as some Φ(t) function, but how do we do that, exactly? Well… We’ll obviously need both the sin(kx–ωt) as well as those sin(kx+ωt) functions, as I’ll explain in a moment. However, let me first make another small digression, so as to complete your knowledge of wave mechanics. 🙂

We look at a wave as something that’s traveling through space and time at the same time. In that regard, I told you that the speed of the wave is its so-called phase velocity, which we denoted as v_p = c and which, as I explained above, is equal to v_p = c = λ·f = (2π/k)·(ω/2π) = ω/k. The animation below (credit for it must go to Wikipedia—and sorry I forget to acknowledge the same source for the illustrations above) illustrates the principle: the speed of travel of the red dot is the phase velocity. But you can see that what’s going on here is somewhat more complicated: we have a series of wave packets traveling through space and time here, and so that’s where the concept of the so-called group velocity comes in: it’s the speed of travel of the green dot.

Now, look at the animation below. What’s going on here? The wave packet (or the group or the envelope of the wave—whatever you want to call it) moves to the right, but the phase goes to the left, as the peaks and troughs move leftward indeed. Huh? How is that possible? And where is this wave going? Left or right? Can we still associate some direction with the wave here? It looks like it’s traveling in both directions at the same time!

The wave actually does travel in both directions at the same time. Well… Sort of. The point is actually quite subtle. When I started this post by writing that the pulses were ‘obviously’ traveling in the positive x-direction… Well… That’s actually not so obvious. What is it that is traveling really? Think about an oscillating guitar string: nothing travels left or right really. Each point on the string just moves up and down. Likewise, if our repeated pulse is some water wave, then the water just stays where it is: it just moves up and down. Likewise, if we shake up some rope, the rope is not going anywhere: we just started some motion that is traveling down the rope. In other words, the phase velocity is just a mathematical concept. The peaks and troughs that seem to be traveling are just mathematical points that are ‘traveling’ left or right.

What about the group velocity? Is that a mathematical notion too? It is. The wave packet is often referred to as the envelope of the wave curves, for obviously reasons: they’re enveloped indeed. Well… Sort of. 🙂 However, while both the phase and group velocity are velocities of mathematical constructs, it’s obvious that, if we’re looking at wave packets, the group velocity would be of more interest to us than the phase velocity. Think of those repeated pulses as real water waves, for example: while the water stays where it is (as mentioned, the water molecules just go up and down—more or less, at least), we’d surely be interested to know how fast these waves are ‘moving’, and that’s given by the group velocity, not the phase velocity. Still, having said that, the group velocity is as ‘unreal’ as the phase velocity: both are mathematical concepts. The only thing that’s ‘real’ is the up and down movement. Nothing travels in reality. Now, I shouldn’t digress too much here, but that’s why there’s no limit on the phase velocity: it can exceed the speed of light. In fact, in quantum mechanics, some real-life particle − like an electron, for instance – will be represented by a complex-valued wave function, and there’s no reason to put some limit on the phase velocity. In contrast, the group velocity will actually be the speed of the electron itself, and that speed can, obviously, approach the speed of light – in particle accelerators, for example – but it can never exceed it. [If you’re smart, and you are, you’ll wonder: what about photons? Well…The classical and quantum-mechanical view of an electromagnetic wave are surely not the same, but they do have a lot in common: both photons and electromagnetic radiation travel at the speed c. Photons can do so because their rest mass is zero. But I can’t go into any more detail here, otherwise this thing will become way too long.]

OK. Let me get back to the issue at hand. So I’ll now revert to the simpler situation we’re looking at here, and so that’s these harmonic waves, whose form is a simple sinusoidal indeed. The animation below (and, yes, it’s also from Wikipedia) is the one that’s relevant for this situation. You need to study it for a while to understand what’s going on. As you can see, the green wave travels to the right, the blue one travels to the left, and the red wave function is the sum of both.

Of course, after all that I wrote above, I should use quotation marks and write ‘travel’ instead of travel, so as to indicate there’s nothing traveling really, except for those mathematical points, but then no one does that, and so I won’t do it either. Just make sure you always think twice when reading stuff like this! Back to the lesson: what’s going on here?

As I explained, the argument of a wave traveling towards the negative x-direction will be x+ct. Conversely, the argument of a wave traveling in the positive x-direction will be x–ct. Now, our guitar string is going nowhere, obviously: it’s like the red wave function above. It’s a so-called standing wave. The red wave function has nodes, i.e. points where there is no motion—no displacement at all! Between the nodes, every point moves up and down sinusoidally, but the pattern of motion stays fixed in space. So that’s the kind of wave function we want, and the animation shows us how we can get it.

Indeed, there’s a funny thing with fixed strings: when a wave reaches the clamped end of a string, it will be reflected with a change in sign, as illustrated below: we’ve got that F(x+ct) wave coming in, and then it goes back indeed, but with the sign reversed.

The illustration above speaks for itself but, of course, once again I need to warn you about the use of sentences like ‘the wave reaches the end of the string’ and/or ‘the wave gets reflected back’. You know what it really means now: it’s some movement that travels through space. […] In any case, let’s get back to the lesson once more: how do we analyze that?

Easy: the red wave function is the sum of two waves: one traveling to the right, and one traveling to the left. We’ll call these component waves F and G respectively, so we have y = F(x, t) + G(x, t). Let’s go for it.

Let’s first assume the string is not held anywhere, so that we have an infinite string along which waves can travel in either direction. In fact, the most general functional form to capture the fact that a waveform can travel in any direction is to write the displacement y as the sum of two functions: one wave traveling one way (which we’ll denote by F, indeed), and the other wave (which, yes, we’ll denote by G) traveling the other way. From the illustration above, it’s obvious that the F wave is traveling towards the negative x-direction and, hence, its argument will be x+ct. Conversely, the G wave travels in the positive x-direction, so its argument is x–ct. So we write:

y = F(x, t) + G(x, t) = F(x+ct) + G(x–ct)

So… Well… We know that the string is actually not infinite, but that it’s fixed to two points. Hence, y is equal to zero there: y = 0. Now let’s choose the origin of our x-axis at the fixed end so as to simplify the analysis. Hence, where y is zero, x is also zero. Now, at x = 0, our general solution above for the infinite string becomes y = F(ct) + G(−ct) = 0, for all values of t. Of course, that means G(−ct) must be equal to –F(ct). Now, that equality is there for all values of t. So it’s there for all values of ct and −ct. In short, that equality is valid for whatever value of the argument of G and –F. As Feynman puts it: “G of anything must be –F of minus that same thing.” Now, the ‘anything’ in G is its argument: x – ct, so ‘minus that same thing’ is –(x–ct) = −x+ct. Therefore, our equation becomes:

y = F(x+ct) − F(−x+ct)

So that’s what’s depicted in the diagram above: the F(x+ct) wave ‘vanishes’ behind the wall as the − F(−x+ct) wave comes out of it. Now, of course, so as to make sure our guitar string doesn’t stop its vibration after being plucked, we need to ensure F is a periodic function, like a sin(kx+ωt) function. 🙂 Why? Well… If this F and G function would simply disappear and ‘serve’ only once, so to speak, then we only have one oscillation and that’s it! So the waves need to continue and so that’s why it needs to be periodic.

OK. Can we just take sin(kx+ωt) and −sin(−kx+ωt) and add both? It makes sense, doesn’t it? Indeed, −sinα = sin(−α) and, therefore, −sin(−kx+ωt) = sin(kx−ωt). Hence, y = F(x+ct) − F(−x+ct) would be equal to:

y = sin(kx+ωt) + sin(kx–ωt) = sin(2π(x+ct)/λ) + sin(2π(x−ct)/λ)

Done! Let’s use specific values for k and ω now. For the first harmonic, we know that k = 2π/2L = π/L. What about ω? Hmm… That depends on the wave velocity and, therefore, that actually does depend on the material and/or the tension of the string! The only thing we can say is that ω = c·k, so ω = c·2π/λ = c·π/L. So we get:

sin(kx+ωt) = sin(π·x/L + π·c·t/L) = sin[(π/L)·(x+ct)]

But this is our F function only. The whole oscillation is y = F(x+ct) − F(−x+ct), and − F(−x+ct) is equal to:

–sin[(π/L)·(−x+ct)] = –sin(−π·x/L+π·c·t/L) = −sin(−kx+ωt) = sin(kx–ωt) = sin[(π/L)·(x–ct)]

So, yes, we should add both functions to get:

y = sin[π(x+ct)/L] + sin[π(x−ct)/L]

Now, we can, of course, apply our trigonometric formulas for the addition of angles, which say that sin(α+β) = sinαcosβ + sinβcosα and sin(α–β) = sinαcosβ – sinβcosα. Hence, y = sin(kx+ωt) + sin(kx–ωt) is equal to sin(kx)cos(ωt) + sin(ωt)cos(kx) + sin(kx)cos(ωt) – sin(ωt)cos(kx) = 2sin(kx)cos(ωt). Now, that’s a very interesting result, so let’s give it some more prominence by writing it in boldface:

y = sin(kx+ωt) + sin(kx–ωt) = 2sin(kx)cos(ωt) = 2sin(π·x/L)cos(π·c·t/L)

The sin(π·x/L) factor gives us the nodes in space. Indeed, sin(π·x/L) = 0 if x is equal to 0 or L (values of x outside of the [0, L] interval are obviously not relevant here). Now, the other factor cos(π·c·t/L) can be re-written cos(2π·c·t/λ) = cos(2π·f·t) = cos(2π·t/T), with T the period T = 1/f = λ/c, so the amplitude reaches a maximum (+1 or −1 or, including the factor 2, +2 or −2) if 2π·t/T is equal to a multiple of π, so that’s if t = n·T/2 with n = 0, 1, 2, etc. In our example above, for f = 5 Hz, that means the amplitude reaches a maximum (+2 or −2) every tenth of a second.

The analysis for the other modes is as easy, and I’ll leave it you, Vincent, as an exercise, to work it all out and send me the y = 2·sin[something]·cos[something else] formula (with the ‘something’ and ‘something else’ written in terms of L and c, of course) for the higher harmonics. 🙂

[…] You’ll say: what’s the point, daddy? Well… Look at that animation again: isn’t it great we can analyze any standing wave, or any harmonic indeed, as the sum of two component waves with the same wavelength and frequency but ‘traveling’ in opposite directions?

Yes, Vincent. I can hear you sigh: “Daddy, I really do not see why I should be interested in this.”

Well… Your call… What can I say? Maybe one day you will. In fact, if you’re going to go for engineering studies, you’ll have to. 🙂

To conclude this post, I’ll insert one more illustration. Now that you know what modes are, you can start thinking about those more complicated Ψ and Φ functions. The illustration below shows how the first and second mode of our guitar string combine to give us some composite wave traveling up and down the very same string.

Think about it. We have one physical phenomenon here: at every point in time, the string is somewhere, but where exactly, depends on the mathematical shape of its components. If this doesn’t illustrate the beauty of Nature, the fact that, behind every simple physical phenomenon − most of which are some sort of oscillation indeed − we have some marvelous mathematical structure, then… Well… Then I don’t know how to explain why I am absolutely fascinated by this stuff.

Addendum 1: On actual waves

My examples of waves above were all examples of so-called transverse waves, i.e. oscillations at a right angle to the direction of the wave. The other type of wave is longitudinal. I mentioned sound waves above, but they are essentially longitudinal. So there the displacement of the medium is in the same direction of the wave, as illustrated below.

Real-life waves, like water waves, may be neither of the two. The illustration below shows how water molecules actually move as a wave passes. They move in little circles, with a systemic phase shift from circle to circle.

Why is this so? I’ll let Feynman answer, as he also provided the illustration above:

“Although the water at a given place is alternately trough or hill, it cannot simply be moving up and down, by the conservation of water. That is, if it goes down, where is the water going to go? The water is essentially incompressible. The speed of compression of waves—that is, sound in the water—is much, much higher, and we are not considering that now. Since water is incompressible on this scale, as a hill comes down the water must move away from the region. What actually happens is that particles of water near the surface move approximately in circles. When smooth swells are coming, a person floating in a tire can look at a nearby object and see it going in a circle. So it is a mixture of longitudinal and transverse, to add to the confusion. At greater depths in the water the motions are smaller circles until, reasonably far down, there is nothing left of the motion.”

So… There you go… 🙂

Addendum 2: On non-periodic waves, i.e. pulses

A waveform is not necessarily periodic. The pulse we looked at could, perhaps, not repeat itself. It is not possible, then, to describe its wavelength. However, it’s still a wave and, hence, its functional form would still be some y = F(x−ct) or y = F(x+ct) form, depending on its direction of travel.

The example below also comes out of Feynman’s Lectures: electromagnetic radiation is caused by some accelerating electric charge – an electron, usually, because its mass is small and, hence, it’s much easier to move than a proton 🙂 – and then the electric field travels out in space. So the two diagrams below show (i) the acceleration (a) as a function of time (t) and (ii) the electric field strength (E) as a function of the distance (r). [To be fully precise, I should add he ignores the 1/r variation, but that’s a fine point which doesn’t matter much here.]

He basically uses this illustration to explain why we can use a y = G(t–x/c) functional form to describe a wave. The point is: he actually talks about one pulse only here. So the F(x±ct) or G(t±x/c) or sin(kx±ωt) form has nothing to do with whether or not we’re looking at a periodic or non-periodic waveform. The gist of the matter is that we’ve got something moving through space, and it doesn’t matter whether it’s periodic or not: the periodicity or non-periodicity, of a wave has nothing to do with the x±ct, t±x/c or kx±ωt shape of the argument of our wave function. The functional form of our argument is just the result of what I said about traveling along with our wave.

So what is it about periodicity then? Well… If periodicity kicks it, you’ll talk sinusoidal functions, and so the circle will be needed once more. 🙂

Now, I mentioned we cannot associate any particular wavelength with such non-periodic wave. Having said that, it’s still possible to analyze this pulse as a sum of sinusoids through a mathematical procedure which is referred to as the Fourier transform. If you’re going for engineer, you’ll need to learn how to master this technique. As for now, however, you can just have a look at the Wikipedia article on it. 🙂

Magnetism and relativity

Original post:

The magnetic force is a strange animal. The F = q(E+v×B) = qE+qv×B formula implies that both its direction as well as its magnitude depend on the direction and the magnitude of the motion of the charge. The magnetic force is, just like the electric force, still proportional to the amount of charge (q), but then we have not one but two vectors co-determining its direction and magnitude, as expressed by the vector product v×B = |v|·|B|·sinθ = v·B·sinθ.

The presence of the velocity vector in the F = q(E+v×B) formula implies both the magnetic as well as the electric field are relative, as we wonder: “What velocity? With respect to which reference frame?” The (a) and (b) below illustrate the same interaction between some current-carrying wire and some negative charge q from two perspectives:

Diagram (a) below represents frame S, in which the wire is at rest, and the charge moves along the wire with velocity v₀, while

Diagram (b) below represents frame S’, which coincides with the reference frame of the charge, so now it’s the wire that’s moving past the particle, instead of the other way around.

Because of relativity, all of our variables transform: we have time dilation, length contraction, and relativistic mass, as I explained in my posts on special relativity. So we cannot take any of the variables for granted and so we prime all of them: in S’, we have I’, v’, etcetera, and so we need to calculate their values using the Lorentz transformation rules.

Now, we know that the absolute speed of light connects both pictures, but that’s not enough to explain what’s going on. We need some other anchoring principle as well. We have such anchor: charges are always the same, moving or not. They are indestructible. They are never lost or created: they move from place to place but never appear from nowhere. In short, charge is conserved. So we also need to look at charge densities and see what happens to them.

The illustration above shows the current I going in the conventional direction, so that’s opposite to the actual direction of travel of the free drifting electrons. It’s a convention that makes sense because of all our other conventions, such as the right-hand rule for our vector cross-product v×B above, so we won’t touch it. Having said that, the illustration shows what’s going on in S: the positive charges in the wire don’t move, so we have some charge density ρ₊ and a velocity v₊ = 0. The electrons, on the other hand, do move, and so we have some charge density ρ₋ and a velocity v₋ = v. Now, we’re looking at an uncharged wire, so ρ₊ must be equal to −ρ₋. So the situation is rather simple: we have a current causing a magnetic field, and the force on our moving charge q(−) is F = v₀×B.

However, the same situation looks very different from the S’ perspective: our q(−) charge is not moving and, therefore, there can be no magnetic force. Hence, if there’s any force on the particle, it must come from an electric field. But what electric field? If the wire is neutral, there can be no electric flux from it.

You’ll say: why should there be a force on it? Forces also look different in different reference frames, don’t they? They do: they’re subject to the same Lorentz transformation rules: F’ = γF with γ = (1−v²/c²)^−1/2. So, yes, the force looks different, but they surely do not disappear! Especially not because the typical drift velocity of electrons in a conductor is exceedingly slow. In fact, it’s usually measured in centimeter per hour and, hence, the Lorentz factor γ is extremely close to 1. 🙂 So the forces in the two reference frames should be nearly identical. Hence, the conclusion must be that the electromagnetic force in the S’ reference frame appears as some electric force, which implies that… Well… The bold conclusion is that our wire must be charged in S’ and, therefore, causes an electric field, rather than a magnetic field!

Huh? How is that possible?

To simplify the calculations involved, Feynman analyzes a special case: he equates v with v₀. So that gives us the variables in diagram (b) above: in reference frame S’, we have some charge density ρ’₊ and a velocity v’₊ = –v₀= −v, while the electrons don’t seem to move: we have some charge density ρ’₋ but the velocity v’₋ = 0. As mentioned above, we cannot assume that ρ’₊ = ρ₊ or that ρ’₋ = ρ₋and, therefore, we cannot assume that I = I’.

[…] OK. Now that we’ve explained all the variables involved, we’re ready to actually do the calculation. The crux of the matter is that a charge density is some number expressed per unit volume, and that the volume changes because of the relativistic contraction of distances. That’s what’s shown below.

As I mentioned in my posts on relativity, of all of the effects of relativity, length contraction is probably the most difficult to grasp. How comes the same amount of charge is suddenly spread over a smaller volume? Well… It is what it is, and I cannot say more about it than what I already said in the mentioned posts, so let’s get on with it. The (a) and (b) situations above describe the same piece of wire: its length and area, as measured in the stationary reference frame S, is L₀ and A₀ respectively, so its volume is L₀·A₀. If we denote the total charge in this volume as Q, then the charge density ρ₀ will be measured as ρ₀= Q/(L₀·A₀).

Now what changes if we change the reference frame, so we look at this piece of wire moving past at velocity v? The dimensions that are transverse to the direction of motion don’t change, so the area A₀ remains what it is. What about Q? Well… Q doesn’t change either. As mentioned above, there’s no such thing as relativistic charge, so there’s no equivalent for the m_v = γm₀(or, multiplied with c², E_v = γE₀) formula when charges are involved. How do we know that? Feynman answers that question appealing to common sense:

“Suppose that we take a block of material, say a conductor, which is initially uncharged. Now we heat it up. Because the electrons have a different mass than the protons, the velocities of the electrons and of the protons will change by different amounts. If the charge of a particle depended on the speed of the particle carrying it, in the heated block the charge of the electrons and protons would no longer balance. A block would become charged when heated. As we have seen earlier, a very small fractional change in the charge of all the electrons in a block would give rise to enormous electric fields. No such effect has ever been observed. Also, we can point out that the mean speed of the electrons in matter depends on its chemical composition. If the charge on an electron changed with speed, the net charge in a piece of material would be changed in a chemical reaction. Again, a straightforward calculation shows that even a very small dependence of charge on speed would give enormous fields from the simplest chemical reactions. No such effect is observed, and we conclude that the electric charge of a single particle is independent of its state of motion. So the charge $q$ on a particle is an invariant scalar quantity, independent of the frame of reference. That means that in any frame the charge density of a distribution of electrons is just proportional to the number of electrons per unit volume. We need only worry about the fact that the volume can change because of the relativistic contraction of distances.”

OK. That’s clear enough. Let’s get back to the lesson. The upshot here is that we don’t need to worry about the charge but about the charge density. To be specific, the charge density, as measured in the reference frame S’, will be equal to:

Why? If the total charge Q is the same in both S and S’, then Q = ρ₀·L₀·A₀ must be equal to ρ·L·A₀, with L the measured length in the S’ reference frame. Now, because of the relativistic length contraction effect, we know that L = L₀·(1−v²/c²)^1/2 and, therefore, ρ must be equal to ρ = ρ₀·(1−v²/c²)^−1/2. Capito?

We’re almost there. Now we need to apply this more general result to the ρ’₋/ρ₋ and ρ₊/ρ’₊density ‘pairs’ that we mentioned at the start. Let me copy the illustration once again so you can see what we are talking about:

The analysis is straightforward but a bit tricky. For the positive charges, you should note that they are at rest in (a), so that’s in reference frame S and, therefore, we can just write:

However, for the negative charges, we see they’re at rest in (b), and so that’s in reference frame S’, so the ρ₀ in our general formula is not ρ₋ but ρ’₋! So you should be careful when applying the same formula. However, if you are careful, you’ll agree we can write:

Now, the total charge density ρ’ in reference frame S’ is, of course, the sum of ρ’₋ and ρ’₊. Now, also noting that we were looking at an uncharged wire in reference frame S, so ρ₊ = − ρ₋, we get the following grand result:

So our wire appears to be positively charged in the S’ frame, with a charge that’s equal to the product of the positive charge density and a β²/(1−β²)^1/2 factor. So that’s our Lorentz factor γ multiplied by β² = (v/c)². The graph below compares how that factor increases as β = v/c goes from 0 to 1. We’ve also inserted the graph of the Lorentz factor itself, so you can compare both. Interesting, isn’t it? 🙂

Now, because the wire is electrically charged in reference frame S’, we have an electric field E’ which, using the formula for the field of a uniformly charged cylinder, can be calculated as:

Now, as far as I am concerned, that’s it. But… Well… Of course, we should generalize the analysis for v ≠ v₀. However, I’ll refer you to Feynman for that. He also takes care of the remainder of the calculations you’d probably want to see, like a formula which show that the force on the charge in S’ is indeed what we would expect it to be. Feynman also shows that all other variables we can possibly calculate in the S’ reference frame, such as the momentum of the charged particle after the force has acted on it for some time all turn out be what we’d expect them to be according to special relativity.

However, I have to limit this post and, hence, I’ll just copy Feynman’s grand conclusion:

“We have found that we get the same physical result whether we analyze the motion of a particle moving along a wire in a coordinate system at rest with respect to the wire, or in a system at rest with respect to the particle. In the first instance, the force was purely “magnetic,” in the second, it was purely “electric. If we had chosen still another coordinate system, we would have found a different mixture of E and B fields. Electric and magnetic forces are part of one physical phenomenon—the electromagnetic interactions of particles. The separation of this interaction into electric and magnetic parts depends very much on the reference frame chosen for the description. But a complete electromagnetic description is invariant; electricity and magnetism taken together are consistent with Einstein’s relativity.”

So… That’s basically it for today’s lesson. 🙂 I should just add one more thing so as to be as complete as I should be in regard to the issue on hand here. You know the Lorentz transformation rules for the space and time coordinates, and you may or may not remember we had similar relativistic four-vectors for energy and momentum. Now, it turns out that we also have similar equations to relate charges and currents in one reference frame to those in another. More in particular, to transform ρ and j to a coordinate system moving with velocity in the x-direction, you should use the following rules:

But that’s really it for today. Have fun reflecting upon it all! 🙂

The field from a grid

Pre-script (dated 26 June 2020): This post got mutilated by the removal of some material by the dark force. You should be able to follow the main story-line, however. If anything, the lack of illustrations might actually help you to think things through for yourself.

Original post:

As part of his presentation of indirect methods for finding the field, Feynman presents an interesting argument on the electrostatic field of a grid. It’s just another indirect method to arrive at meaningful conclusions on how a field is supposed to look like, but it’s quite remarkable, and that’s why I am expanding it here. Feynman’s presentation is extremely succint indeed and, hence, I hope the elaboration below will help you to understand it somewhat quicker than I did. 🙂

The grid is shown below: it’s just a uniformly spaced array of parallel wires in a plane. We are looking at the field above the plane of wires here, and the dotted lines represent equipotential surfaces above the grid.

As you can see, for larger distances above the plane, we see a constant electric field, just as though the charge were uniformly spread over a sheet of charge, rather than over a grid. However, as we approach the grid, the field begins to deviate from the uniform field.

Let’s analyze it by assuming the wires lie in the xy-plane, running parallel to the y-axis. The distance between the wires is measured along the x-axis, and the distance to the grid is measured along the z-axis, as shown in the illustration above. We assume the wires are infinitely long and, hence, the electric field does not depend on y. So the component of E in the y-direction is 0, so E_y= –∂Φ/∂y = 0. Therefore, ∂²Φ/∂y²= 0 and our Poisson equation above the wires (where there are no charges) is reduced to ∂²Φ/∂x²+ ∂²Φ/∂z²=0. What’s next?

Let’s look at the field of two positive wires first. The plot below comes from the Wolfram Demonstrations Project. I recommend you click the link and play with it: you can vary the charges and the distance, and the tool will redraw the equipotentials and the field lines accordingly. It will give you a better feel for the (a)symmetries involved. The equipotential lines are the gray contours: they are cross-sections of equipotential surfaces. The red curves are the field lines, which are always orthogonal to the equipotentials.

The point at the center is really interesting: the straight horizontal and vertical red lines through it are limits really. Feynman’s illustration below shows the point represents an unstable equilibrium: the hollow tube prevents the charge from going sideways. So if it wouldn’t be there, the charge would go sideways, of course! So it’s some kind of saddle point. Onward!

Look at the illustration below and try to imagine how the field looks like by thinking about the value of the potential as you move along one of the two blue lines below: the potential goes down as we move to the right, reaches a minimum in the middle, and then goes up again. Also think about the difference between the lighter and darker blue line: going along the light-blue line, we start at a lower potential, and its minimum will also be lower than that of the dark-blue line.

So you can start drawing curves. However, I have to warn you: the graphs are not so simple. Look at the detail below. The potential along the blue line goes slightly up before it decreases, so the graph of the potential may resemble the green curve on the right of the image. I did an actual calculation here. 🙂 If there are only two charges, the formula for the potential is quite simple: Φ = (1/4πε₀)·(q₁/r₁) + (1/4πε₀)·(q₂/r₂). Briefly forgetting about the (1/4πε₀) and equating q₁ and q₂ to +1, we get Φ = 1/r₁ + 1/r₂= (r₁ + r₂)/r₁r₂. That looks like an easy function, and it is. You should think of it as the equivalent of the 1/r formula, but written as 1/r = r/r², and with a factor 2 in front because we have two charges. 🙂

However, we need to express it as a function of x, keeping z (i.e. the ‘vertical’ coordinate) constant. That’s what I did to get the graphs below. It’s easy to see that 1/r₁= (x²+ z²)^−1/2, while 1/r₂= [(a−x)²+ z²]^−1/2. Assuming a = 2 and z = 0.8, the contribution from the first charge is given by the blue curve, the contribution of the second charge is represented by the red curve, and the green curve adds both and, hence, represents the potential generated by both charges, i.e. q₁at x = 0 and q₂at x = a. OK… Onward!

The point to note is that we have an extremely simple situation here – two charges only, or two wires, I should say – but a potential function that is surely not some simple sinusoidal function. To drive the point home, I plotted a few more curves below, keeping a at a = 2, but equating z with 0.4, 0.7 and 1.7 respectively. The z = 1.7 curve shows that, at larger distances, the potential actually increases slightly as we move from left to right along the z = 1.7 line. Note the remarkable symmetry of the curves and the equipotential lines: there should be some obvious mathematical explanation for that but, unfortunately, not obvious enough for me to find it, so please let me know if you see it! 🙂

OK. Let’s get back to our grid. For your convenience, I copied it once more below.

Feynman’s approach to calculating the variations is quite original. He also duly notes that the potential function is surely not some simple sinusoidal function. However, he also notes that, when everything is said and done, it is some periodic quantity, in one way or another, and, therefore, we should be able to do a Fourier analysis and express it as a sum of sinusoidal waves. To be precise, we should be able to write Φ(x, z) as a sum of harmonics.

[…] I know. […] Now you say: Oh sh**! And you’ll just turn off. That’s OK, but why don’t you give it a try? I promise to be lengthy. 🙂

Before we get too much into the weeds, let’s briefly recall how it works for our classical guitar string. That post explained how the wavelengths of the harmonics of a string depended on its length. If we denote the various harmonics by their harmonic number n = 1, 2, 3 etcetera, and the length of the string by L, we have λ₁ = 2L = (1/1)·2L, λ₂ = L = (1/2)·2L, λ₃ = (1/3)·2L,… λ_n = (1/n)·2L. In short, the harmonics – i.e. the components of our waveform – look like this:

etcetera (1/8, 1/9,…,1/n,… 1/∞)

Beautiful, isn’t it? As I explained in that post, it’s so beautiful it triggered a misplaced fascination with harmonic ratios. It was misplaced because the Pythagorean theory was a bit too simple to be true. However, their intuition was right, and they set the stage for guys like Copernicus, Fourier and Feynman, so that was good! 🙂

Now, as you know, we’ll usually substitute wavelength and frequency by wavenumber and angular frequency so as to convert all to something expressed in radians, which we can then use as the argument in the sine and/or cosine component waves. [Yes, the Pythagoreans once again! :-)] The wavenumber k is equal to k = 2π/λ, and the angular frequency is ω = 2π·f = 2π/T (in case you doubt, you can quickly check that the speed of a wave c is equal to the product of the wavelength and its frequency by substituting: c = λ·f = (2π/k)·(ω/2π) = ω/k, which gives you the phase velocity v_p= c). To make a long story short, we wrote k = k₁ = 2π·1/(2L), k₂ = 2π·2/(2L) = 2k, k₃ = 2π·3/(2L) = 3k,,… k_n = 2π·3/(2L) = nk,… to arrive at the grand result, and that’s our wave F(x) expressed as the sum of an infinite number of simple sinusoids:

F(x) = a₁cos(kx) + a₂cos(2kx) + a₃cos(3kx) + … + a_ncos(nkx) + … = ∑ a_ncos(nkx)

That’s easy enough. The problem is to find those amplitudes a₁, a₂, a₃,… of course, but the great French mathematician who gave us the Fourier series also gave us the formulas for that, so we should be fine! Can we use them here? Should we use them here? Let’s see…

The a in the analysis, i.e. the spacing of the wires, is the physical quantity that corresponds to the length of our guitar string in our musical sound problem. In fact, a corresponds to 2L, because guitar strings are fixed at two ends and, hence, the two ends have to be nodes and, therefore, the wavelength of our first harmonic is twice the length of the string. Huh? Well… Something like that. As you can see from the illustration of the grid, a, in contrast to L, does correspond to one full wavelength of our periodic function. So we write:

Φ(x) = ∑ a_ncos(n·k·x) = ∑ a_ncos(2π·n·x/a) (n = 1, 2, 3,…)

Now, that’s the formula for Φ(x) assuming we’re fixing z, so it’s Φ(x) at some fixed distance from the grid. Let’s think about those amplitudes a_n now. They should not depend on x, because the harmonics themselves (i.e. the cos(2π·n·x/a) components) are all that varies with x. So they have be some function of n and – most importantly – some function of z also. So we denote them by F_n(z) and re-write the equation above as:

Φ(x, z) = ∑ F_n(z)·cos(2π·n·x/a) (n = 1, 2, 3,…)

Now, the rest of Feynman’s analysis speaks for itself, so I’ll just shamelessly copy it:

What did he find here? What is he saying, really? 🙂 First note that the derivation above has been done for one term in the Fourier sum only, so we’re talking a specific harmonic n here. That harmonic n is a function of z which – let me remind you – is the distance from the grid. To be precise, the function is F_n(z) = A_ne^−z/z₀. [In case you wonder how Feynman goes from equation (7.43) to (7.44), he’s just solving a second-order linear differential equation here. :-)]

Now, you’ve seen the graph of that function a zillion times before: it starts at A_nfor z = 0 and goes to zero as z goes to infinity, as shown below. 🙂

Now, that’s the case for all F_n(z) coefficients of course. As Feynman writes:

“We have found that if there is a Fourier component of the field of harmonic $n$ , that component will decrease exponentially with a characteristic distance z₀ $= a/2π n .$ For the first harmonic ( $n =1$ ), the amplitude falls by the factor e^−2π(i.e. a large decrease) each time we increase $z$ by one grid spacing $a$ . The other harmonics fall off even more rapidly as we move away from the grid. We see that if we are only a few times the distance $a$ away from the grid, the field is very nearly uniform, i.e., the oscillating terms are small. There would, of course, always remain the “zero harmonic” field, i.e. Φ₀ $= -E 0 \cdot z, to give the uniform field at large z.$ $Of course, for the complete solution, the sum needs to be made, and the coefficients A n would need to be adjusted so that the total sum, when differentiated, gives an electric field that would fit the charge density of the grid wires.”$

Phew! Quite something, isn’t it? But that’s it really, and it’s actually simpler than the ‘direct’ calculations of the field that I googled. Those calculations involve complicated series and logs and what have you, to arrive at the same result: the field away from a grid of charged wires is very nearly uniform.

Let me conclude this post by noting Feynman’s explanation of shielding by a screen. It’s quite terse:

“The method we have just developed can be used to explain why electrostatic shielding by means of a screen is often just as good as with a solid metal sheet. Except within a distance from the screen a few times the spacing of the screen wires, the fields inside a closed screen are zero. We see why copper screen—lighter and cheaper than copper sheet—is often used to shield sensitive electrical equipment from external disturbing fields.”

Hmm… So how does that work? The logic should be similar to the logic I explained when discussing shielding in one of my previous posts. Have a look—if only because it’s a lot easier to understand than the rather convoluted business I presented above. 🙂 But then I guess it’s all par for the course, isn’t it? 🙂

Capacitors

Original post:

This post briefly explores the properties of capacitors. Why? Well… Just because they’re an element in electric circuits, and so we should try to fully understand how they function so we can understand how electric circuits work. Indeed, we’ll look at some interesting DC and AC circuits in the very near future. 🙂

Feynman introduces condensers − now referred to as capacitors – right from the start, as he explains Maxwell’s fourth equation, which is written as c²∇×B = ∂E/∂t + j/ε₀ in differential form, but easier to read when integrating over a surface S bounded by a curve C:

formula 4 The ∂E/∂t term implies that changing electric fields produce magnetic effects (i.e. some circulation of B, i.e. the c²∇×B on the left-hand side). We need this term because, without it, there could be no currents in circuits that are not complete loops, like the circuit below, which is just a circuit with a capacitor made of two flat plates. The capacitor is charged by a current that flows toward one plate and away from the other. It looks messy because of the complicated drawing: we have a curve C around one of the wires defining two surfaces: S₁ is a surface that just fills the loop and, hence, crosses the wire, while S₂ is a bowl-shaped surface which passes between the plates of the capacitor (so it does not cross the wire).

If we look at C and S₁ only, then the circulation of B around C is explained by the current through the wire, so that’s the j/ε₀ term in Maxwell’s equation, which is probably how you understood magnetism during your high-school time. However, no current goes through the S₂ surface, so if we look at C and S₂ only, we need the ∂E/∂t to explain the magnetic field. Indeed, as Feynman points out, changing the location of an imaginary surface should not change a real magnetic field! 🙂

Let’s look at those charged sheets. For a single sheet of charge, we found two opposite fields of magnitude E = (1/2)·σ/ε₀. Now, it is easy to see that we can superimpose the solutions for two parallel sheets with equal and opposite charge densities +σ and −σ, so we get:

E _{between the sheets} = σ/ε₀ and E _outside = 0

Now, actual capacitors are not made of some infinitely thin sheet of charge: they are made of some conductor and, hence, we get that shielding effect and we’re talking surface charge densities +σ and −σ, so the actual picture is more like the one below. Having said that, the formula above is still correct: E is σ/ε₀ between the plates, and zero everywhere else (except at the edge, but I’ll talk about that later).

We’re now ready to tackle the first property of a capacitor, and that is its capacity. In fact, the correct term is capacitance, but that sounds rather strange, doesn’t it?

The capacity of a capacitor

We know the two plates are both equipotentials but with different potential, obviously! If we denote these two potentials as Φ₁and Φ₂respectively, we can define their difference Φ₁− Φ₂as the voltage between the two plates. It’s unit is the same as the unit for potential which, as you may or may not remember, is potential energy per unit charge, so that’s newton·meter/coulomb. [In honor of the guy who invented the first battery, 1 N·m/C is usually referred to as one volt, which – quite annoyingly – is also abbreviated as V, even if the voltage and the volt are two very different things: the volt is the unit of voltage.]

Now, it’s easy to see that the voltage, or potential difference, is the amount of work that’s required to carry one unit charge from one plate to the other. To be precise, because the coulomb is a huge unit − it’s equivalent to the combined charge of some 6.241×10¹⁸ protons − we should say that the voltage is the work per unit charge required to carry a small charge from one plate to the other. Hence, if d is the distance between the two plates (as shown in the illustration above), we can write:

Q is the total charge on each plate (so it’s positive on one, and negative on the other), A is the area of each plate, and d is the separation between the two plates. What the equation says is that the voltage is proportional to the charge, and the constant of proportionality is d over ε₀A. Now, the proportionality between V and Q is there for any two conductors in space (provided we have a plus charge on one, and a minus charge on the other, and so we assume there are no other charges around). Why? It’s just the logic of the superposition of fields: we double the charges, so we double the fields, and so the work done in carrying a unit charge from one point to the other is also doubled! So that’s why the potential difference between any two points is proportional to the charges.

Now, the constant of proportionality is called the capacity or capacitance of the system. In fact, it’s defined as C = Q/V. [Again, it’s a bit of a nuisance the symbol (C) is the same as the symbol that is used for the unit of charge, but don’t worry about it.] To put it simply, the capacitance is the ability of a body to store electric charge. For our parallel-plate condenser, it is equal to C = ε₀A/d. Its unit is coulomb/volt, obviously, but – again in honor of some other guy – it’s referred to as the farad: 1 F = 1 C/V.

To build a fairly high-capacity condenser, one could put waxed paper between sheets of aluminium and roll it up. Sealed in plastic, that made a typical radio-type condenser. The principle used today is still the same. In order to reduce the risk of breakdown (which occurs when the field strength becomes so large that it pulls electrons from the dielectric between the plates, thus causing conduction), higher capacity is generally better, so the voltage developed across the condenser will be smaller. Condensers used to be fairly big, but modern capacitors are actually as small as other computer card components. It’s all interesting stuff, but I won’t elaborate on it here, because I’d rather focus on the physics and the math behind the engineering in this blog. 🙂

Onward! Let’s move to the next thing. Before we do so, however, let me quickly give you the formula for the capacity of a charged sphere (for a parallel-plate capacitor, it’s C = ε₀A/d, as noted above): C = 4πε₀a. You’ll wonder: where’s the ‘other’ conductor here? Well… When this formula is used, it assumes some imaginary sphere of infinite radius with opposite charge −Q.

The energy of a capacitor

I talked about the energy of fields in various places, most notably my posts on fields and charges. The idea behind is quite simple: if there’s some distribution of charges in space, then we always have some energy in the system, because a certain amount of work was required to bring the charges together. [For the concept of energy itself, please see my post on energy and potential.] Remember that simple formula, and the equally simple illustration:

Also remember what we wrote above: the voltage is the work per unit charge required to carry a small charge from one plate to the other. Now, when charging a conductor, what’s happening is that charge gets transferred from one plate to another indeed, and the work required to transfer a small charge dQ is, obviously, equal to V·dQ. Hence, the change in energy is dU = V·dQ. Now, because V = Q/C, we get dU = (Q/C)·dQ, and integrating this from zero charge to some final charge Q, we get:

U = (1/2)·Q²/C = (1/2)·C·V²

Note how the capacity C, or its inverse 1/C, appears as a a constant of proportionality in both equations. It’s the charge, or the voltage, that’s the variable really, and the formulas say the energy is proportional to the square of the charge, or the voltage. Finally, also note that we immediately get the energy of a charged sphere by substituting C for 4πε₀a (see the capacity formula in the previous section):

Now, Feynman applies this energy formula to an interesting range of practical problems, but I’ll refer you to him for that: just click on the link and check it out. 🙂

OK… Next thing. The next thing is to look at the dielectric material inside capacitors.

Dielectrics

You know the dielectric inside a capacitor increases its capacity. In case you wonder what I am talking about: the dielectric is the waxed paper inside of that old-fashioned radio-type condenser, or the oxide layer on the metal foil used in more recent designs. However, before analyzing dielectric, let’s first look at what happens when putting another conductor in-between the plates of our parallel-plate condenser, as shown below.

As a matter of fact, the neutral conductor will also increase the capacitance of our condenser. Now how does that work? It’s because of the induced charges. As I explained in my post on how shielding works, the induced charges reduce the field inside of the conductor to zero. So there is no field inside the (neutral) conductor. The field in the rest of the space is still what it was: σ/ε₀, so that’s the surface density of charge (σ) divided by ε₀. However, the distance over which we have to integrate to get the potential difference (i.e. the voltage V) is reduced: it’s no longer d but d minus b, as there’s no work involved in moving a charge across a zero field. Hence, instead of writing V = E·d = σ·d/ε₀, we now write V = σ·(d−b)/ε₀. Hence, the capacity C = Q/V = ε₀A/d is now equal to C = Q/V = ε₀A/(d−b), which we prefer to write as:

Now, because 0 < 1 − b/d < 1, we have a factor (1 − b/d)⁻¹ that is greater than 1. So our capacitor will have greater capacity which, remembering our C = Q/V and U = (1/2)·C·V², formulas, implies (a) that it will store more charge at the same potential difference (i.e. voltage) and, hence, (a) that it will also store more energy at the same voltage.

Having said that, it’s easy to see that, if there’s air in-between, the risk of the capacitor breaking down will be much more significant. Hence, the use of conducting material to increase the capacitance of a capacitor is not recommended. [The question of how a breakdown actually occurs in a vacuum is an interesting one: the vacuum is expected to undergo electrical breakdown at or near the so-called Schwinger limit. If you want to know more about it, you can read the Wikipedia article on this.]

So what happens when we put a dielectric in-between. It’s illustrated below. The field is reduced but it is not zero, so the positive charge on the surface of the dielectric (look at the gaussian surface S shown by the broken lines) is less than the negative charge on the conductor: in the illustration below, it’s a 1 to 2 ratio.

But what’s happening really? What’s the reality behind? Good question. The illustration above is just a mathematical explanation. It doesn’t tell us anything − nothing at all, really − on the physics of the situation. As Feynman writes:

“The experimental fact is that if we put a piece of insulating material like lucite or glass between the plates, we find that the capacitance is larger. That means, of course, that the voltage is lower for the same charge. But the voltage difference is the integral of the electric field across the capacitor; so we must conclude that inside the capacitor, the electric field is reduced even though the charges on the plates remain unchanged. Now how can that be? Gauss’ Law tells us that the flux of the electric field is directly related to the enclosed charge. Consider the gaussian surface $S$ shown by broken lines. Since the electric field is reduced with the dielectric present, we conclude that the net charge inside the surface must be lower than it would be without the material. There is only one possible conclusion, and that is that there must be positive charges on the surface of the dielectric. Since the field is reduced but is not zero, we would expect this positive charge to be smaller than the negative charge on the conductor. So the phenomena can be explained if we could understand in some way that when a dielectric material is placed in an electric field there is positive charge induced on one surface and negative charge induced on the other.”

Now that’s a mathematical model indeed, based on the formula for the work involved in transferring charge from one plate to the other:

W = ∫ F·ds = ∫qE·ds = q·∫E·ds = qV

If your physics classes in high school were any good, you’ve probably seen the illustration above. Having said that, the physical model behind is more complicated, and so let’s have a look at that now.

The key to the whole analysis is the assumption that, inside a dielectric, we have lots of little atomic or molecular dipoles. Feynman presents an atomic model (shown below) but we could also think of highly polar molecules, like water, for instance. [Note, however, that, with water, we’d have a high risk of electrical breakdown once again.]

The micro-model doesn’t matter very much. The whole analysis hinges on the concept of a dipole moment per unit volume. We’ve introduced the concept of the dipole moment tout court in a previous post, but let me remind you: the dipole moment is the product of the distance between two equal but opposite charges q₊ and q₋.

Now, because we’re using the d symbol for the distance between our plates, we’ll use δ for the distance between the two charges. Also note that we usually write the dipole moment as a vector so we keep track of its direction and we can use it in vector equations. To make a long story: p = qδ and, using boldface for vectors, p = qδ. [Please do note that δ is a vector going from the negative to the positive charge, otherwise you won’t understand a thing of what follows.]

As mentioned above, we can have atomic or molecular or whatever other type of dipoles, but what we’re interested in is the dipole moment per unit volume, which we write as:

P = Nqδ, with N the number of dipoles per unit volume.

For rather obvious reasons, P is also often referred to as the polarization vector. […] OK. We’re all set now. We should distinguish two possibilities:

P is uniform, i.e. constant, across our sheet of material.
P is not uniform, i.e. P varies across the dielectric.

So let’s do the first case first.

1. Uniform P

This assumption gives us the mathematical model of the dielectric almost immediately. Indeed, when everything is said and done, what’s going on here is that the positive/negative charges inside the dielectric have just moved in/out over that distance δ, so at the surface, they have also moved in/out over the very same distance. So the image is effectively the image below, which is equivalent to that mathematical of a dielectric we presented above.

Of course, no analysis is complete without formulas, so let’s see what we need and what we get.

The first thing we need is the surface density of the polarization charge induced on the surface, which was denoted by σ_pol, as opposed to σ_free, which is the surface density on the plates of our capacitor (the subscript ‘free’ refers to the fact that the electrons are supposed to be able to move freely, which is not the case in our dielectric). Now, if A is the area of our surface slabs, and if, for each of the dipoles, we have that q₋ charge, then the illustration above tells us that the total charge in the tiny negative surface slab will be equal to Q = A·δ·q₋·N. Hence, the surface charge density σ_pol = Q/A = A·δ·q₋·N/A = N·δ·q₋. But N·δ·q is also the definition of P! Hence, σ_pol = P. [Note that σ_polis positive on one side, and negative on the other, of course!]

Now that we have σ_pol, we can use our E = σ/ε₀ formula and add the fields from the dielectric and the capacitor plates respectively. Just think about that gaussian surface S, for example. The field there, taking into account that σ_pol and σ_free have opposite signs, is equal to:

Using our σ_pol = P identity, we can also write this as E = (σ_free−P)/ε₀. But what’s P? Well… It’s a property of the material obviously, but then it’s also related to the electric field, of course! For larger E, we can reasonably assume that δ will be larger too (assuming some grid of atoms or molecules, we should obviously not assume a change in N or q₋) and, hence, dP/dE is supposed to be positive. In fact, it turns out that the relation between E and P is pretty linear, and so we can define some constant of proportionality and write E ≈ kP. Moreover, because the E and P vectors have the same direction, we can actually write E ≈ kP. Now, for historic reasons, we’ll write our k as k = ε₀·χ, so we’re singling out our ε₀ constant once more and – as usual – we add some gravitas to the analysis by using one of those Greek capital letters (χ is chi). So we have P = ε₀·χ·E, and our equation above becomes:

Now, remembering that V = E·d and that the total charge on our capacitor is equal to Q = σ_free·A, we get the formula which you may or may not know from your high school physics classes:

So… As Feynman puts it: “We have explained the observed facts. When a parallel-plate capacitor is filled with a dielectric, the capacitance is increased by the factor 1+χ.” The table below gives the values for various materials. As you can see, water’d be a great dielectric… if it wouldn’t be so conducive. 🙂

As for the assumption of linearity between E and P, there’s stuff on the Web on non-linear relationships too, but you can google that yourself. 🙂 Let’s now analyze the second case.

2. Non-uniform P

The analysis for non-uniform polarization is more general, and includes uniform polarization as a special case. To get going with it, Feynman uses an illustration (reproduced below) which is not so evident to interpret. Take your time to study it. The d connects, once again, two equal but opposite charges. The P vector points in the same direction as the d vector, obviously, but has a different magnitude, because P is equal to P = Nqd. We also have the normal unit vector n here and an angle θ between the normal and P. Finally, the broken lines represent a tiny imaginary surface. To be precise, it represents, once again, an infinitesimal surface, or a surface element, as Feynman terms it.

Just take your time and think about it. If there’s no field across, then θ = π/2 and our surface disappears. If n and P point in the same direction, then θ = 0 and our surface becomes a tiny rectangle of height d. Feynman uses the illustration above to point out that the charge moved across any surface element is proportional to the component of P that is perpendicular to the surface. Hence, remembering what the vector dot product stands for, and remembering that both σ_pol as well as P are expressed per unit area, we can write:

σ_pol = P·n = |P|·|n|·cosθ = P·cosθ

So P·n is the normal component of P, i.e. the component of P that’s perpendicular to our infinitesimal surface, and this component gives us the charge that moves across a surface element. [I know… The analysis is everything but easy here… But just hang in and try to get through it.]

Now, while the illustration above, and the formula, show us how some charge moves across the infinitesimal surface to create some surface polarization, it is obvious that it should not result in a net surface charge, because there are equal and opposite contributions from the dielectric on the two sides of the surface. However, having said that, the displacements of the charges do result in some tiny volume charge density, as illustrated below.

Now, I must admit Feynman does not make it easy to intuitively understand what’s going on because the various P vectors are chosen rather randomly, but you should be able to get the idea. P is not uniform indeed. Therefore, the electric field across our dielectric causes the P vectors to have different magnitudes and/or lengths. Now, as mentioned above, to get the total charge that is being displaced out of any volume bound by some surface S, we should look at the normal component of P over the surface S. To be precise, to get the total charge that is being displaced out of the volume V, we should integrate the outward normal component of P over the surface S. Of course, an equal excess charge of the opposite sign will be left behind. So, denoting the net charge inside V by ΔQ_pol, we write:

Now, you may or may not remember Gauss’ Theorem, which is related but not to be confused with Gauss’ Law (for more details, check one of my previous posts on vector analysis), according to which we can write:

[I know… You’re getting tired, but we’re almost there.] We can look at the net charge inside ΔQ_pol as an infinite sum of the (surface) charge densities σ_pol, but then added over the volume V. So we write:

Again, the integral above may not appear to be be very intuitive, but it actually is: we have a formula for the surface density for a surface element – so that’s something two-dimensional – and now we integrate over the volume, so the third spatial dimension comes in. Again, just let it sink in for a while, and you’ll see it all makes sense. In any case, the equalities above imply that:

and, therefore, that

σ_pol = −∇· P

You’ll say: so what? Well… It’s a nice result, really. Feynman summarizes it as follows:

“If there is a nonuniform polarization, its divergence gives the net density of charge appearing in the material. We emphasize that this is a perfectly real charge density; we call it “polarization charge” only to remind ourselves how it got there.”

Well… That says it all, I guess. To make sure you understand what’s written here: please note, once again, that the net charge over the whole of the dielectric is and remains zero, obviously!

The only question you may have is if non-uniform polarization is actually relevant. It is. You can google and you’re likely to get a lot of sites relating to multi-layered transducers and piezoelectric materials. 🙂 But, you’re right, that’s perhaps too advanced to talk about here.

Having said that, what I write above may look like too much nitty-gritty, but it isn’t: the formulas are pretty basic, and you need them if you want to advance in physics. In fact, Feynman uses these simple formulas in two more Lectures (Chapter 10 and 11 in Volume II, to be precise) to do some more analyses of real physics. However, as this blog is not meant to be a substitute for his Lectures, I’ll refer to him for further reading. At the very least, you have the basics here, and I hope it was interesting enough to induce you to look at the mentioned Lectures yourself. 🙂

The method of images

Pre-script (dated 26 June 2020): This post got mutilated by the removal of some illustrations by the dark force. You should be able to follow the main story-line, however. If anything, the lack of illustrations might actually help you to think things through for yourself.

Original post:

In my previous post, I mentioned the so-called method of images, but didn’t elaborate much. Let’s recall the problem. As you know, the whole subject of electrostatics is governed by one equation: the so-called Poisson equation:

∇²Φ = ∂²Φ/∂x² + ∂²Φ/∂x² + ∂²Φ/∂x² = −ρ/ε₀

We get this equation by combining Maxwell’s first law (∇·Φ = −ρ/ε₀) and the E = −∇Φ formula. Now, if we know the distribution of charges, then we don’t need that Poisson equation: we can calculate the potential at every point – denoted by (1) below – using the following formulas:

And if we have Φ, we have E, because E = –∇Φ. But, in most actual situations, we don’t know the charge distribution, and then we need to work with that Poisson equation. Of course, you’ll say: if you don’t know the charge distribution, then you don’t know the ρ in the equation, and so what use is it really?

The answer is: most problems will involve conductors, and we do know that their surface is an equipotential surface. We also know that the electric field just outside the surface must be normal to the surface. Let’s take the example of the grounded conducting sheet once again, as depicted below. We know the image charge and the field lines on the left-hand side are not there. In fact, because the sheet is grounded, there is no net charge on it, and the conductor acts as a shield.

We do have a real field on the right-hand side though, and it’s exactly the same as that of a dipole: we only need to cross out the left-hand half of the picture. What charges are responsible for it? It surely cannot be the lone +q charge alone, and it’s isn’t: we also have induced local charges on the sheet. Indeed, the positive charge will attract negative charges to the surface and, hence, while the sheet as a whole is neutral (so it has no net charge), the surface charge density is not zero. We can calculate it. How? It’s quite complicated, but let’s give it a try.

Look at the detail below. Let’s forget about the induced charges for a while, and analyze the field produced by the positive charge in the absence of induced charges, so that’s the E field at point P. The magnitude of its normal component is E_n+= E·cosθ, with θ the angle between the two vectors.

θ is an angle of a rectangular triangle, and it’s easy to see that cosθ is equal to a/(a² + ρ²)^1/2. Now, Coulomb’s Law tells us that E = (1/4πε₀)·q/[(a² + ρ²)^1/2]²= (1/4πε₀)·q/(a² + ρ²). Hence, we can write:

E_n+= (1/4πε₀)·a·q/(a² + ρ²)^3/2

[A quick note on the symbols used here: we use ρ (rho) to denote a distance here. That’s somewhat confusing because it usually denotes a volume density. However, we’re interested in a surface density here, for which the σ (sigma) symbol is used. So don’t worry about it. Just note that ρ is some distance here, instead of a charge density.]

Now we know that the induced charges will arrange themselves in such way that the addition of their field makes the field at P look like there was a negative charge of the same magnitude as q at the other side of the sheet. If there was such charge −q, then we could do the same analysis, as shown below. It’s easy to see that the component of the imaginary field along the sheet (i.e. the component that’s perpendicular to the normal) cancels the actual component along the shield of the field created by +q, while its normal component adds to the normal component of the +q field. To make a long story short, the actual field at P is equal to E(ρ) = (1/4πε₀)·2a·q/(a² + ρ²)^3/2, and it has two components of strength (1/4πε₀)·a·q/(a² + ρ²)^3/2.

To put it differently, the actual field can be thought as two parts: (1) the (normal) component of the field caused by + q, and (2) the field caused by the surface charge density σ at P, which we denote as σ(ρ). Let’s see what we can do with this.

The analysis of the field of a sheet of charge on a conductor is quite complicated, and not quite like the analysis of just a sheet of charge. The analysis for just a sheet of charge was based on the theoretical situation depicted below. We imagined some box with two Gaussian surfaces of area A, and we then used Gauss’ Law to deduce that, if σ was the charge per unit area (i.e. the surface density), the total flux out of the box should be equal to EA + EA = σA/ε₀ and, hence, E = (1/2)·σ/ε₀. The illustration below shows we should think of two fields with opposite direction, and with a magnitude of (1/2)·σ/ε₀ each.

That’s simple enough. However, a sheet of charge on a conductor produces a different field, as shown below. Because of the shielding effect, we have flux on one side of the box only, and the field strength of this flux is σ/ε₀, so that’s two times the (1/2)·σ/ε₀ magnitude described above. However, as mentioned, it’s zero on the other side, i.e. the inside of the conductor shown below.

So what happens here? The charges in the neighborhood of a point P on the surface actually do produce a local field (E_local), both inside and outside of the surface, which respects the E_local= (1/2)·σ/2ε₀ equality, but all the rest of the charges on the conductor “conspire” to produce an additional field at the point P, which also produces two fields, again with opposite direction and with a magnitude of (1/2)·σ/ε₀ each. So the net result is that the total field inside goes to zero, and the field outside is equal to E = σ/ε₀, so E = 2·E_local. Note that the example above assumes a positively charged conductor: if the charge on the conductor would be negative, the direction of the field would be inwards, but we’d still have a field on and outside of the surface only.

I know you’ve switched off already but − just in case you didn’t − what equality should we use to find σ in this case, i.e. the grounded sheet with no net charge on it but with some (negative) surface charge density. Well… We’re talking a surface density, and a conductor, and, therefore, I would think it’s the E = σ/ε₀, i.e. the formula for a charged sheet on a conductor. So we write:

E = σ(ρ)/ε₀ ⇔ σ(ρ) = ε₀E

But what E do we take to continue our calculation? The whole field or (1/4πε₀)·a·q/(a² + ρ²)^3/2only? The analysis above may make you think that we should take (1/4πε₀)·a·q/(a² + ρ²)^3/2only, so that’s the component that’s related to the imaginary charge only, but… No! We’re talking one actual field here, which is produced by the positive charge as well as by the induced charges. So we should not cut it for the purpose of calculating σ(ρ)! So the grand result is:

σ(ρ) = ε₀E = (1/4π)·2a·q/(a² + ρ²)^3/2

The shape of this function should not surprise us: it’s shown below for some different values of q (1 and 2 respectively) and a (1, 2 and 3 respectively).

How do we know our solution is correct? We can check it: if we integrate σ over the whole surface, we should find that the total induced charge is equal to $-q. So\dots Well\dots I’ll let you do that. Feynman also notes the induced charges should exert a force on our point charge, which we can calculating the force between the surface charges and the charge. It’s again an integral, and it should be equal to$

Lo and behold! The force acting on the positive charge is exactly the same as it would be with the negative image charge instead of the plate. Why? Well… Because the fields are the same!

The results we obtained are quite wonderful! Indeed, we said we did not know the charge distribution, and so we used a very different method to find the field: the method of images, which consists of computing the field due to q and some imaginary point charge –q somewhere else. Feynman summarizes the method of images as follows:

“The point charge we “imagine” existing behind the conducting surface is called an image charge. In books you can find long lists of solutions for hyperbolic-shaped conductors and other complicated looking things, and you wonder how anyone ever solved these terrible shapes. They were solved backwards! Someone solved a simple problem with given charges. He then saw that some equipotential surface showed up in a new shape, and he wrote a paper in which he pointed out that the field outside that particular shape can be described in a certain way.”

However, as you can see, the method is actually quite powerful, because we got a substantial bonus here: we calculated the field indeed, but then we could also calculate the charge distribution afterwards, so we got it all! Let’s see if we master the topic by looking at some other applications of the method of images.

Point charges near conducting spheres

For a grounded conducting sphere, we get the result shown below: the point charge q will induce charges on it whose fields are those of an image charge q’ = −aq/b placed at the point below.

You can check the details in Feynman’s Lecture on it, in which you will also find a more general formula for spheres that are not at zero potential. The more general formula involves a third charge q” at the center of the sphere, with charge q” = −q’ = aq/b.

Again, we’ll have a force of attraction between the sphere and the point charge, even if the net charge on the sphere is zero, because it’s grounded. Indeed, the positive charge q attracts negative charges to the side closer to itself and, hence, leaves positive charges on the surface of the far side. As the attraction by the negative charges exceeds the repulsion from the positive charges, we end up with some net attraction. Feynman leaves us with an interesting challenge here:

“Those who were entertained in childhood by the baking powder box which has on its label a picture of a baking powder box which has on its label a picture of a baking powder box which has … may be interested in the following problem. Two equal spheres, one with a total charge of $+Q$ and the other with a total charge of $-Q$ , are placed at some distance from each other. What is the force between them? The problem can be solved with an infinite number of images. One first approximates each sphere by a charge at its center. These charges will have image charges in the other sphere. The image charges will have images, etc., etc., etc. The solution is like the picture on the box of baking powder—and it converges pretty fast.”

Well… I’ll leave it to you to take up that challenge. 🙂

Direct and indirect methods

Let me end this post by noting that I started out with that Poisson equation, but that I actually didn’t use it. Having said that, this method of images did result in some solutions for it. It is what Feynman calls an indirect method of solving some problems, and he writes the following on it:

“If the problem to be solved does not belong to the class of problems for which we can construct solutions by the indirect method, we are forced to solve the problem by a more direct method. The mathematical problem of the direct method is the solution of Laplace’s equation ∇²Φ = 0 subject to the condition that Φ is a suitable constant on certain boundaries—the surfaces of the conductors. [Note that Laplace’s equation is Poisson’s equation with a zero on the right-hand side.] Problems which involve the solution of a differential field equation subject to certain boundary conditions are called boundary-value problems. They have been the object of considerable mathematical study. In the case of conductors having complicated shapes, there are no general analytical methods. Even such a simple problem as that of a charged cylindrical metal can closed at both ends—a beer can—presents formidable mathematical difficulties. It can be solved only approximately, using numerical methods. The only general methods of solution are numerical.”

Well… That says it all, I guess. There are other indirect methods, i.e. other than the method of images, but I won’t present these here. I may write something about it in some other post, perhaps. 🙂

The electric field in various circumstances

Original post:

This post summarizes two of what may well be Feynman’s most tedious Lectures. Their title is the same: the electric field “in various circumstances.” At first, I wanted to skip them, but then I found some unifying principle: the fields involved are all quite simple. In fact, except in chapter seven, it’s only about (a) the field of a single charge and (b) the field of a so-called dipole, i.e. the field of two opposite charges next to each other. Both are depicted below, and the dipole field can actually be derived by adding the fields of the two single charges.

So… In a way, these two Lectures are just a bunch of formulas repeating the same thing over and over again. The thing to remember is that a complicated but neutral mess of charges will also create a dipole field and, if that mess would not be neutral as a whole, then the field of our lump of charge will look like that of a point charge, provided we look at it from a large enough distance (i.e. a distance that is large relative to the separation of the elementary charges involved). So the situation we’re looking at, is the one depicted below, which is really quite general.

Before going into the nitty-gritty, it is probably good to review one of the points I made in my previous post: the field inside of a spherical shell of charge (like the one below) is zero everywhere, i.e. for any point P inside the shell.

This has nothing to do with the phenomenon of shielding, which is a consequence of free electrons re-arranging themselves so as to cancel the field inside. If we’d be able to build the cage below from protons only, so we’d have a fixed distribution of charges, the inside would not be shielded from the external electrical field. [Credit for the animation must go to Wikipedia.]

Because of the symmetry of the situation, however, the field inside a rectangular, fixed and uniform distribution of charges would also be zero. Let me quickly go over the math for the example of the spherical shell. The randomly chosen point P defines small cones extending to the surface of the sphere, with their apex at P and cutting out some surface area Δa. In the illustration above, we have two symmetrical cones defining two surfaces Δa₁ and Δa₂ respectively. It is easy to see that:

Δa₂/Δa₁ = r₂²/r₁²

Note that r₂²/r₁²is equal to (r₂/r₁)²but that (r₂/r₁)²is not equal to r₂/r₁. The square matters, and the square of a ratio is different than the ratio itself! In fact, it’s because of the inverse square law that the fields cancel exactly. Indeed, if the surface of the sphere is uniformly charged (which is the key assumption here), then the charge Δq on each of the area elements will be proportional to the area, so Δq₂/Δq₁ = Δa₂/Δa₁. Now, Coulomb’s Law also says that the magnitudes of the fields produced at P by these two surface elements are in the ratio of:

Huh? Yes. E₂/E₁ = (Δa₂/Δa₁)·(r₁²/ r₂²) = (Δa₂/Δa₁)·(Δa₁/Δa₂) = 1, according to the above. So… Yes, the fields cancel exactly, and because all parts of the surface can be paired off in the same way, the total field at P is zero, indeed! But what if we’d put a charge with equal sign at the center? Logic dictates the shell would balance it at the center. Hence, Feynman’s statement that a charge in an electrostatic field in free space can only be in equilibrium if there are mechanical constraints − as illustrated below – is false, and – I should add – the whole argument that follows has no relevance whatsoever for the quantum-mechanical model of an atom. But that’s a somewhat separate story which I’ll touch upon at the end of this post. Let me get back to the dipole problem.

Dipole fields

The model of a dipole is illustrated below. We have two opposite charges separated by a distance d. The so-called dipole moment is defined as p = q·d, and we also have an associated vector p, whose magnitude is p (so that’s the product of q and d) and whose direction is that of the dipole axis from −q to +q. We could also define a vector d and write p as p = q·d. Just think about it. I am sure you’ll figure it out. 🙂

Now, Feynman derives the formula for the dipole potential in various ways—first in an easy way, and then in a not-so-easy way. 🙂 The not-so-easy way is the most interesting—in this case, that is! He first notes the general formula for the potential of some point charge q at the origin at some point P = (x, y, z). You’ve seen that before: it’s Φ₀= q/r. [Forget about the constant of proportionality (I mean that 1/4πε₀ factor in Coulomb’s Law) for a while. We can stick it back in at the end of the argument.] What it says, is that, while the field follows an inverse square law, the potential has a 1/r dependence only (so when you double the distance, you halve the potential). Now, if we’d move the charge q along the z-axis, up a distance Δz, then the potential at P will change a little, by, say ΔΦ₊. How much exactly? Well, Feynman notes that “it is just the amount that the potential would change if we were to leave the charge at the origin and move P downward by the same distance Δz.” His illustration below, and the associated formula below, speak for themselves:

Now I’ll refer you to Feynman itself for the detail of the whole argument. The bottom line is that he gets the following formula for the dipole potential:

Φ = −p·∇φ₀

We have a vector dot product here of that dipole vector we defined above (p) and the gradient of φ₀, which is the potential of a unit point of charge: φ₀ = 1/4πε₀r. So what? Well… We can re-write this as:

Φ = −(1/4πε₀)p·∇(1/r)

Isn’t that great? For point charges, we have a field that’s the gradient of a potential that has a 1/r dependence, but so… Well… Here we have the potential of a dipole that’s the gradient of… Well… Just a number that has a 1/r dependence. 🙂

It explains why the dipole field E = −∇Φ varies inversely not as the square but as the cube of the distance from a dipole. I could give you the formula for E but, again, I don’t want to copy all of Feynman here and so I’ll just assume you believe me. Let me just wrap up in this section with the graph of the electric field, and note how the field vector E can be analyzed as the sum of a transverse component (i.e. the component in the x-y plane) and its component along the dipole axis (i.e. the component along the z-axis).

The dipole field of a lump of charges

The only thing that’s left is to define the p vector for a lump (or a mess as Feynman calls it) of charges. Note that the lump should not be neutral: if it is, then it will look like a point charge from a distance. But if it’s not neutral, then its field will be a dipole field. So the same formula applies but p is defined as p = ∑q_id_i. I copy the illustration above below so you can see what is what. 🙂

So… Is that it? Well… Yes. And… Well… No. All of the above assumes we know the charge distribution from the start. If we do, then my little summary above pretty much covers the whole subject. 🙂 However, we’ll often be talking some conductor with some total charge Q, without being able to say where the charges are, exactly. All that we know is that they will be spread out on the surface in some way.

Now… Well… That’s not quite exact. We also know they will distribute themselves so that the potential of the surface is constant, and that helps us some practical problems at least. What problems? Well… The problem of finding the field of charged conductors, which is the second topic that Feynman deals with in his two Lectures on the field “in various circumstances.”

However, that story risks becoming as tedious as Feynman’s Lectures on it, and so I’d rather not copy him here. Just look at the following illustrations. The first one gives the field lines and equipotentials for two point charges once again. It highlights two equipotentials in particular: A and B. Now look at the second illustration: we have a curved conductor with a given potential near a point charge and – lo and behold! – the field looks the same: we replace A by the surface of our conductor and all the rest vanishes. In fact, the illustration we could just put an imaginary point charge q at a suitable point and get the same field.

Now that’s what’s referred to as the method of images, and it’s illustrated in the third graph, where we have an “image charge” indeed. We see the equipotential halfway between the two charges which, in this case, is grounded conducting sheet. Why grounded? Because the plane had zero potential in our dipole field, as it was halfway between the two charges indeed.

Capito? No?

Well… It doesn’t matter all that much. This is, indeed, the really boring stuff one just has to grind through in order to understand the next thing, which is hopefully somewhat more exciting.

Quadrupole fields

Because you’re interested in physics, you probably know a thing or two about those quadrupole magnets used to focus particles beams in accelerators. They’re also referred to as lenses. The illustration below is the field of a quadrupole electric field, but a quadrupole magnetic field looks the same.

The point is: these lenses focus in one direction and, hence, in an actual accelerator or cyclotron, the Q-magnets will be arranged so as to alternately focus horizontally and vertically. Why can’t we build magnets so as to focus electric or magnetically charged particles simultaneously in two directions?

Well… It would require a tube built of protons, or electrons, in a stable configuration. We can’t do that. Technology just isn’t ready for it: we’re not able to build stable tubes of protons, or of electrons. 🙂 So the so-called Theorem of Earnshaw is still valid. Earnshaw’s Theorem says just that: simultaneous focusing in two directions at once is impossible. It applies to classical inverse-square law forces, such as the electric and gravitational force, but also the magnetic forces created by permanent magnets.

However, the theorem is subject to constraints, and these constraints can be exploited to create very interesting exceptions, like magnetic levitation. I warmly recommend the link. 🙂

The electric field in (and from) a conductor

Original post:

This is just a quick post to answer a question of my 16-year old son, Vincent: why are we safe in a car when lightning strikes? What’s the Faraday effect really?

He wants to become an engineer, and so I told him what I knew: the electric charges reside at the surface of a conductor and, therefore, a fully-enclosed, all-metallic vehicle is safe. One should just not touch the interior metallic areas, surely not during the strike, but also not after the strike. Why? Because there may still be some residual charge left on the vehicle, even if the metal frame should direct all lightning currents to the ground.

Through the rubber of the tyres? Yes. In fact, it’s the rubber and other insulators that explain why some residual charge might be left. Indeed, the common assumption that, somehow, it’s the rubber that protects the occupants of a car (or that, somehow, rubber soles would insulate us in an electric storm and, hence, less likely to get hit) is ridiculous—completely false, really! The following quote from the US National Weather Service is clear enough on that:

“While rubber is an electric insulator, it’s only effective to a certain point. The average lightning bolt carries about 30,000 amps of charge, has 100 million volts of electric potential, and is about 50,000°F. These amounts are several orders of magnitude higher than what humans use on a daily basis and can burn through any insulator—even the ceramic insulators on power lines! Besides, the lightning bolt may just have traveled many miles through the atmosphere, which is a good insulator. Half an inch (or less) of rubber will make no difference.”

So that’s what I told him—sort of. However, I felt my answer (which I tried to get across as I was driving the car, in fact) was superficial and incomplete. So…

Vincent, here’s the full answer! I promise, no integrals or complex numbers. At the same time, it will be not so easy as the physics you learned in school, because I want to teach you something new. 🙂 Just try it. What I want to explain to you is Gauss’ Law. If you manage to go through it, you’ll know all you need to know about electrostatics, and it will make your first undergrad year a lot easier. [Especially that vector equation, as I always felt my math teacher never told me what a vector really was: it’s something physical. :-)]

Forces and fields

You’ve surely seen Coulomb’s Law:

F = k_e·(q₁q₂)·(1/r²₁₂)

The k_e factor is Coulomb’s constant: it is just a constant of proportionality, so it’s there to make the units come out alright. Indeed, Coulomb’s formula is simple enough: it says that the force is directly proportional to the amount of charge and inversely proportional to the square of the distance. That’s all. However, the units in which we measure stuff are not necessarily compatible: we measure distance in meter, electric charge in coulomb, and force in newton. So, if we’d define the newton as the force between two charges of one coulomb separated by a distance of one meter, then we wouldn’t need to put that k_efactor there. But the newton has another definition: one newton is the force needed to accelerate 1 kg at a rate of 1 m/s per second.

Coulomb’s constant is usually written as k_e= 1/4πε₀ factor in more serious textbooks. Why? Well… You can read my note at the end of this post, but it doesn’t matter right now. It’s much more important to try to understand the vector form of Coulomb’s Law, which is written as:

I used boldface to denote F₁ and F₂ because they are force vectors. Vectors are physical ‘quantities’ with a magnitude (denoted by F₁ and F₂, so no boldface here) and a direction. That direction is given by the unit vector e₁₂ in the equation: it’s a unit vector (so its length is one) from q₂ to q₁. Read again: from q₂ to q₁, not from q₂ to q₁. It’s important to get this one thing right, otherwise you’ll make a mess of the signs. Indeed, in the example below, q₁ and q₂ have the same sign (+) but their sign may differ (so we have a plus and a minus), and the formula above should still work. Check it yourself by doing the drawing for opposite charges.

In fact, my drawing above has a small mistake: F₂is the same as F₁but I forgot to put the minus sign: the force on q₂ is F₂= –F₁. It’s the action = reaction principle, really.

OK. That’s clear. Now you need to learn about the concept of a field: the field is the force per unit charge. So the field at q₁, or the field at point (1), is the force on q₁ divided by q₁. For example, if q₁ is three Coulomb, we divide by three. More in general, we write:

So now you know what the field vector E stands for: it is the force on a unit charge we would place in the field. To be clear, a unit charge is +1 unit. We can measure it in coulomb, or the proton charge, or the charge of a quark, or in whatever unit we want, but we’ve been using coulomb so far so let’s stick to that. Just in case you wonder: one coulomb is the charge of approximately 6.241×10¹⁸ protons, so… Yes. That’s quite a lot. 🙂

OK. Next thing.

Gauss’ Law

The field is real. We don’t have to put any charge there. The field is there, and it has energy. [There’s a formula for the energy, but I won’t bother you with that here, because we don’t need it.] The magnitude of the electric field, i.e. the field strength E = |E|, is measured in newton (N) per coulomb (C), so in N/C. In physics, we’ll multiply the field strength with a surface area so we get the so-called flux of the field, which is measured in (N/C)·m². The illustration below (which I took from Feynman’s Lectures) is just as good as any. In fact, we have several surfaces here: we have a closed surface S with several faces, including surface a and b, which are spherical surfaces. The other surfaces of this box are so-called radial faces. The E field coming out of the charge is like a flow, and so the flow going through face a is the same as the flow going through face b: the b face is larger, but the field strength is less.

It is easy to show that the net flux is zero: Coulomb’s Law tells us that the magnitude of E decreases as 1/r² while, from our geometry classes, we know that the surface area increases as r², so their product is the same. So, if the surface area of a is Δa, and the surface area of b is Δb, then E_a·Δa = E_b·Δb and so the net flux through the box is equal to E_b·Δb − E_a·Δa = 0. So the flux of E into face a is just cancelled by the flux out of face b. Needless to say, there is no flux through the radial surfaces. Why? Because the electric force is a radial force.

OK. Let’s look at a more complicated situation:

When calculating the flux through a surface, we need to take the component of E that is normal to the surface, so that’s E_n = E·n = |E|·|n|·cosθ = |E|·cosθ. I am sure you’ve seen that much in your math classes: n is the so-called normal vector, so its length is one and it’s perpendicular to the surface. In any case, the point is: the net flux through this closed surface will still be zero.

Now it’s time for the Big Move. Look at the volume enclosed by the surface S below: we can think of it as completely made up of infinitesimal truncated cones and, for each of these cones, the flux of E from one end of each conical segment will be equal and opposite to the flux from the other end. So the total net flux from the surface $S$ is still zero!

So we have a very general result here:

The (net) flux out of a volume that has no charge(s) in it is zero, always!

You’ll say: so what? Well… It’s a most remarkable result, really. First, it’s not what you’d expect intuitively, and, second, we can now use a clever trick to calculate the flux out of a volume that has some charge(s) in it. Let’s be clever about it. Look at the surface S below: it’s got a point charge q in it. Now we imagine another surface S’ around it: we imagine a little sphere centered on the charge.

From Coulomb’s Law, we know that, if the radius of our little sphere is equal to r, then the field strength E, everywhere on its surface, is equal to:

From your geometry class, you also know that the surface of a sphere is equal to 4πr², so the flux from the surface of our little sphere is just the product of the field and the surface, so we write:

Now, the nice thing is that we can generalize this result for many charges, or for charge distributions, because we can simply add the fields for each of them: E = E₁+ E₂+ E₃+ … That gives us Gauss’ Law:

The flux from any closed surface S = Q_inside/ε₀

Q_insideis, obviously, the sum of the charges inside the volume enclosed by the surface.

OK. That’s Gauss’ Law. Let’s go back to our car. 🙂

The field in (and from) a conductor

An electrical conductor is a solid that contains many free electrons. Free electrons can move freely around, but cannot leave the surface. When we charge a conductor, the electrons will move around until they have arranged themselves to produce a zero electric field everywhere inside the conductor. It’s the corollary of Gauss’ Law: the (net) flux out of a volume that has no charge(s) in it is zero, always! And so the electrons will arrange themselves in order to make sure that happens.

Think about the dynamics of the situation: as long as there’s some field inside, the charges will keep moving. Fortunately (especially if you’re in a car or a plane hit by lightning!), the re-arrangement happens in a fraction of a second. Hence, if we have some kind of shell, then the field everywhere inside of the shell will be zero, always. In addition, when we charge a conductor, the electrons will push each other away and try to spread as much as possible, so they will reside at the surface of the conductor. In fact, the excess charge of any conductor is, on the average, within one or two atomic layers of the surface only. The situation is illustrated below:

Let me sum up the main conclusions:

The electric field inside the conductor (E₁) is zero. In other words, if a cavity is completely enclosed by a conductor, no distribution of charges outside can ever produce any field inside. But no field is no force, so that’s how the shielding really works!
The electric field just outside the surface of a conductor (E₂) is normal to the surface. There can be no tangential component. If there were a tangential component, the electrons would move along the surface until it was gone.

To be fully complete, the formula for the field just outside the surface of the conductor is E = σ/ε₀, where σ is the local surface charge density. That local surface charge density can be quite high, of course, especially when lightning is involved—but it works! You’re safe in a car!

There’s one more point. You may think that you’ve seen that E = σ/ε₀ formula before: it’s the formula for the field from a charged sheet, which is easy to calculate from Gauss’ Law. Indeed, if we look at some imaginary rectangular box that cuts through the sheet, as shown below (it’s referred to as a Gaussian surface), then the total flux is, once again, the field E times the area. Now, if the charge density (so the charge per unit area) is ρ, then the total charge enclosed in the box is σA. So the flux, on each side of the sheet, must be equal to E·A = σA/ε₀, from which we get: E = σ/ε₀. But so we have a field left and right. For our conductor, we only have the E = σ/ε₀field outside. So how does it work really?

We only have a field outside the conductor – and, hence, no field inside – because the charges in the immediate neighborhood of a point $P$ on the surface will arrange themselves in such a way so as to produce a field that neutralizes the E = σ/ε₀field we’d expect on the inside. So we have ‘other charges’ here that come into play. The mechanics behind are similar to the mechanics behind the polarization phenomenon. If we have a negative charge density on the surface, we’ll have a positive charge density in the layer below. However, it’s quite complicated and, to analyze it properly, we’d need to analyze the electric properties of matter in more detail, which we won’t do here.

So… When everything is said and done, the phenomenon of ‘shielding’ is extremely complex indeed: it’s all about charges arranging themselves in patterns, and the result is truly remarkable: the fields on the two sides of a closed conducting shell are completely independent—zero on the inside, and E = σ/ε₀on the outside, with σ the local surface charge density. And it also works the other way around: if we’d have some distribution of charges inside of a closed conductor, those charges would not produce any field outside. So shielding works both ways!

Some closing remarks

A car is not a sphere. Some surfaces may have points or sharp ends, like the object sketched below. Again, the charges will try to spread out as much as possible on the surface, and the tip of a sharp point is as far away as it is possible from most of the surface. Therefore, we should expect the surface density to be very high there. Now, a high charge density means a high field just outside. In fact, if the electric field is too great, air will break down, so we get a discharge. As Feynman explains it:

“Air will break down if the electric field is too great. What happens is that a loose charge (electron, or ion) somewhere in the air is accelerated by the field, and if the field is very great, the charge can pick up enough speed before it hits another atom to be able to knock an electron off that atom. As a result, more and more ions are produced. Their motion constitutes a discharge, or spark. If you want to charge an object to a high potential and not have it discharge itself by sparks in the air, you must be sure that the surface is smooth, so that there is no place where the field is abnormally large.”

It explains why lightning is attracted to pointy objects, so you should stay away from them.

What about planes and lightning? Well… There’s a nice article on that on the Scientific American website. Let me quote a paragraph that sort of sums up what actually happens:

“Although passengers and crew may see a flash and hear a loud noise if lightning strikes their plane, nothing serious should happen because of the careful lightning protection engineered into the aircraft and its sensitive components. Initially, the lightning will attach to an extremity such as the nose or wing tip. The airplane then flies through the lightning flash, which reattaches itself to the fuselage at other locations while the airplane is in the electric “circuit” between the cloud regions of opposite polarity. The current will travel through the conductive exterior skin and structures of the aircraft and exit off some other extremity, such as the tail. Pilots occasionally report temporary flickering of lights or short-lived interference with instruments.”

One more thing perhaps: isn’t incredible that, even when lightning goes through a car or a plane, it’s only the surface that’s being affected? I mean… It’s fairly easy to see the equilibrium situation, which has the charges on the surface only. But what about the dynamics indeed? 30,000 amps, 100 million volts, and 25,000 to 30,000 degrees Celsius… As lightning strikes, that must go everywhere, no? Well… Yes and no. If there are pointy objects, lightning will effectively burn through them. For an example of the damage of lightning on the nose of an airplane, click this link. 🙂 But then… Well… Let me copy Feynman as he introduces the electric force:

“Consider a force like gravitation which varies predominantly inversely as the square of the distance, but which is about a billion-billion-billion-billion times stronger. And with another difference. There are two kinds of “matter,” which we can call positive and negative. Like kinds repel and unlike kinds attract—unlike gravity where there is only attraction. What would happen? A bunch of positives would repel with an enormous force and spread out in all directions. A bunch of negatives would do the same.”

So that’s what happens. The charges spread out, in a fraction of a second, all away from each other, and so they stay on the surface only, because that’s as far away as they can get from each other. As mentioned above, we’re talking atomic or molecular layers really, so they don’t penetrate, despite the incredible charges and voltages involved. Let me continue the quote—just to illustrate the strength of the forces involved:

“But an evenly mixed bunch of positives and negatives would do something completely different. The opposite pieces would be pulled together by the enormous attractions. The net result would be that the terrific forces would balance themselves out almost perfectly, by forming tight, fine mixtures of the positive and the negative, and between two separate bunches of such mixtures there would be practically no attraction or repulsion at all. […] There is such a force: the electrical force. And all matter is a mixture of positive protons and negative electrons which are attracting and repelling with this great force. So perfect is the balance, however, that when you stand near someone else you don’t feel any force at all. If there were even a little bit of unbalance you would know it. If you were standing at arm’s length from someone and each of you had one percent more electrons than protons, the repelling force would be incredible. How great? Enough to lift the Empire State Building? No! To lift Mount Everest? No! The repulsion would be enough to lift a “weight” equal to that of the entire earth!”

So… Well… That’s it. I’ll close this post with the promised note on Coulomb’s constant and the electric constant, but it’s just an addendum, so you don’t have to read it if you don’t feel like it, Vincent. 🙂

Addendum: Coulomb’s constant and the electric constant

The k_e = 1/4πε₀ factor in Coulomb’s Law is just a constant of proportionality. Coulomb’s formula is simple enough – it says that the force is directly proportional to the amount of charge and inversely proportional to the square of the distance – but it would be a miracle if the units came out alright, wouldn’t it? Indeed, we measure distance in meter, charge in coulomb, and force in newton. Now, we could re-define one of those units so as to get rid of the 1/4πε₀ factor, but so that’s not what we’re going to do. Why not? First, the constant of proportionality depends on the medium. Indeed, ε₀is the so-called permittivity in a vacuum, so that’s in empty space. The constant of proportionality will be different in a gas, and it will be different for different gases and different temperatures and at different pressure. You can check it online if you want – just click the link here for some examples – but I guess you’ll believe me. So, if we write 1/4πε instead of k_e then we can put in a different ε for each medium and our formula is still OK.

Now, because you’re a smart kid, you’ll say that doesn’t quite answer the question: why do we write is as 1/4πε? Why don’t we simply write μ instead of 1/4πε, or just k or a or something? Well… There is an answer to that, but it’s complicated. First, the μ and μ₀ symbols are already used for something else: it’s something similar as ε and ε₀but then for magnetic fields. To be precise, μ₀ is referred to as the permeability of the vacuum (and μ is just the permeability of some non-vacuum medium, of course). Now, because electricity and magnetism are part of one and the same phenomenon in Nature (when you’re going for engineer, you’ll get one course on electromagnetism, not two separate ones), ε₀ are μ₀ related. In fact, they’re related through a marvelous formulas—a formula like E = mc² in physics or, in math, e^iπ+ 1 = 0. Don’t try to understand it. Just look at it:

c²ε₀μ₀ = (cε₀)(cμ₀) = 1

Amazing, isn’t it? The c here is the speed of light in a vacuum, obviously. So it’s a physical constant. In other words, unlike ε₀ or μ₀, it’s got nothing to do with proportionality or units: the speed of light is the speed of light no matter what units we use—meters or light-seconds or whatever. OK. Just swallow this and don’t pay too much attention. It’s just a digression, but let me finish it.

The equivalent of Coulomb’s Law in magnetism is Ampère’s Law, and it involves the circulation of a field, as illustrated below. So that’s why Ampère’s Law involves a 2π factor.

In fact, because we’re talking two wires (or two conductors) with currents going through them (I₁ and I₂respectively), the proportionality constant in Ampère’s Law is written as 2k_A.

Now, I won’t go too much into the detail but the thing about the circulation and that factor 2 in Ampère’s Law result in μ₀being written as μ₀ = 4π×10^–7N/A². As for the units: N is newton and A is ampere obviously. And so that’s why we have the 4π in the proportionality constant for Coulomb’s Law as well. And, of course, the (cε₀)(cμ₀) = 1 equation makes it obvious that cε₀ and cμ₀ are reciprocal numbers, so that’s why we write 1/4πε₀ for the proportionality constant in Coulomb’s Law, rather than k_eor a or whatever other simple thing. […] Well… Sort of. In any case, nothing to worry about. 🙂

The Uncertainty Principle and the stability of atoms

Pre-script (dated 26 June 2020): This post did not suffer too much from the attack on this blog by the the dark force. It remains relevant. 🙂

Original post:

The Model of the Atom

In one of my posts, I explained the quantum-mechanical model of an atom. Feynman sums it up as follows:

“The electrostatic forces pull the electron as close to the nucleus as possible, but the electron is compelled to stay spread out in space over a distance given by the Uncertainty Principle. If it were confined in too small a space, it would have a great uncertainty in momentum. But that means it would have a high expected energy—which it would use to escape from the electrical attraction. The net result is an electrical equilibrium not too different from the idea of Thompson—only is it the negative charge that is spread out, because the mass of the electron is so much smaller than the mass of the proton.”

This explanation is a bit sloppy, so we should add the following clarification: “The wave function Ψ(r) for an electron in an atom does not describe a smeared-out electron with a smooth charge density. The electron is either here, or there, or somewhere else, but wherever it is, it is a point charge.” (Feynman’s Lectures, Vol. III, p. 21-6)

The two quotes are not incompatible: it is just a matter of defining what we really mean by ‘spread out’. Feynman’s calculation of the Bohr radius of an atom in his introduction to quantum mechanics clears all confusion in this regard:

It is a nice argument. One may criticize he gets the right thing out because he puts the right things in – such as the values of e and m, for example 🙂 − but it’s nice nevertheless!

Mass as a Scale Factor for Uncertainty

Having complimented Feynman, the calculation above does raise an obvious question: why is it that we cannot confine the electron in “too small a space” but that we can do so for the nucleus (which is just one proton in the example of the hydrogen atom here). Feynman gives the answer above: because the mass of the electron is so much smaller than the mass of the proton.

Huh? What’s the mass got to do with it? The uncertainty is the same for protons and electrons, isn’t it?

Well… It is, and it isn’t. 🙂 The Uncertainty Principle – usually written in its more accurate σ_xσ_p ≥ ħ/2 expression – applies to both the electron and the proton – of course! – but the momentum p is the product of mass and velocity (p = m·v), and so it’s the proton’s mass that makes the difference here. To be specific, the mass of a proton is about 1836 times that of an electron. Now, as long as the velocities involved are non-relativistic—and they are non-relativistic in this case: the (relative) speed of electrons in atoms is given by the fine-structure constant α = v/c ≈ 0.0073, so the Lorentz factor is very close to 1—we can treat the m in the p = m·v identity as a constant and, hence, we can also write: Δp = Δ(m·v) = m·Δv. So all of the uncertainty of the momentum goes into the uncertainty of the velocity. Hence, the mass acts likes a reverse scale factor for the uncertainty. To appreciate what that means, let me write ΔxΔp = ħ as:

ΔxΔv = ħ/m

It is an interesting point, so let me expand the argument somewhat. We actually use a more general mathematical property of the standard deviation here: the standard deviation of a variable scales directly with the scale of the variable. Hence, we can write: σ(k·x) = k·σ(x), with k > 0. So the uncertainty is, indeed, smaller for larger masses. Larger masses are associated with smaller uncertainties in their position x. To be precise, the uncertainty is inversely proportional to the mass and, hence, the mass number effectively acts like a reverse scale factor for the uncertainty.

Of course, you’ll say that the uncertainty still applies to both factors on the left-hand side of the equation, and so you’ll wonder: why can’t we keep Δx the same and multiply Δv with m, so its product yields ħ again? In other words, why can’t we have a uncertainty in velocity for the proton that is 1836 times larger than the uncertainty in velocity for the electron? The answer to that question should be obvious: the uncertainty should not be greater than the expected value. When everything is said and done, we’re talking a distribution of some variable here (the velocity variable, to be precise) and, hence, that distribution is likely to be the Maxwell-Boltzmann distribution we introduced in previous posts. Its formula and graph are given below:

In statistics (and in probability theory), they call this a chi distribution with three degrees of freedom and a scale parameter which is equal to a = (kT/m)^1/2. The formula for the scale parameter shows how the mass of a particle indeed acts as a reverse scale parameter. The graph above shows three graphs for a = 1, 2 and 5 respectively. Note the square root though: quadrupling the mass (keeping kT the same) amounts to going from a = 2 to a = 1, so that’s halving a. Indeed, [kT/(4m)]^1/2= (1/2)(kT/m)^1/2. So we can’t just do what we want with Δv (like multiplying it with 1836, as suggested). In fact, the graph and the formulas show that Feynman’s assumption that we can equate p with Δp (i.e. his assumption that “the momenta must be of the order p = ħ/Δx, with Δx the spread in position”), more or less at least, is quite reasonable.

Of course, you are very smart and so you’ll have yet another objection: why can’t we associate a much higher momentum with the proton, as that would allow us to associate higher velocities with the proton? Good question. My answer to that is the following (and it might be original, as I didn’t find this anywhere else). When everything is said and done, we’re talking two particles in some box here: an electron and a proton. Hence, we should assume that the average kinetic energy of our electron and our proton is the same (if not, they would be exchanging kinetic energy until it’s more or less equal), so we write <m_electron·v²_electron/2> = <m_proton·v²_proton/2>. We can re-write this as m_p/m_e= 1/1836 = <v²_e>/<v²_p> and, therefore, <v²_e> = 1836·<v²_p>. Now, <v²> ≠ <v>² and, hence, <v> ≠ √<v²>. So the equality does not imply that the expected velocity of the electron is √1836 ≈ 43 times the expected velocity of the proton. Indeed, because of the particularities of the distribution, there is a difference between (a) the most probable speed, which is equal to √2·a ≈ 1.414·a, (b) the root mean square speed, which is equal to √<v²> = √3·a ≈ 1.732·a, and, finally, (c) the mean or expected speed, which is equal to <v> = 2·(2/π)^1/2·a ≈ 1.596·a.

However, we are not far off. We could use any of these three values to roughly approximate Δv, as well as the scale parameter a itself: our answers would all be of the same order. However, to keep the calculations simple, let’s use the most probable speed. Let’s equate our electron mass with unity, so the mass of our proton is 1836. Now, such mass implies a scale factor (i.e. a) that’s √1836 ≈ 43 times smaller. So the most probable speed of the proton and, therefore, its spread, would be about √2/√1836 = √(2/1836) ≈ 0.033 that of the electron, so we write: Δv_p ≈ 0.033·Δv_e. Now we can insert this in our ΔxΔv = ħ/m = ħ/1836 identity. We get: Δx_pΔv_p = Δx_p·√(2/1836)·Δv_e = ħ/1836. That, in turn, implies that √(2·1836)·Δx_p = ħ/Δv_e, which we can re-write as: Δx_p= Δx_e/√(2·1836) ≈ Δx_e/60. In other words, the expected spread in the position of the proton is about 60 times smaller than the expected spread of the electron. More in general, we can say that the spread in position of a particle, keeping all else equal, is inversely proportional to (2m)^1/2. Indeed, in this case, we multiplied the mass with about 1800, and we found that the uncertainty in position went down with a factor 1/60 = 1/√3600. Not bad as a result ! Is it precise? Well… It could be like √3·√m or 2·(2/π)^1/2··√m depending on our definition of ‘uncertainty’, but it’s all of the same order. So… Yes. Not bad at all… 🙂

You’ll raise a third objection now: the radius of a proton is measured using the femtometer scale, so that’s expressed in 10⁻¹⁵m, which is not 60 but a million times smaller than the nanometer (i.e. 10⁻⁹m) scale used to express the Bohr radius as calculated by Feynman above. You’re right, but the 10⁻¹⁵m number is the charge radius, not the uncertainty in position. Indeed, the so-called classical electron radius is also measured in femtometer and, hence, the Bohr radius is also like a million times that number. OK. That should settle the matter. I need to move on.

Before I do move on, let me relate the observation (i.e. the fact that the uncertainty in regard to position decreases as the mass of a particle increases) to another phenomenon. As you know, the interference of light beams is easy to observe. Hence, the interference of photons is easy to observe: Young’s experiment involved a slit of 0.85 mm (so almost 1 mm) only. In contrast, the 2012 double-slit experiment with electrons involved slits that were 62 nanometer wide, i.e. 62 billionths of a meter! That’s because the associated frequencies are so much higher and, hence, the wave zone is much smaller. So much, in fact, that Feynman could not imagine technology would ever be sufficiently advanced so as to actually carry out the double slit experiment with electrons. It’s an aspect of the same: the uncertainty in position is much smaller for electrons than it is for photons. Who knows: perhaps one day, we’ll be able to do the experiment with protons. 🙂 For further detail, I’ll refer you one of my posts on this.

What’s Explained, and What’s Left Unexplained?

There is another obvious question: if the electron is still some point charge, and going around as it does, why doesn’t it radiate energy? Indeed, the Rutherford-Bohr model had to be discarded because this ‘planetary’ model involved circular (or elliptical) motion and, therefore, some acceleration. According to classical theory, the electron should thus emit electromagnetic radiation, as a result of which it would radiate its kinetic energy away and, therefore, spiral in toward the nucleus. The quantum-mechanical model doesn’t explain this either, does it?

I can’t answer this question as yet, as I still need to go through all Feynman’s Lectures on quantum mechanics. You’re right. There’s something odd about the quantum-mechanical idea: it still involves a electron moving in some kind of orbital − although I hasten to add that the wavefunction is a complex-valued function, not some real function − but it does not involve any loss of kinetic energy due to circular motion apparently!

There are other unexplained questions as well. For example, the idea of an electrical point charge still needs to be re-conciliated with the mathematical inconsistencies it implies, as Feynman points out himself in yet another of his Lectures.

Finally, you’ll wonder as to the difference between a proton and a positron: if a positron and an electron annihilate each other in a flash, why do we have a hydrogen atom at all? Well… The proton is not the electron’s anti-particle. For starters, it’s made of quarks, while the positron is made of… Well… A positron is a positron: it’s elementary. But, yes, interesting question, and the ‘mechanics’ behind the mutual destruction are quite interesting and, hence, surely worth looking into—but not here. 🙂

Having mentioned a few things that remain unexplained, the model does have the advantage of solving plenty of other questions. It explains, for example, why the electron and the proton are actually right on top of each other, as they should be according to classical electrostatic theory, and why they are not at the same time: the electron is still a sort of ‘cloud’ indeed, with the proton at its center.

The quantum-mechanical ‘cloud’ model of the electron also explains why “the terrific electrical forces balance themselves out, almost perfectly, by forming tight, fine mixtures of the positive and the negative, so there is almost no attraction or repulsion at all between two separate bunches of such mixtures” (Richard Feynman, Introduction to Electromagnetism, p. 1-1) or, to quote from one of his other writings, why we do not fall through the floor as we walk:

“As we walk, our shoes with their masses of atoms push against the floor with its mass of atoms. In order to squash the atoms closer together, the electrons would be confined to a smaller space and, by the uncertainty principle, their momenta would have to be higher on the average, and that means high energy; the resistance to atomic compression is a quantum-mechanical effect and not a classical effect. Classically, we would expect that if we were to draw all the electrons and protons closer together, the energy would be reduced still further, and the best arrangement of positive and negative charges in classical physics is all on top of each other. This was well known in classical physics and was a puzzle because of the existence of the atom. Of course, the early scientists invented some ways out of the trouble—but never mind, we have the right way out, now!”

So that’s it, then. Except… Well…

The Fine-Structure Constant

When talking about the stability of atoms, one cannot escape a short discussion of the so-called fine-structure constant, denoted by α (alpha). I discussed it another post of mine, so I’ll refer you there for a more comprehensive overview. I’ll just remind you of the basics:

(1) α is the square of the electron charge expressed in Planck units: α = e_P².

(2) α is the square root of the ratio of (a) the classical electron radius and (b) the Bohr radius: α = √(r_e /r). You’ll see this more often written as r_e = α²r. Also note that this is an equation that does not depend on the units, in contrast to equation 1 (above), and 4 and 5 (below), which require you to switch to Planck units. It’s the square of a ratio and, hence, the units don’t matter. They fall away.

(3) α is the (relative) speed of an electron: α = v/c. [The relative speed is the speed as measured against the speed of light. Note that the ‘natural’ unit of speed in the Planck system of units is equal to c. Indeed, if you divide one Planck length by one Planck time unit, you get (1.616×10⁻³⁵m)/(5.391×10⁻⁴⁴s) = c m/s. However, this is another equation, just like (2), that does not depend on the units: we can express v and c in whatever unit we want, as long we’re consistent and express both in the same units.]

(4) Finally, α is also equal to the product of (a) the electron mass (which I’ll simply write as m_e here) and (b) the classical electron radius r_e (if both are expressed in Planck units): α = m_e·r_e. [I think that’s, perhaps, the most amazing of all of the expressions for α. If you don’t think that’s amazing, I’d really suggest you stop trying to study physics.]

Note that, from (2) and (4), we also find that:

(5) The electron mass (in Planck units) is equal m_e = α/r_e= α/α²r = 1/αr. So that gives us an expression, using α once again, for the electron mass as a function of the Bohr radius r expressed in Planck units.

Finally, we can also substitute (1) in (5) to get:

(6) The electron mass (in Planck units) is equal to m_e = α/r_e = e_P²/r_e. Using the Bohr radius, we get m_e = 1/αr = 1/e_P²r.

In addition, in the mentioned post, I also related α to the so-called coupling constant determining the strength of the interaction between electrons and photons. So… What a magical number indeed ! It suggests some unity that our little model of the atom above doesn’t quite capture. As far as I am concerned, it’s one of the many other ‘unexplained questions’, and one of my key objectives, as I struggle through Feynman’s Lectures, is to understand it all. 🙂 One of the issues is, of course, how to relate this coupling constant to the concept of a gauge, which I briefly discussed in my previous post. In short, I’ve still got a long way to go… 😦

Post Scriptum: The de Broglie relations and the Uncertainty Principle

My little exposé on mass being nothing but a scale factor in the Uncertainty Principle is a good occasion to reflect on the Uncertainty Principle once more. Indeed, what’s the uncertainty about, if it’s not about the mass? It’s about the position in space and velocity, i.e. it’s movement and time. Velocity or speed (i.e. the magnitude of the velocity vector) is, in turn, defined as the distance traveled divided by the time of travel, so the uncertainty is about time as well, as evidenced from the ΔEΔt = h expression of the Uncertainty Principle. But how does it work exactly?

Hmm… Not sure. Let me try to remember the context. We know that the de Broglie relation, λ = h/p, which associates a wavelength (λ) with the momentum (p) of a particle, is somewhat misleading, because we’re actually associating a (possibly infinite) bunch of component waves with a particle. So we’re talking some range of wavelengths (Δλ) and, hence, assuming all these component waves travel at the same speed, we’re also talking a frequency range (Δf). The bottom line is that we’ve got a wave packet and we need to distinguish the velocity of its phase (v_p) versus the group velocity (v_g), which corresponds to the classical velocity of our particle.

I think I explained that pretty well in one of my previous posts on the Uncertainty Principle, so I’d suggest you have a look there. The mentioned post explains how the Uncertainty Principle relates position (x) and momentum (p) as a Fourier pair, and it also explains that general mathematical property of Fourier pairs: the more ‘concentrated’ one distribution is, the more ‘spread out’ its Fourier transform will be. In other words, it is not possible to arbitrarily ‘concentrate’ both distributions, i.e. both the distribution of x (which I denoted as Ψ(x) as well as its Fourier transform, i.e. the distribution of p (which I denoted by Φ(p)). So, if we’d ‘squeeze’ Ψ(x), then its Fourier transform Φ(p) will ‘stretch out’.

That was clear enough—I hope! But how do we go from ΔxΔp = h to ΔEΔt = h? Why are energy and time another Fourier pair? To answer that question, we need to clearly define what energy and what time we are talking about. The argument revolves around the second de Broglie relation: E = h·f. How do we go from the momentum p to the energy E? And how do we go from the wavelength λ to the frequency f?

The answer to the first question is the energy-mass equivalence: E = mc², always. This formula is relativistic, as m is the relativistic mass, so it includes the rest mass m₀ as well as the equivalent mass of its kinetic energy m₀v²/2 + … [Note, indeed, that the kinetic energy – defined as the excess energy over its rest energy – is a rapidly converging series of terms, so only the m₀v²/2 term is mentioned.] Likewise, momentum is defined as p = mv, always, with m the relativistic mass, i.e. m = (1−v²/c²)^−1/2·m₀ = γ·m₀, with γ the Lorentz factor. The E = mc² and p = mv relations combined give us the E/c = m·c = p·c/v or E·v/c = p·c relationship, which we can also write as E/p = c²/v. However, we’ll need to write E as a function of p for the purpose of a derivation. You can verify that E²− p²c²= m₀²c⁴) and, hence, that E = (p²c²+ m₀²c⁴)^1/2.

Now, to go from a wavelength to a frequency, we need the wave velocity, and we’re obviously talking the phase velocity here, so we write: v_p = λ·f. That’s where the de Broglie hypothesis comes in: de Broglie just assumed the Planck-Einstein relation E = h·ν, in which ν is the frequency of a massless photon, would also be valid for massive particles, so he wrote: E = h·f. It’s just a hypothesis, of course, but it makes everything come out alright. More in particular, the phase velocity v_p = λ·f can now be re-written, using both de Broglie relations (i.e. h/p = λ and E/h = f) as v_p = (E/h)·(p/h) = E/p = c²/v. Now, because v is always smaller than c for massive particles (and usually very much smaller), we’re talking a superluminal phase velocity here! However, because it doesn’t carry any signal, it’s not inconsistent with relativity theory.

Now what about the group velocity? To calculate the group velocity, we need the frequencies and wavelengths of the component waves. The dispersion relation assumes the frequency of each component wave can be expressed as a function of its wavelength, so f = f(λ). Now, it takes a bit of wave mechanics (which I won’t elaborate on here) to show that the group velocity is the derivative of f with respect to λ, so we write v_g = ∂f/∂λ. Using the two de Broglie relations, we get: v_g = ∂f/∂λ = ∂(E/h)/∂(p/h) = ∂E/∂p = ∂[p²c²+ m₀²c⁴)^1/2]/∂p. Now, when you write it all out, you should find that v_g = ∂f/∂λ = pc²/E = c²/v_p = v, so that’s the classical velocity of our particle once again.

Phew! Complicated! Yes. But so we still don’t have our ΔEΔt = h expression! All of the above tells us how we can associate a range of momenta (Δp) with a range of wavelengths (Δλ) and, in turn, with a frequency range (Δf) which then gives us some energy range (ΔE), so the logic is like:

Δp ⇒ Δλ ⇒ Δf ⇒ ΔE

Somehow, the same sequence must also ‘transform’ our Δx into Δt. I googled a bit, but I couldn’t find any clear explanation. Feynman doesn’t seem to have one in his Lectures either so, frankly, I gave up. What I did do in one of my previous posts, is to give some interpretation. However, I am not quite sure if it’s really the interpretation: there are probably several ones. It must have something to do with the period of a wave, but I’ll let you break your head over it. 🙂 As far as I am concerned, it’s just one of the other unexplained questions I have as I sort of close my study of ‘classical’ physics. So I’ll just make a mental note of it. [Of course, please don’t hesitate to send me your answer, if you’d have one!] Now it’s time to really dig into quantum mechanics, so I should really stay silent for quite a while now! 🙂

Maxwell, Lorentz, gauges and gauge transformations

Pre-script (dated 26 June 2020): This post got severely mutilated by the removal of material by the dark force. It may, therefore, be difficult to follow the main story-line.

Original post:

I’ve done quite a few posts already on electromagnetism. They were all focused on the math one needs to understand Maxwell’s equations. Maxwell’s equations are a set of (four) differential equations, so they relate some function with its derivatives. To be specific, they relate E and B, i.e. the electric and magnetic field vector respectively, with their derivatives in space and in time. [Let me be explicit here: E and B have three components, but depend on both space as well as time, so we have three dependent and four independent variables for each function: E = (E_x, E_y, E_z) = E(x, y, z, t) and B = (B_x, B_y, B_z) = B(x, y, z, t).] That’s simple enough to understand, but the dynamics involved are quite complicated, as illustrated below.

I now want to do a series on the more interesting stuff, including an exploration of the concept of gauge in field theory, and I also want to show how one can derive the wave equation for electromagnetic radiation from Maxwell’s equations. Before I start, let’s recall the basic concept of a field.

The reality of fields

I said a couple of time already that (electromagnetic) fields are real. They’re more than just a mathematical structure. Let me show you why. Remember the formula for the electrostatic potential caused by some charge q at the origin:

We know that the (negative) gradient of this function, at any point in space, gives us the electric field vector at that point: E = –∇Φ. [The minus sign is there because of convention: we take the reference point Φ = 0 at infinity.] Now, the electric field vector gives us the force on a unit charge (i.e. the charge of a proton) at that point. If q is some positive charge, the force will be repulsive, and the unit charge will accelerate away from our q charge at the origin. Hence, energy will be expended, as force over distance implies work is being done: as the charges separate, potential energy is converted into kinetic energy. Where does the energy come from? The energy conservation law tells us that it must come from somewhere.

It does: the energy comes from the field itself. Bringing in more or bigger charges (from infinity, or just from further away) requires more energy. So the new charges change the field and, therefore, its energy. How exactly? That’s given by Gauss’ Law: the total flux out of a closed surface is equal to:

You’ll say: flux and energy are two different things. Well… Yes and no. The energy in the field depends on E. Indeed, the formula for the energy density in space (i.e. the energy per unit volume) is

Getting the energy over a larger space is just another integral, with the energy density as the integral kernel:

Feynman’s illustration below is not very sophisticated but, as usual, enlightening. 🙂

Gauss’ Theorem connects both the math as well as the physics of the situation and, as such, underscores the reality of fields: the energy is not in the electric charges. The energy is in the fields they produce. Everything else is just the principle of superposition of fields – i.e. E = E₁+ E₂– coming into play. I’ll explain Gauss’ Theorem in a moment. Let me first make some additional remarks.

First, the formulas are valid for electrostatics only (so E and B only vary in space, not in time), so they’re just a piece of the larger puzzle. 🙂 As for now, however, note that, if a field is real (or, to be precise, if its energy is real), then the flux is equally real.

Second, let me say something about the units. Field strength (E or, in this case, its normal component E_n = E·n) is measured in newton (N) per coulomb (C), so in N/C. The integral above implies that flux is measured in (N/C)·m². It’s a weird unit because one associates flux with flow and, therefore, one would expect flux is some quantity per unit time and per unit area, so we’d have the m² unit (and the second) in the denominator, not in the numerator. But so that’s true for heat transfer, for mass transfer, for fluid dynamics (e.g. the amount of water flowing through some cross-section) and many other physical phenomena. But for electric flux, it’s different. You can do a dimensional analysis of the expression above: the sum of the charges is expressed in coulomb (C), and the electric constant (i.e. the vacuum permittivity) is expressed in C²/(N·m²), so, yes, it works: C/[C²/(N·m²)] = (N/C)·m². To make sense of the units, you should think of the flux as the total flow, and of the field strength as a surface density, so that’s the flux divided by the total area, so (field strength) = (flux)/(area). Conversely, (flux) = (field strength)×(area). Hence, the unit of flux is [flux] = [field strength]×[area] = (N/C)·m².

OK. Now we’re ready for Gauss’ Theorem. 🙂 I’ll also say something about its corollary, Stokes’ Theorem. It’s a bit of a mathematical digression but necessary, I think, for a better understanding of all those operators we’re going to use.

Gauss’ Theorem

The concept of flux is related to the divergence of a vector field through Gauss’ Theorem. Gauss’s Theorem has nothing to do with Gauss’ Law, except that both are associated with the same genius. Gauss’ Theorem is:

The ∇·C in the integral on the right-hand side is the divergence of a vector field. It’s the volume density of the outward flux of a vector field from an infinitesimal volume around a given point.

Huh? What’s a volume density? Good question. Just substitute C for E in the surface and volume integral above (the integral on the left is a surface integral, and the one on the right is a volume integral), and think about the meaning of what’s written. To help you, let me also include the concept of linear density, so we have (1) linear, (2) surface and (3) volume density. Look at that representation of a vector field once again: we said the density of lines represented the magnitude of E. But what density? The representation hereunder is flat, so we can think of a linear density indeed, measured along the blue line: so the flux would be six (that’s the number of lines), and the linear density (i.e. the field strength) is six divided by the length of the blue line.

However, we defined field strength as a surface density above, so that’s the flux (i.e. the number of field lines) divided by the surface area (i.e. the area of a cross-section): think of the square of the blue line, and field lines going through that square. That’s simple enough. But what’s volume density? How do we count the number of lines inside of a box? The answer is: mathematicians actually define it for an infinitesimally small cube by adding the fluxes out of the six individual faces of an infinitesimally small cube:

So, the truth is: volume density is actually defined as a surface density, but for an infinitesimally small volume element. That, in turn, gives us the meaning of the divergence of a vector field. Indeed, the sum of the derivatives above is just ∇·C (i.e. the divergence of C), and ΔxΔyΔz is the volume of our infinitesimal cube, so the divergence of some field vector C at some point P is the flux – i.e. the outgoing ‘flow’ of C – per unit volume, in the neighborhood of P, as evidenced by writing

Indeed, just bring ΔV to the other side of the equation to check the ‘per unit volume’ aspect of what I wrote above. The whole idea is to determine whether the small volume is like a sink or like a source, and to what extent. Think of the field near a point charge, as illustrated below. Look at the black lines: they are the field lines (the dashed lines are equipotential lines) and note how the positive charge is a source of flux, obviously, while the negative charge is a sink.

Now, the next step is to acknowledge that the total flux from a volume is the sum of the fluxes out of each part. Indeed, the flux through the part of the surfaces common to two parts will cancel each other out. Feynman illustrates that with a rough drawing (below) and I’ll refer you to his Lecture on it for more detail.

So… Combining all of the gymnastics above – and integrating the divergence over an entire volume, indeed – we get Gauss’ Theorem:

Stokes’ Theorem

There is a similar theorem involving the circulation of a vector, rather than its flux. It’s referred to as Stokes’ Theorem. Let me jot it down:

We have a contour integral here (left) and a surface integral (right). The reasoning behind is quite similar: a surface bounded by some loop Γ is divided into infinitesimally small squares, and the circulation around Γ is the sum of the circulations around the little loops. We should take care though: the surface integral takes the normal component of ∇×C, so that’s (∇×C)_n= (∇×C)·n. The illustrations below should help you to understand what’s going on.

The electric versus the magnetic force

There’s more than just the electric force: we also have the magnetic force. The so-called Lorentz force is the combination of both. The formula, for some charge q in an electromagnetic field, is equal to:

Hence, if the velocity vector v is not equal to zero, we need to look at the magnetic field vector B too! The simplest situation is magnetostatics, so let’s first have a look at that.

Magnetostatics imply that that the flux of E doesn’t change, so Maxwell’s third equation reduces to c²∇×B = j/ε₀. So we just have a steady electric current (j): no accelerating charges. Maxwell’s fourth equation, ∇•B = 0, remains what is was: there’s no such thing as a magnetic charge. The Lorentz force also remains what it is, of course: F = q(E+v×B) = qE +qv×B. Also note that the v, j and the lack of a magnetic charge all point to the same: magnetism is just a relativistic effect of electricity.

What about units? Well… While the unit of E, i.e. the electric field strength, is pretty obvious from the F = qE term – hence, E = F/q, and so the unit of E must be [force]/[charge] = N/C – the unit of the magnetic field strength is more complicated. Indeed, the F = qv×B identity tells us it must be (N·s)/(m·C), because 1 N = 1C·(m/s)·(N·s)/(m·C). Phew! That’s as horrendous as it looks, and that’s why it’s usually expressed using its shorthand, i.e. the tesla: 1 T = 1 (N·s)/(m·C). Magnetic flux is the same concept as electric flux, so it’s (field strength)×(area). However, now we’re talking magnetic field strength, so its unit is T·m²= (N·s·m²)/(m·C) = (N·s·m)/C, which is referred to as the weber (Wb). Remembering that 1 volt = 1 N·m/C, it’s easy to see that a weber is also equal to 1 Wb = 1 V·s. In any case, it’s a unit that is not so easy to interpret.

Magnetostatics is a bit of a weird situation. It assumes steady fields, so the ∂E/∂t and ∂B/∂t terms in Maxwell’s equations can be dropped. In fact, c²∇×B = j/ε₀ implies that ∇·(c²∇×B ) = ∇·(j/ε₀) and, therefore, that ∇·j = 0. Now, ∇·j = –∂ρ/∂t and, therefore, magnetostatics is a situation which assumes ∂ρ/∂t = 0. So we have electric currents but no change in charge densities. To put it simply, we’re not looking at a condenser that is charging or discharging, although that condenser may act like the battery or generator that keeps the charges flowing! But let’s go along with the magnetostatics assumption. What can we say about it? Well… First, we have the equivalent of Gauss’ Law, i.e. Ampère’s Law:

We have a line integral here around a closed curve, instead of a surface integral over a closed surface (Gauss’ Law), but it’s pretty similar: instead of the sum of the charges inside the volume, we have the current through the loop, and then an extra c² factor in the denominator, of course. Combined with the ∇•B = 0 equation, this equation allows us to solve practical problems. But I am not interested in practical problems. What’s the theory behind?

The magnetic vector potential

The ∇•B = 0 equation is true, always, unlike the ∇×E = 0 expression, which is true for electrostatics only (no moving charges). It says the divergence of B is zero, always, and, hence, it means we can represent B as the curl of another vector field, always. That vector field is referred to as the magnetic vector potential, and we write:

∇·B = ∇·(∇×A) = 0 and, hence, B = ∇×A

In electrostatics, we had the other theorem: if the curl of a vector field is zero (everywhere), then the vector field can be represented as the gradient of some scalar function, so if ∇×C = 0, then there is some Ψ for which C = ∇Ψ. Substituting C for E, and taking into account our conventions on charge and the direction of flow, we get E = –∇Φ. Substituting E in Maxwell’s first equation (∇•E = ρ/ε₀) then gave us the so-called Poisson equation: ∇²Φ = ρ/ε₀, which sums up the whole subject of electrostatics really! It’s all in there!

Except magnetostatics, of course. Using the (magnetic) vector potential A, all of magnetostatics is reduced to another expression:

∇²A= −j/ε₀, with ∇·A = 0

Note the qualifier: ∇·A = 0. Why should the divergence of A be equal to zero? You’re right. It doesn’t have to be that way. We know that ∇·(∇×C) = 0, for any vector field C, and always (it’s a mathematical identity, in fact, so it’s got nothing to do with physics), but choosing A such that ∇·A = 0 is just a choice. In fact, as I’ll explain in a moment, it’s referred to as choosing a gauge. The ∇·A = 0 choice is a very convenient choice, however, as it simplifies our equations. Indeed, c²∇×B = j/ε₀ = c²∇×(∇×A), and – from our vector calculus classes – we know that ∇×(∇×C) = ∇(∇·C) – ∇²C. Combining that with our choice of A (which is such that ∇·A = 0, indeed), we get the ∇²A= −j/ε₀expression indeed, which sums up the whole subject of magnetostatics!

The point is: if the time derivatives in Maxwell’s equations, i.e. ∂E/∂t and ∂B/∂t, are zero, then Maxwell’s four equations can be nicely separated into two pairs: the electric and magnetic field are not interconnected. Hence, as long as charges and currents are static, electricity and magnetism appear as distinct phenomena, and the interdependence of E and B does not appear. So we re-write Maxwell’s set of four equations as:

Electrostatics: ∇•E = ρ/ε₀ and ∇×E = 0
Magnetostatics: ∇×B = j/c²ε₀ and ∇•B = 0

Note that electrostatics is a neat example of a vector field with zero curl and a given divergence (ρ/ε₀), while magnetostatics is a neat example of a vector field with zero divergence and a given curl (j/c²ε₀).

Electrodynamics

But reality is usually not so simple. With time-varying fields, Maxwell’s equations are what they are, and so there is interdependence, as illustrated in the introduction of this post. Note, however, that the magnetic field remains divergence-free in dynamics too! That’s because there is no such thing as a magnetic charge: we only have electric charges. So ∇·B = 0 and we can define a magnetic vector potential A and re-write B as B = ∇×A, indeed.

I am writing a vector potential field because, as I mentioned a couple of times already, we can choose A. Indeed, as long as ∇·A = 0, it’s fine, so we can add curl-free components to the magnetic potential: it won’t make a difference. This condition is referred to as gauge invariance. I’ll come back to that, and also show why this is what it is.

While we can easily get B from A because of the B = ∇×A, getting E from some potential is a different matter altogether. It turns out we can get E using the following expression, which involves both Φ (i.e. the electric or electrostatic potential) as well as A (i.e. the magnetic vector potential):

E = –∇Φ – ∂A/∂t

Likewise, one can show that Maxwell’s equations can be re-written in terms of Φ and A, rather than in terms of E and B. The expression looks rather formidable, but don’t panic:

Just look at it. We have two ‘variables’ here (Φ and A) and two equations, so the system is fully defined. [Of course, the second equation is three equations really: one for each component x, y and z.] What’s the point? Why would we want to re-write Maxwell’s equations? The first equation makes it clear that the scalar potential (i.e. the electric potential) is a time-varying quantity, so things are not, somehow, simpler. The answer is twofold. First, re-writing Maxwell’s equations in terms of the scalar and vector potential makes sense because we have (fairly) easy expressions for their value in time and in space as a function of the charges and currents. For statics, these expressions are:

So it is, effectively, easier to first calculate the scalar and vector potential, and then get E and B from them. For dynamics, the expressions are similar:

Indeed, they are like the integrals for statics, but with “a small and physically appealing modification”, as Feynman notes: when doing the integrals, we must use the so-called retarded time $t' = t - r 12 /ct’$ . The illustration below shows how it works: the influences propagate from point (2) to point (1) at the speed c, so we must use the values of ρ and j at the time $t' = t - r 12 /ct’$ indeed!

The second aspect of the answer to the question of why we’d be interested in Φ and A has to do with the topic I wanted to write about here: the concept of a gauge and a gauge transformation.

Gauges and gauge transformations in electromagnetics

Let’s see what we’re doing really. We calculate some A and then solve for B by writing: B = ∇×A. Now, I say some A because any A‘ = A + ∇Ψ, with Ψ any scalar field really. Why? Because the curl of the gradient of Ψ – i.e. curl(gradΨ) = ∇×(∇Ψ) – is equal to 0. Hence, ∇×(A + ∇Ψ) = ∇×A + ∇×∇Ψ = ∇×A.

So we have B, and now we need E. So the next step is to take Faraday’s Law, which is Maxwell’s second equation: ∇×E = –∂B/∂t. Why this one? It’s a simple one, as it does not involve currents or charges. So we combine this equation and our B = ∇×A expression and write:

∇×E = –∂(∇×A)/∂t

Now, these operators are tricky but you can verify this can be re-written as:

∇×(E + ∂A/∂t) = 0

Looking carefully, we see this expression says that E + ∂A/∂t is some vector whose curl is equal to zero. Hence, this vector must be the gradient of something. When doing electrostatics, When we worked on electrostatics, we only had E, not the ∂A/∂t bit, and we said that E tout court was the gradient of something, so we wrote $E = - \nabla Φ. We now do the same thing for E + \partial A /\partialt, so we write:$

E + ∂A/∂t = −∇Φ

So we use the same symbol Φ but it’s a bit of a different animal, obviously. However, it’s easy to see that, if the ∂A/∂t would disappear (as it does in electrostatics, where nothing changes with time), we’d get our ‘old’ −∇Φ. Now, E + ∂A/∂t = −∇Φ can be written as:

E = −∇Φ – ∂A/∂t

So, what’s the big deal? We wrote B and E as a function of Φ and A. Well, we said we could replace A by any A‘ = A + ∇Ψ but, obviously, such substitution would not yield the same E. To get the same E, we need some substitution rule for Φ as well. Now, you can verify we will get the same E if we’d substitute Φ for Φ’ = Φ – ∂Ψ/∂t. You should check it by writing it all out:

E = −∇Φ’–∂A’/∂t = −∇(Φ–∂Ψ/∂t)–∂(A+∇Ψ)/∂t

= −∇Φ+∇(∂Ψ/∂t)–∂A/∂t–∂(∇Ψ)/∂t = −∇Φ – ∂A/∂t = E

Again, the operators are a bit tricky, but the +∇(∂Ψ/∂t) and –∂(∇Ψ)/∂t terms do cancel out. Where are we heading to? When everything is said and done, we do need to relate it all to the currents and the charges, because that’s the real stuff out there. So let’s take Maxwell’s ∇•E = ρ/ε₀ equation, which has the charges in it, and let’s substitute E for E = −∇Φ – ∂A/∂t. We get:

That equation can be re-written as:

So we have one equation here relating Φ and A to the sources. We need another one, and we also need to separate Φ and A somehow. How do we do that?

Maxwell’s fourth equation, i.e. c²∇×B = j/ε₀+ ∂E/∂t can, obviously, be written as c²∇×B − ∂E/∂t = j/ε₀. Substituting both E and B yields the following monstrosity:

We can now apply the general ∇×(∇×C) = ∇(∇·C) – ∇²C identity to the first term to get:

It’s equally monstrous, obviously, but we can simplify the whole thing by choosing Φ and A in a clever way. For the magnetostatic case, we chose A such that ∇·A = 0. We could have chosen something else. Indeed, it’s not because B is divergence-free, that A has to be divergence-free too! For example, I’ll leave it to you to show that choosing ∇·A such that

also respects the general condition that any A and Φ we choose must respect the A‘ = A + ∇Ψ and Φ’ = Φ – ∂Ψ/∂t equalities. Now, if we choose ∇·A such that ∇·A = −c^–2·∂Φ/∂t indeed, then the two middle terms in our monstrosity cancel out, and we’re left with a much simpler equation for A:

In addition, doing the substitution in our other equation relating Φ and A to the sources yields an equation for Φ that has the same form:

What’s the big deal here? Well… Let’s write it all out. The equation above becomes:

That’s a wave equation in three dimensions. In case you wonder, just check one of my posts on wave equations. The one-dimensional equivalent for a wave propagating in the x direction at speed c (like a sound wave, for example) is ∂²Φ/∂x²= c^–2·∂²Φ/∂t², indeed. The equation for A yields above yields similar wave functions for A‘s components A_x, A_y, and A_z.

So, yes, it is a big deal. We’ve written Maxwell’s equations in terms of the scalar (Φ) and vector (A) potential and in a form that makes immediately apparent that we’re talking electromagnetic waves moving out at the speed c. Let me copy them again:

You may, of course, say that you’d rather have a wave equation for E and B, rather than for A and Φ. Well… That can be done. Feynman gives us two derivations that do so. The first derivation is relatively simple and assumes the source our electromagnetic wave moves in one direction only. The second derivation is much more complicated and gives an equation for E that, if you’ve read the first volume of Feynman’s Lectures, you’ll surely remember:

The links are there, and so I’ll let you have fun with those Lectures yourself. I am finished here, indeed, in terms of what I wanted to do in this post, and that is to say a few words about gauges in field theory. It’s nothing much, really, and so we’ll surely have to discuss the topic again, but at least you now know what a gauge actually is in classical electromagnetic theory. Let’s quickly go over the concepts:

Choosing the ∇·A is choosing a gauge, or a gauge potential (because we’re talking scalar and vector potential here). The particular choice is also referred to as gauge fixing.
Changing A by adding ∇ψ is called a gauge transformation, and the scalar function Ψ is referred to as a gauge function. The fact that we can add curl-free components to the magnetic potential without them making any difference is referred to as gauge invariance.
Finally, the ∇·A = −c^–2·∂Φ/∂t gauge is referred to as a Lorentz gauge.

Just to make sure you understand: why is that Lorentz gauge so special? Well… Look at the whole argument once more: isn’t it amazing we get such beautiful (wave) equations if we stick it in? Also look at the functional shape of the gauge itself: it looks like a wave equation itself! […] Well… No… It doesn’t. I am a bit too enthusiastic here. We do have the same 1/c² and a time derivative, but it’s not a wave equation. 🙂 In any case, it all confirms, once again, that physics is all about beautiful mathematical structures. But, again, it’s not math only. There’s something real out there. In this case, that ‘something’ is a traveling electromagnetic field. 🙂

But why do we call it a gauge? That should be equally obvious. It’s really like choosing a gauge in another context, such as measuring the pressure of a tyre, as shown below. 🙂

Gauges and group theory

You’ll usually see gauges mentioned with some reference to group theory. For example, you will see or hear phrases like: “The existence of arbitrary numbers of gauge functions ψ(r, t) corresponds to the U(1) gauge freedom of the electromagnetic theory.” The U(1) notation stands for a unitary group of degree n = 1. It is also known as the circle group. Let me copy the introduction to the unitary group from the Wikipedia article on it:

In mathematics, the unitary group of degree n, denoted U(n), is the group of n × n unitary matrices, with the group operation that of matrix multiplication. The unitary group is a subgroup of the general linear group GL(n, C). In the simple case n = 1, the group U(1) corresponds to the circle group, consisting of all complex numbers with absolute value 1 under multiplication. All the unitary groups contain copies of this group.

The unitary group U(n) is a real Lie group of of dimension n². The Lie algebra of U(n) consists of n × n skew-Hermitian matrices, with the Lie bracket given by the commutator. The general unitary group (also called the group of unitary similitudes) consists of all matrices A such that A*A is a nonzero multiple of the identity matrix, and is just the product of the unitary group with the group of all positive multiples of the identity matrix.

Phew! Does this make you any wiser? If anything, it makes me realize I’ve still got a long way to go. 🙂 The Wikipedia article on gauge fixing notes something that’s more interesting (if only because I more or less understand what it says):

Although classical electromagnetism is now often spoken of as a gauge theory, it was not originally conceived in these terms. The motion of a classical point charge is affected only by the electric and magnetic field strengths at that point, and the potentials can be treated as a mere mathematical device for simplifying some proofs and calculations. Not until the advent of quantum field theory could it be said that the potentials themselves are part of the physical configuration of a system. The earliest consequence to be accurately predicted and experimentally verified was the Aharonov–Bohm effect, which has no classical counterpart.

This confirms, once again, that the fields are real. In fact, what this says is that the potentials are real: they have a meaningful physical interpretation. I’ll leave it to you to expore that Aharanov-Bohm effect. In the meanwhile, I’ll study what Feynman writes on potentials and all that as used in quantum physics. It will probably take a while before I’ll get into group theory though.

Indeed, it’s probably best to study physics at a somewhat less abstract level first, before getting into the more sophisticated stuff.

The blackbody radiation problem revisited: quantum statistics

Original post:

The equipartition theorem – which states that the energy levels of the modes of any (linear) system, in classical as well as in quantum physics, are always equally spaced – is deep and fundamental in physics. In my previous post, I presented this theorem in a very general and non-technical way: I did not use any exponentials, complex numbers or integrals. Just simple arithmetic. Let’s go a little bit beyond now, and use it to analyze that blackbody radiation problem which bothered 19th century physicists, and which led Planck to ‘discover’ quantum physics. [Note that, once again, I won’t use any complex numbers or integrals in this post, so my kids should actually be able to read through it.]

Before we start, let’s quickly introduce the model again. What are we talking about? What’s the black box? The idea is that we add heat to atoms (or molecules) in a gas. The heat results in the atoms acquiring kinetic energy, and the kinetic theory of gases tells us that the mean value of the kinetic energy for each independent direction of motion will be equal to kT/2. The blackbody radiation model analyzes the atoms (or molecules) in a gas as atomic oscillators. Oscillators have both kinetic as well as potential energy and, on average, the kinetic and potential energy is the same. Hence, the energy in the oscillation is twice the kinetic energy, so its average energy is 〈E〉 = 2·kT/2 = kT. However, oscillating atoms implies oscillating electric charges. Now, electric charges going up and down radiate light and, hence, as light is emitted, energy flows away.

How exactly? It doesn’t matter. It is worth noting that 19th century physicists had no idea about the inner structure of an atom. In fact, at that time, the term electron had not yet been invented: the first atomic model involving electrons was the so-called plum pudding model, which J.J. Thompson advanced in 1904, and he called electrons “negative corpuscles“. And the Rutherford-Bohr model, which is the first model one can actually use to explain how and why excited atoms radiate light, came in 1913 only, so that’s long after Planck’s solution for the blackbody radiation problem, which he presented to the scientific community in December 1900. It’s really true: it doesn’t matter. We don’t need to know about the specifics. The general idea is all that matters. As Feynman puts it: it’s how “A hot stove cools on a cold night, by radiating the light into the sky, because the atoms are jiggling their charge and they continually radiate, and slowly, because of this radiation, the jiggling motion slows down.” 🙂

His subsequent description of the black box is equally simple: “If we enclose the whole thing in a box so that the light does not go away to infinity, then we can eventually get thermal equilibrium. We may either put the gas in a box where we can say that there are other radiators in the box walls sending light back or, to take a nicer example, we may suppose the box has mirror walls. It is easier to think about that case. Thus we assume that all the radiation that goes out from the oscillator keeps running around in the box. Then, of course, it is true that the oscillator starts to radiate, but pretty soon it can maintain its kT of energy in spite of the fact that it is radiating, because it is being illuminated, we may say, by its own light reflected from the walls of the box. That is, after a while there is a great deal of light rushing around in the box, and although the oscillator is radiating some, the light comes back and returns some of the energy that was radiated.”

So… That’s the model. Don’t you just love the simplicity of the narrative here? 🙂 Feynman then derives Rayleigh’s Law, which gives us the frequency spectrum of blackbody radiation as predicted by classical theory, i.e. the intensity (I) of the light as a function of (a) its (angular) frequency (ω) and (b) the average energy of the oscillators, which is nothing but the temperature of the gas (Boltzmann’s constant k is just what it is: a proportionality constant which makes the units come out alright). The other stuff in the formula, given hereunder, are just more constants (and, yes, the c is the speed of light!). The grand result is:

The formula looks formidable but the function is actually very simple: it’s quadratic in ω and linear in 〈E〉 = kT. The rest is just a bunch of constants which ensure all of the units we use to measures stuff come out alright. As you may suspect, the derivation of the formula is not so simple as the narrative of the black box model, and so I won’t copy it here (you can check yourself). Indeed, let’s focus on the results, not on the technicalities. Let’s have a look at the graph.

The I(ω) graphs for T = T₀ and T = 2T₀ are given by the solid black curves. They tell us how much light we should have at different frequencies. They just go up and up and up, so Rayleigh’s Law implies that, when we open our stove – and, yes, I know, some kids don’t know what a stove is – and take a look, we should burn our eyes from x-rays. We know that’s not the case, in reality, so our theory must be wrong. An even bigger problem is that the curve implies that the total energy in the box, i.e. the total of all this intensity summed up over all frequencies, is infinite: we’ve got an infinite curve here indeed, and so an infinite area under it. Therefore, as Feynman puts it: “Rayleigh’s Law is fundamentally, powerfully, and absolutely wrong.” The actual graphs, indeed, are the dashed curves. I’ll come back to them.

The blackbody radiation problem is history, of course. So it’s no longer a problem. Let’s see how the equipartition theorem solved it. We assume our oscillators can only take on equally spaced energy levels, with the space between them equal to h·f = ħ·ω. The frequency f (or ω = 2π·f) is the fundamental frequency of our oscillator, and you know h and ħ = h/2π, course: Planck’s constant. Hence, the various energy levels are given by the following formula: E_n = n·ħ·ω = n·h·f. The first five are depicted below.

Next to the energy levels, we write the probability of an oscillator occupying that energy level, which is given by Boltzmann’s Law. I wrote about Boltzmann’s Law in another post too, so I won’t repeat myself here, except for noting that Boltzmann’s Law says that the probabilities of different conditions of energy are given by e^−energy/kT = 1/e^energy/kT. Different ‘conditions of energy’ can be anything: density, molecular speeds, momenta, whatever. Here we have a probability P_n as a function of the energy E_n = n·ħ·ω, so we write: P_n = A·e^−energy/kT= A·e^{−n·ħ·ω/kT}. [Note that P₀ is equal to A, as a consequence.]

Now, we need to determine how many oscillators we have in each of the various energy states, so that’s N₀, N₁, N₂, etcetera. We’ve done that before: N₁/N₀ = P₁/P₀ = (A·e^−2ħω/kT)/(A·e^−ħω/kT) = e^−ħω/kT. Hence, N₁ = N₀·e^−ħω/kT. Likewise, it’s not difficult to see that, N₂ = N₀·e^−2ħω/kTor, more in general, that N_n = N₀·e^−nħω/kT = N₀·[e^−ħω/kT]ⁿ. To make the calculations somewhat easier, Feynman temporarily substitutes e^−ħω/kT for x. Hence, we write: N₁ = N₀·x, N₂ = N₀·x²,…, N_n = N₀·xⁿ, and the total number of oscillators is obviously N_tot = N₀+N₁+…+N_n+… = N₀·(1+x+x²+…+xⁿ+…).

What about their energy? The energy of all oscillators in state 0 is, obviously, zero. The energy of all oscillators in state 1 is N₁·ħω = ħω·N₀·x. Adding it all up for state 2 yields N₂·2·ħω = 2·ħω·N₀·x². More generally, the energy of all oscillators in state n is equal to N_n·n·ħω = n·ħω·N₀·xⁿ. So now we can write the total energy of the whole system as E_tot = E₀+E₁+…+E_n+… = 0+ħω·N₀·x+2·ħω·N₀·x²+…+n·ħω·N₀·xⁿ+… = ħω·N₀·(x+2x²+…+nxⁿ+…). The average energy of one oscillator, for the whole system, is therefore:

Now, Feynman leaves the exercise of simplifying that expression to the reader and just says it’s equal to:

I should try to figure out how he does that. It’s something like Horner’s rule but that’s not easy with infinite polynomials. Or perhaps it’s just some clever way of factoring both polynomials. I didn’t break my head over it but just checked if the result is correct. [I don’t think Feynman would dare to joke here, but one could never be sure with him it seems. :-)] Note he substituted e^−ħω/kT for x, not e^+ħω/kT, so there is a minus sign there, which we don’t have in the formula above. Hence, the denominator, e^ħω/kT–1 = (1/x)–1 = (1–x)/x, and 1/(e^ħω/kT–1) = x/(1–x). Now, if (x+2x²+…+nxⁿ+…)/(1+x+x²+…+xⁿ+…) = x/(1–x), then (x+2x²+…+nxⁿ+…)·(1–x) must be equal to x·(1+x+x²+…+xⁿ+…). Just write it out: (x+2x²+…+nxⁿ+…)·(1–x) = x+2x²+…+nxⁿ+….−x²−2x³−…−nxⁿ⁺¹+… = x+x²+…+xⁿ+… Likewise, we get x·(1+x+x²+…+xⁿ+…) = x+x²+…+xⁿ+… So, yes, done.

Now comes the Big Trick, the rabbit out of the hat, so to speak. 🙂 We’re going to substitute the classical expression for 〈E〉 (i.e. kT) in Rayleigh’s Law for it’s quantum-mechanical equivalent (i.e. 〈E〉 = ħω/[e^ħω/kT–1].

What’s the logic behind? Rayleigh’s Law gave the intensity for the various frequencies that are present as a function of (a) the frequency (of course!) and (b) the average energy of the oscillators, which is kT according to classical theory. Now, our assumption that an oscillator cannot take on just any energy value but that the energy levels are equally spaced, combined with Boltzmann’s Law, gives us a very different formula for the average energy: it’s a function of the temperature, but it’s a function of the fundamental frequency too! I copied the graph below from the Wikipedia article on the equipartition theorem. The black line is the classical value for the average energy as a function of the thermal energy. As you can see, it’s one and the same thing, really (look at the scales: they happen to be both logarithmic but that’s just to make them more ‘readable’). Its quantum-mechanical equivalent is the red curve. At higher temperatures, the two agree nearly perfectly, but at low temperatures (with low being defined as the range where kT << ħ·ω, written as h·ν in the graph), the quantum mechanical value decreases much more rapidly. [Note the energy is measured in units equivalent to h·ν: that’s a nice way to sort of ‘normalize’ things so as to compare them.]

So, without further ado, let’s take Rayleigh’s Law again and just substitute kT (i.e. the classical formula for the average energy) for the ‘quantum-mechanical’ formula for 〈E〉, i.e. ħω/[e^ħω/kT–1]. Adding the dω factor to emphasize we’re talking some continuous distribution here, we get the even grander result (Feynman calls it the first quantum-mechanical formula ever known or discussed):

So this function is the dashed I(ω) curve (I copied the graph below again): this curve does not ‘blow up’. The math behind the curve is the following: even for large ω, leading that ω³ factor in the numerator to ‘blow up’, we also have Euler’s number being raised to a tremendous power in the denominator. Therefore, the curves come down again, and so we don’t get those incredible amounts of UV light and x-rays.

So… That’s how Max Planck solved the problem and how he became the ‘reluctant father of quantum mechanics.’ The formula is not as simple as Rayleigh’s Law (we have a cubic function in the numerator, and an exponential in the denominator), but its advantage is that it’s correct. Indeed, when everything is said and done, indeed, we do want our formulas to describe something real, don’t we? 🙂

Let me conclude by looking at that ‘quantum-mechanical’ formula for the average energy once more:

〈E〉 = ħω/[e^ħω/kT–1]

It’s not a distribution function (the formula for I(ω) is the distribution function), but the –1 term in the denominator does tell us already we’re talking Bose-Einstein statistics. In my post on quantum statistics, I compared the three distribution functions. Let ‘s quickly look at them again:

Maxwell-Boltzmann (for classical particles): f(E) = 1/[A·e^E/kT]
Fermi-Dirac (for fermions): f(E) = 1/[Ae^E/kT+ 1]
Bose-Einstein (for bosons): f(E) = 1/[Ae^E/kT− 1]

So here we simply substitute ħω for E, which makes sense, as the Planck-Einstein relation tells us that the energy of the particles involved is, indeed, equal to E = ħω . Below, you’ll find the graph of these three functions, first as a function of E, so that’s f(E), and then as a function of T, so that’s f(T) (or f(kT) if you want).

The first graph, for which E is the variable, is the more usual one. As for the interpretation, you can see what’s going on: bosonic particles (or bosons, I should say) will crowd the lower energy levels (the associated probabilities are much higher indeed), while for fermions, it’s the opposite: they don’t want to crowd together and, hence, the associated probabilities are much lower. So fermions will spread themselves over the various energy levels. The distribution for ‘classical’ particles is somewhere in the middle.

In that post of mine, I gave an actual example involving nine particles and the various patterns that are possible, so you can have a look there. Here I just want to note that the math behind is easy to understand when dropping the A (that’s just another normalization constant anyway) and re-writing the formulas as follows:

Maxwell-Boltzmann (for classical particles): f(E) = e^−E/kT
Fermi-Dirac (for fermions): f(E) = e^−E/kT/[1+e^−E/kT]
Bose-Einstein (for bosons): f(E) = e^−E/kT/[1−e^−E/kT]

Just use Feynman’s substitution x = e^−ħω/kT: the Bose-Einstein distribution then becomes 1/[1/x–1] = 1/[(1–x)/x] = x/(1–x). Now it’s easy to see that the denominator of the formula of both the Fermi-Dirac as well as the Bose-Einstein distribution will approach 1 (i.e. the ‘denominator’ of the Maxwell-Boltzmann formula) if e^−E/kTapproaches zero, so that’s when E becomes larger and larger. Hence, for higher energy levels, the probability densities of the three functions approach each other indeed, as they should.

Now what’s the second graph about? Here we’re looking at one energy level only, but we let the temperature vary from 0 to infinity. The graph says that, at low temperature, the probabilities will also be more or less the same, and the three distributions only differ at higher temperatures. That makes sense too, of course!

Well… That says it all, I guess. I hope you enjoyed this post. As I’ve sort of concluded Volume I of Feynman’s Lectures with this, I’ll be silent for a while… […] Or so I think. 🙂

Strings in classical and quantum physics

Pre-scriptum (dated 26 June 2020): These posts on elementary math and physics have not suffered much from the attack by the dark force—which is good because I still like them. While my views on the true nature of light, matter and the force or forces that act on them have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. In fact, I find the simplest stuff is often the best. 🙂

Original post:

This post is not about string theory. The goal of this post is much more limited: it’s to give you a better understanding of why the metaphor of the string is so appealing. Let’s recapitulate the basics by see how it’s used in classical as well as in quantum physics.

In my posts on music and math, or music and physics, I described how a simple single string always vibrates in various modes at the same time: every tone is a mixture of an infinite number of elementary waves. These elementary waves, which are referred to as harmonics (or as (normal) modes, indeed) are perfectly sinusoidal, and their amplitude determines their relative contribution to the composite waveform. So we can always write the waveform F(t) as the following sum:

F(t) = a₁sin(ωt) + a₂sin(2ωt) + a₃sin(3ωt) + … + a_nsin(nωt) + …

[If this is your first reading of my post, and the formula shies you away, please try again. I am writing most of my posts with teenage kids in mind, and especially this one. So I will not use anything else than simple arithmetic in this post: no integrals, no complex numbers, no logarithms. Just a bit of geometry. That’s all. So, yes, you should go through the trouble of trying to understand this formula. The only thing that you may have some trouble with is ω, i.e. angular frequency: it’s the frequency expressed in radians per time unit, rather than oscillations per second, so ω = 2π·f = 2π/T, with f the frequency as you know it (i.e. oscillations per second) and T the period of the wave.]

I also noted that the wavelength of these component waves (λ) is determined by the length of the string (L), and by its length only: λ₁ = 2L, λ₂ = L, λ₃ = (2/3)·L. So these wavelengths do not depend on the material of the string, or its tension. At any point in time (so keeping t constant, rather than x, as we did in the equation above), the component waves look like this:

etcetera (1/8, 1/9,…,1/n,… 1/∞)

That the wavelengths of the harmonics of any actual string only depend on its length is an amazing result in light of the complexities behind: a simple wound guitar string, for example, is not simple at all (just click the link here for a quick introduction to guitar string construction). Simple piano wire isn’t simple either: it’s made of high-carbon steel, i.e. a very complex metallic alloy. In fact, you should never think any material is simple: even the simplest molecular structures are very complicated things. Hence, it’s quite amazing all these systems are actually linear systems and that, despite the underlying complexity, those wavelength ratios form a simple harmonic series, i.e. a simple reciprocal function y = 1/x, as illustrated below.

A simple harmonic series? Hmm… I can’t resist noting that the harmonic series is, in fact, a mathematical beast. While its terms approach zero as x (or n) increases, the series itself is divergent. So it’s not like 1+1/2+1/4+1/8+…+1/2ⁿ+…, which adds up to 2. Divergent series don’t add up to any specific number. Even Leonhard Euler – the most famous mathematician of all times, perhaps – struggled with this. In fact, as late as in 1826, another famous mathematician, Niels Henrik Abel (in light of the fact he died at age 26 (!), his legacy is truly amazing), exclaimed that a series like this was “an invention of the devil”, and that it should not be used in any mathematical proof. But then God intervened through Abel’s contemporary Augustin-Louis Cauchy 🙂 who finally cracked the nut by rigorously defining the mathematical concept of both convergent as well as divergent series, and equally rigorously determining their possibilities and limits in mathematical proofs. In fact, while medieval mathematicians had already grasped the essentials of modern calculus and, hence, had already given some kind of solution to Zeno’s paradox of motion, Cauchy’s work is the full and final solution to it. But I am getting distracted, so let me get back to the main story.

More remarkable than the wavelength series itself, is its implication for the respective energy levels of all these modes. The material of the string, its diameter, its tension, etc will determine the speed with which the wave travels up and down the string. [Yes, that’s what it does: you may think the string oscillates up and down, and it does, but the waveform itself travels along the string. In fact, as I explained in my previous post, we’ve got two waves traveling simultaneously: one going one way and the other going the other.] For a specific string, that speed (i.e. the wave velocity) is some constant, which we’ll denote by c. Now, c is, obviously, the product of the wavelength (i.e. the distance that the wave travels during one oscillation) and its frequency (i.e. the number of oscillations per time unit), so c = λ·f. Hence, f = c/λ and, therefore, f₁ = (1/2)·c/L, f₂ = (2/2)·c/L, f₃ = (3/2)·c/L, etcetera. More in general, we write f_n = (n/2)·c/L. In short, the frequencies are equally spaced. To be precise, they are all (1/2)·c/L apart.

Now, the energy of a wave is directly proportional to its frequency, always, in classical as well as in quantum mechanics. For example, for photons, we have the Planck-Einstein relation: E = h·f = ħ·ω. So that relation states that the energy is proportional to the (light) frequency of the photon, with h (i.e. he Planck constant) as the constant of proportionality. [Note that ħ is not some different constant. It’s just the ‘angular equivalent’ of h, so we have to use ħ = h/2π when frequencies are expressed in angular frequency, i.e. radians per second rather than hertz.] Because of that proportionality, the energy levels of our simple string are also equally spaced and, hence, inserting another proportionality constant, which I’ll denote by a instead of h (because it’s some other constant, obviously), we can write:

E_n = a·f_n = (n/2)·a·c/L

Now, if we denote the fundamental frequency f₁ = (1/2)·c/L, quite simply, by f (and, likewise, its angular frequency as ω), then we can re-write this as:

E_n = n·a·f = n·ā·ω (ā = a/2π)

This formula is exactly the same as the formula used in quantum mechanics when describing atoms as atomic oscillators, and why and how they radiate light (think of the blackbody radiation problem, for example), as illustrated below: E_n = n·ħ·ω = n·h·f. The only difference between the formulas is the proportionality constant: instead of a, we have Planck’s constant here: h, or ħ when the frequency is expressed as an angular frequency.

This grand result – that the energy levels associated with the various states or modes of a system are equally spaced – is referred to as the equipartition theorem in physics, and it is what connects classical and quantum physics in a very deep and fundamental way.

In fact, because they’re nothing but proportionality constants, the value of both a and h depends on our units. If w’d use the so-called natural units, i.e. equating ħ to 1, the energy formula becomes E_n = n·ω, and, hence, our unit of energy and our unit of frequency become one and the same. In fact, we can, of course, also re-define our time unit such that the fundamental frequency ω is one, i.e. one oscillation per (re-defined) time unit, so then we have the following remarkable formula:

E_n = n

Just think about it for a moment: what I am writing here is E₀ = 0, E₁ = 1, E₂ = 2, E₃ = 3, E₄ = 4, etcetera. Isn’t that amazing? I am describing the structure of a system here – be it an atom emitting or absorbing photons, or a macro-thing like a guitar string – in terms of its basic components (i.e. its modes), and it’s as simple as counting: 0, 1, 2, 3, 4, etc.

You may think I am not describing anything real here, but I am. We cannot do whatever we wanna do: some stuff is grounded in reality, and in reality only—not in the math. Indeed, the fundamental frequency of our guitar string – which we used as our energy unit – is a property of the string, so that’s real: it’s not just some mathematical shape out: it depends on the string’s length (which determines its wavelength), and it also depends on the propagation speed of the wave, which depends on other basic properties of the string, such as its material, its diameter, and its tension. Likewise, the fundamental frequency of our atomic oscillator is a property of the atomic oscillator or, to use a much grander term, a property of the Universe. That’s why h is a fundamental physical constant. So it’s not like π or e. [When reading physics as a freshman, it’s always useful to clearly distinguish physical constants (like Avogadro’s number, for example) from mathematical constants (like Euler’s number).]

The theme that emerges here is what I’ve been saying a couple of times already: it’s all about structure, and the structure is amazingly simple. It’s really that equipartition theorem only: all you need to know is that the energy levels of the modes of a system – any system really: an atom, a molecular system, a string, or the Universe itself – are equally spaced, and that the space between the various energy levels depends on the fundamental frequency of the system. Moreover, if we use natural units, and also re-define our time unit so the fundamental frequency is equal to 1 (so the frequencies of the other modes are 2, 3, 4 etc), then the energy levels are just 0, 1, 2, 3, 4 etc. So, yes, God kept things extremely simple. 🙂

In order to not cause too much confusion, I should add that you should read what I am writing very carefully: I am talking the modes of a system. The system itself can have any energy level, of course, so there is no discreteness at the level of the system. I am not saying that we don’t have a continuum there. We do. What I am saying is that its energy level can always be written as a (potentially infinite) sum of the energies of its components, i.e. its fundamental modes, and those energy levels are discrete. In quantum-mechanical systems, their spacing is h·f, so that’s the product of Planck’s constant and the fundamental frequency. For our guitar, the spacing is a·f (or, using angular frequency, ā·ω: it’s the same amount). But that’s it really. That’s the structure of the Universe. 🙂

Let me conclude by saying something more about a. What information does it capture? Well… All of the specificities of the string (like its material or its tension) determine the fundamental frequency f and, hence, the energy levels of the basic modes of our string. So a has nothing to do with the particularities of our string, of our system in general. However, we can, of course, pluck our string very softly or, conversely, give it a big jolt. So our a coefficient is not related to the string as such, but to the total energy of our string. In other words, a is related to those amplitudes a₁, a₂, etc in our F(t) = a₁sin(ωt) + a₂sin(2ωt) + a₃sin(3ωt) + … + a_nsin(nωt) + … wave equation.

How exactly? Well… Based on the fact that the total energy of our wave is equal to the sum of the energies of all of its components, I could give you some formula. However, that formula does use an integral. It’s an easy integral: energy is proportional to the square of the amplitude, and so we’re integrating the square of the wave function over the length of the string. But then I said I would not have any integral in this post, and so I’ll stick to that. In any case, even without the formula, you know enough now. For example, one of the things you should be able to reflect on is the relation between a and h. It’s got to do with structure, of course. 🙂 But I’ll let you think about that yourself.

[…] Let me help you. Think of the meaning of Planck’s constant h. Let’s suppose we’d have some elementary ‘wavicle’, like that elementary ‘string’ that string theorists are trying to define: the smallest ‘thing’ possible. It would have some energy, i.e. some frequency. Perhaps it’s just one full oscillation. Just enough to define some wavelength and, hence, some frequency indeed. Then that thing would define the smallest time unit that makes sense: it would the time corresponding to one oscillation. In turn, because of the E = h·f relation, it would define the smallest energy unit that makes sense. So, yes, h is the quantum (or fundamental unit) of energy. It’s very small indeed (h = 6.626070040(81)×10⁻³⁴ J·s, so the first significant digit appears only after 33 zeroes behind the decimal point) but that’s because we’re living at the macro-scale and, hence, we’re measuring stuff in huge units: the joule (J) for energy, and the second (s) for time. In natural units, h would be one. [To be precise, physicist prefer to equate ħ, rather than h, to one when talking natural units. That’s because angular frequency is more ‘natural’ as well when discussing oscillations.]

What’s the conclusion? Well… Our a will be some integer multiple of h. Some incredibly large multiple, of course, but a multiple nevertheless. 🙂

Post scriptum: I didn’t say anything about strings in this post or, let me qualify, about those elementary ‘strings’ that string theorists try to define. Do they exist? Feynman was quite skeptical about it. He was happy with the so-called Standard Model of phyics, and he would have been very happy to know that the existence Higgs field has been confirmed experimentally (that discovery is what prompted my blog!), because that confirms the Standard Model. The Standard Model distinguishes two types of wavicles: fermions and bosons. Fermions are matter particles, such as quarks and electrons. Bosons are force carriers, like photons and gluons. I don’t know anything about string theory, but my guts instinct tells me there must be more than just one mathematical description of reality. It’s the principle of duality: concepts, theorems or mathematical structures can be translated into other concepts, theorems or structures. But… Well… We’re not talking equivalent descriptions here: string theory is a different theory, it seems. For a brief but totally incomprehensible overview (for novices at least), click on the following link, provided by the C.N. Yang Institute for Theoretical Physics. If anything, it shows I’ve got a lot more to study as I am inching forward on the difficult Road to Reality. 🙂

Modes in classical and in quantum physics

Original post:

Basics

Waves are peculiar: there is one single waveform, i.e. one motion only, but that motion can always be analyzed as the sum of the motions of all the different wave modes, combined with the appropriate amplitudes and phases. Saying the same thing using different words: we can always analyze the wave function as the sum of a (possibly infinite) number of components, i.e. a so-called Fourier series:

The f(t) function can be any wave, but the simple examples in physics textbooks usually involve a string or, in two dimensions, some vibrating membrane, and I’ll stick to those examples too in this post. Feynman calls the Fourier components harmonic functions, or harmonics tout court, but the term ‘harmonic’ refers to so many different things in math that it may be better not to use it in this context. The component waves are sinusoidal functions, so sinusoidals might be a better term but it’s not in use, because a more general analysis will use complex exponentials, rather than sines and/or cosines. Complex exponentials (e.g. 10^ix) are periodic functions too, so they are totally unlike real exponential functions (e.g. (e.g. 10^x). Hence, Feynman also uses the term ‘exponentials’. At some point, he also writes that the pattern of motion (of a mode) varies ‘exponentially’ but, of course, he’s thinking of complex exponentials, and, therefore, we should substitute ‘exponentially’ for ‘sinusoidally’ when talking real-valued wave functions.

[…] I know. I am already getting into the weeds here. As I am a bit off-track anyway now, let me make another remark here. You may think that we have two types of sinusoidals, or two types of functions, in that Fourier decomposition: sines and cosines. You should not think of it that way: the sine and cosine function are essentially the same. I know your old math teacher in high school never told you that, but it’s true. They both come with the same circle (yes, I know that’s ridiculous statement but I don’t know how to phrase it otherwise): the difference between a sine and a cosines is just a phase shift: cos(ωt) = sin(ωt + π/2) and, conversely, sin(ωt) = cos(ωt − π/2). If the starting phases of all of the component waves would be the same, we’d have a Fourier decomposition involving cosines only, or sines only—whatever you prefer. Indeed, because they’re the same function except for that phase shift (π/2), we can always go from one to the other by shifting our origin of space (x) and/or time (t). However, we cannot assume that all of the component waves have the same starting phase and, therefore, we should write each component as cos(n·ωt + Φ_n), or a sine with a similar argument. Now, you’ll remember – because your math teacher in high school told you that at least 🙂 – that there’s a formula for the cosine (and sine) of the sum of two angles: we can write cos(n·ωt + Φ_n) as cos(n·ωt + Φ_n) = [cos(Φ_n)·cos(n·ωt) – sin(Φ_n)·sin(n·ωt)]. Substituting cos(Φ_n) and – sin(Φ_n) for a_n and b_n respectively gives us the a_n·cos(n·ωt) + b_n·sin(n·ωt) expressions above. In addition, the component waves may not only differ in phase, but also in amplitude, and, hence, the a_n and b_n coefficients do more than only capturing the phase differences. But let me get back on the track. 🙂

Those sinusoidals have a weird existence: they are not there, physically—or so it seems. Indeed, there is one waveform only, i.e. one motion only—and, if it’s any real wave, it’s most likely to be non-sinusoidal. At the same time, I noted, in my previous post, that, if you pluck a string or play a chord on your guitar, some string you did not pluck may still pick up one or more of its harmonics (i.e. one or more of its overtones) and, hence, start to vibrate too! It’s the resonance phenomenon. If you have a grand piano, it’s even more obvious: if you’d press the C4 key on a piano, a small hammer will strike the C4 string and it will vibrate—but the C5 string (one octave higher) will also vibrate, although nothing touched it—except for the air transmitting the sound wave (including the harmonics causing the resonance) from the C4 string, of course! So the component waves are there and, at the same time, they’re not. Whatever they are, they are more than mathematical forms: the so-called superposition principle (on which the Fourier analysis is based) is grounded in reality: it’s because we can add forces. I know that sounds extremely obvious – or ridiculous, you might say 🙂 – but it is actually not so obvious. […] I am tempted to write something about conservative forces here but… Well… I need to move on.

Let me show that diagram of the first seven harmonics of an ideal string once again. All of them, and the higher ones too, would be in our wave function. Hence, assuming there’s no phase difference between the harmonics, we’d write:

f(t) = sin(ωt) + sin(2ωt) + sin(3ωt) + … + sin(nωt) + …

The frequencies of the various modes of our ideal string are all simple multiples of the fundamental frequency ω, as evidenced from the argument in our sine functions (ω, 2ω, 3ω, etcetera). Conversely, the respective wavelengths are λ, λ/2, λ/3, etcetera. [Remember: the speed of the wave is fixed, and frequency and wavelength are inversely proportional: c = λ·f = λ/T = λ·(ω/2π).] So, yes, these frequencies and wavelengths can all be related to each other in terms of equally simple harmonic ratios: 1:2, 2:3, 3:5, 4:5 etcetera. I explained in my previous posts why that does not imply that the musical notes themselves are related in such way: the musical scale is logarithmic. So I won’t repeat myself. All of the above is just an introduction to the more serious stuff, which I’ll talk about now.

Modes in two dimensions

An analysis of waves in two dimensions is often done assuming some drum membrane. The Great Teacher played drums, as you can see from his picture in his Lectures, and there are also videos of him performing on YouTube. So that’s why the drum is used almost all textbooks now. 🙂

The illustration of one of the normal modes of a circular membrane comes from the Wikipedia article on modes. There are many other normal modes – some of them with a simpler shape, but some of them more complicated too – but this is a nice one as it also illustrates the concept of a nodal line, which is closely related to the concept of a mode. Huh? Yes. The modes of a one-dimensional string have nodes, i.e. points where the displacement is always zero. Indeed, as you can see from the illustration above (not below), the first overtone has one node, the second two, etcetera. So the equivalent of a node in two dimensions is a nodal line: for the mode shown below, we have one bisecting the disc and then another one—a circle about halfway between the edge and center. The third nodal line is the edge itself, obviously. [The author of the Wikipedia article nodes that the animation isn’t perfect, because the nodal line and the nodal circle halfway the edge and the center both move a little bit. In any case, it’s pretty good, I think. I should also learn how to make animations like that. :-)]

What’s a mode?

How do we find these modes? And how are they defined really? To explain that, I have to briefly return to the one-dimensional example. The key to solving the problem (i.e. finding the modes, and defining their characteristics) is the following fact: when a wave reaches the clamped end of a string, it will be reflected with a change in sign, as illustrated below: we’ve got that F(x+ct) wave coming in, and then it goes back indeed, but with the sign reversed.

It’s a complicated illustration because it also shows some hypothetical wave coming from the other side, where there is no string to vibrate. That hypothetical wave is the same wave, but travelling in the other direction and with the sign reversed (–F). So what’s that all about? Well… I never gave any general solution for a waveform traveling up and down a string: I just said the waveform was traveling up and down the string (now that is obvious: just look at that diagram with the seven first harmonics once again, and think about how that oscillation goes up and down with time), but so I did not really give any general solution for them (the sine and cosine functions are specific solutions). So what is the general solution?

Let’s first assume the string is not held anywhere, so that we have an infinite string along which waves can travel in either direction. In fact, the most general functional form to capture the fact that a waveform can travel in any direction is to write the displacement y as the sum of two functions: one wave traveling one way (which we’ll denote by F), and the other wave (which we’ll denote by G) traveling the other way. From the illustration above, it’s obvious that the F wave is traveling towards the negative x-direction and, hence, its argument will be x + ct. Conversely, the G wave travels in the positive x-direction, so its argument is x – ct. So we write:

y = F(x + ct) + G(x – ct)

[I’ve explained this thing about directions and why the argument in a wavefunction (x ± ct) is what it is before. You should look it up in case you don’t understand. As for the c in this equation, that’s the wave velocity once more, which is constant and which depends, as always, on the medium, so that’s the material and the diameter and the tension and whatever of the string.]

So… We know that the string is actually not infinite, but that it’s fixed to some ‘infinitely solid wall’ (as Feynman puts it). Hence, y is equal to zero there: y = 0. Now let’s choose the origin of our x-axis at the fixed end so as to simplify the analysis. Hence, where y is zero, x is also zero. Now, at x = 0, our general solution above for the infinite string becomes y = F(ct) + G(−ct) = 0, for all values of t. Of course, that means G(−ct) must be equal to –F(ct). Now, that equality is there for all values of t. So it’s there for all values of ct and −ct. In short, that equality is valid for whatever value of the argument of G and –F. As Feynman puts it: “G of anything must be –F of minus that same thing.” Now, the ‘anything’ in G is its argument: x – ct, so ‘minus that same thing’ is –(x – ct) = −x + ct. Therefore, our equation becomes:

y = F(x + ct) − F(−x + ct)

So that’s what’s depicted in the diagram above: the F(x + ct) wave ‘vanishes’ behind the wall as the − F(−x + ct) wave comes out of it. Conversely, the − F(−x + ct) is hypothetical indeed until it reaches the origin, after which it becomes the real wave. Their sum is only relevant near the origin x = 0, and on the positive side only (on the negative side of the x-axis, the F and G functions are both hypothetical). [I know, it’s not easy to follow, but textbooks are really short on this—which is why I am writing my blog: I want to help you ‘get’ it.]

Now, the results above are valid for any wave, periodic or not. Let’s now confine the analysis to periodic waves only. In fact, we’ll limit the analysis to sinusoidal wavefunctions only. So that should be easy. Yes. Too easy. I agree. 🙂

So let’s make things difficult again by introducing the complex exponential notation, so that’s Euler’s formula: e^iθ = cosθ + isinθ, with i the imaginary unit, and isinθ the imaginary component of our wave. So the only thing that is real, is cosθ.

What the heck? Just bear with me. It’s good to make the analysis somewhat more general, especially because we’ll be talking about the relevance of all of this to quantum physics, and in quantum physics the waves are complex-valued indeed! So let’s get on with it. To use Euler’s formula, we need to substitute x + ct for the phase of the wave, so that involves the angular frequency and the wavenumber. Let me just write it down:

F(x + ct) = e^iω(t+x/c) and F(−x + ct) = e^iω(t−x/c)

Huh? Yeah. Sorry. I’ll resist the temptation to go off-track here, because I really shouldn’t be copying what I wrote in other posts. Most of what I write above is really those simple relations: c = λ·f = ω/k, with k, i.e. the wavenumber, being defined as k = 2π/λ. For details, go to one of my others posts indeed, in which I explain how that works in very much detail: just click on the link here, and scroll down to the section on the phase of a wave, in which I explain why the phase of wave is equal to θ = ωt–kx = ω(t–x/c). And, yes, I know: the thing with the wave directions and the signs is quite tricky. Just remember: for a wave traveling in the positive x-direction, the signs in front of x and t are each other’s opposite but, if the wave’s traveling in the negative y-direction, they are the same. As mentioned, all the rest is usually a matter of shifting the phase, which amounts to shifting the origin of either the x- or the t-axis. I need to move on. Using the exponential notation for our sinusoidal wave, y = F(x + ct) − F(−x + ct) becomes:

y = e^iω(t+x/c) − e^iω(t−x/c)

I can hear you sigh again: Now what’s that for? What can we do with this? Just continue to bear with me for a while longer. Let’s factor the e^iωt term out. [Why? Patience, please!] So we write:

y = e^iωt[e^iωx/c) − e^−iωx/c)]

Now, you can just use Euler’s formula again to double-check that e^iθ − e^−θ = 2isinθ. [To get that result, you should remember that cos(−θ) = cosθ, but sin(−θ) = −sin(θ).] So we get:

y = e^iωt[e^iωx/c) − e^−iωx/c)] = 2ie^iωtsin(ωx/c)

Now, we’re only interested in the real component of this amplitude of course – but that’s only we’re in the classical world here, not in the real world, which is quantum-mechanical and, hence, involves the imaginary stuff also 🙂 – so we should write this out using Euler’s formula again to convert the exponential to sinusoidals again. Hence, remembering that i² = −1, we get:

y = 2ie^iωtsin(ωx/c) = 2icos(ωt)·sin(ωx/c) – 2sin(ωt)·sin(ωx/c)

!?!

OK. You need a break. So let me pause here for a while. What the hell are we doing? Is this legit? I mean… We’re talking some real wave, here, don’t we? We do. So is this conversion from/to real amplitudes to/from complex amplitudes legit? It is. And, in this case (i.e. in classical physics), it’s true that we’re interested in the real component of y only. But then it’s nice the analysis is valid for complex amplitudes as well, because we’ll be talking complex amplitudes in quantum physics.

[…] OK. I acknowledge it all looks very tricky so let’s see what we’d get using our old-fashioned sine and/or cosine function. So let’s write F(x + ct) as cos(ωt+ωx/c) and F(−x + ct) as cos(ωt−ωx/c). So we write y = cos(ωt+ωx/c) − cos(ωt−ωx/c). Now work on this using the cos(α+β) = cosα·cosβ − sinα·sinβ formula and the cos(−α) = cosα and sin(−α) = −sinα identities. You (should) get: y = −2sin(ωt)·sin(ωx/c). So that’s the real component in our y function above indeed. So, yes, we do get the same results when doing this funny business using complex exponentials as we’d get when sticking to real stuff only! Fortunately! 🙂

[Why did I get off-track again? Well… It’s true these conversions from real to complex amplitudes should not be done carelessly. It is tricky and non-intuitive, to say the least. The weird thing about it is that, if we multiply two imaginary components, we get a real component, because i² is a real number: it’s −1! So it’s fascinating indeed: we add an imaginary component to our real-valued function, do all kinds of manipulations with – including stuff that involves the use of the i² = −1 – and, when done, we just take out the real component and it’s alright: we know that the result is OK because of the ‘magic’ of complex numbers! In any case, I need to move on so I can’t dwell on this. I also explained much of the ‘magic’ in other posts already, so I shouldn’t repeat myself. If you’re interested, click on this link, for instance.]

Let’s go back to our y = – 2sin(ωt)·sin(ωx/c) function. So that’s the oscillation. Just look at the equation and think about what it tells us. Suppose we fix x, so we’re looking at one point on the string only and only let t vary: then sin(ωx/c) is some constant and it’s our sin(ωt) factor that goes up and down. So our oscillation has frequency ω, at every point x, so that’s everywhere!

Of course, this result shouldn’t surprise us, should it? That’s what we put in when we wrote F as F(x + ct) = e^iω(t+x/c) or as cos(ωt+ωx/c), isn’t it? Well… Yes and no. Yes, because you’re right: we put in that angular frequency. But then, no, because we’re talking a composite wave here: a wave traveling up and down, with the components traveling in opposite directions. Indeed, we’ve also got that G(x) = −F(–x) function here. So, no, it’s not quite the same.

Let’s fix t now, and take a snapshot of the whole wave, so now we look at x as the variable and sin(ωt) is some constant. What we see is a sine wave, and sin(ωt) is its maximum amplitude. Again, you’ll say: of course! Well… Yes. The thing is: the point where the amplitude of our oscillation is equal to zero, is always the same, regardless of t. So we have fixed nodes indeed. Where are they? The nodes are, obviously, the points where sin(ωx/c) = 0, so that’s when ωx/c is equal to 0, obviously, or – more importantly – whenever ωx/c is equal to π, 2π, 3π, 4π, etcetera. More, generally, we can say whenever ωx/c = n·π with n = 0, 1, 2,… etc. Now, that’s the same as writing x = n·π·c/ω = n·π/k = n·π·λ/2π = n·λ/2.

Now let’s remind ourselves of what λ really is: for the fundamental frequency it’s twice the length of the string, so λ = 2·L. For the next mode (i.e. the second harmonic), it’s the length itself: λ = L. For the third, it’s λ = (2/3)·L, etcetera. So, in general, it’s λ = (2/m)·L with m = 1, 2, etcetera. [We may or may not want to include a zero mode by allowing m to equal zero as well, so then there’s no oscillation and y = 0 everywhere. 🙂 But that’s a minor point.] In short, our grand result is:

x = n·λ/2 = n·(2/m)·L/2 = (n/m)·L

Of course, we have to exclude the x points lying outside of our string by imposing that n/m ≤ 1, i.e. the condition that n ≤ m. So for m = 1, n is 0 or 1, so the nodes are, effectively, both ends of the string. For m = 2, n can be 0, 1 and 2, so the nodes are the ends of the string and it’s middle point L/2. And so on and so on.

I know that, by now, you’ve given up. So no one is reading anymore and so I am basically talking to myself now. What’s the point? Well… I wanted to get here in order to define the concept of a mode: a mode is a pattern of motion, which has the property that, at any point, the object moves perfectly sinusoidally, and that all points move at the same frequency (though some will move more than others). Modes also have nodes, i.e. points that don’t move at all, and above I showed how we can find the nodes of the modes of a one-dimensional string.

Also note how remarkable that result actually is: we didn’t specify anything about that string, so we don’t care about its material or diameter or tension or whatever. Still, we know its fundamental (or normal modes), and we know their nodes: they’re a function of the length of the string, and the number of the mode only: x = (n/m)·L. While an oscillating string may seem to be the most simple thing on earth, it isn’t: think of all the forces between the molecules, for instance, as that string is vibrating. Still, we’ve got this remarkably simple formula. Don’t you find that amazing?

[…] OK… If you’re still reading, I know you want me to move on, so I’ll just do that.

Back to two dimensions

The modes are all that matters: when linear forces (i.e. linear systems) are involved, any motion can be analyzed as the sum of the motions of all the different modes, combined with appropriate amplitudes and phases. Let me reproduce the Fourier series once more (the more you see, the better you’ll understand it—I should hope!): Of course, we should generalize this also include x as a variable which, again, is easier if we’d use complex exponentials instead of the sinusoidal components. The nice illustration on Fourier analysis from Wikipedia shows how it works, in essence, that is. The red function below consists of six of those modes.

OK. Enough of this. Let’s go to the two-dimensional case now. To simplify the analysis, Feynman invented a rectangular drum. A rectangular drum is probably more difficult to play, but it’s easier to analyze—as compared to a circular drum, that is! 🙂

In two dimensions, our sinusoidal one-dimensional e^i(ωt−kx)waveform becomes e^{i(ωt−k_xx−k_yy)}. So we have a wavenumber for the x and y directions, and the sign in front is determined by the direction of the wave, so we need to check whether it moves in the positive or negative direction of the x- and y-axis respectively. Now, we can rewrite e^{i(ωt+k_xx+k_yy)} as e^iωt·e^{i(ωt+k_xx+k_yy)}, of course, which is what you see in the diagram above, except that the wave is moving in the negative y direction and, hence, we’ve got + sign in front of our k_yy term. All the rest is rather well explained in Feynman, so I’ll refer you to the textbook here.

We basically need to ensure that we have a nodal line at x = 0 and at x = a, and then we do the same for y = 0 and y = a. Then we apply exactly the same logic as for the one-dimensional string: the wave needs to be coherently reflected. The analysis is somewhat more complicated because it involves some angle of incidence now, i.e. the θ in the diagram above, so that’s another page in Feynman’s textbook. And then we have the same gymnastics for finding wavelengths in terms of the dimensions a and b, as well as in terms of n and m, where n is the number of the mode involved when fixing the nodal lines at x = 0 and x = a, and m is the number of the mode involved when fixing the nodal lines at y = 0 and y = b. Sounds difficult? Well… Yes. But I won’t copy Feynman here. Just go and check for yourself.

The grand result is that we do get some formula for a wavelength λ of what satisfies the definition of a mode: a perfectly sinusoidal motion, that has all points on the drum move at the same frequency, though some move more than others. Also, as evidenced from my illustration for the circular disk: we’ve got nodal lines, and then I mean other nodal lines, different from the edges! I’ll just give you that formula here (again, for the detail, go and check Feynman yourself):

Feynman also works out an example for a = 2b. I’ll just copy the results hereunder, which is a formula for the (angular) frequencies ω, and a table of the mode shapes in a qualitative way (I’ll leave it to you to google animations that match the illustration).

Again, we should note the amazing simplicity of the result: we don’t care about the type of membrane or whatever other material the drum is made of. It’s proportions are all that matters.

Finally, you should also note the last two columns in the table above: these just show to illustrate that, unlike our modes in the one-dimensional case, the natural frequencies here are not multiples of the fundamental frequency. As Feynman notes, we should not be led astray by the example of the one-dimensional ideal string. It’s again a departure from the Pythagorean idea, that all in Nature respects harmonic ratios. It’s just not true. Let me quote Feynman, as I have no better summary: “The idea that the natural frequencies are harmonically related is not generally true. It is not true for a system with more than one dimension, nor is it true for one-dimensional systems which are more complicated than a string with uniform density and tension.“

So… That says it all, I’d guess. Maybe I should just quote his example of a one-dimensional system that does not obey Pythagoras’ prescription: a hanging chain which, because of the weight of the chain, has higher tension at the top than at the bottom. If such chain is set in oscillation, there are various modes and frequencies, but the frequencies will not be simply multiples of each other, nor of any other number. It is also interesting to note that the mode shapes will also not be sinusoidal. However, here we’re getting into non-linear dynamics, and so I’ll you read about that elsewhere too: once again, Feynman’s analysis of non-linear systems is very accessible and an interesting read. Hence, I warmly recommend it.

Modes in three dimensions and in quantum mechanics.

Well… Unlike what you might expect, I won’t bury you under formulas this time. Let me refer you, instead, to Wikipedia’s article on the so-called Leidenfrost effect. Just do it. Don’t bother too much about the text, scroll down a bit, and play the video that comes with it. I saw it, sort of by accident, and, at first, I thought it was something very high-tech. But no: it’s just a drop of water skittering around in a hot pan. It takes on all kinds of weird forms and oscillates in the weirdest of ways, but all is nothing but an excitation of the various normal modes of it, with various amplitudes and phases, of course, as a Fourier analysis of the phenomenon dictates.

There’s plenty of other stuff around to satisfy your curiosity, all quite understandable and fun—because you now understand the basics of it for the one- and two-dimensional case.

So… Well… I’ve kept this section extremely short, because now I want to say a few words about quantum-mechanical systems. Well… In fact, I’ll simply quote Feynman on it, because he writes about in a style that’s unsurpassed. He also nicely sums up the previous conversation. Here we go:

The ideas discussed above are all aspects of what is probably the most general and wonderful principle of mathematical physics. If we have a linear system whose character is independent of the time, then the motion does not have to have any particular simplicity, and in fact may be exceedingly complex, but there are very special motions, usually a series of special motions, in which the whole pattern of motion varies exponentially with the time. For the vibrating systems that we are talking about now, the exponential is imaginary, and instead of saying “exponentially” we might prefer to say “sinusoidally” with time. However, one can be more general and say that the motions will vary exponentially with the time in very special modes, with very special shapes. The most general motion of the system can always be represented as a superposition of motions involving each of the different exponentials.

This is worth stating again for the case of sinusoidal motion: a linear system need not be moving in a purely sinusoidal motion, i.e., at a definite single frequency, but no matter how it does move, this motion can be represented as a superposition of pure sinusoidal motions. The frequency of each of these motions is a characteristic of the system, and the pattern or waveform of each motion is also a characteristic of the system. The general motion in any such system can be characterized by giving the strength and the phase of each of these modes, and adding them all together. Another way of saying this is that any linear vibrating system is equivalent to a set of independent harmonic oscillators, with the natural frequencies corresponding to the modes.

In quantum mechanics the vibrating object, or the thing that varies in space, is the amplitude of a probability function that gives the probability of finding an electron, or system of electrons, in a given configuration. This amplitude function can vary in space and time, and satisfies, in fact, a linear equation. But in quantum mechanics there is a transformation, in that what we call frequency of the probability amplitude is equal, in the classical idea, to energy. Therefore we can translate the principle stated above to this case by taking the word frequency and replacing it with energy. It becomes something like this: a quantum-mechanical system, for example an atom, need not have a definite energy, just as a simple mechanical system does not have to have a definite frequency; but no matter how the system behaves, its behavior can always be represented as a superposition of states of definite energy. The energy of each state is a characteristic of the atom, and so is the pattern of amplitude which determines the probability of finding particles in different places. The general motion can be described by giving the amplitude of each of these different energy states. This is the origin of energy levels in quantum mechanics. Since quantum mechanics is represented by waves, in the circumstance in which the electron does not have enough energy to ultimately escape from the proton, they are confined waves. Like the confined waves of a string, there are definite frequencies for the solution of the wave equation for quantum mechanics. The quantum-mechanical interpretation is that these are definite energies. Therefore a quantum-mechanical system, because it is represented by waves, can have definite states of fixed energy; examples are the energy levels of various atoms.

Isn’t that great? What a summary! It also shows a deeper understanding of classical physics makes it sooooo much better to read something about quantum mechanics. In any case, as for the examples, I should add – because that’s what you’ll often find when you google for quantum-mechanical modes – the vibrational modes of molecules. There’s tons of interesting analysis out there, and so I’ll let you now have fun with it yourself! 🙂

Music and Math

Pre-scriptum (dated 26 June 2020): These posts on elementary math and physics have not suffered much the attack by the dark force—which is good because I still like them. While my views on the true nature of light, matter and the force or forces that act on them have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. In fact, I find the simplest stuff is often the best. 🙂

Original post:

I ended my previous post, on Music and Physics, by emphatically making the point that music is all about structure, about mathematical relations. Let me summarize the basics:

1. The octave is the musical unit, defined as the interval between two pitches with the higher frequency being twice the frequency of the lower pitch. Let’s denote the lower and higher pitch by a and b respectively, so we say that b‘s frequency is twice that of a.

2. We then divide the [a, b] interval (whose length is unity) in twelve equal sub-intervals, which define eleven notes in-between a and b. The pitch of the notes in-between is defined by the exponential function connecting a and b. What exponential function? The exponential function with base 2, so that’s the function y = 2^x.

Why base 2? Because of the doubling of the frequencies when going from a to b, and when going from b to b + 1, and from b + 1 to b + 2, etcetera. In music, we give a, b, b + 1, b + 2, etcetera the same name, or symbol: A, for example. Or Do. Or C. Or Re. Whatever. If we have the unit and the number of sub-intervals, all the rest follows. We just add a number to distinguish the various As, or Cs, or Gs, so we write A1, A2, etcetera. Or C1, C2, etcetera. The graph below illustrates the principle for the interval between C4 and C5. Don’t think the function is linear. It’s exponential: note the logarithmic frequency scale. To make the point, I also inserted another illustration (credit for that graph goes to another blogger).

You’ll wonder: why twelve sub-intervals? Well… That’s random. Non-Western cultures use a different number. Eight instead of twelve, for example—which is more logical, at first sight at least: eight intervals amounts to dividing the interval in two equal halves, and the halves in halves again, and then once more: so the length of the sub-interval is then 1/2·1/2·1/2 = (1/2)³ = 1/8. But why wouldn’t we divide by three, so we have 9 = 3·3 sub-intervals? Or by 27 = 3·3·3? Or by 16? Or by 5?

The answer is: we don’t know. The limited sensitivity of our ear demands that the intervals be cut up somehow. [You can do tests of the sensitivity of your ear to relative frequency differences online: it’s fun. Just try them! Some of the sites may recommend a hearing aid, but don’t take that crap.] So… The bottom line is that, somehow, mankind settled on twelve sub-intervals within our musical unit—or our sound unit, I should say. So it is what it is, and the ratio of the frequencies between two successive (semi)tones (e.g. C and C#, or E and F, as E and F are also separated by one half-step only) is 2^1/12 = 1.059463… Hence, the pitch of each note is about 6% higher than the pitch of the previous note. OK. Next thing.

3. What’s the similarity between C1, C2, C3 etcetera? Or between A1, A2, A3 etcetera? The answer is: harmonics. The frequency of the first overtone of a string tuned at pitch A3 (i.e. 220 Hz) is equal to the fundamental frequency of a string tuned at pitch A4 (i.e. 440 Hz). Likewise, the frequency of the (pitch of the) C4 note above (which is the so-called middle C) is 261.626 Hz, while the frequency of the (pitch of the) next C note (C5) is twice that frequency: 523.251 Hz. [I should quickly clarify the terminology here: a tone consists of several harmonics, with frequencies f, 2·f, 3·f,… n·f,… The first harmonic is referred to as the fundamental, with frequency f. The second, third, etc harmonics are referred to as overtones, with frequency 2·f, 3·f, etc.]

To make a long story short: our ear is able to identify the individual harmonics in a tone, and if the frequency of the first harmonic of one tone (i.e. the fundamental) is the same frequency as the second harmonic of another, then we feel they are separated by one musical unit.

Isn’t that most remarkable? Why would it be that way?

My intuition tells me I should look at the energy of the components. The energy theorem tells us that the total energy in a wave is just the sum of the energies in all of the Fourier components. Surely, the fundamental must carry most of the energy, and then the first overtone, and then the second. Really? Is that so?

Well… I checked online to see if there’s anything on that, but my quick check reveals there’s nothing much out there in terms of research: if you’d google ‘energy levels of overtones’, you’ll get hundreds of links to research on the vibrational modes of molecules, but nothing that’s related to music theory. So… Well… Perhaps this is my first truly original post! 🙂 Let’s go for it. 🙂

The energy in a wave is proportional to the square of its amplitude, and we must integrate over one period (T) of the oscillation. The illustration below should help you to understand what’s going on. The fundamental mode of the wave is an oscillation with a wavelength (λ₁) that is twice the length of the string (L). For the second mode, the wavelength (λ₂) is just L. For the third mode, we find that λ₃ = (2/3)·L. More in general, the wavelength of the n^thmode is λ_n = (2/n)·L.

The illustration above shows that we’re talking sine waves here, differing in their frequency (or wavelength) only. [The speed of the wave (c), as it travels back and forth along the string, i constant, so frequency and wavelength are in that simple relationship: c = f·λ.] Simplifying and normalizing (i.e. choosing the ‘right’ units by multiplying scales with some proportionality constant), the energy of the first mode would be (proportional to):

What about the second and third modes? For the second mode, we have two oscillations per cycle, but we still need to integrate over the period of the first mode T = T₁, which is twice the period of the second mode: T₁ = 2·T₂. Hence, T₂ = (1/2)·T₁. Therefore, the argument of the sine wave (i.e. the x variable in the integral above) should go from 0 to 4π. However, we want to compare the energies of the various modes, so let’s substitute cleverly. We write:

The period of the third mode is equal to T₃ = (1/3)·T₁. Conversely, T₁ = 3·T₃. Hence, the argument of the sine wave should go from 0 to 6π. Again, we’ll substitute cleverly so as to make the energies comparable. We write:

Now that is interesting! For a so-called ideal string, whose motion is the sum of a sinusoidal oscillation at the fundamental frequency f, another at the second harmonic frequency 2·f, another at the third harmonic 3·f, etcetera, we find that the energies of the various modes are proportional to the values in the harmonic series 1, 1/2, 1/3, 1/4,… 1/n, etcetera. Again, Pythagoras’ conclusion was wrong (the ratio of frequencies of individual notes do not respect simple ratios), but his intuition was right: the harmonic series ∑n⁻¹(n = 1, 2,…,∞) is very relevant in describing natural phenomena. It gives us the respective energies of the various natural modes of a vibrating string! In the graph below, the values are represented as areas. It is all quite deep and mysterious really!

So now we know why we feel C4 and C5 have so much in common that we call them by the same name: C, or Do. It also helps us to understand why the E and A tones have so much in common: the third harmonic of the 110 Hz A2 string corresponds to the fundamental frequency of the E4 string: both are 330 Hz! Hence, E and A have ‘energy in common’, so to speak, but less ‘energy in common’ than two successive E notes, or two successive A notes, or two successive C notes (like C4 and C5).

[…] Well… Sort of… In fact, the analysis above is quite appealing but – I hate to say it – it’s wrong, as I explain in my post scriptum to this post. It’s like Pythagoras’ number theory of the Universe: the intuition behind is OK, but the conclusions aren’t quite right. 🙂

Ideality versus reality

We’ve been talking ideal strings. Actual tones coming out of actual strings have a quality, which is determined by the relative amounts of the various harmonics that are present in the tone, which is not some simple sum of sinusoidal functions. Actual tones have a waveform that may resemble something like the wavefunction I presented in my previous post, when discussing Fourier analysis. Let me insert that illustration once again (and let me also acknowledge its source once more: it’s Wikipedia). The red waveform is the sum of six sine functions, with harmonically related frequencies, but with different amplitudes. Hence, the energy levels of the various modes will not be proportional to the values in that harmonic series ∑n⁻¹, with n = 1, 2,…,∞.

Das wohltemperierte Klavier

Nothing in what I wrote above is related to questions of taste like: why do I seldomly select a classical music channel on my online radio station? Or why am I not into hip hop, even if my taste for music is quite similar to that of the common crowd (as evidenced from the fact that I like ‘Listeners’ Top’ hit lists)?

Not sure. It’s an unresolved topic, I guess—involving rhythm and other ‘structures’ I did not mention. Indeed, all of the above just tells us a nice story about the structure of the language of music: it’s a story about the tones, and how they are related to each other. That relation is, in essence, an exponential function with base 2. That’s all. Nothing more, nothing less. It’s remarkably simple and, at the same time, endlessly deep. 🙂 But so it is not a story about the structure of a musical piece itself, of a pop song of Ellie Goulding, for instance, or one of Bach’s preludes or fugues.

That brings me back to the original question I raised in my previous post. It’s a question which was triggered, long time ago, when I tried to read Douglas Hofstadter‘s Gödel, Escher and Bach, frustrated because my brother seemed to understand it, and I didn’t. So I put it down, and never ever looked at it again. So what is it really about that famous piece of Bach?

Frankly, I still amn’t sure. As I mentioned in my previous post, musicians were struggling to find a tuning system that would allow them to easily transpose musical compositions. Transposing music amounts to changing the so-called key of a musical piece, so that’s moving the whole piece up or down in pitch by some constant interval that is not equal to an octave. It’s a piece of cake now. In fact, increasing or decreasing the playback speed of a recording also amounts to transposing a piece: a increase or decrease of the playback speed by 6% will shift the pitch up or down by about one semitone. Why? Well… Go back to what I wrote above about that 12th root of 2. We’ve got the right tuning system now, and so everything is easy. Logarithms are great! 🙂

Back to Bach. Despite their admiration for the Greek ideas around aesthetics – and, most notably, their fascination with harmonic ratios! – (almost) all Renaissance musicians were struggling with the so-called Pythagorean tuning system, which was used until the 18th century and which was based on a correct observation (similar strings, under the same tension but differing in length, sound ‘pleasant’ when sounded together if – and only if – the ratio of the length of the strings is like 1:2, 2:3, 3:4, 3:5, 4:5, etcetera) but a wrong conclusion (the frequencies of musical tones should also obey the same harmonic ratios), and Bach’s so-called ‘good’ temperament tuning system was designed such that the piece could, indeed, be played in most keys without sounding… well… out of tune. 🙂

Having said that, the modern ‘equal temperament’ tuning system, which prescribes that tuning should be done such that the notes are in the above-described simple logarithmic relation to each other, had already been invented. So the true question is: why didn’t Bach embrace it? Why did he stick to ratios? Why did it take so long for the right system to be accepted?

I don’t know. If you google, you’ll find a zillion of possible explanations. As far as I can see, most are all rather mystic. More importantly, most of them do not mention many facts. My explanation is rather simple: while Bach was, obviously, a musical genius, he may not have understood what an exponential, or a logarithm, is all about. Indeed, a quick read of summary biographies reveals that Bach studied a wide range of topics, like Latin and Greek, and theology—of course! But math is not mentioned. He didn’t write about tuning and all that: all of his time went to writing musical masterpieces!

What the biographies do mention is that he always found other people’s tunings unsatisfactory, and that he tuned his harpsichords and clavichords himself. Now that is quite revealing, I’d say! In my view, Bach couldn’t care less about the ratios. He knew something was wrong with the Pythagorean system (or the variants as were then used, which are referred to as meantone temperament) and, as a musical genius, he probably ended up tuning by ear. [For those who’d wonder what I am talking about, let me quickly insert a Wikipedia graph illustrating the difference between the Pythagorean system (and two of these meantone variants) and the equal temperament tuning system in use today.]

So… What’s the point I am trying to make? Well… Frankly, I’d bet Bach’s own tuning was actually equal temperament, and so he should have named his masterpiece Das gleichtemperierte Klavier. Then we wouldn’t have all that ‘noise’ around it. 🙂

Post scriptum: Did you like the argument on the respective energy levels of the harmonics of an ideal string? Too bad. It’s wrong. I made a common mistake: when substituting variables in the integral, I ‘forgot’ to substitute the lower and upper bound of the interval over which I was integrating the function. The calculation below corrects the mistake, and so it does the required substitutions—for the first three modes at least. What’s going on here? Well… Nothing much… I just integrate over the length L taking a snapshot at t = 0 (as mentioned, we can always shift the origin of our independent variable, so here we do it for time and so it’s OK). Hence, the argument of our wave function sin(kx−ωt) reduces to kx, with k = 2π/λ, and λ= 2L, λ = L, λ= (2/3)·L for the first, second and third mode respectively. [As for solving the integral of the sine squared, you can google the formula, and please do check my substitutions. They should be OK, but… Well… We never know, do we? :-)]

[…] No… This doesn’t make all that much sense either. Those integrals yield the same energy for all three modes. Something must be wrong: shorter wavelengths (i.e. higher frequencies) are associated with higher energy levels. Full stop. So the ‘solution’ above can’t be right… […] You’re right. That’s where the time aspect comes into play. We were taking a snapshot, indeed, and the mean value of the sine squared function is 1/2 = 0.5, as should be clear from Pythagoras’ theorem: cos²x + sin²x = 1. So what I was doing is like integrating a constant function over the same-length interval. So… Well… Yes: no wonder I get the same value again and again.

[…]

We need to integrate over the same time interval. You could do that, as an exercise, but there’s a more direct approach to it: the energy of a wave is directly proportional to its frequency, so we write: E ∼ f. If the frequency doubles, triples, quadruples etcetera, then its energy doubles, triples, quadruples etcetera too. But – remember – we’re talking one string only here, with a fixed wave speed c = λ·f – so f = c/λ (read: the frequency is inversely proportional to the wavelength) – and, therefore (assuming the same (maximum) amplitude), we get that the energy level of each mode is inversely proportional to the wavelength, so we find that E ∼ 1/f.

Now, with direct or inverse proportionality relations, we can always invent some new unit that makes the relationship an identity, so let’s do that and turn it into an equation indeed. [And, yes, sorry… I apologize again to your old math teacher: he may not quite agree with the shortcut I am taking here, but he’ll justify the logic behind.] So… Remembering that λ₁ = 2L, λ₂ = L, λ₃ = (2/3)·L, etcetera, we can then write:

E₁ = (1/2)/L, E₂ = (2/2)/L, E₃ = (3/2)/L, E₄ = (4/2)/L, E₅ = (5/2)/L,…, E_n = (n/2)/L,…

That’s a really nice result, because… Well… In quantum theory, we have this so-called equipartition theorem, which says that the permitted energy levels of a harmonic oscillator are equally spaced, with the interval between them equal to h or ħ (if you use the angular frequency to describe a wave (so that’s ω = 2π·f), then Planck’s constant (h) becomes ħ = h/2π). So here we’ve got equipartition too, with the interval between the various energy levels equal to (1/2)/L.

You’ll say: So what? Frankly, if this doesn’t amaze you, stop reading—but if this doesn’t amaze you, you actually stopped reading a long time ago. 🙂 Look at what we’ve got here. We didn’t specify anything about that string, so we didn’t care about its materials or diameter or tension or how it was made (a wound guitar string is a terribly complicated thing!) or about whatever. Still, we know its fundamental (or normal) modes, and their frequency or nodes or energy or whatever depend on the length of the string only, with the ‘fundamental’ unit of energy being equal to the reciprocal length. Full stop. So all is just a matter of size and proportions. In other words, it’s all about structure. Absolute measurements don’t matter.

You may say: Bull****. What’s the conclusion? You still didn’t tell me anything about how the total energy of the wave is supposed to be distributed over its normal modes!

That’s true. I didn’t. Why? Well… I am not sure, really. I presented a lot of stuff here, but I did not present a clear and unambiguous answer as to how the total energy of a string is distributed over its modes. Not for actual strings, nor for ideal strings. Let me be honest: I don’t know. I really don’t. Having said that, my guts instinct that most of the energy – of, let’s say, a C4 note – should be in the primary mode (i.e. in the fundamental frequency) must be right: otherwise we would not call it a C4 note. So let’s try to make some assumptions. However, before doing so, let’s first briefly touch base with reality.

For actual strings (or actual musical sounds), I suspect the analysis can be quite complicated, as evidenced by the following illustration, which I took from one of the many interesting sites on this topic. Let me quote the author: “A flute is essentially a tube that is open at both ends. Air is blown across one end and sound comes out the other. The harmonics are all whole number multiples of the fundamental frequency (436 Hz, a slightly flat A₄ — a bit lower in frequency than is normally acceptable). Note how the second harmonic is nearly as intense as the fundamental. [My = blog writer’s 🙂 italics] This strong second harmonic is part of what makes a flute sound like a flute.”

Hmmm… What I see in the graph is a first harmonic that is actually more intense than its fundamental, so what’s that all about? So can we actually associate a specific frequency to that tone? Not sure. So we’re in trouble already.

If reality doesn’t match our thinking, what about ideality? Hmmm… What to say? As for ideal strings – or ideal flutes 🙂 – I’d venture to say that the most obvious distribution of energy over the various modes (or harmonics, when we’re talking sound) would is the Boltzmann distribution.

Huh? Yes. Have a look at one of my posts on statistical mechanics. It’s a weird thing: the distribution of molecular speeds in a gas, or the density of the air in the atmosphere, or whatever involving many particles and/or a great degree of complexity (so many, or such a degree of complexity, that only some kind of statistical approach to the problem works—all that involves Boltzmann’s Law, which basically says the distribution function will be a function of the energy levels involved: f = e^–energy. So… Well… Yes. It’s the logarithmic scale again. It seems to govern the Universe. 🙂

Huh? Yes. That’s why I think: the distribution of the total energy of the oscillation should be some Boltzmann function, so it should depend on the energy of the modes: most of the energy will be in the lower modes, and most of the most in the fundamental. […] Hmmm… It again begs the question: how much exactly?

Well… The Boltzmann distribution strongly resembles the ‘harmonic’ distribution shown above (1, 1/2, 1/3, 1/4 etc), but it’s not quite the same. The graph below shows how they are similar and dissimilar in shape. You can experiment yourself with coefficients and all that, but your conclusion will be the same. As they say in Asia: they are “same-same but different.” 🙂 […] It’s like the ‘good’ and ‘equal’ temperament used when tuning musical instruments: the ‘good’ temperament – which is based on harmonic ratios – is good, but not good enough. Only the ‘equal’ temperament obeys the logarithmic scale and, therefore, is perfect. So, as I mentioned already, while my assumption isn’t quite right (the distribution is not harmonic, in the Pythagorean sense), the intuition behind is OK. So it’s just like Pythagoras’ number theory of the Universe. Having said that, I’ll leave it to you to draw the correct the conclusions from it. 🙂

Music and Physics

Original post:

My first working title for this post was Music and Modes. Yes. Modes. Not moods. The relation between music and moods is an interesting research topic as well but so it’s not what I am going to write about. 🙂

It started with me thinking I should write something on modes indeed, because the concept of a mode of a wave, or any oscillator really, is quite central to physics, both in classical physics as well as in quantum physics (quantum-mechanical systems are analyzed as oscillators too!). But I wondered how to approach it, as it’s a rather boring topic if you look at the math only. But then I was flying back from Europe, to Asia, where I live and, as I am also playing a bit of guitar, I suddenly wanted to know why we like music. And then I thought that’s a question you may have asked yourself at some point of time too! And so then I thought I should write about modes as part of a more interesting story: a story about music—or, to be precise, a story about the physics behind music. So… Let’s go for it.

Philosophy versus physics

There is, of course, a very simple answer to the question of why we like music: we like music because it is music. If it would not be music, we would not like it. That’s a rather philosophical answer, and it probably satisfies most people. However, for someone studying physics, that answer can surely not be sufficient. What’s the physics behind? I reviewed Feynman’s Lecture on sound waves in the plane, combined it with some other stuff I googled when I arrived, and then I wrote this post, which gives you a much less philosophical answer. 🙂

The observation at the center of the discussion is deceptively simple: why is it that similar strings (i.e. strings made of the same material, with the same thickness, etc), under the same tension but differing in length, sound ‘pleasant’ when sounded together if – and only if – the ratio of the length of the strings is like 1:2, 2:3, 3:4, 3:5, 4:5, etc (i.e. like whatever other ratio of two small integers)?

You probably wonder: is that the question, really? It is. The question is deceptively simple indeed because, as you will see in a moment, the answer is quite complicated. So complicated, in fact, that the Pythagoreans didn’t have any answer. Nor did anyone else for that matter—until the 18th century or so, when musicians, physicists and mathematicians alike started to realize that a string (of a guitar, or a piano, or whatever instrument Pythagoras was thinking of at the time), or a column of air (in a pipe organ or a trumpet, for example), or whatever other thing that actually creates the musical tone, actually oscillates at numerous frequencies simultaneously.

The Pythagoreans did not suspect that a string, in itself, is a rather complicated thing – something which physicists refer to as a harmonic oscillator – and that its sound, therefore, is actually produced by many frequencies, instead of only one. The concept of a pure note, i.e. a tone that is free of harmonics (i.e. free of all other frequencies, except for the fundamental frequency) also didn’t exist at the time. And if it did, they would not have been able to produce a pure tone anyway: producing pure tones – or notes, as I’ll call them, somewhat inaccurately (I should say: a pure pitch) – is remarkably complicated, and they do not exist in Nature. If the Pythagoreans would have been able to produce pure tones, they would have observed that pure tones do not give any sensation of consonance or dissonance if their relative frequencies respect those simple ratios. Indeed, repeated experiments, in which such pure tones are being produced, have shown that human beings can’t really say whether it’s a musical sound or not: it’s just sound, and it’s neither pleasant (or consonant, we should say) or unpleasant (i.e. dissonant).

The Pythagorean observation is valid, however, for actual (i.e. non-pure) musical tones. In short, we need to distinguish between tones and notes (i.e. pure tones): they are two very different things, and the gist of the whole argument is that musical tones coming out of one (or more) string(s) under tension are full of harmonics and, as I’ll explain in a minute, that’s what explains the observed relation between the lengths of those strings and the phenomenon of consonance (i.e. sounding ‘pleasant’) or dissonance (i.e. sounding ‘unpleasant’).

Of course, it’s easy to say what I say above: we’re 2015 now, and so we have the benefit of hindsight. Back then – so that’s more than 2,500 years ago! – the simple but remarkable fact that the lengths of similar strings should respect some simple ratio if they are to sound ‘nice’ together, triggered a fascination with number theory (in fact, the Pythagoreans actually established the foundations of what is now known as number theory). Indeed, Pythagoras felt that similar relationships should also hold for other natural phenomena! To mention just one example, the Pythagoreans also believed that the orbits of the planets would also respect such simple numerical relationships, which is why they talked of the ‘music of the spheres’ (Musica Universalis).

We now know that the Pythagoreans were wrong. The proportions in the movements of the planets around the Sun do not respect simple ratios and, with the benefit of hindsight once again, it is regrettable that it took many courageous and brilliant people, such as Galileo Galilei and Copernicus, to convince the Church of that fact. 😦 Also, while Pythagoras’ observations in regard to the sounds coming out of whatever strings he was looking at were correct, his conclusions were wrong: the observation does not imply that the frequencies of musical notes should all be in some simple ratio one to another.

Let me repeat what I wrote above: the frequencies of musical notes are not in some simple relationship one to another. The frequency scale for all musical tones is logarithmic and, while that implies that we can, effectively, do some tricks with ratios based on the properties of the logarithmic scale (as I’ll explain in a moment), the so-called ‘Pythagorean’ tuning system, which is based on simple ratios, was plain wrong, even if it – or some variant of it (instead of the 3:2 ratio, musicians used the 5:4 ratio from about 1510 onwards) – was generally used until the 18th century! In short, Pythagoras was wrong indeed—in this regard at least: we can’t do much with those simple ratios.

Having said that, Pythagoras’ basic intuition was right, and that intuition is still very much what drives physics today: it’s the idea that Nature can be described, or explained (whatever that means), by quantitative relationships only. Let’s have a look at how it actually works for music.

Tones, noise and notes

Let’s first define and distinguish tones and notes. A musical tone is the opposite of noise, and the difference between the two is that musical tones are periodic waveforms, so they have a period T, as illustrated below. In contrast, noise is a non-periodic waveform. It’s as simple as that.

Now, from previous posts, you know we can write any period function as the sum of a potentially infinite number of simple harmonic functions, and that this sum is referred to as the Fourier series. I am just noting it here, so don’t worry about it as for now. I’ll come back to it later.

You also know we have seven musical notes: Do-Re-Mi-Fa-Sol-La-Si or, more common in the English-speaking world, A-B-C-D-E-F-G. And then it starts again with A (or Do). So we have two notes, separated by an interval which is referred to as an octave (from the Greek octo, i.e. eight), with six notes in-between, so that’s eight notes in total. However, you also know that there are notes in-between, except between E and F and between B and C. They are referred to as semitones or half-steps. I prefer the term ‘half-step’ over ‘semitone’, because we’re talking notes really, not tones.

We have, for example, F–sharp (denoted by F#), which we can also call G-flat (denoted by Gb). It’s the same thing: a sharp # raises a note by a semitone (aka half-step), and a flat b lowers it by the same amount, so F# is Gb. That’s what shown below: in an octave, we have eight notes but twelve half-steps.

Let’s now look at the frequencies. The frequency scale above (expressed in oscillations per second, so that’s the hertz unit) is a logarithmic scale: frequencies double as we go from one octave to another: the frequency of the C4 note above (the so-called middle C) is 261.626 Hz, while the frequency of the next C note (C5) is double that: 523.251 Hz. [Just in case you’d want to know: the 4 and 5 number refer to its position on a standard 88-key piano keyboard: C4 is the fourth C key on the piano.]

Now, if we equate the interval between C4 and C5 with 1 (so the octave is our musical ‘unit’), then the interval between the twelve half-steps is, obviously, 1/12. Why? Because we have 12 halve-steps in our musical unit. You can also easily verify that, because of the way logarithms work, the ratio of the frequencies of two notes that are separated by one half-step (between D# and E, for example) will be equal to 2^1/12. Likewise, the ratio of the frequencies of two notes that are separated by n half-steps is equal to 2^n/12. [In case you’d doubt, just do an example. For instance, if we’d denote the frequency of C4 as f₀, and the frequency of C# as f₁ and so on (so the frequency of D is f₂, the frequency of C5 is f₁₂, and everything else is in-between), then we can write the f₂/f₀ratio as f₂/f₀= ( f₂/f₁)(f₁/f₀) = 2^1/12·2^1/12 = 2^2/12= 2^1/6. I must assume you’re smart enough to generalize this result yourself, and that f₁₂/f₀is, obviously, equal to 2^12/12=2¹ = 2, which is what it should be!]

Now, because the frequencies of the various C notes are expressed as a number involving some decimal fraction (like 523.251 Hz, and the 0.251 is actually an approximation only), and because they are, therefore, a bit hard to read and/or work with, I’ll illustrate the next idea – i.e. the concept of harmonics – with the A instead of the C. 🙂

Harmonics

The lowest A on a piano is denoted by A0, and its frequency is 27.5 Hz. Lower A notes exist (we have one at 13.75 Hz, for instance) but we don’t use them, because they are near (or actually beyond) the limit of the lowest frequencies we can hear. So let’s stick to our grand piano and start with that 27.5 Hz frequency. The next A note is A1, and its frequency is 55 Hz. We then have A2, which is like the A on my (or your) guitar: its frequency is equal to 2×55 = 110 Hz. The next is A3, for which we double the frequency once again: we’re at 220 Hz now. The next one is the A in the illustration of the C scale above: A4, with a frequency of 440 Hz.

[Let me, just for the record, note that the A4 note is the standard tuning pitch in Western music. Why? Well… There’s no good reason really, except convention. Indeed, we can derive the frequency of any other note from that A4 note using our formula for the ratio of frequencies but, because of the properties of a logarithmic function, we could do the same using whatever other note really. It’s an important point: there’s no such thing as an absolute reference point in music: once we define our musical ‘unit’ (so that’s the so-called octave in Western music), and how many steps we want to have in-between (so that’s 12 steps—again, in Western music, that is), we get all the rest. That’s just how logarithms work. So music is all about structure, i.e. mathematical relationships. Again, Pythagoras’ conclusions were wrong, but his intuition was right.]

Now, the notes we are talking about here are all so-called pure tones. In fact, when I say that the A on our guitar is referred to as A2 and that it has a frequency of 110 Hz, then I am actually making a huge simplification. Worse, I am lying when I say that: when you play a string on a guitar, or when you strike a key on a piano, all kinds of other frequencies – so-called harmonics – will resonate as well, and that’s what gives the quality to the sound: it’s what makes it sound beautiful. So the fundamental frequency (aka as first harmonic) is 110 Hz alright but we’ll also have second, third, fourth, etc harmonics with frequency 220 Hz, 330 Hz, 440 Hz, etcetera. In music, the basic or fundamental frequency is referred to as the pitch of the tone and, as you can see, I often use the term ‘note’ (or pure tone) as a synonym for pitch—which is more or less OK, but not quite correct actually. [However, don’t worry about it: my sloppiness here does not affect the argument.]

What’s the physics behind? Look at the illustration below (I borrowed it from the Physics Classroom site). The thick black line is the string, and the wavelength of its fundamental frequency (i.e. the first harmonic) is twice its length, so we write λ₁ = 2·L or, the other way around, L = (1/2)·λ₁. Now that’s the so-called first mode of the string. [One often sees the term fundamental or natural or normal mode, but the adjective is not necessary really. In fact, I find it confusing, although I sometimes find myself using it too.]

We also have a second, third, etc mode, depicted below, and these modes correspond to the second, third, etc harmonic respectively.

For the second, third, etc mode, the relationship between the wavelength and the length of the string is, obviously, the following: L = (2/2)·λ₂= λ₂, L = L = (3/2)·λ₃, etc. More in general, for the n^th mode, L will be equal to L = (n/2)·λ_n, with n = 1, 2, etcetera. In fact, because L is supposed to be some fixed length, we should write it the other way around: λ_n = (2/n)·L.

What does it imply for the frequencies? We know that the speed of the wave – let’s denote it by c – as it travels up and down the string, is a property of the string, and it’s a property of the string only. In other words, it does not depend on the frequency. Now, the wave velocity is equal to the frequency times the wavelength, always, so we have c = f·λ. To take the example of the (classical) guitar string: its length is 650 mm, i.e. 0.65 m. Hence, the identities λ₁ = (2/1)·L, λ₂ = (2/2)·L, λ₃ = (2/3)·L etc become λ₁ = (2/1)·0.65 = 1.3 m, λ₂ = (2/2)·0.65 = 0.65 m, λ₃ = (2/3)·0.65 = 0.433.. m and so on. Now, combining these wavelengths with the above-mentioned frequencies, we get the wave velocity c = (110 Hz)·(1.3 m) = (220 Hz)·(0.65 m) = (330 Hz)·(0.433.. m) = 143 m/s.

Let me now get back to Pythagoras’ string. You should note that the frequencies of the harmonics produced by a simple guitar string are related to each other by simple whole number ratios. Indeed, the frequencies of the first and second harmonics are in a simple 2 to 1 ratio (2:1). The second and third harmonics have a 3:2 frequency ratio. The third and fourth harmonics a 4:3 ratio. The fifth and fourth harmonic 5:4, and so on and so on. They have to be. Why? Because the harmonics are simple multiples of the basic frequency. Now that is what’s really behind Pythagoras’ observation: when he was sounding similar strings with the same tension but different lengths, he was making sounds with the same harmonics. Nothing more, nothing less.

Let me be quite explicit here, because the point that I am trying to make here is somewhat subtle. Pythagoras’ string is Pythagoras’ string: he talked similar strings. So we’re not talking some actual guitar or a piano or whatever other string instrument. The strings on (modern) string instruments are not similar, and they do not have the same tension. For example, the six strings of a guitar strings do not differ in length (they’re all 650 mm) but they’re different in tension. The six strings on a classical guitar also have a different diameter, and the first three strings are plain strings, as opposed to the bottom strings, which are wound. So the strings are not similar but very different indeed. To illustrate the point, I copied the values below for just one of the many commercially available guitar string sets. It’s the same for piano strings. While they are somewhat more simple (they’re all made of piano wire, which is very high quality steel wire basically), they also differ—not only in length but in diameter as well, typically ranging from 0.85 mm for the highest treble strings to 8.5 mm (so that’s ten times 0.85 mm) for the lowest bass notes.

In short, Pythagoras was not playing the guitar or the piano (or whatever other more sophisticated string instrument that the Greeks surely must have had too) when he was thinking of these harmonic relationships. The physical explanation behind his famous observation is, therefore, quite simple: musical tones that have the same harmonics sound pleasant, or consonant, we should say—from the Latin con-sonare, which, literally, means ‘to sound together’ (from sonare = to sound and con = with). And otherwise… Well… Then they do not sound pleasant: they are dissonant.

To drive the point home, let me emphasize that, when we’re plucking a string, we produce a sound consisting of many frequencies, all in one go. One can see it in practice: if you strike a lower A string on a piano – let’s say the 110 Hz A2 string – then its second harmonic (220 Hz) will make the A3 string vibrate too, because it’s got the same frequency! And then its fourth harmonic will make the A4 string vibrate too, because they’re both at 440 Hz. Of course, the strength of these other vibrations (or their amplitude we should say) will depend on the strength of the other harmonics and we should, of course, expect that the fundamental frequency (i.e. the first harmonic) will absorb most of the energy. So we pluck one string, and so we’ve got one sound, one tone only, but numerous notes at the same time!

In this regard, you should also note that the third harmonic of our 110 Hz A2 string corresponds to the fundamental frequency of the E4 tone: both are 330 Hz! And, of course, the harmonics of E, such as its second harmonic (2·330 Hz = 660 Hz) correspond to higher harmonics of A too! To be specific, the second harmonic of our E string is equal to the sixth harmonic of our A2 string. If your guitar is any good, and if your strings are of reasonable quality too, you’ll actually see it: the (lower) E and A strings co-vibrate if you play the A major chord, but by striking the upper four strings only. So we’ve got energy – motion really – being transferred from the four strings you do strike to the two strings you do not strike! You’ll say: so what? Well… If you’ve got any better proof of the actuality (or reality) of various frequencies being present at the same time, please tell me! 🙂

So that’s why A and E sound very well together (A, E and C#, played together, make up the so-called A major chord): our ear likes matching harmonics. And so that why we like musical tones—or why we define those tones as being musical! 🙂 Let me summarize it once more: musical tones are composite sound waves, consisting of a fundamental frequency and so-called harmonics (so we’ve got many notes or pure tones altogether in one musical tone). Now, when other musical tones have harmonics that are shared, and we sound those notes too, we get the sensation of harmony, i.e. the combination sounds consonant.

Now, i’s not difficult to see that we will always have such shared harmonics if we have similar strings, with the same tension but different lengths, being sounded together. In short, what Pythagoras observed has nothing much to do with notes, but with tones. Let’s go a bit further in the analysis now by introducing some more math. And, yes, I am very sorry: it’s the dreaded Fourier analysis indeed! 🙂

Fourier analysis

You know that we can decompose any periodic function into a sum of a (potentially infinite) series of simple sinusoidal functions, as illustrated below. I took the illustration from Wikipedia: the red function s₆(x) is the sum of six sine functions of different amplitudes and (harmonically related) frequencies. The so-called Fourier transform S(f) (in blue) relates the six frequencies with the respective amplitudes.

In light of the discussion above, it is easy to see what this means for the sound coming from a plucked string. Using the angular frequency notation (so we write everything using ω instead of f), we know that the normal or natural modes of oscillation have frequencies ω = 2π/T = 2πf (so that’s the fundamental frequency or first harmonic), 2ω (second harmonic), 3ω (third harmonic), and so on and so on.

Now, there’s no reason to assume that all of the sinusoidal functions that make up our tone should have the same phase: some phase shift Φ may be there and, hence, we should write our sinusoidal function not as cos(ωt), but as cos(ωt + Φ) in order to ensure our analysis is general enough. [Why not a sine function? It doesn’t matter: the cosine and sine function are the same, except for another phase shift of 90° = π/2.] Now, from our geometry classes, we know that we can re-write cos(ωt + Φ) as

cos(ωt + Φ) = [cos(Φ)cos(ωt) – sin(Φ)sin(ωt)]

We have a lot of these functions of course – one for each harmonic, in fact – and, hence, we should use subscripts, which is what we do in the formula below, which says that any function f(t) that is periodic with the period T can be written mathematically as:

You may wonder: what’s that period T? It’s the period of the fundamental mode, i.e. the first harmonic. Indeed, the period of the second, third, etc harmonic will only be one half, one third etcetera of the period of the first harmonic. Indeed, T₂ = (2π)/(2ω) = (1/2)·(2π)/ω = (1/2)·T₁, and T₃ = (2π)/(3ω) = (1/3)·(2π)/ω = (1/3)·T₁, and so on. However, it’s easy to see that these functions also repeat themselves after two, three, etc periods respectively. So all is alright, and the general idea behind the Fourier analysis is further illustrated below. [Note that both the formula as well as the illustration below (which I took from Feynman’s Lectures) add a ‘zero-frequency term’ a₀ to the series. That zero-frequency term will usually be zero for a musical tone, because the ‘zero’ level of our tone will be zero indeed. Also note that the a_n and b_n coefficients are, of course, equal to a_n = cos Φ_nand b_n= –sinΦ_n, so you can relate the illustration and the formula easily.]

You’ll say: What the heck! Why do we need the mathematical gymnastics here? It’s just to understand that other characteristic of a musical tone: its quality (as opposed to its pitch). A so-called rich tone will have strong harmonics, while a pure tone will only have the first harmonic. All other characteristics – the difference between a tone produced by a violin as opposed to a piano – are then related to the ‘mix’ of all those harmonics.

So we have it all now, except for loudness which is, of course, related to the magnitude of the air pressure changes as our waveform moves through the air: pitch, loudness and quality. that’s what makes a musical tone. 🙂

Dissonance

As mentioned above, if the sounds are not consonant, they’re dissonant. But what is dissonance really? What’s going on? The answer is the following: when two frequencies are near to a simple fraction, but not exact, we get so-called beats, which our ear does not like.

Huh? Relax. The illustration below, which I copied from the Wikipedia article on piano tuning, illustrates the phenomenon. The blue wave is the sum of the red and the green wave, which are originally identical. But then the frequency of the green wave is increased, and so the two waves are no longer in phase, and the interference results in a beating pattern. Of course, our musical tone involves different frequencies and, hence, different periods T₁,T₂, T₃etcetera, but you get the idea: the higher harmonics also oscillate with period T₁, and if the frequencies are not in some exact ratio, then we’ll have a similar problem: beats, and our ear will not like the sound.

Of course, you’ll wonder: why don’t we like beats in tones? We can ask that, can’t we? It’s like asking why we like music, isn’t it? […] Well… It is and it isn’t. It’s like asking why our ear (or our brain) likes harmonics. We don’t know. That’s how we are wired. The ‘physical’ explanation of what is musical and what isn’t only goes so far, I guess. 😦

Pythagoras versus Bach

From all of what I wrote above, it is obvious that the frequencies of the harmonics of a musical tone are, indeed, related by simple ratios of small integers: the frequencies of the first and second harmonics are in a simple 2 to 1 ratio (2:1); the second and third harmonics have a 3:2 frequency ratio; the third and fourth harmonics a 4:3 ratio; the fifth and fourth harmonic 5:4, etcetera. That’s it. Nothing more, nothing less.

In other words, Pythagoras was observing musical tones: he could not observe the pure tones behind, i.e. the actual notes. However, aesthetics led Pythagoras, and all musicians after him – until the mid-18th century – to also think that the ratio of the frequencies of the notes within an octave should also be simple ratios. From what I explained above, it’s obvious that it should not work that way: the ratio of the frequencies of two notes separated by n half-steps is 2^n/12, and, for most values of n, 2^n/12 is not some simple ratio. [Why? Just take your pocket calculator and calculate the value of 2^1/12: it’s 2^0.08333… = 1.0594630943… and so on… It’s an irrational number: there are no repeating decimals. Now, 2^n/12 is equal to 2^1/12·2^1/12·…·2^1/12 (n times). Why would you expect that product to be equal to some simple ratio?]

So – I said it already – Pythagoras was wrong—not only in this but also in other regards, such as when he espoused his views on the solar system, for example. Again, I am sorry to have to say that, but it is what is: the Pythagoreans did seem to prefer mathematical ideas over physical experiment. 🙂 Having said that, musicians obviously didn’t know about any alternative to Pythagoras, and they had surely never heard about logarithmic scales at the time. So… Well… They did use the so-called Pythagorean tuning system. To be precise, they tuned their instruments by equating the frequency ratio between the first and the fifth tone in the C scale (i.e. the C and G, as they did not include the C#, D# and F# semitones when counting) with the ratio 3/2, and then they used other so-called harmonic ratios for the notes in-between.

Now, the 3/2 ratio is actually almost correct, because the actual frequency ratio is 2^7/12 (we have seven tones, including the semitones—not five!), and so that’s 1.4983, approximately. Now, that’s pretty close to 3/2 = 1.5, I’d say. 🙂 Using that approximation (which, I admit, is fairly accurate indeed), the tuning of the other strings would then also be done assuming certain ratios should be respected, like the ones below.

So it was all quite good. Having said that, good musicians, and some great mathematicians, felt something was wrong—if only because there were several so-called just intonation systems around (for an overview, check out the Wikipedia article on just intonation). More importantly, they felt it was quite difficult to transpose music using the Pythagorean tuning system. Transposing music amounts to changing the so-called key of a musical piece: what one does, basically, is moving the whole piece up or down in pitch by some constant interval that is not equal to an octave. Today, transposing music is a piece of cake—Western music at least. But that’s only because all Western music is played on instruments that are tuned using that logarithmic scale (technically, it’s referred to as the 12-tone equal temperament (12-TET) system). When you’d use one of the Pythagorean systems for tuning, a transposed piece does not sound quite right.

The first mathematician who really seemed to know what was wrong (and, hence, who also knew what to do) was Simon Stevin, who wrote a manuscript based on the ’12^throot of 2 principle’ around AD 1600. It shouldn’t surprise us: the thinking of this mathematician from Bruges would inspire John Napier’s work on logarithms. Unfortunately, while that manuscript describes the basic principles behind the 12-TET system, it didn’t get published (Stevin had to run away from Bruges, to Holland, because he was protestant and the Spanish rulers at the time didn’t like that). Hence, musicians, while not quite understanding the math (or the physics, I should say) behind their own music, kept trying other tuning systems, as they felt it made their music sound better indeed.

One of these ‘other systems’ is the so-called ‘good’ temperament, which you surely heard about, as it’s referred to in Bach’s famous composition, Das Wohltemperierte Klavier, which he finalized in the first half of the 18th century. What is that ‘good’ temperament really? Well… It is what it is: it’s one of those tuning systems which made musicians feel better about their music for a number of reasons, all of which are well described in the Wikipedia article on it. But the main reason is that the tuning system that Bach recommended was a great deal better when it came to playing the same piece in another key. However, it still wasn’t quite right, as it wasn’t the equal temperament system (i.e. the 12-TET system) that’s in place now (in the West at least—the Indian music scale, for instance, is still based on simple ratios).

Why do I mention this piece of Bach? The reason is simple: you probably heard of it because it’s one of the main reference points in a rather famous book: Gödel, Escher and Bach—an Eternal Golden Braid. If not, then just forget about it. I am mentioning it because one of my brothers loves it. It’s on artificial intelligence. I haven’t read it, but I must assume Bach’s master piece is analyzed there because of its structure, not because of the tuning system that one’s supposed to use when playing it. So… Well… I’d say: don’t make that composition any more mystic than it already is. 🙂 The ‘magic’ behind it is related to what I said about A4 being the ‘reference point’ in music: since we’re using a universal logarithmic scale now, there’s no such thing as an absolute reference point any more: once we define our musical ‘unit’ (so that’s the so-called octave in Western music), and also define how many steps we want to have in-between (so that’s 12—in Western music, that is), we get all the rest. That’s just how logarithms work.

So, in short, music is all about structure, i.e. it’s all about mathematical relations, and about mathematical relations only. Again, Pythagoras’ conclusions were wrong, but his intuition was right. And, of course, it’s his intuition that gave birth to science: the simple ‘models’ he made – of how notes are supposed to be related to each other, or about our solar system – were, obviously, just the start of it all. And what a great start it was! Looking back once again, it’s rather sad conservative forces (such as the Church) often got in the way of progress. In fact, I suddenly wonder: if scientists would not have been bothered by those conservative forces, could mankind have sent people around the time that Charles V was born, i.e. around A.D. 1500 already? 🙂

Post scriptum: My example of the the (lower) E and A guitar strings co-vibrating when playing the A major chord striking the upper four strings only, is somewhat tricky. The (lower) E and A strings are associated with lower pitches, and we said overtones (i.e. the second, third, fourth, etc harmonics) are multiples of the fundamental frequency. So why is that the lower strings co-vibrate? The answer is easy: they oscillate at the higher frequencies only. If you have a guitar: just try it. The two strings you do not pluck do vibrate—and very visibly so, but the low fundamental frequencies that come out of them when you’d strike them, are not audible. In short, they resonate at the higher frequencies only. 🙂

The example that Feynman gives is much more straightforward: his example mentions the lower C (or A, B, etc) notes on a piano causing vibrations in the higher C strings (or the higher A, B, etc string respectively). For example, striking the C2 key (and, hence, the C2 string inside the piano) will make the (higher) C3 string vibrate too. But few of us have a grand piano at home, I guess. That’s why I prefer my guitar example. 🙂

Maxwell-Boltzmann, Bose-Einstein and Fermi-Dirac statistics

Pre-scriptum added much later: We have advanced much in our understanding since we wrote this post. If you are reading it because you want to understand more about the boson-fermion distinction, then you shouldn’t be here. The general distinction between bosons and fermions is a useless theoretical generalization which actually prevents you from understanding what is really going on. I am keeping this post online for documentation purposes only. It is interesting from a math point of view but you are not here to learn math, are you?

Jean Louis Van Belle, 20 May 2020

Original post:

I’ve discussed statistics, in the context of quantum mechanics, a couple of times already (see, for example, my post on amplitudes and statistics). However, I never took the time to properly explain those distribution functions which are referred to as the Maxwell-Boltzmann, Bose-Einstein and Fermi-Dirac distribution functions respectively. Let me try to do that now—without, hopefully, getting lost in too much math! It should be a nice piece, as it connects quantum mechanics with statistical mechanics, i.e. two topics I had nicely separated so far. 🙂

You know the Boltzmann Law now, which says that the probabilities of different conditions of energy are given by e^−energy/kT = 1/e^energy/kT. Different ‘conditions of energy’ can be anything: density, molecular speeds, momenta, whatever. The point is: we have some probability density function f, and it’s a function of the energy E, so we write:

f(E) = C·e^−energy/kT= C/e^energy/kT

C is just a normalization constant (all probabilities have to add up to one, so the integral of this function over its domain must be one), and k and T are also usual suspects: T is the (absolute) temperature, and k is the Boltzmann constant, which relates the temperate to the kinetic energy of the particles involved. We also know the shape of this function. For example, when we applied it to the density of the atmosphere at various heights (which are related to the potential energy, as P.E. = m·g·h), assuming constant temperature, we got the following graph. The shape of this graph is that of an exponential decay function (we’ll encounter it again, so just take a mental note of it).

graph

A more interesting application is the quantum-mechanical approach to the theory of gases, which I introduced in my previous post. To explain the behavior of gases under various conditions, we assumed that gas molecules are like oscillators but that they can only take on discrete levels of energy. [That’s what quantum theory is about!] We denoted the various energy levels, i.e. the energies of the various molecular states, by E₀, E₁, E₂,…, E_i,…, and if Boltzmann’s Law applies, then the probability of finding a molecule in the particular state E_i is proportional to e^−E_i /kT. We can then calculate the relative probabilities, i.e. the probability of being in state E_i, relative to the probability of being in state E₀, is:

P_i/P₀ = e^−E_i /kT/e^−E₀ /kT = e^{−(E_i–E₀)/kT}= 1/e^{(E_i–E₀)/kT}

Now, P_i obviously equals n_i/N, so it is the ratio of the number of molecules in state E_i (n_i) and the total number of molecules (N). Likewise, P₀ = n₀/N and, therefore, we can write:

n_i/n₀= e^{−(E_i−E₀)/kT}= 1/e^{(E_i–E₀)/kT}

This formulation is just another Boltzmann Law, but it’s nice in that it introduces the idea of a ground state, i.e. the state with the lowest energy level. We may or may not want to equate E₀ with zero. It doesn’t matter really: we can always shift all energies by some arbitrary constant because we get to choose the reference point for the potential energy.

So that’s the so-called Maxwell-Boltzmann distribution. Now, in my post on amplitudes and statistics, I had jotted down the formulas for the other distributions, i.e. the distributions when we’re not talking classical particles but fermions and/or bosons. As you know, fermions are particles governed by the Fermi exclusion principle: indistinguishable particles cannot be together in the same state. For bosons, it’s the other way around: having one in some quantum state actually increases the chance of finding another one there, and we can actually have an infinite number of them in the same state.

We also know that fermions and bosons are the real world: fermions are the matter-particles, bosons are the force-carriers, and our ‘Boltzmann particles’ are nothing but a classical approximation of the real world. Hence, even if we can’t see them in the actual world, the Fermi-Dirac and Bose-Einstein distributions are the real-world distributions. 🙂 Let me jot down the equations once again:

Fermi-Dirac (for fermions): f(E) = 1/[Ae^{(E − E_F)/kT}+ 1]

Bose-Einstein (for bosons): f(E) = 1/[Ae^E/kT− 1]

We’ve got some other normalization constant here (A), which we shouldn’t be too worried about—for the time being, that is. Now, to see how these distributions are different from the Maxwell-Boltzmann distribution (which we should re-write as f(E) = C·e^−E/kT = 1/[A·e^E/kT] so as to make all formulas directly comparable), we should just make a graph. Please go online to find a graph tool (I found a new one recently—really easy to use), and just do it. You’ll see they are all like that exponential decay function. However, in order to make a proper comparison, we would actually need to calculate the normalization coefficients and, for the Fermi energy, we would also need the Fermi energy E_F(note that, for simplicity, we did equate E₀ with zero). Now, we could give it a try, but it’s much easier to google and find an example online.

The HyperPhysics website of Georgia State University gives us one: the example assumes 6 particles and 9 energy levels, and the table and graph below compare the Maxwell-Boltzmann and Bose-Einstein distributions for the model.

Now that is an interesting example, isn’t it? In this example (but all depends on its assumptions, of course), the Maxwell-Boltzmann and Bose-Einstein distributions are almost identical. Having said that, we can clearly see that the lower energy states are, indeed, more probable with Bose-Einstein statistics than with the Maxwell-Boltzmann statistics. While the difference is not dramatic at all in this example, the difference does become very dramatic, in reality, with large numbers (i.e. high matter density) and, more importantly, at very low temperatures, at which bosons can condense into the lowest energy state. This phenomenon is referred to as Bose-Einstein condensation: it causes superfluidity and superconductivity, and it’s real indeed: it has been observed with supercooled He-4, which is not an everyday substance, but real nevertheless!

What about the Fermi-Dirac distribution for this example? The Fermi-Dirac distribution is given below: the lowest energy state is now less probable, the mid-range energies much more, and none of the six particles occupy any of the four highest energy levels. Again, while the difference is not dramatic in this example, it can become very dramatic, in reality, with large numbers (read: high matter density) and very low temperatures: at absolute zero, all of the possible energy states up to the Fermi energy level will be occupied, and all the levels above the Fermi energy will be vacant.

What can we make out of all of this? First, you may wonder why we actually have more than one particle in one state above: doesn’t that contradict the Fermi exclusion principle? No. We need to distinguish micro- and macro-states. In fact, the example assumes we’re talking electrons here, and so we can have two particles in each energy state—with opposite spin, however. At the same time, it’s true we cannot have three, or more, in any state. That results, in the example we’re looking at here, in five possible distributions only, as shown below.

The diagram is an interesting one: if the particles were to be classical particles, or bosons, then 26 combinations are possible, including the five Fermi-Dirac combinations, as shown above. Note the little numbers above the 26 possible combinations (e.g. 6, 20, 30,… 180): they are proportional to the likelihood of occurring under the Maxwell-Boltzmann assumption (so if we assume the particles are ‘classical’ particles). Let me introduce you to the math behind the example by using the diagram below, which shows three possible distributions/combinations (I know the terminology is quite confusing—sorry for that!).

If we could distinguish the particles, then we’d have 2002 micro-states, which is the total of all those little numbers on top of the combinations that are shown (6+60+180+…). However, the assumption is that we cannot distinguish the particles. Therefore, the first combination in the diagram above, with five particles in the zero energy state and one particle in state 9, occurs 6 times into 2002 and, hence, it has a probability of 6/2002 ≈ 0.003 only. In contrast, the second combination is 10 times more likely, and the third one is 30 times more likely! In any case, the point is, in the classical situation (and in the Bose-Einstein hypothesis as well), we have 26 possible macro-states, as opposed to 5 only for fermions, and so that leads to a very different density function. Capito?

No? Well, this blog is not a textbook on physics and, therefore, I should refer you to the mentioned site once again, which references a 1992 textbook on physics (Frank Blatt, Modern Physics, 1992) as the source of this example. However, I won’t do that: you’ll find the details in the Post Scriptum to this post. 🙂

Let’s first focus on the fundamental stuff, however. The most burning question is: if the real world consists of fermions and bosons, why is that that we only see the Maxwell-Boltzmann distribution in our actual (non-real?) world? 🙂 The answer is that both the Fermi-Dirac and Bose-Einstein distribution approach the Maxwell–Boltzmann distribution if higher temperatures and lower particle densities are involved. In other words, we cannot see the Fermi-Dirac distributions (all matter is fermionic, except for weird stuff like superfluid helium-4 at 1 or 2 degrees Kelvin), but they are there!

Let’s approach it mathematically: the most general formula, encompassing both Fermi-Dirac and Bose-Einstein statistics, is:

N_i(E_i) ∝ 1/[e^{(E_i − μ)/kT}± 1]

If you’d google, you’d find a formula involving an additional coefficient, g_i, which is the so-called degeneracy of the energy level E_i. I included it in the formula I used in the above-mentioned post of mine. However, I don’t want to make it any more complicated than it already is and, therefore, I omitted it this time. What you need to look at are the two terms in the denominator: e^{(E_i − μ)/kT}and ± 1.

From a math point of view, it is obvious that the values of e^{(E_i − μ)/kT}+ 1 (Fermi-Dirac) and e^{(E_i − μ)/kT}− 1 (Bose-Einstein) will approach each other if e^{(E_i − μ)/kT}is much larger than ±1, so if e^{(E_i − μ)/kT}>> 1. That’s the case, obviously, if the (E_i − μ)/kT ratio is large, so if (E_i − μ) >> kT. In fact, (E_i − μ) should, obviously, be much larger than kT for the lowest energy levels too! Now, the conditions under which that is the case are associated with the classical situation (such as a cylinder filled with gas, for example). Why?

Well… […] Again, I have to say that this blog can’t substitute for a proper textbook. Hence, I am afraid I have to leave it to you to do the necessary research to see why. 🙂 The non-mathematical approach is to simple note that quantum effects, i.e. the ±1 term, only apply if the concentration of particles is high enough. Indeed, quantum effects appear if the concentration of particles is higher than the so-called quantum concentration. Only when the quantum concentration is reached, particles will start interacting according to what they are, i.e. as bosons or as fermions. At higher temperature, that concentration will not be reached, except in massive objects such as a white dwarf (white dwarfs are stellar remnants with the mass like that of the Sun but a volume like that of the Earth). So, in general, we can say that at higher temperatures and at low concentration we will not have any quantum effects. That should settle the matter—as for now, at least.

You’ll have one last question: we derived Boltzmann’s Law from the kinetic theory of gases, but how do we derive that N_i(E_i) = 1/[Ae^{(E_i − μ)/kT}± 1] expression? Good question but, again, we’d need more than a few pages to explain that! The answer is: quantum mechanics, of course! Go check it out in Feynman’s third Volume of Lectures! 🙂

Post scriptum: combinations, permutations and multiplicity

The mentioned example from HyperPhysics is really interesting, if only because it shows you also need to master a bit of combinatorics to get into quantum mechanics. Let’s go through the basics. If we have n distinct objects, we can order hem in n! ways, with n! (read: n factorial) equal to n·(n–1)·(n–2)·…·3·2·1. Note that 0! is equal to 1, per definition. We’ll need that definition.

For example, a red, blue and green ball can be ordered in 3·2·1 = 6 ways. Each way is referred to as a permutation.

Besides permutations, we also have the concept of a k-permutation, which we can denote in a number of ways but let’s choose P(n, k). [The P stands for permutation here, not for probability.] P(n, k) is the number of ways to pick k objects out of a set of n objects. Again, the objects are supposed to be distinguishable. The formula is P(n, k) = n·(n–1)·(n–2)·…·(n–k+1) = n!/(n–k)!. That’s easy to understand intuitively: on your first pick you have n choices; on your second, n–1; on your third, n–2, etcetera. When n = k, we obviously get n! again.

There is a third concept: the k-combination (as opposed to the k-permutation), which we’ll denote by C(n, k). That’s when the order within our subset doesn’t matter: an ace, a queen and a jack taken out of some card deck are a queen, a jack, and an ace: we don’t care about the order. If we have k objects, there are k! ways of ordering them and, hence, we just have to divide P(n, k) by k! to get C(n, k). So we write: C(n, k) = P(n, k)/k! = n!/[(n–k)!k!]. You recognize C(n, k): it’s the binomial coeficient.

Now, the HyperPhysics example illustrating the three mentioned distributions (Maxwell-Boltzmann, Bose-Einstein and Fermi-Dirac) is a bit more complicated: we need to associate q energy levels with N particles. Every possible configuration is referred to as a micro-state, and the total number of possible micro-states is referred to as the multiplicity of the system, denoted by Ω(N, q). The formula for Ω(N, q) is another binomial coefficient: Ω(N, q) = (q+N–1)!/[q!(N–1)!]. Ω(N, q) = Ω(6, 9) = (9+6–1)!/[9!(6–1)!] = 2002.

In our example, however, we do not have distinct particles and, therefore, we only have 26 macro-states (as opposed to 2002 micro-states), which are also referred to, confusingly, as distributions or combinations.

Now, the number of micro-states associated with the same macro-state is given by yet another formula: it is equal to N!/[n₁!·n₂!·n₃!·…·n_q!], with n_i! the number of particles in level i. [See why we need the 0! = 1 definition? It ensures unoccupied states do not affect the calculation.] So that’s how we get those numbers 6, 60 and 180 for those three macro-states.

But how do we calculate those average numbers of particles for each energy level? In other words, how do we calculate the probability densities under the Maxwell-Boltzmann, Fermi-Dirac and Bose-Einstein hypothesis respectively?

For the Maxwell-Boltzmann distribution, we proceed as follows: for each energy level j (or E_j, I should say), we calculate n_j= ∑n_ij·P_i over all macro-states i. In this summation, we have n_ij, which is the number of particles in energy level j in micro-state i, while P_i is the probability of macro-state i as calculated by the ratio of (i) the number of micro-states associated with macro-state i and (ii) the total number of micro-states. For P_i, we gave the example of 3/2002 ≈ 0.3%. For 60 and 180, we get 60/2002 ≈ 3% and 180/2002 ≈ 9%. Calculating all the n_j‘s for j ranging from 1 to 9 should yield the numbers and the graph below indeed.

OK. That’s how it works for Maxwell-Boltzmann. Now, it is obvious that the Fermi-Dirac and the Bose-Einstein distribution should not be calculated in the same way because, if they were, they would not be different from the Maxwell-Boltzmann distribution! The trick is as follows.

For the Bose-Einstein distribution, we give all macro-states equal weight—so that’s a weight of one, as shown below. Hence, the probability P_i is, quite simply, 1/26 ≈ 3.85% for all 26 macro-states. So we use the same n_j= ∑n_ij·P_iformula but with P_i = 1/26.

Finally, I already explained how we get the Fermi-Dirac distribution: we can only have (i) one, (ii) two, or (iii) zero fermions for each energy level—not more than two! Hence, out of the 26 macro-states, only five are actually possible under the Fermi-Dirac hypothesis, as illustrated below once more. So it’s a very different distribution indeed!

Now, you’ll probably still have questions. For example, why does the assumption, for the Bose-Einstein analysis, that macro-states have equal probability favor the lower energy states? The answer is that the model also integrates other constraints: first, when associating a particle with an energy level, we do not favor one energy level over another, so all energy levels have equal probability. However, at the same time, the whole system has some fixed energy level, and so we cannot put the particles in the higher energy levels only! At the same time, we know that, if we have q particles, and the probability of a particle having some energy level j is the same for all j, then they are likely not to be all at the same energy level: they’ll be distributed, effectively, as evidenced by the very low chance (0.3% only) of having 5 particles in the ground state and 1 particle at a higher level, as opposed to the 3% and 9% chance of the other two combinations shown in that diagram with three possible Maxwell-Boltzmann (MB) combinations.

So what happens when assigning an equal probability to all 26 possible combinations (with value 1/26) is that the combinations that were previously rather unlikely – because they did have a rather heavy concentration of particles in the ground state only – are now much more likely. So that’s why the Bose-Einstein distribution, in this example at least, is skewed towards the lowest energy level—as compared to the Maxwell-Boltzmann distribution, that is.

So that’s what’s behind, and that should also answer the other question you surely have when looking at those five acceptable Fermi-Dirac configurations: why don’t we have the same five configurations starting from the top down, rather than from the bottom up? Now you know: such configuration would have much higher energy overall, and so that’s not allowed under this particular model.

There’s also this other question: we said the particles were indistinguishable, but so then we suddenly say there can be two at any energy level, because their spin is opposite. It’s obvious this is rather ad hoc as well. However, if we’d allow only one particle at any energy level, we’d have no allowable combinations and, hence, we’d have no Fermi-Dirac distribution at all in this example.

In short, the example is rather intuitive, which is actually why I like it so much: it shows how bosonic and fermionic behavior appear rather gradually, as a consequence of variables that are defined at the system level, such as density, or temperature. So, yes, you’re right if you think the HyperPhysics example lacks rigor. That’s why I think it’s such wonderful pedagogic device. 🙂

The Quantum-Mechanical Gas Law

Pre-script (dated 26 June 2020): This post has become less relevant (even irrelevant, perhaps) because my views on all things quantum-mechanical have evolved significantly as a result of my progression towards a more complete realist (classical) interpretation of quantum physics. The text also got mutilated because of the removal of material by the dark force. I keep blog posts like these mainly because I want to keep track of where I came from. I might review them one day, but I currently don’t have the time or energy for it. 🙂

Original post:

In my previous posts, it was mentioned repeatedly that the kinetic theory of gases is not quite correct: the experimentally measured values of the so-called specific heat ratio (γ) vary with temperature and, more importantly, their values differ, in general, from what classical theory would predict. It works, more or less, for noble gases, which do behave as ideal gases and for which γ is what the kinetic theory of gases would want it to be: γ = 5/3—but we get in trouble immediately, even for simple diatomic gases like oxygen or hydrogen, as illustrated below: the theoretical value is 9/7 (so that’s 1.286, more or less), but the measured value is very different.

Let me quickly remind you how we get the theoretical number. According to classical theory, a diatomic molecule like oxygen can be represented as two atoms connected by a spring. Each of the atoms absorbs kinetic energy, and for each direction of motion (x, y and z), that energy is equal to kT/2, so the kinetic energy of both atoms – added together – is 2·3·kT/2 = 3kT. However, I should immediately add that not all of that energy is to be associated with the center-of-mass motion of the whole molecule, which determines the temperature of the gas: that energy is and remains equal to the 3kT/2, always. We also have rotational and vibratory motion. The molecule can rotate in two independent directions (and any combination of these directions, of course) and, hence, rotational motion is to absorb an amount of energy equal to 2·kT/2 = kT. Finally, the vibratory motion is to be analyzed as any other oscillation, so like a spring really. There is only one dimension involved and, hence, the kinetic energy here is just kT/2. However, we know that the total energy in an oscillator is the sum of the kinetic and potential energy, which adds another kT/2 term. Putting it all together, we find that the average energy for each diatomic particle is (or should be) equal to 7·kT/2 = (7/2)kT. Now, as mentioned above, the temperature of the gas (T) is proportional to the mean molecular energy of the center-of-mass motion only (in fact, that’s how temperature is defined), with the constant of proportionality equal to 3k/2. Hence, for monatomic ideal gases, we can write: U = N·(3k/2)T and, therefore, PV = NkT = (2/3)·U. Now, γ appears as follows in the ideal gas law: PV = (γ–1)U. Therefore, γ = 2/3 + 1 = 5/3, but so that’s for monatomic ideal gases only! The total kinetic energy of our diatomic molecule is U = N·(7k/2)T and, therefore, PV = (2/7)·U. So γ must be γ = 2/7 + 1 = 9/7 ≈ 1.286 for diatomic gases, like oxygen and hydrogen.

Phew! So that’s the theory. However, as we can see from the diagram, γ approaches that value only when we heat the gas to a few thousand degrees! So what’s wrong? One assumption is that certain kinds of motions “freeze out” as the temperature falls—although it’s kinda weird to think of something ‘freezing out’ at a thousand degrees Kelvin! In any case, at the end of the 19th century, that was the assumption that was advanced, very reluctantly, by scientists such as James Jeans. However, the mystery was about to be solved then, as Max Planck, even more reluctantly, presented his quantum theory of energy at the turn of the century itself.

But the quantum theory was confirmed and so we should now see how we can apply it to the behavior of gas. In my humble view, it’s a really interesting analysis, because we’re applying quantum theory here to a phenomenon that’s usually being analyzed as a classical problem only.

Boltzmann’s Law

We derived Boltzmann’s Law in our post on the First Principles of Statistical Mechanics. To be precise, we gave Boltzmann’s Law for the density of a gas (which we denoted by n = N/V) in a force field, like a gravitational field, or in an electromagnetic field (assuming our gas particles are electrically charged, of course). We noted, however, Boltzmann’s Law was also applicable to much more complicated situations, like the one below, which shows a potential energy function for two molecules that is quite characteristic of the way molecules actually behave: when they come very close together, they repel each other but, at larger distances, there’s a force of attraction. We don’t really know the forces behind but we don’t need to: as long as these forces are conservative, they can combine in whatever way they want to combine, and Boltzmann’s Law will be applicable. [It should be obvious why. If you hesitate, just think of the definition of work and how it affects potential energy and all that. Work is force times distance, but when doing work, we’re also changing potential energy indeed! So if we’ve got a potential energy function, we can get all the rest.]

Boltzmann’s Law itself is illustrated by the graph below, which also gives the formula for it: n = n₀·e^−P.E/kT.

It’s a graph starting at n = n₀ for P.E. = 0, and it then decreases exponentially. [Funny expression, isn’t it? So as to respect mathematical terminology, I should say that it decays exponentially.] In any case, if anything, Boltzmann’s Law shows the natural exponential function is quite ‘natural’ indeed, because Boltzmann’s Law pops up in Nature everywhere! Indeed, Boltzmann’s Law is not limited to functions of potential energy only. For example, Feynman derives another Boltzmann Law for the distribution of molecular speeds or, so as to ensure the formula is also valid in relativity, the distribution of molecular momenta. In case you forgot, momentum (p) is the product of mass (m) and velocity (u), and the relevant Boltzmann Law is:

f(p)·dp = C·e^−K.E/kT·dp

The argument is not terribly complicated but somewhat lengthy, and so I’ll refer you to the link for more details. As for the f(p) function (and the dp factor on both sides of the equation), that’s because we’re not talking exact values of p but some range equal to dp and some probability of finding particles that have a momentum within that range. The principle is illustrated below for molecular speeds (denoted by u = p/m), so we have a velocity distribution below. The illustration for p would look the same: just substitute u for p.

Boltzmann’s Law can be stated, much more generally, as follows:

The probability of different conditions of energy (E), potential or kinetic, is proportional to e^−E/kT.

As Feynman notes, “This is a rather beautiful proposition, and a very easy thing to remember too!” It is, and we’ll need it for the next bit.

The quantum-mechanical theory of gases

According to quantum theory, energy comes in discrete packets, quanta, and any system, like an oscillator, will only have a discrete set of energy levels, i.e. states of different energy. An energy state is, obviously, a condition of energy and, hence, Boltzmann’s Law applies. More specifically, if we denote the various energy levels, i.e. the energies of the various molecular states, by E₀, E₁, E2,…, E_i,…, and if Boltzmann’s Law applies, then the probability of finding a molecule in the particular state E_i will be proportional to e^−E_i /kT.

Now, we know we’ve got some constant there, but we can get rid of that by calculating relative probabilities. For example, the probability of being in state E₁, relative to the probability of being in state E₀, is:

P₁/P₀ = e^−E₁ /kT/e^−E₀ /kT = e^{−(E₁–E₀)/kT}

But the relative probability P₁should, obviously, also be equal to the ratio n₁/N, i.e. the ratio of the number of molecules in state E₁ and the total number of molecules. Likewise, P₀= n₀/N. Hence, P₁/P₀ = n₁/n₀and, therefore, we can write:

n = n₀e^{−(E₁–E₀)/kT}

What can we do with that? Remember we want to explain the behavior of non-monatomic gas—like diatomic gas, for example. Now we need some other assumption, obviously. As it turns out, the assumption that we can represent a system as some kind of oscillation still makes sense! In fact, the assumption that our diatomic molecule is like a spring is equally crucial to our quantum-theoretical analysis of gases as it is to our classical kinetic theory of gases. To be precise, in both theories, we look at it as a harmonic oscillator.

Don’t panic. A harmonic oscillator is, quite simply, a system that, when displaced from its equilibrium position, experiences some kind of restoring force. Now, for it to be harmonic, the force needs to be linear. For example, when talking springs, the restoring force F will be proportional to the displacement x). It basically means we can use a linear differential equation to analyze the system, like m·(d²x/dt²) = –kx. […] I hope you recognize this equation, because you should! It’s Newton’s Law: F = m·a with F = –k·x. If you remember the equation, you’ll also remember that harmonic oscillations were sinusoidal oscillations with a constant amplitude and a constant frequency. That frequency did not depend on the amplitude: because of the sinusoidal function involved, it was easier to write that frequency as an angular frequency, which we denoted by ω₀ and which, in the case of our spring, was equal to ω₀ = (k/m)^1/2. So it’s a property of the system. Indeed, ω₀is the square root of the ratio of (1) k, which characterizes the spring (it’s its stiffness), and (2) m, i.e. the mass on the spring. Solving the differential equation yielded x = A·cos(ω₀t + Δ) as a general solution, with A the (maximum) amplitude, and Δ some phase shift determined by our t = 0 point. Let me quickly jot down too more formulas: the potential energy in the spring is kx²/2, while its kinetic energy is mv²/2, as usual (so the kinetic energy depends on the mass and its velocity, while the potential energy only depends on the displacement and the spring’s stiffness). Of course, kinetic and potential energy add up to the total energy of the system, which is constant and proportional to the square of the (maximum) amplitude: K.E. + P.E. = E ∝ A². To be precise, E = kA²/2.

That’s simple enough. Let’s get back to our molecular oscillator. While the total energy of an oscillator in classical theory can take on any value, Planck challenged that assumption: according to quantum theory, it can only take up energies equal to ħω at a time. [Note that we use the so-called reduced Planck constant here (i.e. h-bar), because we’re dealing with angular frequencies.] Hence, according to quantum theory, we have an oscillator with equally spaced energy levels, and the difference between them is ħω. Now, ħω is terribly tiny—but it’s there. Let me visualize what I just wrote:

So our expression for P₁/P₀ becomes P₁/P₀ = e^−ħω/kT/e^−0/kT = e^−ħω/kT. More generally, we have P_i/P₀ = e^{−i·ħω/kT}. So what? Well… We’ve got a function here which gives the chance of finding a molecule in state P_i relative to that of finding it in state E₀, and it’s a function of temperature. Now, the graph below illustrates the general shape of that function. It’s a bit peculiar, but you can see that the relative probability goes up and down with temperature. The graph makes it clear that, at extremely low temperatures, most particles will be in state E₀ and, of course, the internal energy of our body of gas will be close to nil.

Now, we can look at the oscillators in the bottom state (i.e. particles in the molecular energy state E₀) as being effectively ‘frozen’: they don’t contribute to the specific heat. However, as we increase the temperature, our molecules gradually begin to have an appreciable probability to be in the second state, and then in the next state, and so on, and so the internal energy of the gas increases effectively. Now, when the probability is appreciable for many states, the quantized states become nearly indistinguishable and, hence, the situation is like classical physics: it is nearly indistinguishable from a continuum of energies.

Now, while you can imagine such analysis should explain why the specific heat ratio for oxygen and hydrogen varies as it does in the very first graph of this post, you can also imagine the details of that analysis fill quite a few pages! In fact, even Feynman doesn’t include it in his Lectures. What he does include is the analysis of the blackbody radiation problem, which is remarkably similar. So… Well… For more details on that, I’ll refer you to Feynman indeed. 🙂

I hope you appreciated this little ‘lecture’, as it sort of wraps up my ‘series’ of posts on statistical mechanics, thermodynamics and, central to both, the classical theory of gases. Have fun with it all!

Entropy, energy and enthalpy

Original post:

Phew! I am quite happy I got through Feynman’s chapters on thermodynamics. Now is a good time to review the math behind it. We thoroughly understand the gas equation now:

PV = NkT = (γ–1)U

The gamma (γ) in this equation is the specific heat ratio: it’s 5/3 for ideal gases (so that’s about 1.667) and, theoretically, 4/3 ≈ 1.333 or 9/7 ≈ 1.286 for diatomic gases, depending on the degrees of freedom we associate with diatomic molecules. More complicated molecules have even more degrees of freedom and, hence, can absorb even more energy, so γ gets closer to one—according to the kinetic gas theory, that is. While we know that the kinetic gas theory is not quite accurate – an approach involving molecular energy states is a better match for reality – that doesn’t matter here. As for the term (specific heat ratio), I’ll explain that later. [I promise. 🙂 You’ll see it’s quite logical.]

The point to note is that this body of gas (or whatever substance) stores an amount of energy U that is directly proportional to the temperature (T), and Nk/(γ–1) is the constant of proportionality. We can also phrase it the other way around: the temperature is directly proportional to the energy, with (γ–1)/Nk the constant of proportionality. It means temperature and energy are in a linear relationship. [Yes, direct proportionality implies linearity.] The graph below shows the T = [(γ–1)/Nk]·U relationship for three different values of γ, ranging from 5/3 (i.e. the maximum value, which characterizes monatomic noble gases such as helium, neon or krypton) to a value close to 1, which is characteristic of more complicated molecular arrangements indeed, such as heptane (γ = 1.06) or methyl butane ((γ = 1.08). The illustration shows that, unlike monatomic gas, more complicated molecular arrangements allow the gas to absorb a lot of (heat) energy with a relatively moderate rise in temperature only.

We’ll soon encounter another variable, enthalpy (H), which is also linearly related to energy: H = γU. From a math point of view, these linear relationships don’t mean all that much: they just show these variables – temperature, energy and enthalphy – are all directly related and, hence, can be defined in terms of each other.

We can invent other variables, like the Gibbs energy, or the Helmholtz energy. In contrast, entropy, while often being mentioned as just some other state function, is something different altogether. In fact, the term ‘state function’ causes a lot of confusion: pressure and volume are state variables too. The term is used to distinguish these variables from so-called process functions, notably heat and work. Process functions describe how we go from one equilibrium state to another, as opposed to the state variables, which describe the equilibrium situation itself. Don’t worry too much about the distinction—for now, that is.

Let’s look at non-linear stuff. The PV = NkT = (γ–1)U says that pressure (P) and volume (V) are inversely proportional one to another, and so that’s a non-linear relationship. [Yes, inverse proportionality is non-linear.] To help you visualize things, I inserted a simple volume-pressure diagram below, which shows how pressure and volume are related for three different values of U (or, what amounts to the same, three different values of T).

The curves are simple hyperbolas which have the x- and y-axis as horizontal and vertical asymptote respectively. If you’ve studied social sciences (like me!) – so if you know a tiny little bit of the ‘dismal science’, i.e. economics (like me!) – you’ll note they look like indifference curves. The x- and y-axis then represent the quantity of some good X and some good Y respectively, and the curves closer to the origin are associated with lower utility. How much X and Y we will buy then, depends on (a) their price and (b) our budget, which we represented by a linear budget line tangent to the curve we can reach with our budget, and then we are a little bit happy, very happy or extremely happy, depending on our budget. Hence, our budget determines our happiness. From a math point of view, however, we can also look at it the other way around: our happiness determines our budget. [Now that‘s a nice one, isn’t it? Think about it! 🙂 And, in the process, think about hyperbolas too: the y = 1/x function holds the key to understanding both infinity and nothingness. :-)]

U is a state function but, as mentioned above, we’ve got quite a few state variables in physics. Entropy, of course, denoted by S—and enthalpy too, denoted by H. Let me remind you of the basics of the entropy concept:

The internal energy U changes because (a) we add or remove some heat from the system (ΔQ), (b) because some work is being done (by the gas on its surroundings or the other way around), or (c) because of both. Using the differential notation, we write: dU = dQ – dW, always. The (differential) work that’s being done is PdV. Hence, we have dU = dQ – PdV.
When transferring heat to a system at a certain temperature, there’s a quantity we refer to as the entropy. Remember that illustration of Feynman’s in my post on entropy: we go from one point to another on the temperature-volume diagram, taking infinitesimally small steps along the curve, and, at each step, an infinitesimal amount of work dW is done, and an infinitesimal amount of entropy dS = dQ/T is being delivered.
The total change in entropy, ΔS, is a line integral: ΔS = ∫_LdQ/T = ∫_LdS.

That’s somewhat tougher to understand than economics, and so that’s why it took me more time to come with terms with it. 🙂 Just go through Feynman’s Lecture on it, or through that post I referenced above. If you don’t want to do that, then just note that, while entropy is a very mysterious concept, it’s deceptively simple from a math point of view: ΔS = ΔQ/T, so the (infinitesimal) change in entropy is, quite simply, the ratio of (1) the (infinitesimal or incremental) amount of heat that is being added or removed as the system goes from one state to another through a reversible process and (2) the temperature at which the heat is being transferred. However, I am not writing this post to discuss entropy once again. I am writing it to give you an idea of the math behind the system.

So dS = dQ/T. Hence, we can re-write dU = dQ – dW as:

dU = TdS – PdV ⇔ dU + d(PV) = TdS – PdV + d(PV)

⇔ d(U + PV) = dH = TdS – PdV + PdV + VdP = TdS + VdP

The U + PV quantity on the left-hand side of the equation is the so-called enthalpy of the system, which I mentioned above. It’s denoted by H indeed, and it’s just another state variable, like energy: same-same but different, as they say in Asia. We encountered it in our previous post also, where we said that chemists prefer to analyze the behavior of substances using temperature and pressure as ‘independent variables’, rather than temperature and volume. Independent variables? What does that mean, exactly?

According to the PV = NkT equation, we only have two independent variables: if we assign some value to two variables, we’ve got a value for the third one. Indeed, remember that other equation we got when we took the total differential of U. We wrote U as U(V, T) and, taking the total differential, we got:

dU = (∂U/∂T)dT + (∂U/∂V)dV

We did not need to add a (∂U/∂P)dP term, because the pressure is determined by the volume and the temperature. We could also have written U = U(P, T) and, therefore, that dU = (∂U/∂T)dT + (∂U/∂P)dP. However, when working with temperature and pressure as the ‘independent’ variables, it’s easier to work with H rather than U. The point to note is that it’s all quite flexible really: we have two independent variables in the system only. The third one (and all of the other variables really, like energy or enthalpy or whatever) depend on the other two. In other words, from a math point of view, we only have two degrees of freedom in the system here: only two variables are actually free to vary. 🙂

Let’s look at that dH = TdS + VdP equation. That’s a differential equation in which not temperature and pressure, but entropy (S) and pressure (P) are ‘independent’ variables, so we write:

dH(S, P) = TdS + VdP

Now, it is not very likely that we will have some problem to solve with data on entropy and pressure. At our level of understanding, any problem that’s likely to come our way will probably come with data on more common variables, such as the heat, the pressure, the temperature, and/or the volume. So we could continue with the expression above but we don’t do that. It makes more sense to re-write the expression substituting TdS for dQ once again, so we get:

dH = dQ + VdP

That resembles our dU = dQ – PdV expression: it just substitutes V for –P. And, yes, you guessed it: it’s because the two expressions resemble each other that we like to work with H now. 🙂 Indeed, we’re talking the same system and the same infinitesimal changes and, therefore, we can use all the formulas we derived already by just substituting H for U, V for –P, and dP for dV. Huh? Yes. It’s a rather tricky substitution. If we switch V for –P (or vice versa) in a partial derivative involving T, we also need to include the minus sign. However, we do not need to include the minus sign when substituting dV and dP, and we also don’t need to change the sign of the partial derivatives of U and H when going from one expression to another! It’s a subtle and somewhat weird point, but a very important one! I’ll explain it in a moment. Just continue to read as for now. Let’s do the substitution using our rules:

dU = (∂Q/∂T)_VdT + [T(∂P/∂T)_V − P]dV becomes:

dH = (∂Q/∂T)_PdT + (∂H/∂P)_TdP = C_PdT + [–T·(∂V/∂T)_P+ V]dP

Note that, just as we referred to (∂Q/∂T)_Vas the specific heat capacity of a substance at constant volume, which we denoted by C_V, we now refer to (∂Q/∂T)_P as the specific heat capacity at constant pressure, which we’ll denote, logically, as C_P. Dropping the subscripts of the partial derivatives, we re-write the expression above as:

dH = C_PdT + [–T·(∂V/∂T)+ V]dP

So we’ve got what we wanted: we switched from an expression involving derivatives assuming constant volume to an expression involving derivatives assuming constant pressure. [In case you wondered what we wanted, this is it: we wanted an equation that helps us to solve another type of problem—another formula for a problem involving a different set of data.]

As mentioned above, it’s good to use subscripts with the partial derivatives to emphasize what changes and what is constant when calculating those partial derivatives but, strictly speaking, it’s not necessary, and you will usually not find the subscripts when googling other texts. For example, in the Wikipedia article on enthalpy, you’ll find the expression written as:

dH = C_PdT + V(1–αT)dP with α = (1/V)(∂V/∂T)

Just write it all out and you’ll find it’s the same thing, exactly. It just introduces another coefficient, α, i.e. the coefficient of (cubic) thermal expansion. If you find this formula is easier to remember, then please use this one. It doesn’t matter.

Now, let’s explain that funny business with the minus signs in the substitution. I’ll do so by going back to that infinitesimal analysis of the reversible cycle in my previous post, in which we had that formula involving ΔQ for the work done by the gas during an infinitesimally small reversible cycle: ΔW = ΔVΔP = ΔQ·(ΔT/T). Now, we can either write that as:

ΔQ = T·(ΔP/ΔT)·ΔV = dQ = T·(∂P/∂T)_V·dV – which is what we did for our analysis of (∂U/∂V)_T– or, alternatively, as
ΔQ = T·(ΔV/ΔT)·ΔP = dQ = T·(∂V/∂T)_P·dP, which is what we’ve got to do here, for our analysis of (∂H/∂P)_T.

Hence, dH = dQ + VdP becomes dH = T·(∂V/∂T)_P·dP + V·dP, and dividing all by dP gives us what we want to get: dH/dP = (∂H/∂P)_T= T·(∂V/∂T)_P+ V.

[…] Well… NO! We don’t have the minus sign in front of T·(∂V/∂T)_P, so we must have done something wrong or, else, that formula above is wrong.

The formula is right (it’s in Wikipedia, so it must be right :-)), so we are wrong. Indeed! The thing is: substituting dT, dV and dP for ΔT, ΔV and ΔP is somewhat tricky. The geometric analysis (illustrated below) makes sense but we need to watch the signs.

We’ve got a volume increase, a temperature drop and, hence, also a pressure drop over the cycle: the volume goes from V to V+ΔV (and then back to V, of course), while the pressure and the temperature go from P to P–ΔP and T to T–ΔT respectively (and then back to P and T, of course). Hence, we should write: ΔV = dV, –ΔT = dT, and –ΔP = dP. Therefore, as we replace the ratio of the infinitesimal change of pressure and temperature, ΔP/ΔT, by a proper derivative (i.e. ∂P/∂T), we should add a minus sign: ΔP/ΔT = –∂P/∂T. Now that gives us what we want: dH/dP = (∂H/∂P)_T= –T·(∂V/∂T)_P+ V, and, therefore, we can, indeed, write what we wrote above:

dU = (∂Q/∂T)_VdT + [T(∂P/∂T)_V − P]dV becomes:

dH = (∂Q/∂T)_PdT + [–T·(∂V/∂T)_P+ V]dP = C_PdT + [–T·(∂V/∂T)_P+ V]dP

Now, in case you still wonder: what’s the use of all these different expressions stating the same? The answer is simple: it depends on the problem and what information we have. Indeed, note that all derivatives we use in our expression for dH expression assume constant pressure, so if we’ve got that kind of data, we’ll use the chemists’ representation of the system. If we’ve got data describing performance at constant volume, we’ll need the physicists’ formulas, which are given in terms of derivatives assuming constant volume. It all looks complicated but, in the end, it’s the same thing: the PV = NkT equation gives us two ‘independent’ variables and one ‘dependent’ variable. Which one is which will determine our approach.

Now, we left one thing unexplained. Why do we refer to γ as the specific heat ratio? The answer is: it is the ratio of the specific heat capacities indeed, so we can write:

γ = C_P/C_V

However, it is important to note that that’s valid for ideal gases only. In that case, we know that the (∂U/∂V)_Pderivative in our dU = (∂U/∂T)_VdT + (∂U/∂V)_TdV expression is zero: we can change the volume, but if the temperature remains the same, the internal energy remains the same. Hence, dU = (∂U/∂T)_VdT = C_VdT, and dU/dT = C_V. Likewise, the (∂H/∂P)_Tderivative in our dH = (∂H/∂T)_PdT + (∂H/∂P)_TdP expression is zero—for ideal gases, that is. Hence, dH = (∂H/∂T)_PdT = C_PdT, and dH/dT = C_P. Hence,

C_P/C_V = (dH/dT)/(dU/dT) = dH/dU

Does that make sense? If dH/dU = γ, then H must be some linear function of U. More specifically, H must be some function H = γU + c, with c some constant (it’s the so-called constant of integration). Now, γ is supposed to be constant too, of course. That’s all perfectly fine: indeed, combining the definition of H (H = U + PV), and using the PV = (γ–1)U relation, we have H = U + (γ–1)U = γU (hence, c = 0). So, yes, dH/dU = γ, and γ = C_P/C_V.

Note the qualifier, however: we’re assuming γ is constant (which does not imply the gas has to be ideal, so the interpretation is less restrictive than you might think it is). If γ is not a constant, it’s a different ballgame. […] So… Is γ actually constant? The illustration below shows γ is not constant for common diatomic gases like hydrogen or (somewhat less common) oxygen. It’s the same for other gases: when mentioning γ, we need to state the temperate at which we measured it too. 😦 However, the illustration also shows the assumption of γ being constant holds fairly well if temperature varies only slightly (like plus or minus 100° C), so that’s OK. 🙂

I told you so: the kinetic gas theory is not quite accurate. An approach involving molecular energy states works much better (and is actually correct, as it’s consistent with quantum theory). But so we are where we are and I’ll save the quantum-theoretical approach for later. 🙂

So… What’s left? Well… If you’d google the Wikipedia article on enthalphy in order to check if I am not writing nonsense, you’ll find it gives γ as the ratio of H and U itself: γ = H/U. That’s not wrong, obviously (γ = H/U = γU/U = γ), but that formula doesn’t really explain why γ is referred to as the specific heat ratio, which is what I wanted to do here.

OK. We’ve covered a lot of ground, but let’s reflect some more. We did not say a lot about entropy, and/or the relation between energy and entropy. Too bad… The relationship between entropy and energy is obviously not so simple as between enthalpy and energy. Indeed, because of that easy H = γU relationship, enthalpy emerges as just some auxiliary variable: some temporary variable we need to calculate something. Entropy is, obviously, something different. Unlike enthalpy, entropy involves very complicated thinking, involving (ir)reversibility and all that. So it’s quite deep, I’d say – but I’ll write more about that later. I think this post has gone as far as it should. 🙂