# The field from a grid

As part of his presentation of indirect methods for finding the field, Feynman presents an interesting argument on the electrostatic field of a grid. It’s just another indirect method to arrive at meaningful conclusions on how a field is supposed to look like, but it’s quite remarkable, and that’s why I am expanding it here. Feynman’s presentation is extremely succint indeed and, hence, I hope the elaboration below will help you to understand it somewhat quicker than I did. 🙂

The grid is shown below: it’s just a uniformly spaced array of parallel wires in a plane. We are looking at the field above the plane of wires here, and the dotted lines represent equipotential surfaces above the grid.

As you can see, for larger distances above the plane, we see a constant electric field, just as though the charge were uniformly spread over a sheet of charge, rather than over a grid. However, as we approach the grid, the field begins to deviate from the uniform field.

Let’s analyze it by assuming the wires lie in the xy-plane, running parallel to the y-axis. The distance between the wires is measured along the x-axis, and the distance to the grid is measured along the z-axis, as shown in the illustration above. We assume the wires are infinitely long and, hence, the electric field does not depend on y. So the component of E in the y-direction is 0, so E= –∂Φ/∂y = 0. Therefore, ∂2Φ/∂y= 0 and our Poisson equation above the wires (where there are no charges) is reduced to ∂2Φ/∂x+ ∂2Φ/∂z=0. What’s next?

Let’s look at the field of two positive wires first. The plot below comes from the Wolfram Demonstrations Project. I recommend you click the link and play with it: you can vary the charges and the distance, and the tool will redraw the equipotentials and the field lines accordingly. It will give you a better feel for the (a)symmetries involved. The equipotential lines are the gray contours: they are cross-sections of equipotential surfaces. The red curves are the field lines, which are always orthogonal to the equipotentials.

The point at the center is really interesting: the straight horizontal and vertical red lines through it are limits really. Feynman’s illustration below shows the point represents an unstable equilibrium: the hollow tube prevents the charge from going sideways. So if it wouldn’t be there, the charge would go sideways, of course! So it’s some kind of saddle point. Onward!

Look at the illustration below and try to imagine how the field looks like by thinking about the value of the potential as you move along one of the two blue lines below: the potential goes down as we move to the right, reaches a minimum in the middle, and then goes up again. Also think about the difference between the lighter and darker blue line: going along the light-blue line, we start at a lower potential, and its minimum will also be lower than that of the dark-blue line.

So you can start drawing curves. However, I have to warn you: the graphs are not so simple. Look at the detail below. The potential along the blue line goes slightly up before it decreases, so the graph of the potential may resemble the green curve on the right of the image. I did an actual calculation here. 🙂 If there are only two charges, the formula for the potential is quite simple: Φ = (1/4πε0)·(q1/r1) + (1/4πε0)·(q2/r2). Briefly forgetting about the (1/4πε0) and equating q1 and q2 to +1, we get Φ = 1/r1 + 1/r= (r1 + r2)/r1r2.  That looks like an easy function, and it is. You should think of it as the equivalent of the 1/r formula, but written as 1/r = r/r2, and with a factor 2 in front because we have two charges. 🙂

However, we need to express it as a function of x, keeping z (i.e. the ‘vertical’ coordinate) constant. That’s what I did to get the graphs below. It’s easy to see that 1/r= (x+ z2)−1/2, while 1/r= [(a−x)+ z2]−1/2. Assuming a = 2 and z = 0.8, the contribution from the first charge is given by the blue curve, the contribution of the second charge is represented by the red curve, and the green curve adds both and, hence, represents the potential generated by both charges, i.e. qat x = 0 and qat x = a. OK… Onward!

The point to note is that we have an extremely simple situation here – two charges only, or two wires, I should say – but a potential function that is surely not some simple sinusoidal function. To drive the point home, I plotted a few more curves below, keeping a at a = 2, but equating z with 0.4, 0.7 and 1.7 respectively. The z = 1.7 curve shows that, at larger distances, the potential actually increases slightly as we move from left to right along the z = 1.7 line. Note the remarkable symmetry of the curves and the equipotential lines: there should be some obvious mathematical explanation for that but, unfortunately, not obvious enough for me to find it, so please let me know if you see it! 🙂

OK. Let’s get back to our grid. For your convenience, I copied it once more below.

Feynman’s approach to calculating the variations is quite original. He also duly notes that the potential function is surely not some simple sinusoidal function. However, he also notes that, when everything is said and done, it is some periodic quantity, in one way or another, and, therefore, we should be able to do a Fourier analysis and express it as a sum of sinusoidal waves. To be precise, we should be able to write Φ(x, z) as a sum of harmonics.

[…] I know. […] Now you say: Oh sh**! And you’ll just turn off. That’s OK, but why don’t you give it a try? I promise to be lengthy. 🙂

Before we get too much into the weeds, let’s briefly recall how it works for our classical guitar string. That post explained how the wavelengths of the harmonics of a string depended on its length. If we denote the various harmonics by their harmonic number n = 1, 2, 3 etcetera, and the length of the string by L, we have λ1 = 2L = (1/1)·2L, λ2 = L = (1/2)·2L, λ3 = (1/3)·2L,… λn = (1/n)·2L. In short, the harmonics – i.e. the components of our waveform – look like this:

etcetera (1/8, 1/9,…,1/n,… 1/∞)

Beautiful, isn’t it? As I explained in that post, it’s so beautiful it triggered a misplaced fascination with harmonic ratios. It was misplaced because the Pythagorean theory was a bit too simple to be true. However, their intuition was right, and they set the stage for guys like Copernicus, Fourier and Feynman, so that was good! 🙂

Now, as you know, we’ll usually substitute wavelength and frequency by wavenumber and angular frequency so as to convert all to something expressed in radians, which we can then use as the argument in the sine and/or cosine component waves. [Yes, the Pythagoreans once again! :-)] The wavenumber k is equal to k = 2π/λ, and the angular frequency is ω = 2π·f = 2π/T (in case you doubt, you can quickly check that the speed of a wave is equal to the product of the wavelength and its frequency by substituting: = λ·= (2π/k)·(ω/2π) = ω/k, which gives you the phase velocity vp= c). To make a long story short, we wrote k = k1 = 2π·1/(2L), k2 = 2π·2/(2L) = 2k, k3 = 2π·3/(2L) = 3k,,… kn = 2π·3/(2L) = nk,… to arrive at the grand result, and that’s our wave F(x) expressed as the sum of an infinite number of simple sinusoids:

F(x) = a1cos(kx) + a2cos(2kx) + a3cos(3kx) + … + ancos(nkx) + … = ∑ ancos(nkx)

That’s easy enough. The problem is to find those amplitudes a1, a2, a3,… of course, but the great French mathematician who gave us the Fourier series also gave us the formulas for that, so we should be fine! Can we use them here? Should we use them here? Let’s see…

The in the analysis, i.e. the spacing of the wires, is the physical quantity that corresponds to the length of our guitar string in our musical sound problem. In fact, a corresponds to 2L, because guitar strings are fixed at two ends and, hence, the two ends have to be nodes and, therefore, the wavelength of our first harmonic is twice the length of the string. Huh? Well… Something like that. As you can see from the illustration of the grid, a, in contrast to L, does correspond to one full wavelength of our periodic function. So we write:

Φ(x) = ∑ ancos(n·k·x) = ∑ ancos(2π·n·x/a) (n = 1, 2, 3,…)

Now, that’s the formula for Φ(x) assuming we’re fixing z, so it’s Φ(x) at some fixed distance from the grid. Let’s think about those amplitudes an now. They should not depend on x, because the harmonics themselves (i.e. the cos(2π·n·x/a) components) are all that varies with x. So they have be some function of n and – most importantlysome function of z also. So we denote them by Fn(z) and re-write the equation above as:

Φ(x, z) = ∑ Fn(z)·cos(2π·n·x/a) (n = 1, 2, 3,…)

Now, the rest of Feynman’s analysis speaks for itself, so I’ll just shamelessly copy it:

What did he find here? What is he saying, really? 🙂 First note that the derivation above has been done for one term in the Fourier sum only, so we’re talking a specific harmonic here. That harmonic is a function of z which – let me remind you – is the distance from the grid. To be precise, the function is Fn(z) = Ane−z/z0. [In case you wonder how Feynman goes from equation (7.43) to (7.44), he’s just solving a second-order linear differential equation here. :-)]

Now, you’ve seen the graph of that function a zillion times before: it starts at Afor z = 0 and goes to zero as z goes to infinity, as shown below. 🙂

Now, that’s the case for all Fn(z) coefficients of course. As Feynman writes:

“We have found that if there is a Fourier component of the field of harmonic n, that component will decrease exponentially with a characteristic distance z= a/2πn. For the first harmonic (n=1), the amplitude falls by the factor e−2π (i.e. a large decrease) each time we increase z by one grid spacing a. The other harmonics fall off even more rapidly as we move away from the grid. We see that if we are only a few times the distance a away from the grid, the field is very nearly uniform, i.e., the oscillating terms are small. There would, of course, always remain the “zero harmonic” field, i.e. Φ= −E0·z, to give the uniform field at large z. Of course, for the complete solution, the sum needs to be made, and the coefficients An would need to be adjusted so that the total sum, when differentiated, gives an electric field that would fit the charge density of the grid wires.”

Phew! Quite something, isn’t it? But that’s it really, and it’s actually simpler than the ‘direct’ calculations of the field that I googled. Those calculations involve complicated series and logs and what have you, to arrive at the same result: the field away from a grid of charged wires is very nearly uniform.

Let me conclude this post by noting Feynman’s explanation of shielding by a screen. It’s quite terse:

“The method we have just developed can be used to explain why electrostatic shielding by means of a screen is often just as good as with a solid metal sheet. Except within a distance from the screen a few times the spacing of the screen wires, the fields inside a closed screen are zero. We see why copper screen—lighter and cheaper than copper sheet—is often used to shield sensitive electrical equipment from external disturbing fields.”

Hmm… So how does that work? The logic should be similar to the logic I explained when discussing shielding in one of my previous posts. Have a look—if only because it’s a lot easier to understand than the rather convoluted business I presented above. 🙂 But then I guess it’s all par for the course, isn’t it? 🙂

# Capacitors

This post briefly explores the properties of capacitors. Why? Well… Just because they’re an element in electric circuits, and so we should try to fully understand how they function so we can understand how electric circuits work. Indeed, we’ll look at some interesting DC and AC circuits in the very near future. 🙂

Feynman introduces condensers − now referred to as capacitors – right from the start, as he explains Maxwell’s fourth equation, which is written as c2×B =  ∂E/∂t + j0 in differential form, but easier to read when integrating over a surface S bounded by a curve C:

The ∂E/∂t term implies that changing electric fields produce magnetic effects (i.e. some circulation of B, i.e. the c2×B on the left-hand side). We need this term because, without it, there could be no currents in circuits that are not complete loops, like the circuit below, which is just a circuit with a capacitor made of two flat plates. The capacitor is charged by a current that flows toward one plate and away from the other. It looks messy because of the complicated drawing: we have a curve C around one of the wires defining two surfaces: S1 is a surface that just fills the loop and, hence, crosses the wire, while S2 is a bowl-shaped surface which passes between the plates of the capacitor (so it does not cross the wire).

If we look at C and S1 only, then the circulation of B around C is explained by the current through the wire, so that’s the j0 term in Maxwell’s equation, which is probably how you understood magnetism during your high-school time. However, no current goes through the S2 surface, so if we look at C and S2 only, we need the ∂E/∂t to explain the magnetic field. Indeed, as Feynman points out, changing the location of an imaginary surface should not change a real magnetic field! 🙂

Let’s look at those charged sheets. For a single sheet of charge, we found two opposite fields of magnitude E = (1/2)·σ/ε0. Now, it is easy to see that we can superimpose the solutions for two parallel sheets with equal and opposite charge densities +σ and −σ, so we get:

between the sheets = σ/ε0 and E outside = 0

Now, actual capacitors are not made of some infinitely thin sheet of charge: they are made of some conductor and, hence, we get that shielding effect and we’re talking surface charge densities +σ and −σ, so the actual picture is more like the one below. Having said that, the formula above is still correct: E is σ/ε0 between the plates, and zero everywhere else (except at the edge, but I’ll talk about that later).

We’re now ready to tackle the first property of a capacitor, and that is its capacity. In fact, the correct term is capacitance, but that sounds rather strange, doesn’t it?

The capacity of a capacitor

We know the two plates are both equipotentials but with different potential, obviously! If we denote these two potentials as Φ1 and Φrespectively, we can define their difference Φ1 − Φ2 as the voltage between the two plates. It’s unit is the same as the unit for potential which, as you may or may not remember, is potential energy per unit charge, so that’s newton·meter/coulomb. [In honor of the guy who invented the first battery, 1 N·m/C is usually referred to as one volt, which – quite annoyingly – is also abbreviated as V, even if the voltage and the volt are two very different things: the volt is the unit of voltage.]

Now, it’s easy to see that the voltage, or potential difference, is the amount of work that’s required to carry one unit charge from one plate to the other. To be precise, because the coulomb is a huge unit − it’s equivalent to the combined charge of some 6.241×1018 protons − we should say that the voltage is the work per unit charge required to carry a small charge from one plate to the other. Hence, if d is the distance between the two plates (as shown in the illustration above), we can write:

Q is the total charge on each plate (so it’s positive on one, and negative on the other), A is the area of each plate, and is the separation between the two plates. What the equation says is that the voltage is proportional to the charge, and the constant of proportionality is d over ε0A. Now, the proportionality between V and Q is there for any two conductors in space (provided we have a plus charge on one, and a minus charge on the other, and so we assume there are no other charges around). Why? It’s just the logic of the superposition of fields: we double the charges, so we double the fields, and so the work done in carrying a unit charge from one point to the other is also doubled! So that’s why the potential difference between any two points is proportional to the charges.

Now, the constant of proportionality is called the capacity or capacitance of the system. In fact, it’s defined as C = Q/V. [Again, it’s a bit of a nuisance the symbol (C) is the same as the symbol that is used for the unit of charge, but don’t worry about it.] To put it simply, the capacitance is the ability of a body to store electric charge. For our parallel-plate condenser, it is equal to C =  ε0A/d. Its unit is coulomb/volt, obviously, but – again in honor of some other guy – it’s referred to as the farad: 1 F = 1 C/V.

To build a fairly high-capacity condenser, one could put waxed paper between sheets of aluminium and roll it up. Sealed in plastic, that made a typical radio-type condenser. The principle used today is still the same. In order to reduce the risk of breakdown (which occurs when the field strength becomes so large that it pulls electrons from the dielectric between the plates, thus causing conduction), higher capacity is generally better, so the voltage developed across the condenser will be smaller. Condensers used to be fairly big, but modern capacitors are actually as small as other computer card components. It’s all interesting stuff, but I won’t elaborate on it here, because I’d rather focus on the physics and the math behind the engineering in this blog. 🙂

Onward! Let’s move to the next thing. Before we do so, however, let me quickly give you the formula for the capacity of a charged sphere (for a parallel-plate capacitor, it’s C = ε0A/d, as noted above): C = 4πε0a. You’ll wonder: where’s the ‘other’ conductor here? Well… When this formula is used, it assumes some imaginary sphere of infinite radius with opposite charge −Q.

The energy of a capacitor

I talked about the energy of fields in various places, most notably my posts on fields and charges. The idea behind is quite simple: if there’s some distribution of charges in space, then we always have some energy in the system, because a certain amount of work was required to bring the charges together. [For the concept of energy itself, please see my post on energy and potential.] Remember that simple formula, and the equally simple illustration:

Also remember what we wrote above: the voltage is the work per unit charge required to carry a small charge from one plate to the other. Now, when charging a conductor, what’s happening is that charge gets transferred from one plate to another indeed, and the work required to transfer a small charge dQ is, obviously, equal to V·dQ. Hence, the change in energy is dU = V·dQ. Now, because V = Q/C, we get dU = (Q/C)·dQ, and integrating this from zero charge to some final charge Q, we get:

U = (1/2)·Q2/C = (1/2)·C·V2

Note how the capacity C, or its inverse 1/C, appears as a a constant of proportionality in both equations. It’s the charge, or the voltage, that’s the variable really, and the formulas say the energy is proportional to the square of the charge, or the voltage. Finally, also note that we immediately get the energy of a charged sphere by substituting C for 4πε0a (see the capacity formula in the previous section):

Now, Feynman applies this energy formula to an interesting range of practical problems, but I’ll refer you to him for that: just click on the link and check it out. 🙂

OK… Next thing. The next thing is to look at the dielectric material inside capacitors.

Dielectrics

You know the dielectric inside a capacitor increases its capacity. In case you wonder what I am talking about: the dielectric is the waxed paper inside of that old-fashioned radio-type condenser, or the oxide layer on the metal foil used in more recent designs. However, before analyzing dielectric, let’s first look at what happens when putting another conductor in-between the plates of our parallel-plate condenser, as shown below.

As a matter of fact, the neutral conductor will also increase the capacitance of our condenser. Now how does that work? It’s because of the induced charges. As I explained in my post on how shielding works, the induced charges reduce the field inside of the conductor to zero. So there is no field inside the (neutral) conductor. The field in the rest of the space is still what it was: σ/ε0, so that’s the surface density of charge (σ) divided by ε0. However, the distance over which we have to integrate to get the potential difference (i.e. the voltage V) is reduced: it’s no longer d but d minus b, as there’s no work involved in moving a charge across a zero field. Hence, instead of writing V = E·d = σ·d/ε0, we now write V = σ·(d−b)/ε0. Hence, the capacity C = Q/V = ε0A/is now equal to C = Q/V = ε0A/(d−b), which we prefer to write as:

Now, because 0 < 1 − b/d < 1, we have a factor (1 − b/d)−1 that is greater than 1. So our capacitor will have greater capacity which, remembering our C = Q/V and U = (1/2)·C·V2, formulas, implies (a) that it will store more charge at the same potential difference (i.e. voltage) and, hence, (a) that it will also store more energy at the same voltage.

Having said that, it’s easy to see that, if there’s air in-between, the risk of the capacitor breaking down will be much more significant. Hence, the use of conducting material to increase the capacitance of a capacitor is not recommended. [The question of how a breakdown actually occurs in a vacuum is an interesting one: the vacuum is expected to undergo electrical breakdown at or near the so-called Schwinger limit. If you want to know more about it, you can read the Wikipedia article on this.]

So what happens when we put a dielectric in-between. It’s illustrated below. The field is reduced but it is not zero, so the positive charge on the surface of the dielectric (look at the gaussian surface S shown by the broken lines) is less than the negative charge on the conductor: in the illustration below, it’s a 1 to 2 ratio.

But what’s happening really? What’s the reality behind? Good question. The illustration above is just a mathematical explanation. It doesn’t tell us anything − nothing at all, really − on the physics of the situation. As Feynman writes:

“The experimental fact is that if we put a piece of insulating material like lucite or glass between the plates, we find that the capacitance is larger. That means, of course, that the voltage is lower for the same charge. But the voltage difference is the integral of the electric field across the capacitor; so we must conclude that inside the capacitor, the electric field is reduced even though the charges on the plates remain unchanged. Now how can that be? Gauss’ Law tells us that the flux of the electric field is directly related to the enclosed charge. Consider the gaussian surface S shown by broken lines. Since the electric field is reduced with the dielectric present, we conclude that the net charge inside the surface must be lower than it would be without the material. There is only one possible conclusion, and that is that there must be positive charges on the surface of the dielectric. Since the field is reduced but is not zero, we would expect this positive charge to be smaller than the negative charge on the conductor. So the phenomena can be explained if we could understand in some way that when a dielectric material is placed in an electric field there is positive charge induced on one surface and negative charge induced on the other.”

Now that’s a mathematical model indeed, based on the formula for the work involved in transferring charge from one plate to the other:

W = ∫ F·ds = ∫qE·d= q·∫E·ds = qV

If your physics classes in high school were any good, you’ve probably seen the illustration above. Having said that, the physical model behind is more complicated, and so let’s have a look at that now.

The key to the whole analysis is the assumption that, inside a dielectric, we have lots of little atomic or molecular dipoles. Feynman presents an atomic model (shown below) but we could also think of highly polar molecules, like water, for instance. [Note, however, that, with water, we’d have a high risk of electrical breakdown once again.]

The micro-model doesn’t matter very much. The whole analysis hinges on the concept of a dipole moment per unit volume. We’ve introduced the concept of the dipole moment tout court in a previous post, but let me remind you: the dipole moment is the product of the distance between two equal but opposite charges q+ and q.

Now, because we’re using the d symbol for the distance between our plates, we’ll use δ for the distance between the two charges. Also note that we usually write the dipole moment as a vector so we keep track of its direction and we can use it in vector equations. To make a long story: p = qδ and, using boldface for vectors, p = qδ. [Please do note that δ is a vector going from the negative to the positive charge, otherwise you won’t understand a thing of what follows.]

As mentioned above, we can have atomic or molecular or whatever other type of dipoles, but what we’re interested in is the dipole moment per unit volume, which we write as:

P = Nqδ, with N the number of dipoles per unit volume.

For rather obvious reasons, P is also often referred to as the polarization vector. […] OK. We’re all set now. We should distinguish two possibilities:

1. P is uniform, i.e. constant, across our sheet of material.
2. P is not uniform, i.e. P varies across the dielectric.

So let’s do the first case first.

1. Uniform P

This assumption gives us the mathematical model of the dielectric almost immediately. Indeed, when everything is said and done, what’s going on here is that the positive/negative charges inside the dielectric have just moved in/out over that distance δ, so at the surface, they have also moved in/out over the very same distance. So the image is effectively the image below, which is equivalent to that mathematical of a dielectric we presented above.

Of course, no analysis is complete without formulas, so let’s see what we need and what we get.

The first thing we need is the surface density of the polarization charge induced on the surface, which was denoted by σpol, as opposed to σfree, which is the surface density on the plates of our capacitor (the subscript ‘free’ refers to the fact that the electrons are supposed to be able to move freely, which is not the case in our dielectric). Now, if A is the area of our surface slabs, and if, for each of the dipoles, we have that q charge, then the illustration above tells us that the total charge in the tiny negative surface slab will be equal to Q = A·δ·q·N. Hence, the surface charge density σpol = Q/A = A·δ·q·N/A = N·δ·q. But N·δ·q is also the definition of P! Hence, σpol = P. [Note that σpol is positive on one side, and negative on the other, of course!]

Now that we have σpol, we can use our E = σ/ε0 formula and add the fields from the dielectric and the capacitor plates respectively. Just think about that gaussian surface S, for example. The field there, taking into account that σpol and σfree have opposite signs, is equal to:

Using our σpol = P identity, we can also write this as E = (σfree−P)/ε0. But what’s P? Well… It’s a property of the material obviously, but then it’s also related to the electric field, of course! For larger E, we can reasonably assume that δ will be larger too (assuming some grid of atoms or molecules, we should obviously not assume a change in N or q) and, hence, dP/dE is supposed to be positive. In fact, it turns out that the relation between E and P is pretty linear, and so we can define some constant of proportionality and write E ≈ kP. Moreover, because the E and P vectors have the same direction, we can actually write E ≈ kP. Now, for historic reasons, we’ll write our k as k = ε0·χ, so we’re singling out our ε0 constant once more and – as usual – we add some gravitas to the analysis by using one of those Greek capital letters (χ is chi). So we have P = ε0·χ·E, and our equation above becomes:

Now, remembering that V = E·d and that the total charge on our capacitor is equal to Q = σfree·A, we get the formula which you may or may not know from your high school physics classes:

So… As Feynman puts it: “We have explained the observed facts. When a parallel-plate capacitor is filled with a dielectric, the capacitance is increased by the factor 1+χ.” The table below gives the values for various materials. As you can see, water’d be a great dielectric… if it wouldn’t be so conducive. 🙂

As for the assumption of linearity between E and P, there’s stuff on the Web on non-linear relationships too, but you can google that yourself. 🙂 Let’s now analyze the second case.

2. Non-uniform P

The analysis for non-uniform polarization is more general, and includes uniform polarization as a special case. To get going with it, Feynman uses an illustration (reproduced below) which is not so evident to interpret. Take your time to study it. The connects, once again, two equal but opposite charges. The P vector points in the same direction as the d vector, obviously, but has a different magnitude, because P is equal to P = Nqd. We also have the normal unit vector n here and an angle θ between the normal and P. Finally, the broken lines represent a tiny imaginary surface. To be precise, it represents, once again, an infinitesimal surface, or a surface element, as Feynman terms it.

Just take your time and think about it. If there’s no field across, then θ = π/2 and our surface disappears. If n and P point in the same direction, then θ = 0 and our surface becomes a tiny rectangle of height d. Feynman uses the illustration above to point out that the charge moved across any surface element is proportional to the component of P that is perpendicular to the surface. Hence, remembering what the vector dot product stands for, and remembering that both σpol as well as P are expressed per unit area, we can write:

σpol = P·n = |P|·|n|·cosθ = P·cosθ

So P·is the normal component of P, i.e. the component of P that’s perpendicular to our infinitesimal surface, and this component gives us the charge that moves across a surface element. [I know… The analysis is everything but easy here… But just hang in and try to get through it.]

Now, while the illustration above, and the formula, show us how some charge moves across the infinitesimal surface to create some surface polarization, it is obvious that it should not result in a net surface charge, because there are equal and opposite contributions from the dielectric on the two sides of the surface. However, having said that, the displacements of the charges do result in some tiny volume charge density, as illustrated below.

Now, I must admit Feynman does not make it easy to intuitively understand what’s going on because the various P vectors are chosen rather randomly, but you should be able to get the idea. P is not uniform indeed. Therefore, the electric field across our dielectric causes the P vectors to have different magnitudes and/or lengths. Now, as mentioned above, to get the total charge that is being displaced out of any volume bound by some surface S, we should look at the normal component of P over the surface S. To be precise, to get the total charge that is being displaced out of the volume V, we should integrate the outward normal component of P over the surface S. Of course, an equal excess charge of the opposite sign will be left behind. So, denoting the net charge inside V by ΔQpol, we write:

Now, you may or may not remember Gauss’ Theorem, which is related but not to be confused with Gauss’ Law (for more details, check one of my previous posts on vector analysis), according to which we can write:

[I know… You’re getting tired, but we’re almost there.] We can look at the net charge inside ΔQpol as an infinite sum of the (surface) charge densities σpol, but then added over the volume V. So we write:

Again, the integral above may not appear to be be very intuitive, but it actually is: we have a formula for the surface density for a surface element – so that’s something two-dimensional – and now we integrate over the volume, so the third spatial dimension comes in. Again, just let it sink in for a while, and you’ll see it all makes sense. In any case, the equalities above imply that:

and, therefore, that

σpol = −· P

You’ll say: so what? Well… It’s a nice result, really. Feynman summarizes it as follows:

“If there is a nonuniform polarization, its divergence gives the net density of charge appearing in the material. We emphasize that this is a perfectly real charge density; we call it “polarization charge” only to remind ourselves how it got there.”

Well… That says it all, I guess. To make sure you understand what’s written here: please note, once again, that the net charge over the whole of the dielectric is and remains zero, obviously!

The only question you may have is if non-uniform polarization is actually relevant. It is. You can google and you’re likely to get a lot of sites relating to multi-layered transducers and piezoelectric materials. 🙂 But, you’re right, that’s perhaps too advanced to talk about here.

Having said that, what I write above may look like too much nitty-gritty, but it isn’t: the formulas are pretty basic, and you need them if you want to advance in physics. In fact, Feynman uses these simple formulas in two more Lectures (Chapter 10 and 11 in Volume II, to be precise) to do some more analyses of real physics. However, as this blog is not meant to be a substitute for his Lectures, I’ll refer to him for further reading. At the very least, you have the basics here, and I hope it was interesting enough to induce you to look at the mentioned Lectures yourself. 🙂

# The method of images

In my previous post, I mentioned the so-called method of images, but didn’t elaborate much. Let’s recall the problem. As you know, the whole subject of electrostatics is governed by one equation: the so-called Poisson equation:

2Φ = ∂2Φ/∂x2 + ∂2Φ/∂x2 + ∂2Φ/∂x2 = −ρ/ε0

We get this equation by combining Maxwell’s first law (·Φ = −ρ/ε0) and the E = −Φ formula. Now, if we know the distribution of charges, then we don’t need that Poisson equation: we can calculate the potential at every point – denoted by (1) below – using the following formulas:

And if we have Φ, we have E, because E = –Φ. But, in most actual situations, we don’t know the charge distribution, and then we need to work with that Poisson equation. Of course, you’ll say: if you don’t know the charge distribution, then you don’t know the ρ in the equation, and so what use is it really?

The answer is: most problems will involve conductors, and we do know that their surface is an equipotential surface. We also know that the electric field just outside the surface must be normal to the surface. Let’s take the example of the grounded conducting sheet once again, as depicted below. We know the image charge and the field lines on the left-hand side are not there. In fact, because the sheet is grounded, there is no net charge on it, and the conductor acts as a shield.

We do have a real field on the right-hand side though, and it’s exactly the same as that of a dipole: we only need to cross out the left-hand half of the picture. What charges are responsible for it? It surely cannot be the lone +q charge alone, and it’s isn’t: we also have induced local charges on the sheet. Indeed, the positive charge will attract negative charges to the surface and, hence, while the sheet as a whole is neutral (so it has no net charge), the surface charge density is not zero. We can calculate it. How? It’s quite complicated, but let’s give it a try.

Look at the detail below. Let’s forget about the induced charges for a while, and analyze the field produced by the positive charge in the absence of induced charges, so that’s the E field at point P. The magnitude of its normal component is En+ = E·cosθ, with θ the angle between the two vectors.

θ is an angle of a rectangular triangle, and it’s easy to see that cosθ is equal to a/(a2 + ρ2)1/2. Now, Coulomb’s Law tells us that E = (1/4πε0)·q/[(a2 + ρ2)1/2]= (1/4πε0)·q/(a2 + ρ2). Hence, we can write:

En+ = (1/4πε0a·q/(a2 + ρ2)3/2

[A quick note on the symbols used here: we use ρ (rho) to denote a distance here. That’s somewhat confusing because it usually denotes a volume density. However, we’re interested in a surface density here, for which the σ (sigma) symbol is used. So don’t worry about it. Just note that ρ is some distance here, instead of a charge density.]

Now we know that the induced charges will arrange themselves in such way that the addition of their field makes the field at P look like there was a negative charge of the same magnitude as q at the other side of the sheet. If there was such charge −q, then we could do the same analysis, as shown below. It’s easy to see that the component of the imaginary field along the sheet (i.e. the component that’s perpendicular to the normal) cancels the actual component along the shield of the field created by +q, while its normal component adds to the normal component of the +q field. To make a long story short, the actual field at P is equal to E(ρ) = (1/4πε0)·2a·q/(a2 + ρ2)3/2, and it has two components of strength (1/4πε0a·q/(a2 + ρ2)3/2.

To put it differently, the actual field can be thought as two parts: (1) the (normal) component of the field caused by + q, and (2) the field caused by the surface charge density σ at P, which we denote as σ(ρ). Let’s see what we can do with this.

The analysis of the field of a sheet of charge on a conductor is quite complicated, and not quite like the analysis of just a sheet of charge. The analysis for just a sheet of charge was based on the theoretical situation depicted below. We imagined some box with two Gaussian surfaces of area A, and we then used Gauss’ Law to deduce that, if σ was the charge per unit area (i.e. the surface density), the total flux out of the box should be equal to EA + EA = σA/ε0 and, hence, E = (1/2)·σ/ε0. The illustration below shows we should think of two fields with opposite direction, and with a magnitude of (1/2)·σ/ε0 each.

That’s simple enough. However, a sheet of charge on a conductor produces a different field, as shown below. Because of the shielding effect, we have flux on one side of the box only, and the field strength of this flux is σ/ε0, so that’s two times the (1/2)·σ/ε0 magnitude described above. However, as mentioned, it’s zero on the other side, i.e. the inside of the conductor shown below.

So what happens here? The charges in the neighborhood of a point P on the surface actually do produce a local field (Elocal), both inside and outside of the surface, which respects the Elocal = (1/2)·σ/2ε0 equality, but all the rest of the charges on the conductor “conspire” to produce an additional field at the point P, which also produces two fields, again with opposite direction and with a magnitude of (1/2)·σ/ε0 each. So the net result is that the total field inside goes to zero, and the field outside is equal to E = σ/ε0, so E = 2·Elocal. Note that the example above assumes a positively charged conductor: if the charge on the conductor would be negative, the direction of the field would be inwards, but we’d still have a field on and outside of the surface only.

I know you’ve switched off already but − just in case you didn’t − what equality should we use to find σ in this case, i.e. the grounded sheet with no net charge on it but with some (negative) surface charge density. Well… We’re talking a surface density, and a conductor, and, therefore, I would think it’s the E = σ/ε0, i.e. the formula for a charged sheet on a conductor. So we write:

E = σ(ρ)/ε0 ⇔ σ(ρ) = ε0E

But what E do we take to continue our calculation? The whole field or (1/4πε0a·q/(a2 + ρ2)3/2 only? The analysis above may make you think that we should take (1/4πε0a·q/(a2 + ρ2)3/2 only, so that’s the component that’s related to the imaginary charge only, but… No! We’re talking one actual field here, which is produced by the positive charge as well as by the induced charges. So we should not cut it for the purpose of calculating σ(ρ)! So the grand result is:

σ(ρ) = ε0E = (1/4π)·2a·q/(a2 + ρ2)3/2

The shape of this function should not surprise us: it’s shown below for some different values of q (1 and 2 respectively) and a (1, 2 and 3 respectively).

How do we know our solution is correct? We can check it: if we integrate σ over the whole surface, we should find that the total induced charge is equal to −q. So… Well… I’ll let you do that. Feynman also notes the induced charges should exert a force on our point charge, which we can calculating the force between the surface charges and the charge. It’s again an integral, and it should be equal to

Lo and behold! The force acting on the positive charge is exactly the same as it would be with the negative image charge instead of the plate. Why? Well… Because the fields are the same!

The results we obtained are quite wonderful! Indeed, we said we did not know the charge distribution, and so we used a very different method to find the field: the method of images, which consists of computing the field due to q and some imaginary point charge –q somewhere else. Feynman summarizes the method of images as follows:

“The point charge we “imagine” existing behind the conducting surface is called an image charge. In books you can find long lists of solutions for hyperbolic-shaped conductors and other complicated looking things, and you wonder how anyone ever solved these terrible shapes. They were solved backwards! Someone solved a simple problem with given charges. He then saw that some equipotential surface showed up in a new shape, and he wrote a paper in which he pointed out that the field outside that particular shape can be described in a certain way.”

However, as you can see, the method is actually quite powerful, because we got a substantial bonus here: we calculated the field indeed, but then we could also calculate the charge distribution afterwards, so we got it all! Let’s see if we master the topic by looking at some other applications of the method of images.

Point charges near conducting spheres

For a grounded conducting sphere, we get the result shown below: the point charge q will induce charges on it whose fields are those of an image charge q’ = −aq/b placed at the point below.

You can check the details in Feynman’s Lecture on it, in which you will also find a more general formula for spheres that are not at zero potential. The more general formula involves a third charge q” at the center of the sphere, with charge q” = −q’ = aq/b.

Again, we’ll have a force of attraction between the sphere and the point charge, even if the net charge on the sphere is zero, because it’s grounded. Indeed, the positive charge q attracts negative charges to the side closer to itself and, hence, leaves positive charges on the surface of the far side. As the attraction by the negative charges exceeds the repulsion from the positive charges, we end up with some net attraction. Feynman leaves us with an interesting challenge here:

“Those who were entertained in childhood by the baking powder box which has on its label a picture of a baking powder box which has on its label a picture of a baking powder box which has … may be interested in the following problem. Two equal spheres, one with a total charge of +Q and the other with a total charge of −Q, are placed at some distance from each other. What is the force between them? The problem can be solved with an infinite number of images. One first approximates each sphere by a charge at its center. These charges will have image charges in the other sphere. The image charges will have images, etc., etc., etc. The solution is like the picture on the box of baking powder—and it converges pretty fast.”

Well… I’ll leave it to you to take up that challenge. 🙂

Direct and indirect methods

Let me end this post by noting that I started out with that Poisson equation, but that I actually didn’t use it. Having said that, this method of images did result in some solutions for it. It is what Feynman calls an indirect method of solving some problems, and he writes the following on it:

“If the problem to be solved does not belong to the class of problems for which we can construct solutions by the indirect method, we are forced to solve the problem by a more direct method. The mathematical problem of the direct method is the solution of Laplace’s equation ∇2Φ = 0 subject to the condition that Φ is a suitable constant on certain boundaries—the surfaces of the conductors. [Note that Laplace’s equation is Poisson’s equation with a zero on the right-hand side.] Problems which involve the solution of a differential field equation subject to certain boundary conditions are called boundary-value problems. They have been the object of considerable mathematical study. In the case of conductors having complicated shapes, there are no general analytical methods. Even such a simple problem as that of a charged cylindrical metal can closed at both ends—a beer can—presents formidable mathematical difficulties. It can be solved only approximately, using numerical methods. The only general methods of solution are numerical.”

Well… That says it all, I guess. There are other indirect methods, i.e. other than the method of images, but I won’t present these here. I may write something about it in some other post, perhaps. 🙂

# The electric field in various circumstances

This post summarizes two of what may well be Feynman’s most tedious Lectures. Their title is the same: the electric field “in various circumstances.” At first, I wanted to skip them, but then I found some unifying principle: the fields involved are all quite simple. In fact, except in chapter seven, it’s only about (a) the field of a single charge and (b) the field of a so-called dipole, i.e. the field of two opposite charges next to each other. Both are depicted below, and the dipole field can actually be derived by adding the fields of the two single charges.

So… In a way, these two Lectures are just a bunch of formulas repeating the same thing over and over again. The thing to remember is that a complicated but neutral mess of charges will also create a dipole field and, if that mess would not be neutral as a whole, then the field of our lump of charge will look like that of a point charge, provided we look at it from a large enough distance (i.e. a distance that is large relative to the separation of the elementary charges involved). So the situation we’re looking at, is the one depicted below, which is really quite general.

Before going into the nitty-gritty, it is probably good to review one of the points I made in my previous post: the field inside of a spherical shell of charge (like the one below) is zero everywhere, i.e. for any point P inside the shell.

This has nothing to do with the phenomenon of shielding, which is a consequence of free electrons re-arranging themselves so as to cancel the field inside. If we’d be able to build the cage below from protons only, so we’d have a fixed distribution of charges, the inside would not be shielded from the external electrical field. [Credit for the animation must go to Wikipedia.]

Because of the symmetry of the situation, however, the field inside a rectangular, fixed and uniform distribution of charges would also be zero. Let me quickly go over the math for the example of the spherical shell. The randomly chosen point P defines small cones extending to the surface of the sphere, with their apex at P and cutting out some surface area Δa. In the illustration above, we have two symmetrical cones defining two surfaces Δa1 and Δa2 respectively. It is easy to see that:

Δa2/Δa1 = r22/r12

Note that r22/r12 is equal to (r2/r1)but that (r2/r1)is not equal to r2/r1. The square matters, and the square of a ratio is different than the ratio itself! In fact, it’s because of the inverse square law that the fields cancel exactly. Indeed, if the surface of the sphere is uniformly charged (which is the key assumption here), then the charge Δq on each of the area elements will be proportional to the area, so Δq2/Δq1 = Δa2/Δa1. Now, Coulomb’s Law also says that the magnitudes of the fields produced at P by these two surface elements are in the ratio of:

Huh? Yes. E2/E1 = (Δa2/Δa1)·(r12/ r22) = (Δa2/Δa1)·(Δa1/Δa2) = 1, according to the above. So… Yes, the fields cancel exactly, and because all parts of the surface can be paired off in the same way, the total field at P is zero, indeed! But what if we’d put a charge with equal sign at the center? Logic dictates the shell would balance it at the center. Hence, Feynman’s statement that a charge in an electrostatic field in free space can only be in equilibrium if there are mechanical constraints − as illustrated below – is false, and – I should add – the whole argument that follows has no relevance whatsoever for the quantum-mechanical model of an atom. But that’s a somewhat separate story which I’ll touch upon at the end of this post. Let me get back to the dipole problem.

Dipole fields

The model of a dipole is illustrated below. We have two opposite charges separated by a distance d. The so-called dipole moment is defined as p = q·d, and we also have an associated vector p, whose magnitude is p (so that’s the product of q and d) and whose direction is that of the dipole axis from −q to +q. We could also define a vector d and write p as p = q·d. Just think about it. I am sure you’ll figure it out. 🙂

Now, Feynman derives the formula for the dipole potential in various ways—first in an easy way, and then in a not-so-easy way. 🙂 The not-so-easy way is the most interesting—in this case, that is! He first notes the general formula for the potential of some point charge q at the origin at some point P = (x, y, z). You’ve seen that before: it’s Φ= q/r. [Forget about the constant of proportionality (I mean that 1/4πε0 factor in Coulomb’s Law) for a while. We can stick it back in at the end of the argument.] What it says, is that, while the field follows an inverse square law, the potential has a 1/r dependence only (so when you double the distance, you halve the potential). Now, if we’d move the charge q along the z-axis, up a distance Δz, then the potential at P will change a little, by, say ΔΦ+. How much exactly? Well, Feynman notes that “it is just the amount that the potential would change if we were to leave the charge at the origin and move P downward by the same distance Δz.” His illustration below, and the associated formula below, speak for themselves:

Now I’ll refer you to Feynman itself for the detail of the whole argument. The bottom line is that he gets the following formula for the dipole potential:

Φ = −p·φ0

We have a vector dot product here of that dipole vector we defined above (p) and the gradient of φ0, which is the potential of a unit point of charge: φ0 = 1/4πε0r. So what? Well… We can re-write this as:

Φ = −(1/4πε0)p·(1/r)

Isn’t that great? For point charges, we have a field that’s the gradient of a potential that has a 1/r dependence, but so… Well… Here we have the potential of a dipole that’s the gradient of… Well… Just a number that has a 1/r dependence. 🙂

It explains why the dipole field E = −Φ varies inversely not as the square but as the cube of the distance from a dipole. I could give you the formula for E but, again, I don’t want to copy all of Feynman here and so I’ll just assume you believe me. Let me just wrap up in this section with the graph of the electric field, and note how the field vector E can be analyzed as the sum of a transverse component (i.e. the component in the x-y plane) and its component along the dipole axis (i.e. the component along the z-axis).

The dipole field of a lump of charges

The only thing that’s left is to define the p vector for a lump (or a mess as Feynman calls it) of charges. Note that the lump should not be neutral: if it is, then it will look like a point charge from a distance. But if it’s not neutral, then its field will be a dipole field. So the same formula applies but p is defined as p = ∑qidi. I copy the illustration above below so you can see what is what. 🙂

So… Is that it? Well… Yes. And… Well… No. All of the above assumes we know the charge distribution from the start. If we do, then my little summary above pretty much covers the whole subject. 🙂 However, we’ll often be talking some conductor with some total charge Q, without being able to say where the charges are, exactly. All that we know is that they will be spread out on the surface in some way.

Now… Well… That’s not quite exact. We also know they will distribute themselves so that the potential of the surface is constant, and that helps us some practical problems at least. What problems? Well… The problem of finding the field of charged conductors, which is the second topic that Feynman deals with in his two Lectures on the field “in various circumstances.”

However, that story risks becoming as tedious as Feynman’s Lectures on it, and so I’d rather not copy him here. Just look at the following illustrations. The first one gives the field lines and equipotentials for two point charges once again. It highlights two equipotentials in particular: A and B. Now look at the second illustration: we have a curved conductor with a given potential near a point charge and – lo and behold! – the field looks the same: we replace A by the surface of our conductor and all the rest vanishes. In fact, the illustration we could just put an imaginary point charge q at a suitable point and get the same field.

Now that’s what’s referred to as the method of images, and it’s illustrated in the third graph, where we have an “image charge” indeed. We see the equipotential halfway between the two charges which, in this case, is grounded conducting sheet. Why grounded? Because the plane had zero potential in our dipole field, as it was halfway between the two charges indeed.

CapitoNo?

Well… It doesn’t matter all that much. This is, indeed, the really boring stuff one just has to grind through in order to understand the next thing, which is hopefully somewhat more exciting.

Because you’re interested in physics, you probably know a thing or two about those quadrupole magnets used to focus particles beams in accelerators. They’re also referred to as lenses. The illustration below is the field of a quadrupole electric field, but a quadrupole magnetic field looks the same.

The point is: these lenses focus in one direction and, hence, in an actual accelerator or cyclotron, the Q-magnets will be arranged so as to alternately focus horizontally and vertically. Why can’t we build magnets so as to focus electric or magnetically charged particles simultaneously in two directions?

Well… It would require a tube built of protons, or electrons, in a stable configuration. We can’t do that. Technology just isn’t ready for it: we’re not able to build stable tubes of protons, or of electrons. 🙂 So the so-called Theorem of Earnshaw is still valid. Earnshaw’s Theorem says just that: simultaneous focusing in two directions at once is impossible. It applies to classical inverse-square law forces, such as the electric and gravitational force, but also the magnetic forces created by permanent magnets.

However, the theorem is subject to constraints, and these constraints can be exploited to create very interesting exceptions, like magnetic levitation. I warmly recommend the link. 🙂

# The electric field in (and from) a conductor

This is just a quick post to answer a question of my 16-year old son, Vincent: why are we safe in a car when lightning strikes? What’s the Faraday effect really?

He wants to become an engineer, and so I told him what I knew: the electric charges reside at the surface of a conductor and, therefore, a fully-enclosed, all-metallic vehicle is safe. One should just not touch the interior metallic areas, surely not during the strike, but also not after the strike. Why? Because there may still be some residual charge left on the vehicle, even if the metal frame should direct all lightning currents to the ground.

Through the rubber of the tyres? Yes. In fact, it’s the rubber and other insulators that explain why some residual charge might be left. Indeed, the common assumption that, somehow, it’s the rubber that protects the occupants of a car (or that, somehow, rubber soles would insulate us in an electric storm and, hence, less likely to get hit) is ridiculous—completely false, really! The following quote from the US National Weather Service is clear enough on that:

“While rubber is an electric insulator, it’s only effective to a certain point. The average lightning bolt carries about 30,000 amps of charge, has 100 million volts of electric potential, and is about 50,000°F. These amounts are several orders of magnitude higher than what humans use on a daily basis and can burn through any insulator—even the ceramic insulators on power lines! Besides, the lightning bolt may just have traveled many miles through the atmosphere, which is a good insulator. Half an inch (or less) of rubber will make no difference.”

So that’s what I told him—sort of. However, I felt my answer (which I tried to get across as I was driving the car, in fact) was superficial and incomplete. So…

Vincent, here’s the full answer! I promise, no integrals or complex numbers. At the same time, it will be not so easy as the physics you learned in school, because I want to teach you something new. 🙂 Just try it. What I want to explain to you is Gauss’ Law. If you manage to go through it, you’ll know all you need to know about electrostatics, and it will make your first undergrad year a lot easier. [Especially that vector equation, as I always felt my math teacher never told me what a vector really was: it’s something physical. :-)]

Forces and fields

You’ve surely seen Coulomb’s Law:

F = ke·(q1q2)·(1/r212)

The ke factor is Coulomb’s constant: it is just a constant of proportionality, so it’s there to make the units come out alright. Indeed, Coulomb’s formula is simple enough: it says that the force is directly proportional to the amount of charge and inversely proportional to the square of the distance. That’s all. However, the units in which we measure stuff are not necessarily compatible: we measure distance in meter, electric charge in coulomb, and force in newton. So, if we’d define the newton as the force between two charges of one coulomb separated by a distance of one meter, then we wouldn’t need to put that kfactor there. But the newton has another definition: one newton is the force needed to accelerate 1 kg at a rate of 1 m/s per second.

Coulomb’s constant is usually written as k= 1/4πε0 factor in more serious textbooks. Why? Well… You can read my note at the end of this post, but it doesn’t matter right now. It’s much more important to try to understand the vector form of Coulomb’s Law, which is written as:

I used boldface to denote F1 and F2 because they are force vectors. Vectors are physical ‘quantities’ with a magnitude (denoted by F1 and F2, so no boldface here) and a direction. That direction is given by the unit vector e12 in the equation: it’s a unit vector (so its length is one) from q2 to q1. Read again: from q2 to q1, not from q2 to q1. It’s important to get this one thing right, otherwise you’ll make a mess of the signs. Indeed, in the example below, q1 and q2 have the same sign (+) but their sign may differ (so we have a plus and a minus), and the formula above should still work. Check it yourself by doing the drawing for opposite charges.

In fact, my drawing above has a small mistake: Fis the same as Fbut I forgot to put the minus sign: the force on q2 is F= –F1. It’s the action = reaction principle, really.

OK. That’s clear. Now you need to learn about the concept of a field: the field is the force per unit charge. So the field at q1, or the field at point (1), is the force on q1 divided by q1. For example, if q1 is three Coulomb, we divide by three. More in general, we write:

So now you know what the field vector E stands for: it is the force on a unit charge we would place in the field. To be clear, a unit charge is +1 unit. We can measure it in coulomb, or the proton charge, or the charge of a quark, or in whatever unit we want, but we’ve been using coulomb so far so let’s stick to that. Just in case you wonder: one coulomb is the charge of approximately 6.241×1018 protons, so… Yes. That’s quite a lot. 🙂

OK. Next thing.

Gauss’ Law

The field is real. We don’t have to put any charge there. The field is there, and it has energy. [There’s a formula for the energy, but I won’t bother you with that here, because we don’t need it.] The magnitude of the electric field, i.e. the field strength E = |E|, is measured in newton (N) per coulomb (C), so in N/C. In physics, we’ll multiply the field strength with a surface area so we get the so-called flux of the field, which is measured in (N/C)·m2. The illustration below (which I took from Feynman’s Lectures) is just as good as any. In fact, we have several surfaces here: we have a closed surface S with several faces, including surface a and b, which are spherical surfaces. The other surfaces of this box are so-called radial faces. The E field coming out of the charge is like a flow, and so the flow going through face a is the same as the flow going through face b: the face is larger, but the field strength is less.

It is easy to show that the net flux is zero: Coulomb’s Law tells us that the magnitude of E decreases as 1/r2 while, from our geometry classes, we know that the surface area increases as r2, so their product is the same. So, if the surface area of a is Δa, and the surface area of b is Δb, then Ea·Δa = Eb·Δb and so the net flux through the box is equal to Eb·Δb − Ea·Δa = 0. So the flux of E into face a is just cancelled by the flux out of face b. Needless to say, there is no flux through the radial surfaces. Why? Because the electric force is a radial force.

OK. Let’s look at a more complicated situation:

When calculating the flux through a surface, we need to take the component of E that is normal to the surface, so that’s En = E·n = |E|·|ncosθ = |Ecosθ. I am sure you’ve seen that much in your math classes: n is the so-called normal vector, so its length is one and it’s perpendicular to the surface. In any case, the point is: the net flux through this closed surface will still be zero.

Now it’s time for the Big Move. Look at the volume enclosed by the surface S below: we can think of it as completely made up of infinitesimal truncated cones and, for each of these cones, the flux of E from one end of each conical segment will be equal and opposite to the flux from the other end. So the total net flux from the surface S is still zero!

So we have a very general result here:

The (net) flux out of a volume that has no charge(s) in it is zero, always!

You’ll say: so what? Well… It’s a most remarkable result, really. First, it’s not what you’d expect intuitively, and, second, we can now use a clever trick to calculate the flux out of a volume that has some charge(s) in it. Let’s be clever about it. Look at the surface S below: it’s got a point charge q in it. Now we imagine another surface S’ around it: we imagine a little sphere centered on the charge.

From Coulomb’s Law, we know that, if the radius of our little sphere is equal to r, then the field strength E, everywhere on its surface, is equal to:

From your geometry class, you also know that the surface of a sphere is equal to 4πr2, so the flux from the surface of our little sphere is just the product of the field and the surface, so we write:

Now, the nice thing is that we can generalize this result for many charges, or for charge distributions, because we can simply add the fields for each of them: EE+ EE+ … That gives us Gauss’ Law:

The flux from any closed surface S = Qinside0

Qinside is, obviously, the sum of the charges inside the volume enclosed by the surface.

OK. That’s Gauss’ Law. Let’s go back to our car. 🙂

The field in (and from) a conductor

An electrical conductor is a solid that contains many free electrons. Free electrons can move freely around, but cannot leave the surface. When we charge a conductor, the electrons will move around until they have arranged themselves to produce a zero electric field everywhere inside the conductor. It’s the corollary of Gauss’ Law: the (net) flux out of a volume that has no charge(s) in it is zero, always! And so the electrons will arrange themselves in order to make sure that happens.

Think about the dynamics of the situation: as long as there’s some field inside, the charges will keep moving. Fortunately (especially if you’re in a car or a plane hit by lightning!), the re-arrangement happens in a fraction of a second. Hence, if we have some kind of shell, then the field everywhere inside of the shell will be zero, always. In addition, when we charge a conductor, the electrons will push each other away and try to spread as much as possible, so they will reside at the surface of the conductor. In fact, the excess charge of any conductor is, on the average, within one or two atomic layers of the surface only. The situation is illustrated below:

Let me sum up the main conclusions:

1. The electric field inside the conductor (E1) is zero. In other words, if a cavity is completely enclosed by a conductor, no distribution of charges outside can ever produce any field inside. But no field is no force, so that’s how the shielding really works!
2. The electric field just outside the surface of a conductor (E2) is normal to the surface. There can be no tangential component. If there were a tangential component, the electrons would move along the surface until it was gone.

To be fully complete, the formula for the field just outside the surface of the conductor is E = σ/ε0, where σ is the local surface charge density. That local surface charge density can be quite high, of course, especially when lightning is involved—but it works! You’re safe in a car!

There’s one more point. You may think that you’ve seen that E = σ/ε0 formula before: it’s the formula for the field from a charged sheet, which is easy to calculate from Gauss’ Law. Indeed, if we look at some imaginary rectangular box that cuts through the sheet, as shown below (it’s referred to as a Gaussian surface), then the total flux is, once again, the field times the area. Now, if the charge density (so the charge per unit area) is ρ, then the total charge enclosed in the box is σA. So the flux, on each side of the sheet, must be equal to E·A = σA/ε0, from which we get: E = σ/ε0. But so we have a field left and right. For our conductor, we only have the E = σ/ε0 field outside. So how does it work really?

We only have a field outside the conductor – and, hence, no field inside – because the charges in the immediate neighborhood of a point P on the surface will arrange themselves in such a way so as to produce a field that neutralizes the E = σ/ε0 field we’d expect on the inside. So we have ‘other charges’ here that come into play. The mechanics behind are similar to the mechanics behind the polarization phenomenon. If we have a negative charge density on the surface, we’ll have a positive charge density in the layer below. However, it’s quite complicated and, to analyze it properly, we’d need to analyze the electric properties of matter in more detail, which we won’t do here.

So… When everything is said and done, the phenomenon of ‘shielding’ is extremely complex indeed: it’s all about charges arranging themselves in patterns, and the result is truly remarkable: the fields on the two sides of a closed conducting shell are completely independent—zero on the inside, and E = σ/ε0 on the outside, with σ the local surface charge density. And it also works the other way around: if we’d have some distribution of charges inside of a closed conductor, those charges would not produce any field outside. So shielding works both ways!

Some closing remarks

A car is not a sphere. Some surfaces may have points or sharp ends, like the object sketched below. Again, the charges will try to spread out as much as possible on the surface, and the tip of a sharp point is as far away as it is possible from most of the surface. Therefore, we should expect the surface density to be very high there. Now, a high charge density means a high field just outside. In fact, if the electric field is too great, air will break down, so we get a discharge. As Feynman explains it:

“Air will break down if the electric field is too great. What happens is that a loose charge (electron, or ion) somewhere in the air is accelerated by the field, and if the field is very great, the charge can pick up enough speed before it hits another atom to be able to knock an electron off that atom. As a result, more and more ions are produced. Their motion constitutes a discharge, or spark. If you want to charge an object to a high potential and not have it discharge itself by sparks in the air, you must be sure that the surface is smooth, so that there is no place where the field is abnormally large.”

It explains why lightning is attracted to pointy objects, so you should stay away from them.

What about planes and lightning? Well… There’s a nice article on that on the Scientific American website. Let me quote a paragraph that sort of sums up what actually happens:

“Although passengers and crew may see a flash and hear a loud noise if lightning strikes their plane, nothing serious should happen because of the careful lightning protection engineered into the aircraft and its sensitive components. Initially, the lightning will attach to an extremity such as the nose or wing tip. The airplane then flies through the lightning flash, which reattaches itself to the fuselage at other locations while the airplane is in the electric “circuit” between the cloud regions of opposite polarity. The current will travel through the conductive exterior skin and structures of the aircraft and exit off some other extremity, such as the tail. Pilots occasionally report temporary flickering of lights or short-lived interference with instruments.”

One more thing perhaps: isn’t incredible that, even when lightning goes through a car or a plane, it’s only the surface that’s being affected? I mean… It’s fairly easy to see the equilibrium situation, which has the charges on the surface only. But what about the dynamics indeed? 30,000 amps, 100 million volts, and 25,000 to 30,000 degrees Celsius… As lightning strikes, that must go everywhere, no? Well… Yes and no. If there are pointy objects, lightning will effectively burn through them. For an example of the damage of lightning on the nose of an airplane, click this link. 🙂 But then… Well… Let me copy Feynman as he introduces the electric force:

“Consider a force like gravitation which varies predominantly inversely as the square of the distance, but which is about a billion-billion-billion-billion times stronger. And with another difference. There are two kinds of “matter,” which we can call positive and negative. Like kinds repel and unlike kinds attract—unlike gravity where there is only attraction. What would happen? A bunch of positives would repel with an enormous force and spread out in all directions. A bunch of negatives would do the same.”

So that’s what happens. The charges spread out, in a fraction of a second, all away from each other, and so they stay on the surface only, because that’s as far away as they can get from each other. As mentioned above, we’re talking atomic or molecular layers really, so they don’t penetrate, despite the incredible charges and voltages involved. Let me continue the quote—just to illustrate the strength of the forces involved:

“But an evenly mixed bunch of positives and negatives would do something completely different. The opposite pieces would be pulled together by the enormous attractions. The net result would be that the terrific forces would balance themselves out almost perfectly, by forming tight, fine mixtures of the positive and the negative, and between two separate bunches of such mixtures there would be practically no attraction or repulsion at all. […] There is such a force: the electrical force. And all matter is a mixture of positive protons and negative electrons which are attracting and repelling with this great force. So perfect is the balance, however, that when you stand near someone else you don’t feel any force at all. If there were even a little bit of unbalance you would know it. If you were standing at arm’s length from someone and each of you had one percent more electrons than protons, the repelling force would be incredible. How great? Enough to lift the Empire State Building? No! To lift Mount Everest? No! The repulsion would be enough to lift a “weight” equal to that of the entire earth!”

So… Well… That’s it. I’ll close this post with the promised note on Coulomb’s constant and the electric constant, but it’s just an addendum, so you don’t have to read it if you don’t feel like it, Vincent. 🙂

Addendum: Coulomb’s constant and the electric constant

The ke = 1/4πε0 factor in Coulomb’s Law is just a constant of proportionality. Coulomb’s formula is simple enough – it says that the force is directly proportional to the amount of charge and inversely proportional to the square of the distance – but it would be a miracle if the units came out alright, wouldn’t it? Indeed, we measure distance in meter, charge in coulomb, and force in newton. Now, we could re-define one of those units so as to get rid of the 1/4πε0 factor, but so that’s not what we’re going to do. Why not? First, the constant of proportionality depends on the medium. Indeed, εis the so-called permittivity in a vacuum, so that’s in empty space. The constant of proportionality will be different in a gas, and it will be different for different gases and different temperatures and at different pressure. You can check it online if you want – just click the link here for some examples – but I guess you’ll believe me. So, if we write 1/4πε instead of ke then we can put in a different ε for each medium and our formula is still OK.

Now, because you’re a smart kid, you’ll say that doesn’t quite answer the question: why do we write is as 1/4πε? Why don’t we simply write μ instead of 1/4πε, or just k or a or something? Well… There is an answer to that, but it’s complicated. First, the μ and μ0 symbols are already used for something else: it’s something similar as ε and εbut then for magnetic fields. To be precise, μ0 is referred to as the permeability of the vacuum (and μ is just the permeability of some non-vacuum medium, of course). Now, because electricity and magnetism are part of one and the same phenomenon in Nature (when you’re going for engineer, you’ll get one course on electromagnetism, not two separate ones), ε0 are μ0 related. In fact, they’re related through a marvelous formulas—a formula like E = mc2 in physics or, in math, eiπ+ 1 = 0. Don’t try to understand it. Just look at it:

c2ε0μ0 = (cε0)(cμ0) = 1

Amazing, isn’t it? The c here is the speed of light in a vacuum, obviously. So it’s a physical constant. In other words, unlike ε0 or μ0, it’s got nothing to do with proportionality or units: the speed of light is the speed of light no matter what units we use—meters or light-seconds or whatever. OK. Just swallow this and don’t pay too much attention. It’s just a digression, but let me finish it.

The equivalent of Coulomb’s Law in magnetism is Ampère’s Law, and it involves the circulation of a field, as illustrated below. So that’s why Ampère’s Law involves a 2π factor.

In fact, because we’re talking two wires (or two conductors) with currents going through them (I1 and I2 respectively), the proportionality constant in Ampère’s Law is written as 2kA.

Now, I won’t go too much into the detail but the thing about the circulation and that factor 2 in Ampère’s Law result in μbeing written as μ0 = 4π×10–7 N/A2. As for the units: N is newton and A is ampere obviously. And so that’s why we have the 4π in the proportionality constant for Coulomb’s Law as well. And, of course, the (cε0)(cμ0) = 1 equation makes it obvious that cε0 and cμ0 are reciprocal numbers, so that’s why we write 1/4πε0 for the proportionality constant in Coulomb’s Law, rather than kor a or whatever other simple thing. […] Well… Sort of. In any case, nothing to worry about. 🙂

# The Uncertainty Principle and the stability of atoms

The Model of the Atom

In one of my posts, I explained the quantum-mechanical model of an atom. Feynman sums it up as follows:

“The electrostatic forces pull the electron as close to the nucleus as possible, but the electron is compelled to stay spread out in space over a distance given by the Uncertainty Principle. If it were confined in too small a space, it would have a great uncertainty in momentum. But that means it would have a high expected energy—which it would use to escape from the electrical attraction. The net result is an electrical equilibrium not too different from the idea of Thompson—only is it the negative charge that is spread out, because the mass of the electron is so much smaller than the mass of the proton.”

This explanation is a bit sloppy, so we should add the following clarification: “The wave function Ψ(r) for an electron in an atom does not describe a smeared-out electron with a smooth charge density. The electron is either here, or there, or somewhere else, but wherever it is, it is a point charge.” (Feynman’s Lectures, Vol. III, p. 21-6)

The two quotes are not incompatible: it is just a matter of defining what we really mean by ‘spread out’. Feynman’s calculation of the Bohr radius of an atom in his introduction to quantum mechanics clears all confusion in this regard:

It is a nice argument. One may criticize he gets the right thing out because he puts the right things in – such as the values of e and m, for example 🙂 − but it’s nice nevertheless!

Mass as a Scale Factor for Uncertainty

Having complimented Feynman, the calculation above does raise an obvious question: why is it that we cannot confine the electron in “too small a space” but that we can do so for the nucleus (which is just one proton in the example of the hydrogen atom here). Feynman gives the answer above: because the mass of the electron is so much smaller than the mass of the proton.

Huh? What’s the mass got to do with it? The uncertainty is the same for protons and electrons, isn’t it?

Well… It is, and it isn’t. 🙂 The Uncertainty Principle – usually written in its more accurate σxσp ≥ ħ/2 expression – applies to both the electron and the proton – of course! – but the momentum is the product of mass and velocity (p = m·v), and so it’s the proton’s mass that makes the difference here. To be specific, the mass of a proton is about 1836 times that of an electron. Now, as long as the velocities involved are non-relativistic—and they are non-relativistic in this case: the (relative) speed of electrons in atoms is given by the fine-structure constant α = v/c ≈ 0.0073, so the Lorentz factor is very close to 1—we can treat the m in the p = m·v identity as a constant and, hence, we can also write: Δp = Δ(m·v) = m·Δv. So all of the uncertainty of the momentum goes into the uncertainty of the velocity. Hence, the mass acts likes a reverse scale factor for the uncertainty. To appreciate what that means, let me write ΔxΔp = ħ as:

ΔxΔv = ħ/m

It is an interesting point, so let me expand the argument somewhat. We actually use a more general mathematical property of the standard deviation here: the standard deviation of a variable scales directly with the scale of the variable. Hence, we can write: σ(k·x) = k·σ(x), with k > 0. So the uncertainty is, indeed, smaller for larger masses. Larger masses are associated with smaller uncertainties in their position x. To be precise, the uncertainty is inversely proportional to the mass and, hence, the mass number effectively acts like a reverse scale factor for the uncertainty.

Of course, you’ll say that the uncertainty still applies to both factors on the left-hand side of the equation, and so you’ll wonder: why can’t we keep Δx the same and multiply Δv with m, so its product yields ħ again? In other words, why can’t we have a uncertainty in velocity for the proton that is 1836 times larger than the uncertainty in velocity for the electron? The answer to that question should be obvious: the uncertainty should not be greater than the expected value. When everything is said and done, we’re talking a distribution of some variable here (the velocity variable, to be precise) and, hence, that distribution is likely to be the Maxwell-Boltzmann distribution we introduced in previous posts. Its formula and graph are given below:

In statistics (and in probability theory), they call this a chi distribution with three degrees of freedom and a scale parameter which is equal to a = (kT/m)1/2. The formula for the scale parameter shows how the mass of a particle indeed acts as a reverse scale parameter. The graph above shows three graphs for a = 1, 2 and 5 respectively. Note the square root though: quadrupling the mass (keeping kT the same) amounts to going from a = 2 to a = 1, so that’s halving a. Indeed, [kT/(4m)]1/2 = (1/2)(kT/m)1/2. So we can’t just do what we want with Δv (like multiplying it with 1836, as suggested). In fact, the graph and the formulas show that Feynman’s assumption that we can equate p with Δp (i.e. his assumption that “the momenta must be of the order p = ħ/Δx, with Δx the spread in position”), more or less at least, is quite reasonable.

Of course, you are very smart and so you’ll have yet another objection: why can’t we associate a much higher momentum with the proton, as that would allow us to associate higher velocities with the proton? Good question. My answer to that is the following (and it might be original, as I didn’t find this anywhere else). When everything is said and done, we’re talking two particles in some box here: an electron and a proton. Hence, we should assume that the average kinetic energy of our electron and our proton is the same (if not, they would be exchanging kinetic energy until it’s more or less equal), so we write <melectron·v2electron/2> = <mproton·v2proton/2>. We can re-write this as mp/m= 1/1836 = <v2e>/<v2p> and, therefore, <v2e> = 1836·<v2p>. Now, <v2> ≠ <v>2 and, hence, <v> ≠ √<v2>. So the equality does not imply that the expected velocity of the electron is √1836 ≈ 43 times the expected velocity of the proton. Indeed, because of the particularities of the distribution, there is a difference between (a) the most probable speed, which is equal to √2·a ≈ 1.414·a, (b) the root mean square speed, which is equal to √<v2> = √3·a ≈ 1.732·a, and, finally, (c) the mean or expected speed, which is equal to <v> = 2·(2/π)1/2·a ≈ 1.596·a.

However, we are not far off. We could use any of these three values to roughly approximate Δv, as well as the scale parameter a itself: our answers would all be of the same order. However, to keep the calculations simple, let’s use the most probable speed. Let’s equate our electron mass with unity, so the mass of our proton is 1836. Now, such mass implies a scale factor (i.e. a) that’s √1836 ≈ 43 times smaller. So the most probable speed of the proton and, therefore, its spread, would be about √2/√1836 = √(2/1836) ≈ 0.033 that of the electron, so we write: Δvp ≈ 0.033·Δve. Now we can insert this in our ΔxΔv = ħ/m = ħ/1836 identity. We get: ΔxpΔvp = Δxp·√(2/1836)·Δve = ħ/1836. That, in turn, implies that √(2·1836)·Δxp = ħ/Δve, which we can re-write as: Δx= Δxe/√(2·1836) ≈ Δxe/60. In other words, the expected spread in the position of the proton is about 60 times smaller than the expected spread of the electron. More in general, we can say that the spread in position of a particle, keeping all else equal, is inversely proportional to (2m)1/2. Indeed, in this case, we multiplied the mass with about 1800, and we found that the uncertainty in position went down with a factor 1/60 = 1/√3600. Not bad as a result ! Is it precise? Well… It could be like √3·√m or 2·(2/π)1/2··√m depending on our definition of ‘uncertainty’, but it’s all of the same order. So… Yes. Not bad at all… 🙂

You’ll raise a third objection now: the radius of a proton is measured using the femtometer scale, so that’s expressed in 10−15 m, which is not 60 but a million times smaller than the nanometer (i.e. 10−9 m) scale used to express the Bohr radius as calculated by Feynman above. You’re right, but the 10−15 m number is the charge radius, not the uncertainty in position. Indeed, the so-called classical electron radius is also measured in femtometer and, hence, the Bohr radius is also like a million times that number. OK. That should settle the matter. I need to move on.

Before I do move on, let me relate the observation (i.e. the fact that the uncertainty in regard to position decreases as the mass of a particle increases) to another phenomenon. As you know, the interference of light beams is easy to observe. Hence, the interference of photons is easy to observe: Young’s experiment involved a slit of 0.85 mm (so almost 1 mm) only. In contrast, the 2012 double-slit experiment with electrons involved slits that were 62 nanometer wide, i.e. 62 billionths of a meter! That’s because the associated frequencies are so much higher and, hence, the wave zone is much smaller. So much, in fact, that Feynman could not imagine technology would ever be sufficiently advanced so as to actually carry out the double slit experiment with electrons. It’s an aspect of the same: the uncertainty in position is much smaller for electrons than it is for photons. Who knows: perhaps one day, we’ll be able to do the experiment with protons. 🙂 For further detail, I’ll refer you one of my posts on this.

What’s Explained, and What’s Left Unexplained?

There is another obvious question: if the electron is still some point charge, and going around as it does, why doesn’t it radiate energy? Indeed, the Rutherford-Bohr model had to be discarded because this ‘planetary’ model involved circular (or elliptical) motion and, therefore, some acceleration. According to classical theory, the electron should thus emit electromagnetic radiation, as a result of which it would radiate its kinetic energy away and, therefore, spiral in toward the nucleus. The quantum-mechanical model doesn’t explain this either, does it?

I can’t answer this question as yet, as I still need to go through all Feynman’s Lectures on quantum mechanics. You’re right. There’s something odd about the quantum-mechanical idea: it still involves a electron moving in some kind of orbital − although I hasten to add that the wavefunction is a complex-valued function, not some real function − but it does not involve any loss of kinetic energy due to circular motion apparently!

There are other unexplained questions as well. For example, the idea of an electrical point charge still needs to be re-conciliated with the mathematical inconsistencies it implies, as Feynman points out himself in yet another of his Lectures.

Finally, you’ll wonder as to the difference between a proton and a positron: if a positron and an electron annihilate each other in a flash, why do we have a hydrogen atom at all? Well… The proton is not the electron’s anti-particle. For starters, it’s made of quarks, while the positron is made of… Well… A positron is a positron: it’s elementary. But, yes, interesting question, and the ‘mechanics’ behind the mutual destruction are quite interesting and, hence, surely worth looking into—but not here. 🙂

Having mentioned a few things that remain unexplained, the model does have the advantage of solving plenty of other questions. It explains, for example, why the electron and the proton are actually right on top of each other, as they should be according to classical electrostatic theory, and why they are not at the same time: the electron is still a sort of ‘cloud’ indeed, with the proton at its center.

The quantum-mechanical ‘cloud’ model of the electron also explains why “the terrific electrical forces balance themselves out, almost perfectly, by forming tight, fine mixtures of the positive and the negative, so there is almost no attraction or repulsion at all between two separate bunches of such mixtures” (Richard Feynman, Introduction to Electromagnetism, p. 1-1) or, to quote from one of his other writings, why we do not fall through the floor as we walk:

“As we walk, our shoes with their masses of atoms push against the floor with its mass of atoms. In order to squash the atoms closer together, the electrons would be confined to a smaller space and, by the uncertainty principle, their momenta would have to be higher on the average, and that means high energy; the resistance to atomic compression is a quantum-mechanical effect and not a classical effect. Classically, we would expect that if we were to draw all the electrons and protons closer together, the energy would be reduced still further, and the best arrangement of positive and negative charges in classical physics is all on top of each other. This was well known in classical physics and was a puzzle because of the existence of the atom. Of course, the early scientists invented some ways out of the trouble—but never mind, we have the right way out, now!”

So that’s it, then. Except… Well…

The Fine-Structure Constant

When talking about the stability of atoms, one cannot escape a short discussion of the so-called fine-structure constant, denoted by α (alpha). I discussed it another post of mine, so I’ll refer you there for a more comprehensive overview. I’ll just remind you of the basics:

(1) α is the square of the electron charge expressed in Planck units: α = eP2.

(2) α is the square root of the ratio of (a) the classical electron radius and (b) the Bohr radius: α = √(re /r). You’ll see this more often written as re = α2r. Also note that this is an equation that does not depend on the units, in contrast to equation 1 (above), and 4 and 5 (below), which require you to switch to Planck units. It’s the square of a ratio and, hence, the units don’t matter. They fall away.

(3) α is the (relative) speed of an electron: α = v/c. [The relative speed is the speed as measured against the speed of light. Note that the ‘natural’ unit of speed in the Planck system of units is equal to c. Indeed, if you divide one Planck length by one Planck time unit, you get (1.616×10−35 m)/(5.391×10−44 s) = m/s. However, this is another equation, just like (2), that does not depend on the units: we can express v and c in whatever unit we want, as long we’re consistent and express both in the same units.]

(4) Finally, α is also equal to the product of (a) the electron mass (which I’ll simply write as me here) and (b) the classical electron radius re (if both are expressed in Planck units): α = me·re. [think that’s, perhaps, the most amazing of all of the expressions for α. If you don’t think that’s amazing, I’d really suggest you stop trying to study physics.]

Note that, from (2) and (4), we also find that:

(5) The electron mass (in Planck units) is equal me = α/r= α/α2r = 1/αr. So that gives us an expression, using α once again, for the electron mass as a function of the Bohr radius r expressed in Planck units.

Finally, we can also substitute (1) in (5) to get:

(6) The electron mass (in Planck units) is equal to me = α/r = eP2/re. Using the Bohr radius, we get me = 1/αr = 1/eP2r.

In addition, in the mentioned post, I also related α to the so-called coupling constant determining the strength of the interaction between electrons and photons. So… What a magical number indeed ! It suggests some unity that our little model of the atom above doesn’t quite capture. As far as I am concerned, it’s one of the many other ‘unexplained questions’, and one of my key objectives, as I struggle through Feynman’s Lectures, is to understand it all. 🙂 One of the issues is, of course, how to relate this coupling constant to the concept of a gauge, which I briefly discussed in my previous post. In short, I’ve still got a long way to go… 😦

Post Scriptum: The de Broglie relations and the Uncertainty Principle

My little exposé on mass being nothing but a scale factor in the Uncertainty Principle is a good occasion to reflect on the Uncertainty Principle once more. Indeed, what’s the uncertainty about, if it’s not about the mass? It’s about the position in space and velocity, i.e. it’s movement and time. Velocity or speed (i.e. the magnitude of the velocity vector) is, in turn, defined as the distance traveled divided by the time of travel, so the uncertainty is about time as well, as evidenced from the ΔEΔt = h expression of the Uncertainty Principle. But how does it work exactly?

Hmm… Not sure. Let me try to remember the context. We know that the de Broglie relation, λ = h/p, which associates a wavelength (λ) with the momentum (p) of a particle, is somewhat misleading, because we’re actually associating a (possibly infinite) bunch of component waves with a particle. So we’re talking some range of wavelengths (Δλ) and, hence, assuming all these component waves travel at the same speed, we’re also talking a frequency range (Δf). The bottom line is that we’ve got a wave packet and we need to distinguish the velocity of its phase (vp) versus the group velocity (vg), which corresponds to the classical velocity of our particle.

I think I explained that pretty well in one of my previous posts on the Uncertainty Principle, so I’d suggest you have a look there. The mentioned post explains how the Uncertainty Principle relates position (x) and momentum (p) as a Fourier pair, and it also explains that general mathematical property of Fourier pairs: the more ‘concentrated’ one distribution is, the more ‘spread out’ its Fourier transform will be. In other words, it is not possible to arbitrarily ‘concentrate’ both distributions, i.e. both the distribution of x (which I denoted as Ψ(x) as well as its Fourier transform, i.e. the distribution of p (which I denoted by Φ(p)). So, if we’d ‘squeeze’ Ψ(x), then its Fourier transform Φ(p) will ‘stretch out’.

That was clear enough—I hope! But how do we go from ΔxΔp = h to ΔEΔt = h? Why are energy and time another Fourier pair? To answer that question, we need to clearly define what energy and what time we are talking about. The argument revolves around the second de Broglie relation: E = h·f. How do we go from the momentum p to the energy E? And how do we go from the wavelength λ to the frequency f?

The answer to the first question is the energy-mass equivalence: E = mc2, always. This formula is relativistic, as m is the relativistic mass, so it includes the rest mass m0 as well as the equivalent mass of its kinetic energy m0v2/2 + … [Note, indeed, that the kinetic energy – defined as the excess energy over its rest energy – is a rapidly converging series of terms, so only the m0v2/2 term is mentioned.] Likewise, momentum is defined as p = mv, always, with m the relativistic mass, i.e. m = (1−v2/c2)−1/2·m0 = γ·m0, with γ the Lorentz factor. The E = mc2 and p = mv relations combined give us the E/c = m·c = p·c/v or E·v/c = p·c relationship, which we can also write as E/p = c2/v. However, we’ll need to write E as a function of p for the purpose of a derivation. You can verify that E− p2c= m02c4) and, hence, that E = (p2c+ m02c4)1/2.

Now, to go from a wavelength to a frequency, we need the wave velocity, and we’re obviously talking the phase velocity here, so we write: vp = λ·f. That’s where the de Broglie hypothesis comes in: de Broglie just assumed the Planck-Einstein relation E = h·ν, in which ν is the frequency of a massless photon, would also be valid for massive particles, so he wrote: E = h·f. It’s just a hypothesis, of course, but it makes everything come out alright. More in particular, the phase velocity vp = λ·f can now be re-written, using both de Broglie relations (i.e. h/p = λ and E/h = f) as vp = (E/h)·(p/h) = E/p = c2/v. Now, because v is always smaller than c for massive particles (and usually very much smaller), we’re talking a superluminal phase velocity here! However, because it doesn’t carry any signal, it’s not inconsistent with relativity theory.

Now what about the group velocity? To calculate the group velocity, we need the frequencies and wavelengths of the component waves. The dispersion relation assumes the frequency of each component wave can be expressed as a function of its wavelength, so f = f(λ). Now, it takes a bit of wave mechanics (which I won’t elaborate on here) to show that the group velocity is the derivative of f with respect to λ, so we write vg = ∂f/∂λ. Using the two de Broglie relations, we get: vg = ∂f/∂λ = ∂(E/h)/∂(p/h) = ∂E/∂p = ∂[p2c+ m02c4)1/2]/∂p. Now, when you write it all out, you should find that vg = ∂f/∂λ = pc2/E = c2/vp = v, so that’s the classical velocity of our particle once again.

Phew! Complicated! Yes. But so we still don’t have our ΔEΔt = h expression! All of the above tells us how we can associate a range of momenta (Δp) with a range of wavelengths (Δλ) and, in turn, with a frequency range (Δf) which then gives us some energy range (ΔE), so the logic is like:

Δp ⇒ Δλ ⇒ Δf ⇒ ΔE

Somehow, the same sequence must also ‘transform’ our Δx into Δt. I googled a bit, but I couldn’t find any clear explanation. Feynman doesn’t seem to have one in his Lectures either so, frankly, I gave up. What I did do in one of my previous posts, is to give some interpretation. However, I am not quite sure if it’s really the interpretation: there are probably several ones. It must have something to do with the period of a wave, but I’ll let you break your head over it. 🙂 As far as I am concerned, it’s just one of the other unexplained questions I have as I sort of close my study of ‘classical’ physics. So I’ll just make a mental note of it. [Of course, please don’t hesitate to send me your answer, if you’d have one!] Now it’s time to really dig into quantum mechanics, so I should really stay silent for quite a while now! 🙂

# Maxwell, Lorentz, gauges and gauge transformations

I’ve done quite a few posts already on electromagnetism. They were all focused on the math one needs to understand Maxwell’s equations. Maxwell’s equations are a set of (four) differential equations, so they relate some function with its derivatives. To be specific, they relate E and B, i.e. the electric and magnetic field vector respectively, with their derivatives in space and in time. [Let me be explicit here: E and B have three components, but depend on both space as well as time, so we have three dependent and four independent variables for each function: E = (Ex, Ey, Ez) = E(x, y, z, t) and B = (Bx, By, Bz) = B(x, y, z, t).] That’s simple enough to understand, but the dynamics involved are quite complicated, as illustrated below.

I now want to do a series on the more interesting stuff, including an exploration of the concept of gauge in field theory, and I also want to show how one can derive the wave equation for electromagnetic radiation from Maxwell’s equations. Before I start, let’s recall the basic concept of a field.

The reality of fields

I said a couple of time already that (electromagnetic) fields are real. They’re more than just a mathematical structure. Let me show you why. Remember the formula for the electrostatic potential caused by some charge q at the origin:

We know that the (negative) gradient of this function, at any point in space, gives us the electric field vector at that point: E = –Φ. [The minus sign is there because of convention: we take the reference point Φ = 0 at infinity.] Now, the electric field vector gives us the force on a unit charge (i.e. the charge of a proton) at that point. If q is some positive charge, the force will be repulsive, and the unit charge will accelerate away from our q charge at the origin. Hence, energy will be expended, as force over distance implies work is being done: as the charges separate, potential energy is converted into kinetic energy. Where does the energy come from? The energy conservation law tells us that it must come from somewhere.

It does: the energy comes from the field itself. Bringing in more or bigger charges (from infinity, or just from further away) requires more energy. So the new charges change the field and, therefore, its energy. How exactly? That’s given by Gauss’ Law: the total flux out of a closed surface is equal to:

You’ll say: flux and energy are two different things. Well… Yes and no. The energy in the field depends on E. Indeed, the formula for the energy density in space (i.e. the energy per unit volume) is

Getting the energy over a larger space is just another integral, with the energy density as the integral kernel:

Feynman’s illustration below is not very sophisticated but, as usual, enlightening. 🙂

Gauss’ Theorem connects both the math as well as the physics of the situation and, as such, underscores the reality of fields: the energy is not in the electric charges. The energy is in the fields they produce. Everything else is just the principle of superposition of fields –  i.e. E = E+ E– coming into play. I’ll explain Gauss’ Theorem in a moment. Let me first make some additional remarks.

First, the formulas are valid for electrostatics only (so E and B only vary in space, not in time), so they’re just a piece of the larger puzzle. 🙂 As for now, however, note that, if a field is real (or, to be precise, if its energy is real), then the flux is equally real.

Second, let me say something about the units. Field strength (E or, in this case, its normal component En = E·n) is measured in newton (N) per coulomb (C), so in N/C. The integral above implies that flux is measured in (N/C)·m2. It’s a weird unit because one associates flux with flow and, therefore, one would expect flux is some quantity per unit time and per unit area, so we’d have the m2 unit (and the second) in the denominator, not in the numerator. But so that’s true for heat transfer, for mass transfer, for fluid dynamics (e.g. the amount of water flowing through some cross-section) and many other physical phenomena. But for electric flux, it’s different. You can do a dimensional analysis of the expression above: the sum of the charges is expressed in coulomb (C), and the electric constant (i.e. the vacuum permittivity) is expressed in C2/(N·m2), so, yes, it works: C/[C2/(N·m2)] = (N/C)·m2. To make sense of the units, you should think of the flux as the total flow, and of the field strength as a surface density, so that’s the flux divided by the total area, so (field strength) = (flux)/(area). Conversely, (flux) = (field strength)×(area). Hence, the unit of flux is [flux] = [field strength]×[area] = (N/C)·m2.

OK. Now we’re ready for Gauss’ Theorem. 🙂 I’ll also say something about its corollary, Stokes’ Theorem. It’s a bit of a mathematical digression but necessary, I think, for a better understanding of all those operators we’re going to use.

Gauss’ Theorem

The concept of flux is related to the divergence of a vector field through Gauss’ Theorem. Gauss’s Theorem has nothing to do with Gauss’ Law, except that both are associated with the same genius. Gauss’ Theorem is:

The ·C in the integral on the right-hand side is the divergence of a vector field. It’s the volume density of the outward flux of a vector field from an infinitesimal volume around a given point.

Huh? What’s a volume density? Good question. Just substitute C for E in the surface and volume integral above (the integral on the left is a surface integral, and the one on the right is a volume integral), and think about the meaning of what’s written. To help you, let me also include the concept of linear density, so we have (1) linear, (2) surface and (3) volume density. Look at that representation of a vector field once again: we said the density of lines represented the magnitude of E. But what density? The representation hereunder is flat, so we can think of a linear density indeed, measured along the blue line: so the flux would be six (that’s the number of lines), and the linear density (i.e. the field strength) is six divided by the length of the blue line.

However, we defined field strength as a surface density above, so that’s the flux (i.e. the number of field lines) divided by the surface area (i.e. the area of a cross-section): think of the square of the blue line, and field lines going through that square. That’s simple enough. But what’s volume density? How do we count the number of lines inside of a box? The answer is: mathematicians actually define it for an infinitesimally small cube by adding the fluxes out of the six individual faces of an infinitesimally small cube:

So, the truth is: volume density is actually defined as a surface density, but for an infinitesimally small volume element. That, in turn, gives us the meaning of the divergence of a vector field. Indeed, the sum of the derivatives above is just ·C (i.e. the divergence of C), and ΔxΔyΔz is the volume of our infinitesimal cube, so the divergence of some field vector C at some point P is the flux – i.e. the outgoing ‘flow’ of Cper unit volume, in the neighborhood of P, as evidenced by writing

Indeed, just bring ΔV to the other side of the equation to check the ‘per unit volume’ aspect of what I wrote above. The whole idea is to determine whether the small volume is like a sink or like a source, and to what extent. Think of the field near a point charge, as illustrated below. Look at the black lines: they are the field lines (the dashed lines are equipotential lines) and note how the positive charge is a source of flux, obviously, while the negative charge is a sink.

Now, the next step is to acknowledge that the total flux from a volume is the sum of the fluxes out of each part. Indeed, the flux through the part of the surfaces common to two parts will cancel each other out. Feynman illustrates that with a rough drawing (below) and I’ll refer you to his Lecture on it for more detail.

So… Combining all of the gymnastics above – and integrating the divergence over an entire volume, indeed –  we get Gauss’ Theorem:

Stokes’ Theorem

There is a similar theorem involving the circulation of a vector, rather than its flux. It’s referred to as Stokes’ Theorem. Let me jot it down:

We have a contour integral here (left) and a surface integral (right). The reasoning behind is quite similar: a surface bounded by some loop Γ is divided into infinitesimally small squares, and the circulation around Γ is the sum of the circulations around the little loops. We should take care though: the surface integral takes the normal component of ×C, so that’s (×C)n = (×Cn. The illustrations below should help you to understand what’s going on.

The electric versus the magnetic force

There’s more than just the electric force: we also have the magnetic force. The so-called Lorentz force is the combination of both. The formula, for some charge q in an electromagnetic field, is equal to:

Hence, if the velocity vector v is not equal to zero, we need to look at the magnetic field vector B too! The simplest situation is magnetostatics, so let’s first have a look at that.

Magnetostatics imply that that the flux of E doesn’t change, so Maxwell’s third equation reduces to c2×B = j0. So we just have a steady electric current (j): no accelerating charges. Maxwell’s fourth equation, B = 0, remains what is was: there’s no such thing as a magnetic charge. The Lorentz force also remains what it is, of course: F = q(E+v×B) = qE +qv×B. Also note that the v, j and the lack of a magnetic charge all point to the same: magnetism is just a relativistic effect of electricity.

What about units? Well… While the unit of E, i.e. the electric field strength, is pretty obvious from the F = qE term  – hence, E = F/q, and so the unit of E must be [force]/[charge] = N/C – the unit of the magnetic field strength is more complicated. Indeed, the F = qv×B identity tells us it must be (N·s)/(m·C), because 1 N = 1C·(m/s)·(N·s)/(m·C). Phew! That’s as horrendous as it looks, and that’s why it’s usually expressed using its shorthand, i.e. the tesla: 1 T = 1 (N·s)/(m·C). Magnetic flux is the same concept as electric flux, so it’s (field strength)×(area). However, now we’re talking magnetic field strength, so its unit is T·m= (N·s·m)/(m·C) = (N·s·m)/C, which is referred to as the weber (Wb). Remembering that 1 volt = 1 N·m/C, it’s easy to see that a weber is also equal to 1 Wb = 1 V·s. In any case, it’s a unit that is not so easy to interpret.

Magnetostatics is a bit of a weird situation. It assumes steady fields, so the ∂E/∂t and ∂B/∂t terms in Maxwell’s equations can be dropped. In fact, c2×B = j0 implies that ·(c2×B ·(j0) and, therefore, that ·= 0. Now, ·= –∂ρ/∂t and, therefore, magnetostatics is a situation which assumes ∂ρ/∂t = 0. So we have electric currents but no change in charge densities. To put it simply, we’re not looking at a condenser that is charging or discharging, although that condenser may act like the battery or generator that keeps the charges flowing! But let’s go along with the magnetostatics assumption. What can we say about it? Well… First, we have the equivalent of Gauss’ Law, i.e. Ampère’s Law:

We have a line integral here around a closed curve, instead of a surface integral over a closed surface (Gauss’ Law), but it’s pretty similar: instead of the sum of the charges inside the volume, we have the current through the loop, and then an extra c2 factor in the denominator, of course. Combined with the B = 0 equation, this equation allows us to solve practical problems. But I am not interested in practical problems. What’s the theory behind?

The magnetic vector potential

TheB = 0 equation is true, always, unlike the ×E = 0 expression, which is true for electrostatics only (no moving charges). It says the divergence of B is zero, always, and, hence, it means we can represent B as the curl of another vector field, always. That vector field is referred to as the magnetic vector potential, and we write:

·B = ·(×A) = 0 and, hence, B×A

In electrostatics, we had the other theorem: if the curl of a vector field is zero (everywhere), then the vector field can be represented as the gradient of some scalar function, so if ×= 0, then there is some Ψ for which CΨ. Substituting C for E, and taking into account our conventions on charge and the direction of flow, we get E = –Φ. Substituting E in Maxwell’s first equation (E = ρ/ε0) then gave us the so-called Poisson equation: ∇2Φ = ρ/ε0, which sums up the whole subject of electrostatics really! It’s all in there!

Except magnetostatics, of course. Using the (magnetic) vector potential A, all of magnetostatics is reduced to another expression:

2A= −j0, with ·A = 0

Note the qualifier: ·A = 0. Why should the divergence of A be equal to zero? You’re right. It doesn’t have to be that way. We know that ·(×C) = 0, for any vector field C, and always (it’s a mathematical identity, in fact, so it’s got nothing to do with physics), but choosing A such that ·A = 0 is just a choice. In fact, as I’ll explain in a moment, it’s referred to as choosing a gauge. The·A = 0 choice is a very convenient choice, however, as it simplifies our equations. Indeed, c2×B = j0 = c2×(×A), and – from our vector calculus classes – we know that ×(×C) = (·C) – ∇2C. Combining that with our choice of A (which is such that ·A = 0, indeed), we get the ∇2A= −j0 expression indeed, which sums up the whole subject of magnetostatics!

The point is: if the time derivatives in Maxwell’s equations, i.e. ∂E/∂t and ∂B/∂t, are zero, then Maxwell’s four equations can be nicely separated into two pairs: the electric and magnetic field are not interconnected. Hence, as long as charges and currents are static, electricity and magnetism appear as distinct phenomena, and the interdependence of E and B does not appear. So we re-write Maxwell’s set of four equations as:

1. ElectrostaticsE = ρ/ε0 and ×E = 0
2. Magnetostatics: ×B = j/c2ε0 and B = 0

Note that electrostatics is a neat example of a vector field with zero curl and a given divergence (ρ/ε0), while magnetostatics is a neat example of a vector field with zero divergence and a given curl (j/c2ε0).

Electrodynamics

But reality is usually not so simple. With time-varying fields, Maxwell’s equations are what they are, and so there is interdependence, as illustrated in the introduction of this post. Note, however, that the magnetic field remains divergence-free in dynamics too! That’s because there is no such thing as a magnetic charge: we only have electric charges. So ·B = 0 and we can define a magnetic vector potential A and re-write B as B×A, indeed.

I am writing a vector potential field because, as I mentioned a couple of times already, we can choose A. Indeed, as long as ·A = 0, it’s fine, so we can add curl-free components to the magnetic potential: it won’t make a difference. This condition is referred to as gauge invariance. I’ll come back to that, and also show why this is what it is.

While we can easily get B from A because of the B×A, getting E from some potential is a different matter altogether. It turns out we can get E using the following expression, which involves both Φ (i.e. the electric or electrostatic potential) as well as A (i.e. the magnetic vector potential):

E = –Φ – ∂A/∂t

Likewise, one can show that Maxwell’s equations can be re-written in terms of Φ and A, rather than in terms of E and B. The expression looks rather formidable, but don’t panic:

Just look at it. We have two ‘variables’ here (Φ and A) and two equations, so the system is fully defined. [Of course, the second equation is three equations really: one for each component x, y and z.] What’s the point? Why would we want to re-write Maxwell’s equations? The first equation makes it clear that the scalar potential (i.e. the electric potential) is a time-varying quantity, so things are not, somehow, simpler. The answer is twofold. First, re-writing Maxwell’s equations in terms of the scalar and vector potential makes sense because we have (fairly) easy expressions for their value in time and in space as a function of the charges and currents. For statics, these expressions are:

So it is, effectively, easier to first calculate the scalar and vector potential, and then get E and B from them. For dynamics, the expressions are similar:

Indeed, they are like the integrals for statics, but with “a small and physically appealing modification”, as Feynman notes: when doing the integrals, we must use the so-called retarded time t′ = t − r12/ct’. The illustration below shows how it works: the influences propagate from point (2) to point (1) at the speed c, so we must use the values of ρ and j at the time t′ = t − r12/ct’ indeed!

The second aspect of the answer to the question of why we’d be interested in Φ and A has to do with the topic I wanted to write about here: the concept of a gauge and a gauge transformation.

Gauges and gauge transformations in electromagnetics

Let’s see what we’re doing really. We calculate some A and then solve for B by writing: B = ×A. Now, I say some A because any A‘ = AΨ, with Ψ any scalar field really. Why? Because the curl of the gradient of Ψ – i.e. curl(gradΨ) = ×(Ψ) – is equal to 0. Hence, ×(AΨ) = ×A×Ψ = ×A.

So we have B, and now we need E. So the next step is to take Faraday’s Law, which is Maxwell’s second equation: ×E = –∂B/∂t. Why this one? It’s a simple one, as it does not involve currents or charges. So we combine this equation and our B = ×A expression and write:

×E = –∂(∇×A)/∂t

Now, these operators are tricky but you can verify this can be re-written as:

×(E + ∂A/∂t) = 0

Looking carefully, we see this expression says that E + ∂A/∂t is some vector whose curl is equal to zero. Hence, this vector must be the gradient of something. When doing electrostatics, When we worked on electrostatics, we only had E, not the ∂A/∂t bit, and we said that E tout court was the gradient of something, so we wrote E = −Φ. We now do the same thing for E + ∂A/∂t, so we write:

E + ∂A/∂t = −Φ

So we use the same symbol Φ but it’s a bit of a different animal, obviously. However, it’s easy to see that, if the ∂A/∂t would disappear (as it does in electrostatics, where nothing changes with time), we’d get our ‘old’ −Φ. Now, E + ∂A/∂t = −Φ can be written as:

E = −Φ – ∂A/∂t

So, what’s the big deal? We wrote B and E as a function of Φ and A. Well, we said we could replace A by any A‘ = AΨ but, obviously, such substitution would not yield the same E. To get the same E, we need some substitution rule for Φ as well. Now, you can verify we will get the same E if we’d substitute Φ for Φ’ = Φ – ∂Ψ/∂t. You should check it by writing it all out:

E = −Φ’–∂A’/∂t = −(Φ–∂Ψ/∂t)–∂(A+Ψ)/∂t

= −Φ+(∂Ψ/∂t)–∂A/∂t–∂(Ψ)/∂t = −Φ – ∂A/∂t = E

Again, the operators are a bit tricky, but the +(∂Ψ/∂t) and –∂(Ψ)/∂t terms do cancel out. Where are we heading to? When everything is said and done, we do need to relate it all to the currents and the charges, because that’s the real stuff out there. So let’s take Maxwell’s E = ρ/ε0 equation, which has the charges in it, and let’s substitute E for E = −Φ – ∂A/∂t. We get:

That equation can be re-written as:

So we have one equation here relating Φ and A to the sources. We need another one, and we also need to separate Φ and A somehow. How do we do that?

Maxwell’s fourth equation, i.e. c2×B = j+ ∂E/∂t can, obviously, be written as c2×− E/∂t = j0. Substituting both E and B yields the following monstrosity:

We can now apply the general ∇×(×C) = (·C) – ∇2C identity to the first term to get:

It’s equally monstrous, obviously, but we can simplify the whole thing by choosing Φ and A in a clever way. For the magnetostatic case, we chose A such that ·A = 0. We could have chosen something else. Indeed, it’s not because B is divergence-free, that A has to be divergence-free too! For example, I’ll leave it to you to show that choosing ·A such that

also respects the general condition that any A and Φ we choose must respect the A‘ = AΨ and Φ’ = Φ – ∂Ψ/∂t equalities. Now, if we choose ·A such that ·A = −c–2·∂Φ/∂t indeed, then the two middle terms in our monstrosity cancel out, and we’re left with a much simpler equation for A:

In addition, doing the substitution in our other equation relating Φ and A to the sources yields an equation for Φ that has the same form:

What’s the big deal here? Well… Let’s write it all out. The equation above becomes:

That’s a wave equation in three dimensions. In case you wonder, just check one of my posts on wave equations. The one-dimensional equivalent for a wave propagating in the x direction at speed c (like a sound wave, for example) is ∂2Φ/∂xc–2·∂2Φ/∂t2, indeed. The equation for A yields above yields similar wave functions for A‘s components Ax, Ay, and Az.

So, yes, it is a big deal. We’ve written Maxwell’s equations in terms of the scalar (Φ) and vector (A) potential and in a form that makes immediately apparent that we’re talking electromagnetic waves moving out at the speed c. Let me copy them again:

You may, of course, say that you’d rather have a wave equation for E and B, rather than for A and Φ. Well… That can be done. Feynman gives us two derivations that do so. The first derivation is relatively simple and assumes the source our electromagnetic wave moves in one direction only. The second derivation is much more complicated and gives an equation for E that, if you’ve read the first volume of Feynman’s Lectures, you’ll surely remember:

The links are there, and so I’ll let you have fun with those Lectures yourself. I am finished here, indeed, in terms of what I wanted to do in this post, and that is to say a few words about gauges in field theory. It’s nothing much, really, and so we’ll surely have to discuss the topic again, but at least you now know what a gauge actually is in classical electromagnetic theory. Let’s quickly go over the concepts:

1. Choosing the ·A is choosing a gauge, or a gauge potential (because we’re talking scalar and vector potential here). The particular choice is also referred to as gauge fixing.
2. Changing A by adding ψ is called a gauge transformation, and the scalar function Ψ is referred to as a gauge function. The fact that we can add curl-free components to the magnetic potential without them making any difference is referred to as gauge invariance.
3. Finally, the ·A = −c–2·∂Φ/∂t gauge is referred to as a Lorentz gauge.

Just to make sure you understand: why is that Lorentz gauge so special? Well… Look at the whole argument once more: isn’t it amazing we get such beautiful (wave) equations if we stick it in? Also look at the functional shape of the gauge itself: it looks like a wave equation itself! […] Well… No… It doesn’t. I am a bit too enthusiastic here. We do have the same 1/c2 and a time derivative, but it’s not a wave equation. 🙂 In any case, it all confirms, once again, that physics is all about beautiful mathematical structures. But, again, it’s not math only. There’s something real out there. In this case, that ‘something’ is a traveling electromagnetic field. 🙂

But why do we call it a gauge? That should be equally obvious. It’s really like choosing a gauge in another context, such as measuring the pressure of a tyre, as shown below. 🙂

Gauges and group theory

You’ll usually see gauges mentioned with some reference to group theory. For example, you will see or hear phrases like: “The existence of arbitrary numbers of gauge functions ψ(r, t) corresponds to the U(1) gauge freedom of the electromagnetic theory.” The U(1) notation stands for a unitary group of degree n = 1. It is also known as the circle group. Let me copy the introduction to the unitary group from the Wikipedia article on it:

In mathematics, the unitary group of degree n, denoted U(n), is the group of n × n unitary matrices, with the group operation that of matrix multiplication. The unitary group is a subgroup of the general linear group GL(n, C). In the simple case n = 1, the group U(1) corresponds to the circle group, consisting of all complex numbers with absolute value 1 under multiplication. All the unitary groups contain copies of this group.

The unitary group U(n) is a real Lie group of of dimension n2. The Lie algebra of U(n) consists of n × n skew-Hermitian matrices, with the Lie bracket given by the commutator. The general unitary group (also called the group of unitary similitudes) consists of all matrices A such that A*A is a nonzero multiple of the identity matrix, and is just the product of the unitary group with the group of all positive multiples of the identity matrix.

Phew! Does this make you any wiser? If anything, it makes me realize I’ve still got a long way to go. 🙂 The Wikipedia article on gauge fixing notes something that’s more interesting (if only because I more or less understand what it says):

Although classical electromagnetism is now often spoken of as a gauge theory, it was not originally conceived in these terms. The motion of a classical point charge is affected only by the electric and magnetic field strengths at that point, and the potentials can be treated as a mere mathematical device for simplifying some proofs and calculations. Not until the advent of quantum field theory could it be said that the potentials themselves are part of the physical configuration of a system. The earliest consequence to be accurately predicted and experimentally verified was the Aharonov–Bohm effect, which has no classical counterpart.

This confirms, once again, that the fields are real. In fact, what this says is that the potentials are real: they have a meaningful physical interpretation. I’ll leave it to you to expore that Aharanov-Bohm effect. In the meanwhile, I’ll study what Feynman writes on potentials and all that as used in quantum physics. It will probably take a while before I’ll get into group theory though.

Indeed, it’s probably best to study physics at a somewhat less abstract level first, before getting into the more sophisticated stuff.

# The blackbody radiation problem revisited: quantum statistics

The equipartition theorem – which states that the energy levels of the modes of any (linear) system, in classical as well as in quantum physics, are always equally spaced – is deep and fundamental in physics. In my previous post, I presented this theorem in a very general and non-technical way: I did not use any exponentials, complex numbers or integrals. Just simple arithmetic. Let’s go a little bit beyond now, and use it to analyze that blackbody radiation problem which bothered 19th century physicists, and which led Planck to ‘discover’ quantum physics. [Note that, once again, I won’t use any complex numbers or integrals in this post, so my kids should actually be able to read through it.]

Before we start, let’s quickly introduce the model again. What are we talking about? What’s the black box? The idea is that we add heat to atoms (or molecules) in a gas. The heat results in the atoms acquiring kinetic energy, and the kinetic theory of gases tells us that the mean value of the kinetic energy for each independent direction of motion will be equal to kT/2. The blackbody radiation model analyzes the atoms (or molecules) in a gas as atomic oscillators. Oscillators have both kinetic as well as potential energy and, on average, the kinetic and potential energy is the same. Hence, the energy in the oscillation is twice the kinetic energy, so its average energy is 〈E〉 = 2·kT/2 = kT. However, oscillating atoms implies oscillating electric charges. Now, electric charges going up and down radiate light and, hence, as light is emitted, energy flows away.

How exactly? It doesn’t matter. It is worth noting that 19th century physicists had no idea about the inner structure of an atom. In fact, at that time, the term electron had not yet been invented: the first atomic model involving electrons was the so-called plum pudding model, which J.J. Thompson advanced in 1904, and he called electrons “negative corpuscles“. And the Rutherford-Bohr model, which is the first model one can actually use to explain how and why excited atoms radiate light, came in 1913 only, so that’s long after Planck’s solution for the blackbody radiation problem, which he presented to the scientific community in December 1900. It’s really true: it doesn’t matter. We don’t need to know about the specifics. The general idea is all that matters. As Feynman puts it: it’s how “A hot stove cools on a cold night, by radiating the light into the sky, because the atoms are jiggling their charge and they continually radiate, and slowly, because of this radiation, the jiggling motion slows down.” 🙂

His subsequent description of the black box is equally simple: “If we enclose the whole thing in a box so that the light does not go away to infinity, then we can eventually get thermal equilibrium. We may either put the gas in a box where we can say that there are other radiators in the box walls sending light back or, to take a nicer example, we may suppose the box has mirror walls. It is easier to think about that case. Thus we assume that all the radiation that goes out from the oscillator keeps running around in the box. Then, of course, it is true that the oscillator starts to radiate, but pretty soon it can maintain its kT of energy in spite of the fact that it is radiating, because it is being illuminated, we may say, by its own light reflected from the walls of the box. That is, after a while there is a great deal of light rushing around in the box, and although the oscillator is radiating some, the light comes back and returns some of the energy that was radiated.”

So… That’s the model. Don’t you just love the simplicity of the narrative here? 🙂 Feynman then derives Rayleigh’s Law, which gives us the frequency spectrum of blackbody radiation as predicted by classical theory, i.e. the intensity (I) of the light as a function of (a) its (angular) frequency (ω) and (b) the average energy of the oscillators, which is nothing but the temperature of the gas (Boltzmann’s constant k is just what it is: a proportionality constant which makes the units come out alright). The other stuff in the formula, given hereunder, are just more constants (and, yes, the is the speed of light!). The grand result is:

The formula looks formidable but the function is actually very simple: it’s quadratic in ω and linear in 〈E〉 = kT. The rest is just a bunch of constants which ensure all of the units we use to measures stuff come out alright. As you may suspect, the derivation of the formula is not so simple as the narrative of the black box model, and so I won’t copy it here (you can check yourself). Indeed, let’s focus on the results, not on the technicalities. Let’s have a look at the graph.

The I(ω) graphs for T = T0 and T = 2T0 are given by the solid black curves. They tell us how much light we should have at different frequencies. They just go up and up and up, so Rayleigh’s Law implies that, when we open our stove – and, yes, I know, some kids don’t know what a stove is – and take a look, we should burn our eyes from x-rays. We know that’s not the case, in reality, so our theory must be wrong. An even bigger problem is that the curve implies that the total energy in the box, i.e. the total of all this intensity summed up over all frequencies, is infinite: we’ve got an infinite curve here indeed, and so an infinite area under it. Therefore, as Feynman puts it: “Rayleigh’s Law is fundamentally, powerfully, and absolutely wrong.” The actual graphs, indeed, are the dashed curves. I’ll come back to them.

The blackbody radiation problem is history, of course. So it’s no longer a problem. Let’s see how the equipartition theorem solved it. We assume our oscillators can only take on equally spaced energy levels, with the space between them equal to h·f = ħ·ω. The frequency f (or ω = 2π·f) is the fundamental frequency of our oscillator, and you know and ħ = h/2π, course: Planck’s constant. Hence, the various energy levels are given by the following formula: En = n·ħ·ω = n·h·f. The first five are depicted below.

Next to the energy levels, we write the probability of an oscillator occupying that energy level, which is given by Boltzmann’s Law. I wrote about Boltzmann’s Law in another post too, so I won’t repeat myself here, except for noting that Boltzmann’s Law says that the probabilities of different conditions of energy are given by e−energy/kT = 1/eenergy/kT. Different ‘conditions of energy’ can be anything: density, molecular speeds, momenta, whatever. Here we have a probability Pn as a function of the energy En = n·ħ·ω, so we write: Pn = A·e−energy/kT = A·en·ħ·ω/kT. [Note that P0 is equal to A, as a consequence.]

Now, we need to determine how many oscillators we have in each of the various energy states, so that’s N0, N1, N2, etcetera. We’ve done that before: N1/N0 = P1/P0 = (A·e−2ħω/kT)/(A·eħω/kT) = eħω/kT. Hence, N1 = N0·eħω/kT. Likewise, it’s not difficult to see that, N2 = N0·e−2ħω/kT or, more in general, that Nn = N0·e−nħω/kT = N0·[eħω/kT]n. To make the calculations somewhat easier, Feynman temporarily substitutes eħω/kT for x. Hence, we write: N1 = N0·x, N2 = N0·x2,…, Nn = N0·xn, and the total number of oscillators is obviously Ntot = N0+N1+…+Nn+… = N0·(1+x+x2+…+xn+…).

What about their energy? The energy of all oscillators in state 0 is, obviously, zero. The energy of all oscillators in state 1 is N1·ħω = ħω·N0·x. Adding it all up for state 2 yields N2·2·ħω = 2·ħω·N0·x2. More generally, the energy of all oscillators in state n is equal to Nn·n·ħω = n·ħω·N0·xn. So now we can write the total energy of the whole system as Etot = E0+E1+…+En+… = 0+ħω·N0·x+2·ħω·N0·x2+…+n·ħω·N0·xn+… = ħω·N0·(x+2x2+…+nxn+…). The average energy of one oscillator, for the whole system, is therefore:

Now, Feynman leaves the exercise of simplifying that expression to the reader and just says it’s equal to:

I should try to figure out how he does that. It’s something like Horner’s rule but that’s not easy with infinite polynomials. Or perhaps it’s just some clever way of factoring both polynomials. I didn’t break my head over it but just checked if the result is correct. [I don’t think Feynman would dare to joke here, but one could never be sure with him it seems. :-)] Note he substituted eħω/kT for x, not e+ħω/kT, so there is a minus sign there, which we don’t have in the formula above. Hence, the denominator, eħω/kT–1 = (1/x)–1 = (1–x)/x, and 1/(eħω/kT–1) = x/(1–x). Now, if (x+2x2+…+nxn+…)/(1+x+x2+…+xn+…) = x/(1–x), then (x+2x2+…+nxn+…)·(1–x) must be equal to x·(1+x+x2+…+xn+…). Just write it out: (x+2x2+…+nxn+…)·(1–x) = x+2x2+…+nxn+….−x2−2x3−…−nxn+1+… = x+x2+…+xn+… Likewise, we get x·(1+x+x2+…+xn+…) = x+x2+…+xn+… So, yes, done.

Now comes the Big Trick, the rabbit out of the hat, so to speak. 🙂 We’re going to substitute the classical expression for 〈E〉 (i.e. kT) in Rayleigh’s Law for it’s quantum-mechanical equivalent (i.e. 〈E〉 = ħω/[eħω/kT–1].

What’s the logic behind? Rayleigh’s Law gave the intensity for the various frequencies that are present as a function of (a) the frequency (of course!) and (b) the average energy of the oscillators, which is kT according to classical theory. Now, our assumption that an oscillator cannot take on just any energy value but that the energy levels are equally spaced, combined with Boltzmann’s Law, gives us a very different formula for the average energy: it’s a function of the temperature, but it’s a function of the fundamental frequency too! I copied the graph below from the Wikipedia article on the equipartition theorem. The black line is the classical value for the average energy as a function of the thermal energy. As you can see, it’s one and the same thing, really (look at the scales: they happen to be both logarithmic but that’s just to make them more ‘readable’). Its quantum-mechanical equivalent is the red curve. At higher temperatures, the two agree nearly perfectly, but at low temperatures (with low being defined as the range where kT << ħ·ω, written as h·ν in the graph), the quantum mechanical value decreases much more rapidly. [Note the energy is measured in units equivalent to h·ν: that’s a nice way to sort of ‘normalize’ things so as to compare them.]

So, without further ado, let’s take Rayleigh’s Law again and just substitute kT (i.e. the classical formula for the average energy) for the ‘quantum-mechanical’ formula for 〈E〉, i.e. ħω/[eħω/kT–1]. Adding the dω factor to emphasize we’re talking some continuous distribution here, we get the even grander result (Feynman calls it the first quantum-mechanical formula ever known or discussed):

So this function is the dashed I(ω) curve (I copied the graph below again): this curve does not ‘blow up’. The math behind the curve is the following: even for large ω, leading that ω3 factor in the numerator to ‘blow up’, we also have Euler’s number being raised to a tremendous power in the denominator. Therefore, the curves come down again, and so we don’t get those incredible amounts of UV light and x-rays.

So… That’s how Max Planck solved the problem and how he became the ‘reluctant father of quantum mechanics.’ The formula is not as simple as Rayleigh’s Law (we have a cubic function in the numerator, and an exponential in the denominator), but its advantage is that it’s correct. Indeed, when everything is said and done, indeed, we do want our formulas to describe something real, don’t we? 🙂

Let me conclude by looking at that ‘quantum-mechanical’ formula for the average energy once more:

E〉 = ħω/[eħω/kT–1]

It’s not a distribution function (the formula for I(ω) is the distribution function), but the –1 term in the denominator does tell us already we’re talking Bose-Einstein statistics. In my post on quantum statistics, I compared the three distribution functions. Let ‘s quickly look at them again:

• Maxwell-Boltzmann (for classical particles): f(E) = 1/[A·eE/kT]
• Fermi-Dirac (for fermions): f(E) = 1/[AeE/kT + 1]
• Bose-Einstein (for bosons):  f(E) = 1/[AeE/kT − 1]

So here we simply substitute ħω for E, which makes sense, as the Planck-Einstein relation tells us that the energy of the particles involved is, indeed, equal to E = ħω . Below, you’ll find the graph of these three functions, first as a function of E, so that’s f(E), and then as a function of T, so that’s f(T) (or f(kT) if you want).

The first graph, for which E is the variable, is the more usual one. As for the interpretation, you can see what’s going on: bosonic particles (or bosons, I should say) will crowd the lower energy levels (the associated probabilities are much higher indeed), while for fermions, it’s the opposite: they don’t want to crowd together and, hence, the associated probabilities are much lower. So fermions will spread themselves over the various energy levels. The distribution for ‘classical’ particles is somewhere in the middle.

In that post of mine, I gave an actual example involving nine particles and the various patterns that are possible, so you can have a look there. Here I just want to note that the math behind is easy to understand when dropping the A (that’s just another normalization constant anyway) and re-writing the formulas as follows:

• Maxwell-Boltzmann (for classical particles): f(E) = e−E/kT
• Fermi-Dirac (for fermions): f(E) = e−E/kT/[1+e−E/kT]
• Bose-Einstein (for bosons):  f(E) = e−E/kT/[1−e−E/kT]

Just use Feynman’s substitution xeħω/kT: the Bose-Einstein distribution then becomes 1/[1/x–1] = 1/[(1–x)/x] = x/(1–x). Now it’s easy to see that the denominator of the formula of both the Fermi-Dirac as well as the Bose-Einstein distribution will approach 1 (i.e. the ‘denominator’ of the Maxwell-Boltzmann formula) if e−E/kT approaches zero, so that’s when E becomes larger and larger. Hence, for higher energy levels, the probability densities of the three functions approach each other indeed, as they should.

Now what’s the second graph about? Here we’re looking at one energy level only, but we let the temperature vary from 0 to infinity. The graph says that, at low temperature, the probabilities will also be more or less the same, and the three distributions only differ at higher temperatures. That makes sense too, of course!

Well… That says it all, I guess. I hope you enjoyed this post. As I’ve sort of concluded Volume I of Feynman’s Lectures with this, I’ll be silent for a while… […] Or so I think. 🙂

# Strings in classical and quantum physics

This post is not about string theory. The goal of this post is much more limited: it’s to give you a better understanding of why the metaphor of the string is so appealing. Let’s recapitulate the basics by see how it’s used in classical as well as in quantum physics.

In my posts on music and math, or music and physics, I described how a simple single string always vibrates in various modes at the same time: every tone is a mixture of an infinite number of elementary waves. These elementary waves, which are referred to as harmonics (or as (normal) modes, indeed) are perfectly sinusoidal, and their amplitude determines their relative contribution to the composite waveform. So we can always write the waveform F(t) as the following sum:

F(t) = a1sin(ωt) + a2sin(2ωt) + a3sin(3ωt) + … + ansin(nωt) + …

[If this is your first reading of my post, and the formula shies you away, please try again. I am writing most of my posts with teenage kids in mind, and especially this one. So I will not use anything else than simple arithmetic in this post: no integrals, no complex numbers, no logarithms. Just a bit of geometry. That’s all. So, yes, you should go through the trouble of trying to understand this formula. The only thing that you may have some trouble with is ω, i.e. angular frequency: it’s the frequency expressed in radians per time unit, rather than oscillations per second, so ω = 2π·f = 2π/T, with the frequency as you know it (i.e. oscillations per second) and T the period of the wave.]

I also noted that the wavelength of these component waves (λ) is determined by the length of the string (L), and by its length only: λ1 = 2L, λ2 = L, λ3 = (2/3)·L. So these wavelengths do not depend on the material of the string, or its tension. At any point in time (so keeping t constant, rather than x, as we did in the equation above), the component waves look like this:

etcetera (1/8, 1/9,…,1/n,… 1/∞)

That the wavelengths of the harmonics of any actual string only depend on its length is an amazing result in light of the complexities behind: a simple wound guitar string, for example, is not simple at all (just click the link here for a quick introduction to guitar string construction). Simple piano wire isn’t simple either: it’s made of high-carbon steel, i.e. a very complex metallic alloy. In fact, you should never think any material is simple: even the simplest molecular structures are very complicated things. Hence, it’s quite amazing all these systems are actually linear systems and that, despite the underlying complexity, those wavelength ratios form a simple harmonic series, i.e. a simple reciprocal function y = 1/x, as illustrated below.

A simple harmonic series? Hmm… I can’t resist noting that the harmonic series is, in fact, a mathematical beast. While its terms approach zero as x (or n) increases, the series itself is divergent. So it’s not like 1+1/2+1/4+1/8+…+1/2n+…, which adds up to 2. Divergent series don’t add up to any specific number. Even Leonhard Euler – the most famous mathematician of all times, perhaps – struggled with this. In fact, as late as in 1826, another famous mathematician, Niels Henrik Abel (in light of the fact he died at age 26 (!), his legacy is truly amazing), exclaimed that a series like this was “an invention of the devil”, and that it should not be used in any mathematical proof. But then God intervened through Abel’s contemporary Augustin-Louis Cauchy 🙂 who finally cracked the nut by rigorously defining the mathematical concept of both convergent as well as divergent series, and equally rigorously determining their possibilities and limits in mathematical proofs. In fact, while medieval mathematicians had already grasped the essentials of modern calculus and, hence, had already given some kind of solution to Zeno’s paradox of motion, Cauchy’s work is the full and final solution to it. But I am getting distracted, so let me get back to the main story.

More remarkable than the wavelength series itself, is its implication for the respective energy levels of all these modes. The material of the string, its diameter, its tension, etc will determine the speed with which the wave travels up and down the string. [Yes, that’s what it does: you may think the string oscillates up and down, and it does, but the waveform itself travels along the string. In fact, as I explained in my previous post, we’ve got two waves traveling simultaneously: one going one way and the other going the other.] For a specific string, that speed (i.e. the wave velocity) is some constant, which we’ll denote by c. Now, is, obviously, the product of the wavelength (i.e. the distance that the wave travels during one oscillation) and its frequency (i.e. the number of oscillations per time unit), so c = λ·f. Hence, f = c/λ and, therefore, f1 = (1/2)·c/L, f2 = (2/2)·c/L, f3 = (3/2)·c/L, etcetera. More in general, we write fn = (n/2)·c/L. In short, the frequencies are equally spaced. To be precise, they are all (1/2)·c/L apart.

Now, the energy of a wave is directly proportional to its frequency, always, in classical as well as in quantum mechanics. For example, for photons, we have the Planck-Einstein relation: E = h·f = ħ·ω. So that relation states that the energy is proportional to the (light) frequency of the photon, with h (i.e. he Planck constant) as the constant of proportionality. [Note that ħ is not some different constant. It’s just the ‘angular equivalent’ of h, so we have to use ħ = h/2π when frequencies are expressed in angular frequency, i.e. radians per second rather than hertz.] Because of that proportionality, the energy levels of our simple string are also equally spaced and, hence, inserting another proportionality constant, which I’ll denote by a instead of (because it’s some other constant, obviously), we can write:

En = a·fn = (n/2)·a·c/L

Now, if we denote the fundamental frequency f1 = (1/2)·c/L, quite simply, by f (and, likewise, its angular frequency as ω), then we can re-write this as:

En = n·a·f = n·ā·ω (ā = a/2π)

This formula is exactly the same as the formula used in quantum mechanics when describing atoms as atomic oscillators, and why and how they radiate light (think of the blackbody radiation problem, for example), as illustrated below: En = n·ħ·ω = n·h·f. The only difference between the formulas is the proportionality constant: instead of a, we have Planck’s constant here: h, or ħ when the frequency is expressed as an angular frequency.

This grand result – that the energy levels associated with the various states or modes of a system are equally spaced – is referred to as the equipartition theorem in physics, and it is what connects classical and quantum physics in a very deep and fundamental way.

In fact, because they’re nothing but proportionality constants, the value of both a and h depends on our units. If w’d use the so-called natural units, i.e. equating ħ to 1, the energy formula becomes En = n·ω, and, hence, our unit of energy and our unit of frequency become one and the same. In fact, we can, of course, also re-define our time unit such that the fundamental frequency ω is one, i.e. one oscillation per (re-defined) time unit, so then we have the following remarkable formula:

En = n

Just think about it for a moment: what I am writing here is E0 = 0, E1 = 1, E2 = 2, E3 = 3, E4 = 4, etcetera. Isn’t that amazing? I am describing the structure of a system here – be it an atom emitting or absorbing photons, or a macro-thing like a guitar string – in terms of its basic components (i.e. its modes), and it’s as simple as counting: 0, 1, 2, 3, 4, etc.

You may think I am not describing anything real here, but I am. We cannot do whatever we wanna do: some stuff is grounded in reality, and in reality only—not in the math. Indeed, the fundamental frequency of our guitar string – which we used as our energy unit – is a property of the string, so that’s real: it’s not just some mathematical shape out: it depends on the string’s length (which determines its wavelength), and it also depends on the propagation speed of the wave, which depends on other basic properties of the string, such as its material, its diameter, and its tension. Likewise, the fundamental frequency of our atomic oscillator is a property of the atomic oscillator or, to use a much grander term, a property of the Universe. That’s why h is a fundamental physical constant. So it’s not like π or e. [When reading physics as a freshman, it’s always useful to clearly distinguish physical constants (like Avogadro’s number, for example) from mathematical constants (like Euler’s number).]

The theme that emerges here is what I’ve been saying a couple of times already: it’s all about structure, and the structure is amazingly simple. It’s really that equipartition theorem only: all you need to know is that the energy levels of the modes of a system – any system really: an atom, a molecular system, a string, or the Universe itself – are equally spaced, and that the space between the various energy levels depends on the fundamental frequency of the system. Moreover, if we use natural units, and also re-define our time unit so the fundamental frequency is equal to 1 (so the frequencies of the other modes are 2, 3, 4 etc), then the energy levels are just 0, 1, 2, 3, 4 etc. So, yes, God kept things extremely simple. 🙂

In order to not cause too much confusion, I should add that you should read what I am writing very carefully: I am talking the modes of a system. The system itself can have any energy level, of course, so there is no discreteness at the level of the system. I am not saying that we don’t have a continuum there. We do. What I am saying is that its energy level can always be written as a (potentially infinite) sum of the energies of its components, i.e. its fundamental modes, and those energy levels are discrete. In quantum-mechanical systems, their spacing is h·f, so that’s the product of Planck’s constant and the fundamental frequency. For our guitar, the spacing is a·f (or, using angular frequency, ā·ω: it’s the same amount). But that’s it really. That’s the structure of the Universe. 🙂

Let me conclude by saying something more about a. What information does it capture? Well… All of the specificities of the string (like its material or its tension) determine the fundamental frequency f and, hence, the energy levels of the basic modes of our string. So a has nothing to do with the particularities of our string, of our system in general. However, we can, of course, pluck our string very softly or, conversely, give it a big jolt. So our a coefficient is not related to the string as such, but to the total energy of our string. In other words, a is related to those amplitudes  a1, a2, etc in our F(t) = a1sin(ωt) + a2sin(2ωt) + a3sin(3ωt) + … + ansin(nωt) + … wave equation.

How exactly? Well… Based on the fact that the total energy of our wave is equal to the sum of the energies of all of its components, I could give you some formula. However, that formula does use an integral. It’s an easy integral: energy is proportional to the square of the amplitude, and so we’re integrating the square of the wave function over the length of the string. But then I said I would not have any integral in this post, and so I’ll stick to that. In any case, even without the formula, you know enough now. For example, one of the things you should be able to reflect on is the relation between a and h. It’s got to do with structure, of course. 🙂 But I’ll let you think about that yourself.

[…] Let me help you. Think of the meaning of Planck’s constant h. Let’s suppose we’d have some elementary ‘wavicle’, like that elementary ‘string’ that string theorists are trying to define: the smallest ‘thing’ possible. It would have some energy, i.e. some frequency. Perhaps it’s just one full oscillation. Just enough to define some wavelength and, hence, some frequency indeed. Then that thing would define the smallest time unit that makes sense: it would the time corresponding to one oscillation. In turn, because of the E = h·relation, it would define the smallest energy unit that makes sense. So, yes, h is the quantum (or fundamental unit) of energy. It’s very small indeed (h = 6.626070040(81)×10−34 J·s, so the first significant digit appears only after 33 zeroes behind the decimal point) but that’s because we’re living at the macro-scale and, hence, we’re measuring stuff in huge units: the joule (J) for energy, and the second (s) for time. In natural units, h would be one. [To be precise, physicist prefer to equate ħ, rather than h, to one when talking natural units. That’s because angular frequency is more ‘natural’ as well when discussing oscillations.]

What’s the conclusion? Well… Our will be some integer multiple of h. Some incredibly large multiple, of course, but a multiple nevertheless. 🙂

Post scriptum: I didn’t say anything about strings in this post or, let me qualify, about those elementary ‘strings’ that string theorists try to define. Do they exist? Feynman was quite skeptical about it. He was happy with the so-called Standard Model of phyics, and he would have been very happy to know that the existence Higgs field has been confirmed experimentally (that discovery is what prompted my blog!), because that confirms the Standard Model. The Standard Model distinguishes two types of wavicles: fermions and bosons. Fermions are matter particles, such as quarks and electrons. Bosons are force carriers, like photons and gluons. I don’t know anything about string theory, but my guts instinct tells me there must be more than just one mathematical description of reality. It’s the principle of duality: concepts, theorems or mathematical structures can be translated into other concepts, theorems or structures. But… Well… We’re not talking equivalent descriptions here: string theory is different theory, it seems. For a brief but totally incomprehensible overview (for novices at least), click on the following link, provided by the C.N. Yang Institute for Theoretical Physics. If anything, it shows I’ve got a lot more to study as I am inching forward on the difficult Road to Reality. 🙂

# Modes in classical and in quantum physics

Basics

Waves are peculiar: there is one single waveform, i.e. one motion only, but that motion can always be analyzed as the sum of the motions of all the different wave modes, combined with the appropriate amplitudes and phases. Saying the same thing using different words: we can always analyze the wave function as the sum of a (possibly infinite) number of components, i.e. a so-called Fourier series:

The f(t) function can be any wave, but the simple examples in physics textbooks usually involve a string or, in two dimensions, some vibrating membrane, and I’ll stick to those examples too in this post. Feynman calls the Fourier components harmonic functions, or harmonics tout court, but the term ‘harmonic’ refers to so many different things in math that it may be better not to use it in this context. The component waves are sinusoidal functions, so sinusoidals might be a better term but it’s not in use, because a more general analysis will use complex exponentials, rather than sines and/or cosines. Complex exponentials (e.g. 10ix) are periodic functions too, so they are totally unlike real exponential functions (e.g. (e.g. 10x). Hence, Feynman also uses the term ‘exponentials’. At some point, he also writes that the pattern of motion (of a mode) varies ‘exponentially’ but, of course, he’s thinking of complex exponentials, and, therefore, we should substitute ‘exponentially’ for ‘sinusoidally’ when talking real-valued wave functions.

[…] I know. I am already getting into the weeds here. As I am a bit off-track anyway now, let me make another remark here. You may think that we have two types of sinusoidals, or two types of functions, in that Fourier decomposition: sines and cosines. You should not think of it that way: the sine and cosine function are essentially the same. I know your old math teacher in high school never told you that, but it’s true. They both come with the same circle (yes, I know that’s ridiculous statement but I don’t know how to phrase it otherwise): the difference between a sine and a cosines is just a phase shift: cos(ωt) = sin(ωt + π/2) and, conversely, sin(ωt) = cos(ωt − π/2). If the starting phases of all of the component waves would be the same, we’d have a Fourier decomposition involving cosines only, or sines only—whatever you prefer. Indeed, because they’re the same function except for that phase shift (π/2), we can always go from one to the other by shifting our origin of space (x) and/or time (t). However, we cannot assume that all of the component waves have the same starting phase and, therefore, we should write each component as cos(n·ωt + Φn), or a sine with a similar argument. Now, you’ll remember – because your math teacher in high school told you that at least 🙂 – that there’s a formula for the cosine (and sine) of the sum of two angles: we can write cos(n·ωt + Φn) as cos(n·ωt + Φn) = [cos(Φn)·cos(n·ωt) – sin(Φn)·sin(n·ωt)]. Substituting cos(Φn) and – sin(Φn) for an and bn respectively gives us the an·cos(n·ωt) + bn·sin(n·ωt) expressions above. In addition, the component waves may not only differ in phase, but also in amplitude, and, hence, the an and bn coefficients do more than only capturing the phase differences. But let me get back on the track. 🙂

Those sinusoidals have a weird existence: they are not there, physically—or so it seems. Indeed, there is one waveform only, i.e. one motion only—and, if it’s any real wave, it’s most likely to be non-sinusoidal. At the same time, I noted, in my previous post, that, if you pluck a string or play a chord on your guitar, some string you did not pluck may still pick up one or more of its harmonics (i.e. one or more of its overtones) and, hence, start to vibrate too! It’s the resonance phenomenon. If you have a grand piano, it’s even more obvious: if you’d press the C4 key on a piano, a small hammer will strike the C4 string and it will vibrate—but the C5 string (one octave higher) will also vibrate, although nothing touched it—except for the air transmitting the sound wave (including the harmonics causing the resonance) from the C4 string, of course! So the component waves are there and, at the same time, they’re not. Whatever they are, they are more than mathematical forms: the so-called superposition principle (on which the Fourier analysis is based) is grounded in reality: it’s because we can add forces. I know that sounds extremely obvious – or ridiculous, you might say 🙂 – but it is actually not so obvious. […] I am tempted to write something about conservative forces here but… Well… I need to move on.

Let me show that diagram of the first seven harmonics of an ideal string once again. All of them, and the higher ones too, would be in our wave function. Hence, assuming there’s no phase difference between the harmonics, we’d write:

f(t) = sin(ωt) + sin(2ωt) + sin(3ωt) + … + sin(nωt) + …

The frequencies of the various modes of our ideal string are all simple multiples of the fundamental frequency ω, as evidenced from the argument in our sine functions (ω, 2ω, 3ω, etcetera). Conversely, the respective wavelengths are λ, λ/2, λ/3, etcetera. [Remember: the speed of the wave is fixed, and frequency and wavelength are inversely proportional: = λ·f = λ/T = λ·(ω/2π).] So, yes, these frequencies and wavelengths can all be related to each other in terms of equally simple harmonic ratios: 1:2, 2:3, 3:5, 4:5 etcetera. I explained in my previous posts why that does not imply that the musical notes themselves are related in such way: the musical scale is logarithmic. So I won’t repeat myself. All of the above is just an introduction to the more serious stuff, which I’ll talk about now.

Modes in two dimensions

An analysis of waves in two dimensions is often done assuming some drum membrane. The Great Teacher played drums, as you can see from his picture in his Lectures, and there are also videos of him performing on YouTube. So that’s why the drum is used almost all textbooks now. 🙂

The illustration of one of the normal modes of a circular membrane comes from the Wikipedia article on modes. There are many other normal modes – some of them with a simpler shape, but some of them more complicated too – but this is a nice one as it also illustrates the concept of a nodal line, which is closely related to the concept of a mode. Huh? Yes. The modes of a one-dimensional string have nodes, i.e. points where the displacement is always zero. Indeed, as you can see from the illustration above (not below), the first overtone has one node, the second two, etcetera. So the equivalent of a node in two dimensions is a nodal line: for the mode shown below, we have one bisecting the disc and then another one—a circle about halfway between the edge and center. The third nodal line is the edge itself, obviously. [The author of the Wikipedia article nodes that the animation isn’t perfect, because the nodal line and the nodal circle halfway the edge and the center both move a little bit. In any case, it’s pretty good, I think. I should also learn how to make animations like that. :-)]

What’s a mode?

How do we find these modes? And how are they defined really? To explain that, I have to briefly return to the one-dimensional example. The key to solving the problem (i.e. finding the modes, and defining their characteristics) is the following fact: when a wave reaches the clamped end of a string, it will be reflected with a change in sign, as illustrated below: we’ve got that F(x+ct) wave coming in, and then it goes back indeed, but with the sign reversed.

It’s a complicated illustration because it also shows some hypothetical wave coming from the other side, where there is no string to vibrate. That hypothetical wave is the same wave, but travelling in the other direction and with the sign reversed (–F). So what’s that all about? Well… I never gave any general solution for a waveform traveling up and down a string: I just said the waveform was traveling up and down the string (now that is obvious: just look at that diagram with the seven first harmonics once again, and think about how that oscillation goes up and down with time), but so I did not really give any general solution for them (the sine and cosine functions are specific solutions). So what is the general solution?

Let’s first assume the string is not held anywhere, so that we have an infinite string along which waves can travel in either direction. In fact, the most general functional form to capture the fact that a waveform can travel in any direction is to write the displacement y as the sum of two functions: one wave traveling one way (which we’ll denote by F), and the other wave (which we’ll denote by G) traveling the other way. From the illustration above, it’s obvious that the F wave is traveling towards the negative x-direction and, hence, its argument will be x + ct. Conversely, the G wave travels in the positive x-direction, so its argument is x – ct. So we write:

y = F(x + ct) + G(x – ct)

[I’ve explained this thing about directions and why the argument in a wavefunction (x ± ct) is what it is before. You should look it up in case you don’t understand. As for the in this equation, that’s the wave velocity once more, which is constant and which depends, as always, on the medium, so that’s the material and the diameter and the tension and whatever of the string.]

So… We know that the string is actually not infinite, but that it’s fixed to some ‘infinitely solid wall’ (as Feynman puts it). Hence, y is equal to zero there: y = 0. Now let’s choose the origin of our x-axis at the fixed end so as to simplify the analysis. Hence, where y is zero, x is also zero. Now, at x = 0, our general solution above for the infinite string becomes  y = F(ct) + G(−ct) = 0, for all values of t. Of course, that means G(−ct) must be equal to –F(ct). Now, that equality is there for all values of t. So it’s there for all values of ct and −ct. In short, that equality is valid for whatever value of the argument of G and –F. As Feynman puts it: “of anything must be –of minus that same thing.” Now, the ‘anything’ in G is its argument: x – ct, so ‘minus that same thing’ is –(x – ct) = −x + ct. Therefore, our equation becomes:

y = F(x + ct) − F(−x + ct)

So that’s what’s depicted in the diagram above: the F(x + ct) wave ‘vanishes’ behind the wall as the − F(−x + ct) wave comes out of it. Conversely, the − F(−x + ct) is hypothetical indeed until it reaches the origin, after which it becomes the real wave. Their sum is only relevant near the origin x = 0, and on the positive side only (on the negative side of the x-axis, the F and G functions are both hypothetical). [I know, it’s not easy to follow, but textbooks are really short on this—which is why I am writing my blog: I want to help you ‘get’ it.]

Now, the results above are valid for any wave, periodic or not. Let’s now confine the analysis to periodic waves only. In fact, we’ll limit the analysis to sinusoidal wavefunctions only. So that should be easy. Yes. Too easy. I agree. 🙂

So let’s make things difficult again by introducing the complex exponential notation, so that’s Euler’s formula: eiθ = cosθ + isinθ, with the imaginary unit, and isinθ the imaginary component of our wave. So the only thing that is real, is cosθ.

What the heck? Just bear with me. It’s good to make the analysis somewhat more general, especially because we’ll be talking about the relevance of all of this to quantum physics, and in quantum physics the waves are complex-valued indeed! So let’s get on with it. To use Euler’s formula, we need to substitute x + ct for the phase of the wave, so that involves the angular frequency and the wavenumber. Let me just write it down:

F(x + ct) = eiω(t+x/c) and F(−x + ct) = eiω(t−x/c)

Huh? Yeah. Sorry. I’ll resist the temptation to go off-track here, because I really shouldn’t be copying what I wrote in other posts. Most of what I write above is really those simple relations: c = λ·f = ω/k, with k, i.e. the wavenumber, being defined as k = 2π/λ. For details, go to one of my others posts indeed, in which I explain how that works in very much detail: just click on the link here, and scroll down to the section on the phase of a wave, in which I explain why the phase of wave is equal to θ = ωt–kx = ω(t–x/c). And, yes, I know: the thing with the wave directions and the signs is quite tricky. Just remember: for a wave traveling in the positive x-direction, the signs in front of x and t are each other’s opposite but, if the wave’s traveling in the negative y-direction, they are the same. As mentioned, all the rest is usually a matter of shifting the phase, which amounts to shifting the origin of either the x- or the t-axis. I need to move on. Using the exponential notation for our sinusoidal wave, y = F(x + ct) − F(−x + ct) becomes:

y = eiω(t+x/c) − eiω(t−x/c)

I can hear you sigh again: Now what’s that for? What can we do with this? Just continue to bear with me for a while longer. Let’s factor the eiωt term out. [Why? Patience, please!] So we write:

y = eiωt [eiωx/c) − eiωx/c)]

Now, you can just use Euler’s formula again to double-check that eiθ − e−θ = 2isinθ. [To get that result, you should remember that cos(−θ) = cosθ, but sin(−θ) = −sin(θ).] So we get:

y = eiωt [eiωx/c) − eiωx/c)] = 2ieiωtsin(ωx/c)

Now, we’re only interested in the real component of this amplitude of course – but that’s only we’re in the classical world here, not in the real world, which is quantum-mechanical and, hence, involves the imaginary stuff also 🙂 – so we should write this out using Euler’s formula again to convert the exponential to sinusoidals again. Hence, remembering that i2 = −1, we get:

y = 2ieiωtsin(ωx/c) = 2icos(ωt)·sin(ωx/c) – 2sin(ωt)·sin(ωx/c)

## !?!

OK. You need a break. So let me pause here for a while. What the hell are we doing? Is this legit? I mean… We’re talking some real wave, here, don’t we? We do. So is this conversion from/to real amplitudes to/from complex amplitudes legit? It is. And, in this case (i.e. in classical physics), it’s true that we’re interested in the real component of y only. But then it’s nice the analysis is valid for complex amplitudes as well, because we’ll be talking complex amplitudes in quantum physics.

[…] OK. I acknowledge it all looks very tricky so let’s see what we’d get using our old-fashioned sine and/or cosine function. So let’s write F(x + ct) as cos(ωt+ωx/c) and F(−x + ct) as cos(ωt−ωx/c). So we write y = cos(ωt+ωx/c) − cos(ωt−ωx/c). Now work on this using the cos(α+β) = cosα·cosβ − sinα·sinβ formula and the cos(−α) = cosα and sin(−α) = −sinα identities. You (should) get: y = −2sin(ωt)·sin(ωx/c). So that’s the real component in our y function above indeed. So, yes, we do get the same results when doing this funny business using complex exponentials as we’d get when sticking to real stuff only! Fortunately! 🙂

[Why did I get off-track again? Well… It’s true these conversions from real to complex amplitudes should not be done carelessly. It is tricky and non-intuitive, to say the least. The weird thing about it is that, if we multiply two imaginary components, we get a real component, because i2 is a real number: it’s −1! So it’s fascinating indeed: we add an imaginary component to our real-valued function, do all kinds of manipulations with – including stuff that involves the use of the i2 = −1 – and, when done, we just take out the real component and it’s alright: we know that the result is OK because of the ‘magic’ of complex numbers! In any case, I need to move on so I can’t dwell on this. I also explained much of the ‘magic’ in other posts already, so I shouldn’t repeat myself. If you’re interested, click on this link, for instance.]

Let’s go back to our y = – 2sin(ωt)·sin(ωx/c) function. So that’s the oscillation. Just look at the equation and think about what it tells us. Suppose we fix x, so we’re looking at one point on the string only and only let t vary: then sin(ωx/c) is some constant and it’s our sin(ωt) factor that goes up and down. So our oscillation has frequency ω, at every point x, so that’s everywhere!

Of course, this result shouldn’t surprise us, should it? That’s what we put in when we wrote F as F(x + ct) = eiω(t+x/c) or as cos(ωt+ωx/c), isn’t it? Well… Yes and no. Yes, because you’re right: we put in that angular frequency. But then, no, because we’re talking a composite wave here: a wave traveling up and down, with the components traveling in opposite directions. Indeed, we’ve also got that G(x) = −F(–x) function here. So, no, it’s not quite the same.

Let’s fix t now, and take a snapshot of the whole wave, so now we look at x as the variable and sin(ωt) is some constant. What we see is a sine wave, and sin(ωt) is its maximum amplitude. Again, you’ll say: of course! Well… Yes. The thing is: the point where the amplitude of our oscillation is equal to zero, is always the same, regardless of t. So we have fixed nodes indeed. Where are they? The nodes are, obviously, the points where sin(ωx/c) = 0, so that’s when ωx/c is equal to 0, obviously, or – more importantly – whenever ωx/c is equal to π, 2π, 3π, 4π, etcetera. More, generally, we can say whenever ωx/c = n·π with n = 0, 1, 2,… etc. Now, that’s the same as writing x = n·π·c/ω = n·π/k = n·π·λ/2π = n·λ/2.

Now let’s remind ourselves of what λ really is: for the fundamental frequency it’s twice the length of the string, so λ = 2·L. For the next mode (i.e. the second harmonic), it’s the length itself: λ = L. For the third, it’s λ = (2/3)·L, etcetera. So, in general, it’s λ = (2/m)·L with m = 1, 2, etcetera. [We may or may not want to include a zero mode by allowing m to equal zero as well, so then there’s no oscillation and y = 0 everywhere. 🙂 But that’s a minor point.] In short, our grand result is:

x = n·λ/2 = n·(2/m)·L/2 = (n/m)·L

Of course, we have to exclude the x points lying outside of our string by imposing that n/m ≤ 1, i.e. the condition that n ≤ m. So for m = 1, n is 0 or 1, so the nodes are, effectively, both ends of the string. For m = 2, n can be 0, 1 and 2, so the nodes are the ends of the string and it’s middle point L/2. And so on and so on.

I know that, by now, you’ve given up. So no one is reading anymore and so I am basically talking to myself now. What’s the point? Well… I wanted to get here in order to define the concept of a mode: a mode is a pattern of motion, which has the property that, at any point, the object moves perfectly sinusoidally, and that all points move at the same frequency (though some will move more than others). Modes also have nodes, i.e. points that don’t move at all, and above I showed how we can find the nodes of the modes of a one-dimensional string.

Also note how remarkable that result actually is: we didn’t specify anything about that string, so we don’t care about its material or diameter or tension or whatever. Still, we know its fundamental (or normal modes), and we know their nodes: they’re a function of the length of the string, and the number of the mode only: x = (n/m)·L. While an oscillating string may seem to be the most simple thing on earth, it isn’t: think of all the forces between the molecules, for instance, as that string is vibrating. Still, we’ve got this remarkably simple formula. Don’t you find that amazing?

[…] OK… If you’re still reading, I know you want me to move on, so I’ll just do that.

Back to two dimensions

The modes are all that matters: when linear forces (i.e. linear systems) are involved, any motion can be analyzed as the sum of the motions of all the different modes, combined with appropriate amplitudes and phases. Let me reproduce the Fourier series once more (the more you see, the better you’ll understand it—I should hope!): Of course, we should generalize this also include x as a variable which, again, is easier if we’d use complex exponentials instead of the sinusoidal components. The nice illustration on Fourier analysis from Wikipedia shows how it works, in essence, that is. The red function below consists of six of those modes.

OK. Enough of this. Let’s go to the two-dimensional case now. To simplify the analysis, Feynman invented a rectangular drum. A rectangular drum is probably more difficult to play, but it’s easier to analyze—as compared to a circular drum, that is! 🙂

In two dimensions, our sinusoidal one-dimensional ei(ωt−kx) waveform becomes ei(ωt−kxx−kyy). So we have a wavenumber for the x and y directions, and the sign in front is determined by the direction of the wave, so we need to check whether it moves in the positive or negative direction of the x- and y-axis respectively. Now, we can rewrite ei(ωt+kxx+kyy) as eiωt·ei(ωt+kxx+kyy), of course, which is what you see in the diagram above, except that the wave is moving in the negative y direction and, hence, we’ve got + sign in front of our kyy term. All the rest is rather well explained in Feynman, so I’ll refer you to the textbook here.

We basically need to ensure that we have a nodal line at x = 0 and at x = a, and then we do the same for y = 0 and y = a. Then we apply exactly the same logic as for the one-dimensional string: the wave needs to be coherently reflected. The analysis is somewhat more complicated because it involves some angle of incidence now, i.e. the θ in the diagram above, so that’s another page in Feynman’s textbook. And then we have the same gymnastics for finding wavelengths in terms of the dimensions and b, as well as in terms of n and m, where n is the number of the mode involved when fixing the nodal lines at x = 0 and x = a, and m is the number of the mode involved when fixing the nodal lines at = 0 and y = b. Sounds difficult? Well… Yes. But I won’t copy Feynman here. Just go and check for yourself.

The grand result is that we do get some formula for a wavelength λ of what satisfies the definition of a mode: a perfectly sinusoidal motion, that has all points on the drum move at the same frequency, though some move more than others. Also, as evidenced from my illustration for the circular disk: we’ve got nodal lines, and then I mean other nodal lines, different from the edges! I’ll just give you that formula here (again, for the detail, go and check Feynman yourself):

Feynman also works out an example for a = 2b. I’ll just copy the results hereunder, which is a formula for the (angular) frequencies ω, and a table of the mode shapes in a qualitative way (I’ll leave it to you to google animations that match the illustration).

Again, we should note the amazing simplicity of the result: we don’t care about the type of membrane or whatever other material the drum is made of. It’s proportions are all that matters.

Finally, you should also note the last two columns in the table above: these just show to illustrate that, unlike our modes in the one-dimensional case, the natural frequencies here are not multiples of the fundamental frequency. As Feynman notes, we should not be led astray by the example of the one-dimensional ideal string. It’s again a departure from the Pythagorean idea, that all in Nature respects harmonic ratios. It’s just not true. Let me quote Feynman, as I have no better summary: “The idea that the natural frequencies are harmonically related is not generally true. It is not true for a system with more than one dimension, nor is it true for one-dimensional systems which are more complicated than a string with uniform density and tension.

So… That says it all, I’d guess. Maybe I should just quote his example of a one-dimensional system that does not obey Pythagoras’ prescription: a hanging chain which, because of the weight of the chain, has higher tension at the top than at the bottom. If such chain is set in oscillation, there are various modes and frequencies, but the frequencies will not be simply multiples of each other, nor of any other number. It is also interesting to note that the mode shapes will also not be sinusoidal. However, here we’re getting into non-linear dynamics, and so I’ll you read about that elsewhere too: once again, Feynman’s analysis of non-linear systems is very accessible and an interesting read. Hence, I warmly recommend it.

Modes in three dimensions and in quantum mechanics.

Well… Unlike what you might expect, I won’t bury you under formulas this time. Let me refer you, instead, to Wikipedia’s article on the so-called Leidenfrost effect. Just do it. Don’t bother too much about the text, scroll down a bit, and play the video that comes with it. I saw it, sort of by accident, and, at first, I thought it was something very high-tech. But no: it’s just a drop of water skittering around in a hot pan. It takes on all kinds of weird forms and oscillates in the weirdest of ways, but all is nothing but an excitation of the various normal modes of it, with various amplitudes and phases, of course, as a Fourier analysis of the phenomenon dictates.

There’s plenty of other stuff around to satisfy your curiosity, all quite understandable and fun—because you now understand the basics of it for the one- and two-dimensional case.

So… Well… I’ve kept this section extremely short, because now I want to say a few words about quantum-mechanical systems. Well… In fact, I’ll simply quote Feynman on it, because he writes about in a style that’s unsurpassed. He also nicely sums up the previous conversation. Here we go:

The ideas discussed above are all aspects of what is probably the most general and wonderful principle of mathematical physics. If we have a linear system whose character is independent of the time, then the motion does not have to have any particular simplicity, and in fact may be exceedingly complex, but there are very special motions, usually a series of special motions, in which the whole pattern of motion varies exponentially with the time. For the vibrating systems that we are talking about now, the exponential is imaginary, and instead of saying “exponentially” we might prefer to say “sinusoidally” with time. However, one can be more general and say that the motions will vary exponentially with the time in very special modes, with very special shapes. The most general motion of the system can always be represented as a superposition of motions involving each of the different exponentials.

This is worth stating again for the case of sinusoidal motion: a linear system need not be moving in a purely sinusoidal motion, i.e., at a definite single frequency, but no matter how it does move, this motion can be represented as a superposition of pure sinusoidal motions. The frequency of each of these motions is a characteristic of the system, and the pattern or waveform of each motion is also a characteristic of the system. The general motion in any such system can be characterized by giving the strength and the phase of each of these modes, and adding them all together. Another way of saying this is that any linear vibrating system is equivalent to a set of independent harmonic oscillators, with the natural frequencies corresponding to the modes.

In quantum mechanics the vibrating object, or the thing that varies in space, is the amplitude of a probability function that gives the probability of finding an electron, or system of electrons, in a given configuration. This amplitude function can vary in space and time, and satisfies, in fact, a linear equation. But in quantum mechanics there is a transformation, in that what we call frequency of the probability amplitude is equal, in the classical idea, to energy. Therefore we can translate the principle stated above to this case by taking the word frequency and replacing it with energy. It becomes something like this: a quantum-mechanical system, for example an atom, need not have a definite energy, just as a simple mechanical system does not have to have a definite frequency; but no matter how the system behaves, its behavior can always be represented as a superposition of states of definite energy. The energy of each state is a characteristic of the atom, and so is the pattern of amplitude which determines the probability of finding particles in different places. The general motion can be described by giving the amplitude of each of these different energy states. This is the origin of energy levels in quantum mechanics. Since quantum mechanics is represented by waves, in the circumstance in which the electron does not have enough energy to ultimately escape from the proton, they are confined waves. Like the confined waves of a string, there are definite frequencies for the solution of the wave equation for quantum mechanics. The quantum-mechanical interpretation is that these are definite energies. Therefore a quantum-mechanical system, because it is represented by waves, can have definite states of fixed energy; examples are the energy levels of various atoms.

Isn’t that great? What a summary! It also shows a deeper understanding of classical physics makes it sooooo much better to read something about quantum mechanics. In any case, as for the examples, I should add – because that’s what you’ll often find when you google for quantum-mechanical modes – the vibrational modes of molecules. There’s tons of interesting analysis out there, and so I’ll let you now have fun with it yourself! 🙂

# Music and Math

I ended my previous post, on Music and Physics, by emphatically making the point that music is all about structure, about mathematical relations. Let me summarize the basics:

1. The octave is the musical unit, defined as the interval between two pitches with the higher frequency being twice the frequency of the lower pitch. Let’s denote the lower and higher pitch by a and b respectively, so we say that b‘s frequency is twice that of a.

2. We then divide the [a, b] interval (whose length is unity) in twelve equal sub-intervals, which define eleven notes in-between a and b. The pitch of the notes in-between is defined by the exponential function connecting a and b. What exponential function? The exponential function with base 2, so that’s the function y = 2x.

Why base 2? Because of the doubling of the frequencies when going from a to b, and when going from b to b + 1, and from b + 1 to b + 2, etcetera. In music, we give a, b, b + 1, b + 2, etcetera the same name, or symbol: A, for example. Or Do. Or C. Or Re. Whatever. If we have the unit and the number of sub-intervals, all the rest follows. We just add a number to distinguish the various As, or Cs, or Gs, so we write A1, A2, etcetera. Or C1, C2, etcetera. The graph below illustrates the principle for the interval between C4 and C5. Don’t think the function is linear. It’s exponential: note the logarithmic frequency scale. To make the point, I also inserted another illustration (credit for that graph goes to another blogger).

You’ll wonder: why twelve sub-intervals? Well… That’s random. Non-Western cultures use a different number. Eight instead of twelve, for example—which is more logical, at first sight at least: eight intervals amounts to dividing the interval in two equal halves, and the halves in halves again, and then once more: so the length of the sub-interval is then 1/2·1/2·1/2 = (1/2)3 = 1/8. But why wouldn’t we divide by three, so we have 9 = 3·3 sub-intervals? Or by 27 = 3·3·3? Or by 16? Or by 5?

The answer is: we don’t know. The limited sensitivity of our ear demands that the intervals be cut up somehow. [You can do tests of the sensitivity of your ear to relative frequency differences online: it’s fun. Just try them! Some of the sites may recommend a hearing aid, but don’t take that crap.] So… The bottom line is that, somehow, mankind settled on twelve sub-intervals within our musical unit—or our sound unit, I should say. So it is what it is, and the ratio of the frequencies between two successive (semi)tones (e.g. C and C#, or E and F, as E and F are also separated by one half-step only) is 21/12 = 1.059463… Hence, the pitch of each note is about 6% higher than the pitch of the previous note. OK. Next thing.

3. What’s the similarity between C1, C2, C3 etcetera? Or between A1, A2, A3 etcetera? The answer is: harmonics. The frequency of the first overtone of a string tuned at pitch A3 (i.e. 220 Hz) is equal to the fundamental frequency of a string tuned at pitch A4 (i.e. 440 Hz). Likewise, the frequency of the (pitch of the) C4 note above (which is the so-called middle C) is 261.626 Hz, while the frequency of the (pitch of the) next C note (C5) is twice that frequency: 523.251 Hz. [I should quickly clarify the terminology here: a tone consists of several harmonics, with frequencies f, 2·f, 3·f,… n·f,… The first harmonic is referred to as the fundamental, with frequency f. The second, third, etc harmonics are referred to as overtones, with frequency 2·f, 3·f, etc.]

To make a long story short: our ear is able to identify the individual harmonics in a tone, and if the frequency of the first harmonic of one tone (i.e. the fundamental) is the same frequency as the second harmonic of another, then we feel they are separated by one musical unit.

Isn’t that most remarkable? Why would it be that way?

My intuition tells me I should look at the energy of the components. The energy theorem tells us that the total energy in a wave is just the sum of the energies in all of the Fourier components. Surely, the fundamental must carry most of the energy, and then the first overtone, and then the second. Really? Is that so?

Well… I checked online to see if there’s anything on that, but my quick check reveals there’s nothing much out there in terms of research: if you’d google ‘energy levels of overtones’, you’ll get hundreds of links to research on the vibrational modes of molecules, but nothing that’s related to music theory. So… Well… Perhaps this is my first truly original post! 🙂 Let’s go for it. 🙂

The energy in a wave is proportional to the square of its amplitude, and we must integrate over one period (T) of the oscillation. The illustration below should help you to understand what’s going on. The fundamental mode of the wave is an oscillation with a wavelength (λ1) that is twice the length of the string (L). For the second mode, the wavelength (λ2) is just L. For the third mode, we find that λ3 = (2/3)·L. More in general, the wavelength of the nth mode is λn = (2/n)·L.

The illustration above shows that we’re talking sine waves here, differing in their frequency (or wavelength) only. [The speed of the wave (c), as it travels back and forth along the string, i constant, so frequency and wavelength are in that simple relationship: c = f·λ.] Simplifying and normalizing (i.e. choosing the ‘right’ units by multiplying scales with some proportionality constant), the energy of the first mode would be (proportional to):

What about the second and third modes? For the second mode, we have two oscillations per cycle, but we still need to integrate over the period of the first mode T = T1, which is twice the period of the second mode: T1 = 2·T2. Hence, T2 = (1/2)·T1. Therefore, the argument of the sine wave (i.e. the x variable in the integral above) should go from 0 to 4π. However, we want to compare the energies of the various modes, so let’s substitute cleverly. We write:

The period of the third mode is equal to T3 = (1/3)·T1. Conversely, T1 = 3·T3. Hence, the argument of the sine wave should go from 0 to 6π. Again, we’ll substitute cleverly so as to make the energies comparable. We write:

Now that is interesting! For a so-called ideal string, whose motion is the sum of a sinusoidal oscillation at the fundamental frequency f, another at the second harmonic frequency 2·f, another at the third harmonic 3·f, etcetera, we find that the energies of the various modes are proportional to the values in the harmonic series 1, 1/2, 1/3, 1/4,… 1/n, etcetera. Again, Pythagoras’ conclusion was wrong (the ratio of frequencies of individual notes do not respect simple ratios), but his intuition was right: the harmonic series ∑n−1 (n = 1, 2,…,∞) is very relevant in describing natural phenomena. It gives us the respective energies of the various natural modes of a vibrating string! In the graph below, the values are represented as areas. It is all quite deep and mysterious really!

So now we know why we feel C4 and C5 have so much in common that we call them by the same name: C, or Do. It also helps us to understand why the E and A tones have so much in common: the third harmonic of the 110 Hz A2 string corresponds to the fundamental frequency of the E4 string: both are 330 Hz! Hence, E and A have ‘energy in common’, so to speak, but less ‘energy in common’ than two successive E notes, or two successive A notes, or two successive C notes (like C4 and C5).

[…] Well… Sort of… In fact, the analysis above is quite appealing but – I hate to say it – it’s wrong, as I explain in my post scriptum to this post. It’s like Pythagoras’ number theory of the Universe: the intuition behind is OK, but the conclusions aren’t quite right. 🙂

Ideality versus reality

We’ve been talking ideal strings. Actual tones coming out of actual strings have a quality, which is determined by the relative amounts of the various harmonics that are present in the tone, which is not some simple sum of sinusoidal functions. Actual tones have a waveform that may resemble something like the wavefunction I presented in my previous post, when discussing Fourier analysis. Let me insert that illustration once again (and let me also acknowledge its source once more: it’s Wikipedia). The red waveform is the sum of six sine functions, with harmonically related frequencies, but with different amplitudes. Hence, the energy levels of the various modes will not be proportional to the values in that harmonic series ∑n−1, with n = 1, 2,…,∞.

Das wohltemperierte Klavier

Nothing in what I wrote above is related to questions of taste like: why do I seldomly select a classical music channel on my online radio station? Or why am I not into hip hop, even if my taste for music is quite similar to that of the common crowd (as evidenced from the fact that I like ‘Listeners’ Top’ hit lists)?

Not sure. It’s an unresolved topic, I guess—involving rhythm and other ‘structures’ I did not mention. Indeed, all of the above just tells us a nice story about the structure of the language of music: it’s a story about the tones, and how they are related to each other. That relation is, in essence, an exponential function with base 2. That’s all. Nothing more, nothing less. It’s remarkably simple and, at the same time, endlessly deep. 🙂 But so it is not a story about the structure of a musical piece itself, of a pop song of Ellie Goulding, for instance, or one of Bach’s preludes or fugues.

That brings me back to the original question I raised in my previous post. It’s a question which was triggered, long time ago, when I tried to read Douglas Hofstadter‘s Gödel, Escher and Bach, frustrated because my brother seemed to understand it, and I didn’t. So I put it down, and never ever looked at it again. So what is it really about that famous piece of Bach?

Frankly, I still amn’t sure. As I mentioned in my previous post, musicians were struggling to find a tuning system that would allow them to easily transpose musical compositions. Transposing music amounts to changing the so-called key of a musical piece, so that’s moving the whole piece up or down in pitch by some constant interval that is not equal to an octave. It’s a piece of cake now. In fact, increasing or decreasing the playback speed of a recording also amounts to transposing a piece: a increase or decrease of the playback speed by 6% will shift the pitch up or down by about one semitone. Why? Well… Go back to what I wrote above about that 12th root of 2. We’ve got the right tuning system now, and so everything is easy. Logarithms are great! 🙂

Back to Bach. Despite their admiration for the Greek ideas around aesthetics – and, most notably, their fascination with harmonic ratios! – (almost) all Renaissance musicians were struggling with the so-called Pythagorean tuning system, which was used until the 18th century and which was based on a correct observation (similar strings, under the same tension but differing in length, sound ‘pleasant’ when sounded together if – and only if  – the ratio of the length of the strings is like 1:2, 2:3, 3:4, 3:5, 4:5, etcetera) but a wrong conclusion (the frequencies of musical tones should also obey the same harmonic ratios), and Bach’s so-called ‘good’ temperament tuning system was designed such that the piece could, indeed, be played in most keys without sounding… well… out of tune. 🙂

Having said that, the modern ‘equal temperament’ tuning system, which prescribes that tuning should be done such that the notes are in the above-described simple logarithmic relation to each other, had already been invented. So the true question is: why didn’t Bach embrace it? Why did he stick to ratios? Why did it take so long for the right system to be accepted?

I don’t know. If you google, you’ll find a zillion of possible explanations. As far as I can see, most are all rather mystic. More importantly, most of them do not mention many facts. My explanation is rather simple: while Bach was, obviously, a musical genius, he may not have understood what an exponential, or a logarithm, is all about. Indeed, a quick read of summary biographies reveals that Bach studied a wide range of topics, like Latin and Greek, and theology—of course! But math is not mentioned. He didn’t write about tuning and all that: all of his time went to writing musical masterpieces!

What the biographies do mention is that he always found other people’s tunings unsatisfactory, and that he tuned his harpsichords and clavichords himself. Now that is quite revealing, I’d say! In my view, Bach couldn’t care less about the ratios. He knew something was wrong with the Pythagorean system (or the variants as were then used, which are referred to as meantone temperament) and, as a musical genius, he probably ended up tuning by ear. [For those who’d wonder what I am talking about, let me quickly insert a Wikipedia graph illustrating the difference between the Pythagorean system (and two of these meantone variants) and the equal temperament tuning system in use today.]

So… What’s the point I am trying to make? Well… Frankly, I’d bet Bach’s own tuning was actually equal temperament, and so he should have named his masterpiece Das gleichtemperierte Klavier. Then we wouldn’t have all that ‘noise’ around it. 🙂

Post scriptum: Did you like the argument on the respective energy levels of the harmonics of an ideal string? Too bad. It’s wrong. I made a common mistake: when substituting variables in the integral, I ‘forgot’ to substitute the lower and upper bound of the interval over which I was integrating the function. The calculation below corrects the mistake, and so it does the required substitutions—for the first three modes at least. What’s going on here? Well… Nothing much… I just integrate over the length L taking a snapshot at t = 0 (as mentioned, we can always shift the origin of our independent variable, so here we do it for time and so it’s OK). Hence, the argument of our wave function sin(kx−ωt) reduces to kx, with k = 2π/λ, and λ = 2L, λ = L, λ = (2/3)·L for the first, second and third mode respectively. [As for solving the integral of the sine squared, you can google the formula, and please do check my substitutions. They should be OK, but… Well… We never know, do we? :-)]

[…] No… This doesn’t make all that much sense either. Those integrals yield the same energy for all three modes. Something must be wrong: shorter wavelengths (i.e. higher frequencies) are associated with higher energy levels. Full stop. So the ‘solution’ above can’t be right… […] You’re right. That’s where the time aspect comes into play. We were taking a snapshot, indeed, and the mean value of the sine squared function is 1/2 = 0.5, as should be clear from Pythagoras’ theorem: cos2x + sin2x = 1. So what I was doing is like integrating a constant function over the same-length interval. So… Well… Yes: no wonder I get the same value again and again.

[…]

We need to integrate over the same time interval. You could do that, as an exercise, but there’s a more direct approach to it: the energy of a wave is directly proportional to its frequency, so we write: E ∼ f. If the frequency doubles, triples, quadruples etcetera, then its energy doubles, triples, quadruples etcetera too. But – remember – we’re talking one string only here, with a fixed wave speed c = λ·f – so f = c/λ (read: the frequency is inversely proportional to the wavelength) – and, therefore (assuming the same (maximum) amplitude), we get that the energy level of each mode is inversely proportional to the wavelength, so we find that E ∼ 1/f.

Now, with direct or inverse proportionality relations, we can always invent some new unit that makes the relationship an identity, so let’s do that and turn it into an equation indeed. [And, yes, sorry… I apologize again to your old math teacher: he may not quite agree with the shortcut I am taking here, but he’ll justify the logic behind.] So… Remembering that λ1 = 2L, λ2 = L, λ3 = (2/3)·L, etcetera, we can then write:

E1 = (1/2)/L, E2 = (2/2)/L, E3 = (3/2)/L, E4 = (4/2)/L, E5 = (5/2)/L,…, En = (n/2)/L,…

That’s a really nice result, because… Well… In quantum theory, we have this so-called equipartition theorem, which says that the permitted energy levels of a harmonic oscillator are equally spaced, with the interval between them equal to h or ħ (if you use the angular frequency to describe a wave (so that’s ω = 2π·f), then Planck’s constant (h) becomes ħ = h/2π). So here we’ve got equipartition too, with the interval between the various energy levels equal to (1/2)/L.

You’ll say: So what? Frankly, if this doesn’t amaze you, stop reading—but if this doesn’t amaze you, you actually stopped reading a long time ago. 🙂 Look at what we’ve got here. We didn’t specify anything about that string, so we didn’t care about its materials or diameter or tension or how it was made (a wound guitar string is a terribly complicated thing!) or about whatever. Still, we know its fundamental (or normal) modes, and their frequency or nodes or energy or whatever depend on the length of the string only, with the ‘fundamental’ unit of energy being equal to the reciprocal length. Full stop. So all is just a matter of size and proportions. In other words, it’s all about structure. Absolute measurements don’t matter.

You may say: Bull****. What’s the conclusion? You still didn’t tell me anything about how the total energy of the wave is supposed to be distributed over its normal modes!

That’s true. I didn’t. Why? Well… I am not sure, really. I presented a lot of stuff here, but I did not present a clear and unambiguous answer as to how the total energy of a string is distributed over its modes. Not for actual strings, nor for ideal strings. Let me be honest: I don’t know. I really don’t. Having said that, my guts instinct that most of the energy – of, let’s say, a C4 note – should be in the primary mode (i.e. in the fundamental frequency) must be right: otherwise we would not call it a C4 note. So let’s try to make some assumptions. However, before doing so, let’s first briefly touch base with reality.

For actual strings (or actual musical sounds), I suspect the analysis can be quite complicated, as evidenced by the following illustration, which I took from one of the many interesting sites on this topic. Let me quote the author: “A flute is essentially a tube that is open at both ends. Air is blown across one end and sound comes out the other. The harmonics are all whole number multiples of the fundamental frequency (436 Hz, a slightly flat A4 — a bit lower in frequency than is normally acceptable). Note how the second harmonic is nearly as intense as the fundamental. [My = blog writer’s 🙂 italics] This strong second harmonic is part of what makes a flute sound like a flute.”

Hmmm… What I see in the graph is a first harmonic that is actually more intense than its fundamental, so what’s that all about? So can we actually associate a specific frequency to that tone? Not sure. So we’re in trouble already.

If reality doesn’t match our thinking, what about ideality? Hmmm… What to say? As for ideal strings – or ideal flutes 🙂 – I’d venture to say that the most obvious distribution of energy over the various modes (or harmonics, when we’re talking sound) would is the Boltzmann distribution.

Huh? Yes. Have a look at one of my posts on statistical mechanics. It’s a weird thing: the distribution of molecular speeds in a gas, or the density of the air in the atmosphere, or whatever involving many particles and/or a great degree of complexity (so many, or such a degree of complexity, that only some kind of statistical approach to the problem works—all that involves Boltzmann’s Law, which basically says the distribution function will be a function of the energy levels involved: fe–energy. So… Well… Yes. It’s the logarithmic scale again. It seems to govern the Universe. 🙂

Huh? Yes. That’s why think: the distribution of the total energy of the oscillation should be some Boltzmann function, so it should depend on the energy of the modes: most of the energy will be in the lower modes, and most of the most in the fundamental. […] Hmmm… It again begs the question: how much exactly?

Well… The Boltzmann distribution strongly resembles the ‘harmonic’ distribution shown above (1, 1/2, 1/3, 1/4 etc), but it’s not quite the same. The graph below shows how they are similar and dissimilar in shape. You can experiment yourself with coefficients and all that, but your conclusion will be the same. As they say in Asia: they are “same-same but different.” 🙂 […] It’s like the ‘good’ and ‘equal’ temperament used when tuning musical instruments: the ‘good’ temperament – which is based on harmonic ratios – is good, but not good enough. Only the ‘equal’ temperament obeys the logarithmic scale and, therefore, is perfect. So, as I mentioned already, while my assumption isn’t quite right (the distribution is not harmonic, in the Pythagorean sense), the intuition behind is OK. So it’s just like Pythagoras’ number theory of the Universe. Having said that, I’ll leave it to you to draw the correct the conclusions from it. 🙂

# Music and physics

My first working title for this post was Music and Modes. Yes. Modes. Not moods. The relation between music and moods is an interesting research topic as well but so it’s not what I am going to write about. 🙂

It started with me thinking I should write something on modes indeed, because the concept of a mode of a wave, or any oscillator really, is quite central to physics, both in classical physics as well as in quantum physics (quantum-mechanical systems are analyzed as oscillators too!). But I wondered how to approach it, as it’s a rather boring topic if you look at the math only. But then I was flying back from Europe, to Asia, where I live and, as I am also playing a bit of guitar, I suddenly wanted to know why we like music. And then I thought that’s a question you may have asked yourself at some point of time too! And so then I thought I should write about modes as part of a more interesting story: a story about music—or, to be precise, a story about the physics behind music. So… Let’s go for it.

Philosophy versus physics

There is, of course, a very simple answer to the question of why we like music: we like music because it is music. If it would not be music, we would not like it. That’s a rather philosophical answer, and it probably satisfies most people. However, for someone studying physics, that answer can surely not be sufficient. What’s the physics behind? I reviewed Feynman’s Lecture on sound waves in the plane, combined it with some other stuff I googled when I arrived, and then I wrote this post, which gives you a much less philosophical answer. 🙂

The observation at the center of the discussion is deceptively simple: why is it that similar strings (i.e. strings made of the same material, with the same thickness, etc), under the same tension but differing in length, sound ‘pleasant’ when sounded together if – and only if  – the ratio of the length of the strings is like 1:2, 2:3, 3:4, 3:5, 4:5, etc (i.e. like whatever other ratio of two small integers)?

You probably wonder: is that the question, really? It is. The question is deceptively simple indeed because, as you will see in a moment, the answer is quite complicated. So complicated, in fact, that the Pythagoreans didn’t have any answer. Nor did anyone else for that matter—until the 18th century or so, when musicians, physicists and mathematicians alike started to realize that a string (of a guitar, or a piano, or whatever instrument Pythagoras was thinking of at the time), or a column of air (in a pipe organ or a trumpet, for example), or whatever other thing that actually creates the musical tone, actually oscillates at numerous frequencies simultaneously.

The Pythagoreans did not suspect that a string, in itself, is a rather complicated thing – something which physicists refer to as a harmonic oscillator – and that its sound, therefore, is actually produced by many frequencies, instead of only one. The concept of a pure note, i.e. a tone that is free of harmonics (i.e. free of all other frequencies, except for the fundamental frequency) also didn’t exist at the time. And if it did, they would not have been able to produce a pure tone anyway: producing pure tones – or notes, as I’ll call them, somewhat inaccurately (I should say: a pure pitch) – is remarkably complicated, and they do not exist in Nature. If the Pythagoreans would have been able to produce pure tones, they would have observed that pure tones do not give any sensation of consonance or dissonance if their relative frequencies respect those simple ratios. Indeed, repeated experiments, in which such pure tones are being produced, have shown that human beings can’t really say whether it’s a musical sound or not: it’s just sound, and it’s neither pleasant (or consonant, we should say) or unpleasant (i.e. dissonant).

The Pythagorean observation is valid, however, for actual (i.e. non-pure) musical tones. In short, we need to distinguish between tones and notes (i.e. pure tones): they are two very different things, and the gist of the whole argument is that musical tones coming out of one (or more) string(s) under tension are full of harmonics and, as I’ll explain in a minute, that’s what explains the observed relation between the lengths of those strings and the phenomenon of consonance (i.e. sounding ‘pleasant’) or dissonance (i.e. sounding ‘unpleasant’).

Of course, it’s easy to say what I say above: we’re 2015 now, and so we have the benefit of hindsight. Back then –  so that’s more than 2,500 years ago! – the simple but remarkable fact that the lengths of similar strings should respect some simple ratio if they are to sound ‘nice’ together, triggered a fascination with number theory (in fact, the Pythagoreans actually established the foundations of what is now known as number theory). Indeed, Pythagoras felt that similar relationships should also hold for other natural phenomena! To mention just one example, the Pythagoreans also believed that the orbits of the planets would also respect such simple numerical relationships, which is why they talked of the ‘music of the spheres’ (Musica Universalis).

We now know that the Pythagoreans were wrong. The proportions in the movements of the planets around the Sun do not respect simple ratios and, with the benefit of hindsight once again, it is regrettable that it took many courageous and brilliant people, such as Galileo Galilei and Copernicus, to convince the Church of that fact. 😦 Also, while Pythagoras’ observations in regard to the sounds coming out of whatever strings he was looking at were correct, his conclusions were wrong: the observation does not imply that the frequencies of musical notes should all be in some simple ratio one to another.

Let me repeat what I wrote above: the frequencies of musical notes are not in some simple relationship one to another. The frequency scale for all musical tones is logarithmic and, while that implies that we can, effectively, do some tricks with ratios based on the properties of the logarithmic scale (as I’ll explain in a moment), the so-called ‘Pythagorean’ tuning system, which is based on simple ratios, was plain wrong, even if it – or some variant of it (instead of the 3:2 ratio, musicians used the 5:4 ratio from about 1510 onwards) – was generally used until the 18th century! In short, Pythagoras was wrong indeed—in this regard at least: we can’t do much with those simple ratios.

Having said that, Pythagoras’ basic intuition was right, and that intuition is still very much what drives physics today: it’s the idea that Nature can be described, or explained (whatever that means), by quantitative relationships only. Let’s have a look at how it actually works for music.

Tones, noise and notes

Let’s first define and distinguish tones and notes. A musical tone is the opposite of noise, and the difference between the two is that musical tones are periodic waveforms, so they have a period T, as illustrated below. In contrast, noise is a non-periodic waveform. It’s as simple as that.

Now, from previous posts, you know we can write any period function as the sum of a potentially infinite number of simple harmonic functions, and that this sum is referred to as the Fourier series. I am just noting it here, so don’t worry about it as for now. I’ll come back to it later.

You also know we have seven musical notes: Do-Re-Mi-Fa-Sol-La-Si or, more common in the English-speaking world, A-B-C-D-E-F-G. And then it starts again with A (or Do). So we have two notes, separated by an interval which is referred to as an octave (from the Greek octo, i.e. eight), with six notes in-between, so that’s eight notes in total. However, you also know that there are notes in-between, except between E and F and between B and C. They are referred to as semitones or half-steps. I prefer the term ‘half-step’ over ‘semitone’, because we’re talking notes really, not tones.

We have, for example, F–sharp (denoted by F#), which we can also call G-flat (denoted by Gb). It’s the same thing: a sharp # raises a note by a semitone (aka half-step), and a flat b lowers it by the same amount, so F# is Gb. That’s what shown below: in an octave, we have eight notes but twelve half-steps.

Let’s now look at the frequencies. The frequency scale above (expressed in oscillations per second, so that’s the hertz unit) is a logarithmic scale: frequencies double as we go from one octave to another: the frequency of the C4 note above (the so-called middle C) is 261.626 Hz, while the frequency of the next C note (C5) is double that: 523.251 Hz. [Just in case you’d want to know: the 4 and 5 number refer to its position on a standard 88-key piano keyboard: C4 is the fourth C key on the piano.]

Now, if we equate the interval between C4 and C5 with 1 (so the octave is our musical ‘unit’), then the interval between the twelve half-steps is, obviously, 1/12. Why? Because we have 12 halve-steps in our musical unit. You can also easily verify that, because of the way logarithms work, the ratio of the frequencies of two notes that are separated by one half-step (between D# and E, for example) will be equal to 21/12. Likewise, the ratio of the frequencies of two notes that are separated by half-steps is equal to 2n/12. [In case you’d doubt, just do an example. For instance, if we’d denote the frequency of C4 as f0, and the frequency of C# as f1 and so on (so the frequency of D is f2, the frequency of C5 is f12, and everything else is in-between), then we can write the f2/fratio as f2/f= ( f2/f1)(f1/f0) =  21/12·21/12 = 22/12 = 21/6. I must assume you’re smart enough to generalize this result yourself, and that f12/fis, obviously, equal to 212/12 =21 = 2, which is what it should be!]

Now, because the frequencies of the various C notes are expressed as a number involving some decimal fraction (like 523.251 Hz, and the 0.251 is actually an approximation only), and because they are, therefore, a bit hard to read and/or work with, I’ll illustrate the next idea – i.e. the concept of harmonics – with the A instead of the C. 🙂

Harmonics

The lowest A on a piano is denoted by A0, and its frequency is 27.5 Hz. Lower A notes exist (we have one at 13.75 Hz, for instance) but we don’t use them, because they are near (or actually beyond) the limit of the lowest frequencies we can hear. So let’s stick to our grand piano and start with that 27.5 Hz frequency. The next A note is A1, and its frequency is 55 Hz. We then have A2, which is like the A on my (or your) guitar: its frequency is equal to 2×55 = 110 Hz. The next is A3, for which we double the frequency once again: we’re at 220 Hz now. The next one is the A in the illustration of the C scale above: A4, with a frequency of 440 Hz.

[Let me, just for the record, note that the A4 note is the standard tuning pitch in Western music. Why? Well… There’s no good reason really, except convention. Indeed, we can derive the frequency of any other note from that A4 note using our formula for the ratio of frequencies but, because of the properties of a logarithmic function, we could do the same using whatever other note really. It’s an important point: there’s no such thing as an absolute reference point in music: once we define our musical ‘unit’ (so that’s the so-called octave in Western music), and how many steps we want to have in-between (so that’s 12 steps—again, in Western music, that is), we get all the rest. That’s just how logarithms work. So music is all about structure, i.e. mathematical relationships. Again, Pythagoras’ conclusions were wrong, but his intuition was right.]

Now, the notes we are talking about here are all so-called pure tones. In fact, when I say that the A on our guitar is referred to as A2 and that it has a frequency of 110 Hz, then I am actually making a huge simplification. Worse, I am lying when I say that: when you play a string on a guitar, or when you strike a key on a piano, all kinds of other frequencies – so-called harmonics – will resonate as well, and that’s what gives the quality to the sound: it’s what makes it sound beautiful. So the fundamental frequency (aka as first harmonic) is 110 Hz alright but we’ll also have second, third, fourth, etc harmonics with frequency 220 Hz, 330 Hz, 440 Hz, etcetera. In music, the basic or fundamental frequency is referred to as the pitch of the tone and, as you can see, I often use the term ‘note’ (or pure tone) as a synonym for pitch—which is more or less OK, but not quite correct actually. [However, don’t worry about it: my sloppiness here does not affect the argument.]

What’s the physics behind? Look at the illustration below (I borrowed it from the Physics Classroom site). The thick black line is the string, and the wavelength of its fundamental frequency (i.e. the first harmonic) is twice its length, so we write λ1 = 2·L or, the other way around, L = (1/2)·λ1. Now that’s the so-called first mode of the string. [One often sees the term fundamental or natural or normal mode, but the adjective is not necessary really. In fact, I find it confusing, although I sometimes find myself using it too.]

We also have a second, third, etc mode, depicted below, and these modes correspond to the second, third, etc harmonic respectively.

For the second, third, etc mode, the relationship between the wavelength and the length of the string is, obviously, the following: L = (2/2)·λ= λ2, L = L = (3/2)·λ3, etc. More in general, for the nth mode, L will be equal to L = (n/2)·λn, with n = 1, 2, etcetera. In fact, because L is supposed to be some fixed length, we should write it the other way around: λn = (2/n)·L.

What does it imply for the frequencies? We know that the speed of the wave – let’s denote it by c – as it travels up and down the string, is a property of the string, and it’s a property of the string only. In other words, it does not depend on the frequency. Now, the wave velocity is equal to the frequency times the wavelength, always, so we have c = f·λ. To take the example of the (classical) guitar string: its length is 650 mm, i.e. 0.65 m. Hence, the identities λ1 = (2/1)·L, λ2 = (2/2)·L, λ3 = (2/3)·L etc become λ1 = (2/1)·0.65 = 1.3 m, λ2 = (2/2)·0.65 = 0.65 m, λ3 = (2/3)·0.65 = 0.433.. m and so on. Now, combining these wavelengths with the above-mentioned frequencies, we get the wave velocity c = (110 Hz)·(1.3 m) = (220 Hz)·(0.65 m) = (330 Hz)·(0.433.. m) = 143 m/s.

Let me now get back to Pythagoras’ string. You should note that the frequencies of the harmonics produced by a simple guitar string are related to each other by simple whole number ratios. Indeed, the frequencies of the first and second harmonics are in a simple 2 to 1 ratio (2:1). The second and third harmonics have a 3:2 frequency ratio. The third and fourth harmonics a 4:3 ratio. The fifth and fourth harmonic 5:4, and so on and so on. They have to be. Why? Because the harmonics are simple multiples of the basic frequency. Now that is what’s really behind Pythagoras’ observation: when he was sounding similar strings with the same tension but different lengths, he was making sounds with the same harmonics. Nothing more, nothing less.

Let me be quite explicit here, because the point that I am trying to make here is somewhat subtle. Pythagoras’ string is Pythagoras’ string: he talked similar strings. So we’re not talking some actual guitar or a piano or whatever other string instrument. The strings on (modern) string instruments are not similar, and they do not have the same tension. For example, the six strings of a guitar strings do not differ in length (they’re all 650 mm) but they’re different in tension. The six strings on a classical guitar also have a different diameter, and the first three strings are plain strings, as opposed to the bottom strings, which are wound. So the strings are not similar but very different indeed. To illustrate the point, I copied the values below for just one of the many commercially available guitar string sets.  It’s the same for piano strings. While they are somewhat more simple (they’re all made of piano wire, which is very high quality steel wire basically), they also differ—not only in length but in diameter as well, typically ranging from 0.85 mm for the highest treble strings to 8.5 mm (so that’s ten times 0.85 mm) for the lowest bass notes.

In short, Pythagoras was not playing the guitar or the piano (or whatever other more sophisticated string instrument that the Greeks surely must have had too) when he was thinking of these harmonic relationships. The physical explanation behind his famous observation is, therefore, quite simple: musical tones that have the same harmonics sound pleasant, or consonant, we should say—from the Latin con-sonare, which, literally, means ‘to sound together’ (from sonare = to sound and con = with). And otherwise… Well… Then they do not sound pleasant: they are dissonant.

To drive the point home, let me emphasize that, when we’re plucking a string, we produce a sound consisting of many frequencies, all in one go. One can see it in practice: if you strike a lower A string on a piano – let’s say the 110 Hz A2 string – then its second harmonic (220 Hz) will make the A3 string vibrate too, because it’s got the same frequency! And then its fourth harmonic will make the A4 string vibrate too, because they’re both at 440 Hz. Of course, the strength of these other vibrations (or their amplitude we should say) will depend on the strength of the other harmonics and we should, of course, expect that the fundamental frequency (i.e. the first harmonic) will absorb most of the energy. So we pluck one string, and so we’ve got one sound, one tone only, but numerous notes at the same time!

In this regard, you should also note that the third harmonic of our 110 Hz A2 string corresponds to the fundamental frequency of the E4 tone: both are 330 Hz! And, of course, the harmonics of E, such as its second harmonic (2·330 Hz = 660 Hz) correspond to higher harmonics of A too! To be specific, the second harmonic of our E string is equal to the sixth harmonic of our A2 string. If your guitar is any good, and if your strings are of reasonable quality too, you’ll actually see it: the (lower) E and A strings co-vibrate if you play the A major chord, but by striking the upper four strings only. So we’ve got energy – motion really – being transferred from the four strings you do strike to the two strings you do not strike! You’ll say: so what? Well… If you’ve got any better proof of the actuality (or reality) of various frequencies being present at the same time, please tell me! 🙂

So that’s why A and E sound very well together (A, E and C#, played together, make up the so-called A major chord): our ear likes matching harmonics. And so that why we like musical tones—or why we define those tones as being musical! 🙂 Let me summarize it once more: musical tones are composite sound waves, consisting of a fundamental frequency and so-called harmonics (so we’ve got many notes or pure tones altogether in one musical tone). Now, when other musical tones have harmonics that are shared, and we sound those notes too, we get the sensation of harmony, i.e. the combination sounds consonant.

Now, i’s not difficult to see that we will always have such shared harmonics if we have similar strings, with the same tension but different lengths, being sounded together. In short, what Pythagoras observed has nothing much to do with notes, but with tones. Let’s go a bit further in the analysis now by introducing some more math. And, yes, I am very sorry: it’s the dreaded Fourier analysis indeed! 🙂

Fourier analysis

You know that we can decompose any periodic function into a sum of a (potentially infinite) series of simple sinusoidal functions, as illustrated below. I took the illustration from Wikipedia: the red function s6(x) is the sum of six sine functions of different amplitudes and (harmonically related) frequencies. The so-called Fourier transform S(f) (in blue) relates the six frequencies with the respective amplitudes.

In light of the discussion above, it is easy to see what this means for the sound coming from a plucked string. Using the angular frequency notation (so we write everything using ω instead of f), we know that the normal or natural modes of oscillation have frequencies ω = 2π/T = 2πf  (so that’s the fundamental frequency or first harmonic), 2ω (second harmonic), 3ω (third harmonic), and so on and so on.

Now, there’s no reason to assume that all of the sinusoidal functions that make up our tone should have the same phase: some phase shift Φ may be there and, hence, we should write our sinusoidal function  not as cos(ωt), but as cos(ωt + Φ) in order to ensure our analysis is general enough. [Why not a sine function? It doesn’t matter: the cosine and sine function are the same, except for another phase shift of 90° = π/2.] Now, from our geometry classes, we know that we can re-write cos(ωt + Φ) as

cos(ωt + Φ) = [cos(Φ)cos(ωt) – sin(Φ)sin(ωt)]

We have a lot of these functions of course – one for each harmonic, in fact – and, hence, we should use subscripts, which is what we do in the formula below, which says that any function f(t) that is periodic with the period T can be written mathematically as:

You may wonder: what’s that period T? It’s the period of the fundamental mode, i.e. the first harmonic. Indeed, the period of the second, third, etc harmonic will only be one half, one third etcetera of the period of the first harmonic. Indeed, T2 = (2π)/(2ω) = (1/2)·(2π)/ω = (1/2)·T1, and T3 = (2π)/(3ω) = (1/3)·(2π)/ω = (1/3)·T1, and so on. However, it’s easy to see that these functions also repeat themselves after two, three, etc periods respectively. So all is alright, and the general idea behind the Fourier analysis is further illustrated below. [Note that both the formula as well as the illustration below (which I took from Feynman’s Lectures) add a ‘zero-frequency term’ a0 to the series. That zero-frequency term will usually be zero for a musical tone, because the ‘zero’ level of our tone will be zero indeed. Also note that the an and bn coefficients are, of course, equal to an = cos Φand b= –sinΦn, so you can relate the illustration and the formula easily.]

You’ll say: What the heck! Why do we need the mathematical gymnastics here? It’s just to understand that other characteristic of a musical tone: its quality (as opposed to its pitch). A so-called rich tone will have strong harmonics, while a pure tone will only have the first harmonic. All other characteristics – the difference between a tone produced by a violin as opposed to a piano – are then related to the ‘mix’ of all those harmonics.

So we have it all now, except for loudness which is, of course, related to the magnitude of the air pressure changes as our waveform moves through the air: pitch, loudness and quality. that’s what makes a musical tone. 🙂

Dissonance

As mentioned above, if the sounds are not consonant, they’re dissonant. But what is dissonance really? What’s going on? The answer is the following: when two frequencies are near to a simple fraction, but not exact, we get so-called beats, which our ear does not like.

Huh? Relax. The illustration below, which I copied from the Wikipedia article on piano tuning, illustrates the phenomenon. The blue wave is the sum of the red and the green wave, which are originally identical. But then the frequency of the green wave is increased, and so the two waves are no longer in phase, and the interference results in a beating pattern. Of course, our musical tone involves different frequencies and, hence, different periods T1,T2, Tetcetera, but you get the idea: the higher harmonics also oscillate with period T1, and if the frequencies are not in some exact ratio, then we’ll have a similar problem: beats, and our ear will not like the sound.

Of course, you’ll wonder: why don’t we like beats in tones? We can ask that, can’t we? It’s like asking why we like music, isn’t it? […] Well… It is and it isn’t. It’s like asking why our ear (or our brain) likes harmonics. We don’t know. That’s how we are wired. The ‘physical’ explanation of what is musical and what isn’t only goes so far, I guess. 😦

Pythagoras versus Bach

From all of what I wrote above, it is obvious that the frequencies of the harmonics of a musical tone are, indeed, related by simple ratios of small integers: the frequencies of the first and second harmonics are in a simple 2 to 1 ratio (2:1); the second and third harmonics have a 3:2 frequency ratio; the third and fourth harmonics a 4:3 ratio; the fifth and fourth harmonic 5:4, etcetera. That’s it. Nothing more, nothing less.

In other words, Pythagoras was observing musical tones: he could not observe the pure tones behind, i.e. the actual notesHowever, aesthetics led Pythagoras, and all musicians after him – until the mid-18th century – to also think that the ratio of the frequencies of the notes within an octave should also be simple ratios. From what I explained above, it’s obvious that it should not work that way: the ratio of the frequencies of two notes separated by n half-steps is 2n/12, and, for most values of n, 2n/12 is not some simple ratio. [Why? Just take your pocket calculator and calculate the value of 21/12: it’s 20.08333… = 1.0594630943… and so on… It’s an irrational number: there are no repeating decimals. Now, 2n/12 is equal to 21/12·21/12·…·21/12 (n times). Why would you expect that product to be equal to some simple ratio?]

So – I said it already – Pythagoras was wrong—not only in this but also in other regards, such as when he espoused his views on the solar system, for example. Again, I am sorry to have to say that, but it is what is: the Pythagoreans did seem to prefer mathematical ideas over physical experiment. 🙂 Having said that, musicians obviously didn’t know about any alternative to Pythagoras, and they had surely never heard about logarithmic scales at the time. So… Well… They did use the so-called Pythagorean tuning system. To be precise, they tuned their instruments by equating the frequency ratio between the first and the fifth tone in the C scale (i.e. the C and G, as they did not include the C#, D# and F# semitones when counting) with the ratio 3/2, and then they used other so-called harmonic ratios for the notes in-between.

Now, the 3/2 ratio is actually almost correct, because the actual frequency ratio is 27/12 (we have seven tones, including the semitones—not five!), and so that’s 1.4983, approximately. Now, that’s pretty close to 3/2 = 1.5, I’d say. 🙂 Using that approximation (which, I admit, is fairly accurate indeed), the tuning of the other strings would then also be done assuming certain ratios should be respected, like the ones below.

So it was all quite good. Having said that, good musicians, and some great mathematicians, felt something was wrong—if only because there were several so-called just intonation systems around (for an overview, check out the Wikipedia article on just intonation). More importantly, they felt it was quite difficult to transpose music using the Pythagorean tuning system. Transposing music amounts to changing the so-called key of a musical piece: what one does, basically, is moving the whole piece up or down in pitch by some constant interval that is not equal to an octave. Today, transposing music is a piece of cake—Western music at least. But that’s only because all Western music is played on instruments that are tuned using that logarithmic scale (technically, it’s referred to as the 12-tone equal temperament (12-TET) system). When you’d use one of the Pythagorean systems for tuning, a transposed piece does not sound quite right.

The first mathematician who really seemed to know what was wrong (and, hence, who also knew what to do) was Simon Stevin, who wrote a manuscript based on the ’12th root of 2 principle’ around AD 1600. It shouldn’t surprise us: the thinking of this mathematician from Bruges would inspire John Napier’s work on logarithms. Unfortunately, while that manuscript describes the basic principles behind the 12-TET system, it didn’t get published (Stevin had to run away from Bruges, to Holland, because he was protestant and the Spanish rulers at the time didn’t like that). Hence, musicians, while not quite understanding the math (or the physics, I should say) behind their own music, kept trying other tuning systems, as they felt it made their music sound better indeed.

One of these ‘other systems’ is the so-called ‘good’ temperament, which you surely heard about, as it’s referred to in Bach’s famous composition, Das Wohltemperierte Klavier, which he finalized in the first half of the 18th century. What is that ‘good’ temperament really? Well… It is what it is: it’s one of those tuning systems which made musicians feel better about their music for a number of reasons, all of which are well described in the Wikipedia article on it. But the main reason is that the tuning system that Bach recommended was a great deal better when it came to playing the same piece in another key. However, it still wasn’t quite right, as it wasn’t the equal temperament system (i.e. the 12-TET system) that’s in place now (in the West at least—the Indian music scale, for instance, is still based on simple ratios).

Why do I mention this piece of Bach? The reason is simple: you probably heard of it because it’s one of the main reference points in a rather famous book: Gödel, Escher and Bach—an Eternal Golden Braid. If not, then just forget about it. I am mentioning it because one of my brothers loves it. It’s on artificial intelligence. I haven’t read it, but I must assume Bach’s master piece is analyzed there because of its structure, not because of the tuning system that one’s supposed to use when playing it. So… Well… I’d say: don’t make that composition any more mystic than it already is. 🙂 The ‘magic’ behind it is related to what I said about A4 being the ‘reference point’ in music: since we’re using a universal logarithmic scale now, there’s no such thing as an absolute reference point any more: once we define our musical ‘unit’ (so that’s the so-called octave in Western music), and also define how many steps we want to have in-between (so that’s 12—in Western music, that is), we get all the rest. That’s just how logarithms work.

So, in short, music is all about structure, i.e. it’s all about mathematical relations, and about mathematical relations only. Again, Pythagoras’ conclusions were wrong, but his intuition was right. And, of course, it’s his intuition that gave birth to science: the simple ‘models’ he made – of how notes are supposed to be related to each other, or about our solar system – were, obviously, just the start of it all. And what a great start it was! Looking back once again, it’s rather sad conservative forces (such as the Church) often got in the way of progress. In fact, I suddenly wonder: if scientists would not have been bothered by those conservative forces, could mankind have sent people around the time that Charles V was born, i.e. around A.D. 1500 already? 🙂

Post scriptum: My example of the the (lower) E and A guitar strings co-vibrating when playing the A major chord striking the upper four strings only, is somewhat tricky. The (lower) E and A strings are associated with lower pitches, and we said overtones (i.e. the second, third, fourth, etc harmonics) are multiples of the fundamental frequency. So why is that the lower strings co-vibrate? The answer is easy: they oscillate at the higher frequencies only. If you have a guitar: just try it. The two strings you do not pluck do vibrate—and very visibly so, but the low fundamental frequencies that come out of them when you’d strike them, are not audible. In short, they resonate at the higher frequencies only. 🙂

The example that Feynman gives is much more straightforward: his example mentions the lower C (or A, B, etc) notes on a piano causing vibrations in the higher C strings (or the higher A, B, etc string respectively). For example, striking the C2 key (and, hence, the C2 string inside the piano) will make the (higher) C3 string vibrate too. But few of us have a grand piano at home, I guess. That’s why I prefer my guitar example. 🙂