The angular momentum and the magnetic moment in quantum mechanics

Feynman starts his Volume of Lectures on quantum mechanics (so that’s Volume III of the whole series) with the rules we already know, so that’s the ‘special’ math involving probability amplitudes, rather than probabilities. However, the particles in the introductory chapters are all so-called zero-spin particles, which means they are not supposed to have any angular momentum.

That’s OK, because it really makes it much easier to understand the basics of quantum math. However, real-life elementary particles do have angular momentum, and so we have to come to terms with it. Therefore, Feynman makes it very clear he expects all prospective readers of his third volume to first work their way through chapter 34 and 35 of the second volume, which discusses the angular momentum of elementary particles from both a classical as well as a quantum-mechanical perspective, and so that’s what we will do here.

Now, while the mentioned chapters are more generous with text than other textbooks on quantum mechanics I’ve looked at, the matter remains obscure. By way of introduction, Feynman writes the following:

“The behavior of matter on a small scale—as we have remarked many times—is different from anything that you are used to and is very strange indeed. Understanding of these matters comes very slowly, if at all. One never gets a comfortable feeling that these quantum-mechanical rules are ‘natural’. Of course they are, but they are not natural to our own experience at an ordinary level. The attitude that we are going to take with regard to this rule about angular momentum is quite different from many of the other things we have talked about. We are not going to try to ‘explain’ it but tell you what happens.”

So… Well… Let’s go for it. :-) When discussing electromagnetic radiation, we introduced the concept of atomic oscillators. It was a very useful model to help us understand. Now we’re going to introduce atomic magnets. It is based on the classical idea of an electron orbiting around a proton. Of course, we know this classical idea is wrong: we don’t have nice circular electron orbitals, and our discussion on the radius of an the electron in our previous post makes it clear that the idea of the electron itself is rather fuzzy. Nevertheless, the classical concepts used to analyze rotation are also used, mutatis mutandis – i.e. with necessary alterations – in quantum mechanics. In this post, we want to focus on these alterations, or modifications of the classical theory. In line with Feynman’s introductory remarks, we’ll focus on the how mainly—i.e. not on the why. So… Well… Let’s go for it. :-)

The basic idea is the following: an electron in a circular orbit is a circular current and, hence, it causes a magnetic field, i.e. a magnetic flux through the area of the loop—as illustrated below.


As such, we’ll have a magnetic (dipole) moment, and you may want to review my post(s) on that topic so as to ensure you understand what follows. The magnetic moment (μ) is the product of the current (I) and the area of the loop (i.e. π·r2), and its conventional direction is given by the μ vector in the illustration below, which also shows the other relevant scalar and/or vector quantities, such as the velocity v and the orbital angular momentum J. The orbital angular momentum is to be distinguished from the spin angular momentum, which results from the spin around its own axis, but the spin angular momentum – which is often referred to as the spin tout court – is not depicted below, and will only be discussed in a few minutes. Hence, the focus is first on the orbital angular momentum J and the related magnetic moment μ.

atomic magnet

The magnetic moment is the current times the area of the loop. As the velocity is constant, the current is just the electron charge q times the frequency of rotation. The frequency of rotation is, of course, the velocity (i.e. the distance traveled per second) divided by the circumference of the orbit (i.e. 2π·r). Hence, we write: I = (q·v)/(2π·r) and, therefore: μ = (q/·v)·π·r2)/(2π·r) = q·v·r/2.

The orbital angular momentum is the angular momentum we discussed in our post on gyroscopes. We denoted the angular momentum as L, and noted L could be calculated as the vector cross product of the position vector r and the momentum vector p, as shown in the animation below, which also shows the torque vector τ.

Torque_animation (1)

Unlike in the animation above, the angular momentum of the electron in circular orbit will remain constant, and its magnitude is equal to |J| = J = |r×p| = |r|·|p|·sinθ = r·p = r·m·v. One should note this is a non-relativistic formula, but as the relativity velocity of an electron v/c = α ≈ 0.0073 (see my post on the fine-structure constant if you wonder where this formula comes from), it’s OK to not include the Lorentz factor in our formulas as for now.

Now, since μ and J are in the same direction in this case (both are perpendicular to the plane of the orbit), we can combine the J = r·m·v and μ = q·v·r/2 formulas to write:

μ = (qe/2m)·J or μ/= (qe/2m) (electron orbit)

In other words, the ratio of the magnetic moment and the angular moment depends on (1) the charge (which we’ll denote by qas we’re talking an electron here, so we can reserve the q symbol to cover the general case) and (2) the mass of the electron only—not on the velocity v nor on the radius r. It can be noted that the q/2m factor is often referred to as the gyromagnetic factor (not to be confused with the g-factor, which we’ll introduce shortly). It’s good to do a quick dimensional check of this relation: the magnetic moment is expressed in ampère per second times the loop area, so that’s (C/s)·m2. On the right-hand side, we have the dimension of the gyromagnetic factor, which is C/kg, times the dimension of the angular momentum, which is m·kg·m/s, so we have the same units on both sides: C·m2/s,  which is often written as joule per tesla (J/T): the joule is the energy unit (1 J = 1 N·m), and the tesla measures the strength of the magnetic field (1 T = 1 (N·s)/(C·m). OK. So that works out.

So far, so good. The story is a little bit different for the spin angular momentum and the spin magnetic moment. The formula is the following:

μ = (qe/m)·J (electron spin)

This formula says that the μ/J ratio is twice what it is for the orbital motion of the electron. Why is that? Frankly, I don’t know. I’ll try to find out. There must be some similar derivation as the one we did for the orbital angular moment but I haven’t looked for it. Feynman avoids the question altogether in his Lectures so I must assume the derivation is not so easy. Let’s just go with it as for now.  

Now, the total magnetic moment and angular momentum is obviously the sum of both, and we will also want to replace the proton by a full-blown nucleus, so we can calculate the magnetic moment and the angular momentum for a whole atomic system involving more than just one electron. In short, we’ll want to have a more general formula relating μ and J. The more general formula is written as:

μ = –g·(qe/2m)·J

Why the minus sign? Well… We need some convention, because we’ll have positive and negative charges and all that and… Well… It’s just convention. And why qe/2m instead of qe/m in the middle? Well… If we’d take qe/m, then g would be −1/2 for the orbital angular momentum, and the initial idea with g was that it would be some integer (we’ll quickly see that’s an idea only). So… Well… It’s just one more convention. For our example involving the spin angular momentum and the spin magnetic moment, g will obviously have to be –2 so as to yield that μ = (qe/m)·J formula.

OK. That’s clear enough. For electrons, the g-factor is referred to as the Landé g-factor. There is a similar g-factor for protons, which is referred to as the nuclear g-factor. In fact, there is a g-factor for neutrons too, despite the fact that they do not carry a net charge: the explanation for it must have something to do with the quarks that make up the neutron but that’s a complicated matter which we will not get into here. Of course, there is a g-factor for a whole atom or an atomic system, and it’s one of the numbers that is characteristic of the state of the atom.

Of course, we’re talking quantum mechanics and, therefore, J can only take on a finite number of values. That should not surprise us at all, because our discussion on the fine-structure constant (α) made it clear that the various radii of the electron, its velocity and its mass and/or energy are all related one to another and, hence, they can only take on certain values. Indeed, of all the relations we discussed, there’s two you should always remember. The first relationship is the U = (e2/r) = α/r. So that links the energy (which we can express in equivalent mass units), the electron charge and its radius. The second thing you should remember is that the Bohr radius and the classical electron radius are also related through α: α   re/r = α2. These relationships suggest that the different values for J are associated with different orbitals, and they are: remember the fine-structure constant first popped up in Arnold Sommerfeld’s 1916 explanation of the atomic spectral lines!

So it all makes sense. In fact, as you’ll see in a moment, the whole thing is not unlike the quantum-mechanical explanation of the blackbody radiation problem, which it solves by assuming that the permitted energy levels (or states) are equally spaced and h·f apart, with the frequency of the light that’s being absorbed and/or emitted. So the atom takes up energies only h·f at a time. Here we’ve got something similar. If we have an object with a given total angular momentum J in classical mechanics, then any of its components x, y or z, could take on any value from +J to −J. That’s not the case here. The rule is that the ‘system’ – the atom, the nucleus, or anything really – will have a characteristic number, which is referred to as the ‘spin’ of the system. It’s denoted by j, and any component of J (think of the z-direction, for example) can then take on only the following values:

permitted values

OK. So far so good. We sort of ‘get’ this – I hope! – but let’s pause for a moment and analyze this—just to make sure we ‘get’ this indeed. What’s being written here? What are those numbers? We know ħ: it’s the Planck constant h divided by 2π, so that’s something expressed in joule·second per radian. That makes sense, because angular momentum is also the product of the moment of inertia (I = r2·m, so that’s in m2·kg = N·m·sunits) and the angular velocity (Ω = dθ/dt, which is expressed in radians per second, and related to the tangential component of the velocity by the v= r·Ω), and so we can also express the angular momentum in N·m·s = J·s. In short, we’ve got the unit of action once more here. Of course, ħ = h/2π is even smaller than h: it’s about 1×10−34 J·s ≈ 6.6×10−16 eV·s. In order to get an idea of the order of magnitude, you may want to compare this with the energy of a photon, which is 1.6 to 3.2 eV in the visible light spectrum, but you should note that energy does not have the time dimension, and a second is an eternity in quantum physics, so the comparison is a bit tricky. So… Well… Let’s move on. What about those coefficients? What constraints are there?

Well… The constraint is that the difference between +j and −j must be some integer, so twice j must be an integer. That implies that the spin j is always an integer or a half-integer, depending on whether j is even or odd. Let’s do a few examples:

  1. A lithium (Li-7) nucleus has spin j = 3/2 and, therefore, the permitted values for the angular momentum around any axis (the z-axis, for example) are: 3/2, 3/2−1=1/2, 3/2−2=−1/2, and −3/2. [Note that the difference between +j and –j is 3, and each ‘step’ between those two levels is one, as we’d like it to be.]
  2. The nucleus of the much rarer Lithium-6 isotope is one of the few stable nuclei that has spin j = 1, so the permitted values are 1, 0 and −1. [So each step is ‘one’ again, and the total difference (between +j and –j) is 2.]
  3. An electron is a spin-1/2 particle, and so there are only two permitted values: +1/2 and −1/2. So there is just one ‘step’ and it’s equal to the whole difference between +j and –j. In fact, this is the most common situation, because we’ll be talking elementary fermions most of the time. In short, don’t take the ‘general’ formulas too seriously: the actual situations we’ll be analyzing are often quite simple.

Indeed, we already know about the fundamental dichotomy between fermions and bosons. Fermions have half-integer spin, and all elementary fermions, such as protons, neutrons, electrons, neutrinos and quarks are spin-1/2 particles. [Note that a proton and a neutron are, strictly speaking, not elementary, as their constituent parts are quarks.] Bosons have integer spin, and the bosons we know of are actually spin-zero particles. The photon is an example, but the helium nucleus (He-4) also has zero spin, which gives rise to superfluidity when its cooled near the absolute zero point. So, in practice, we’ll be dealing almost exclusively with spin-0 and spin-1/2 particles and, occasionally, with spin-1 and spin-3/2 particles.

We know need to learn how to do a bit of math with all of this.

The magnetic energy of atoms

Before we start, we should, perhaps, relate the angular momentum to the magnetic moment once again. We can do that using the μ = (q/2m)·J and/or μ = (q/m)·formula (so that’s the simple formulas for the orbital and spin angular momentum respectively) or, else, by using the more general μ = – g·(q/2m)·J formula.

Let’s use the simpler μ = (qe/2m)·J formula, which is the one for the orbital angular momentum. What’s qe/2m? It should be equal to 1.6×10−19 C divided by 2·9.1×10−31 kg, so that’s about 0.0879×1012  C/kg, or 0.0879×1012 (C·m)/(N·s2). Now we multiply by ħ/2 ≈ 0.527×10−34 J·s. We get something like 0.0463×10−22 m2·C/s or J/T. These numbers are ridiculously small, so they’re usually measured in terms of a so-called natural unit: the Bohr magneton, which I’ll explain in a moment but so here we’re interested in its value only, which is μB = 9.274×10−24 J/T. Hence, μ/μB = 0.5 = 1/2. What a nice number!

Hmm… This cannot be a coincidence… […] You’re right. It isn’t. To get the full picture, we need to include the spin angular momentum, so we also need to see what the μ = (q/m)·J will yield. That’s easy, of course, as it’s twice the value of (q/2m)·J, so μ/μB = 1, and so the total is equal to 3/2. So the magnetic moment of an electron has the same value (when expressed in terms of the Bohr magneton) as the spin (when expressed in terms of ħ). Now that’s just sweet!

Yes, it is. All our definitions and formulas were formulated so as to make it sweet. Having said that, we do have a tiny little problem. If we use the general μ = −g·(q/2m)·J to write the result we found for the spin of the electron only (so we’re not looking at the orbital momentum here), then we’d write: μ = 2·(q/2m)·J = (q/m)·J and, hence, the g-factor here is −2. Yes. We know that. You told me so already. What’s the issue? Well… The problem is: experiments reveal the actual value of g is not exactly −2: it’s −2.00231930436182(52) instead, with the last two digits (in brackets) the uncertainty in the current measurements. Just check it for yourself on the NIST website. :-) [Please do check it: it brings some realness to this discussion.]

Hmm…. The accuracy of the measurement suggests we should take it seriously, even if we’re talking a difference of 0.1% only. We should. It can be explained, of course: it’s something quantum-mechanical. However, we’ll talk about this later. As for now, just try to understand the basics here. It’s complicated enough already, and so we’ll stay away from the nitty-gritty as long as we can.

Let’s now get back to the magnetic energy of our atoms. From our discussion on the torque on a magnetic dipole in an external magnetic field, we know that our magnetic atoms will have some extra magnetic energy when placed in an external field. So now we have an external magnetic field B, and we derived the formula for the energy is

Umag = −μ·B·cosθ = −μ·B

I won’t explain the whole thing once again, but it might help to visualize the situation, which we do below. The loop here is not circular but square, and it’s a current-carrying wire instead of an electron in orbit, but I hope you get the point.

Geometry 2

We need to chose some coordinate system to calculate stuff and so we’ll just choose our z-axis along the direction of the external magnetic field B so as to simplify those calculations. If we do that, we can just take the z-component of μ and then combine the interim result with our general μ = – g·(q/2m)·J formula, so we write:

Umag = −μz·B = g·(q/2m)·Jz·B

Now, we know that the maximum value of Jz is equal to j·ħ, and so the maximum value of Umag will be equal g(q/2m)jħB. Let’s now simplify this expression by choosing some natural unit, and that’s the unit we introduced already above: the Bohr magneton. It’s equal to (qeħ)/(2me) and its value is μB ≈ 9.274×10−24 J/T. So we get the result we wanted, and that is:


Let me make a few remarks here. First on that magneton: you should note there’s also something which is known as the nuclear magneton which, you guessed it, is calculated using the proton charge and the proton mass: μN = (qpħ)/(2mp) ≈ 5.05×10−27 J/T. My second remark is a question: what does that formula mean, really? Well… Let me quote Feynman on that. The formula basically says the following:

“The energy of an atomic system is changed when it is put in a magnetic field by an amount that is proportional to the field, and proportional to Jz. We say that the energy of an atomic system is ‘split’ into 2+ 1 ‘levels’ by a magnetic field. For instance, an atom whose energy is U0 outside a magnetic field and whose j is 3/2, will have four possible energies when placed in a field. We can show these energies by an energy-level diagram like that drawn below. Any particular atom can have only one of the four possible energies in any given field B. That is what quantum mechanics says about the behavior of an atomic system in a magnetic field.”

diagram 1

Of course, the simplest ‘atomic’ system is a single electron, which has spin 1/2 only (like most fermions really: the example in the diagram above, with spin 3/2, would be that Li-7 system or something similar). If the spin is 1/2, then there are only two energy levels, with Jz = ±ħ/2 and, as we mentioned already, the g-factor for an electron is −2 (again, the use of minus signs (or not) is quite confusing: I am sorry for that), and so our formula above becomes very simple:

Umag = ± μB·B

The graph above becomes the graph below, and we can now speak more loosely and say that the electron either has its spin ‘up’ (so that’s along the field), or ‘down’ (so that’s opposite the field).

diagram 2

So… Well… That’s it for today, I think. This was (or should be) a big step forward. We’ve got all of the basics on that ‘magical’ spin number here, and so I hope it’s somewhat less ‘magical’ now. :-) Let me just copy the values of the g-factor for some elementary particles. It also shows how hard physicists have been trying to narrow down the uncertainty in the measurement. Quite impressive! The table comes from the Wikipedia article on it. I hope the explanations above will now enable you to read and understand that. :-)


Taking the magic out of God’s number: some additional reflections

In my previous post, I explained why the fine-structure constant α is not a ‘magical’ number, even if it relates all fundamental properties of the electron: its mass, its energy, its charge, its radius, its photon scattering cross-section (i.e. the Bohr radius, or the size of the atom really) and, finally, the coupling constant for photon-electron interactions. The key to such understanding of α was the model of an electron as a tiny ball of charge. As such, we have two energy formulas for it. One is the energy that’s needed to assemble the charge from infinitely dispersed infinitesimal charges, which we denoted as Uelec. The other formula is the energy of the field of the tiny ball of charge, which we denoted as Eelec.

The formula for Eelec is calculated using the formula for the field momentum of a moving charge and, using the m = E/cmas-energy equivalence relationship, is equivalent to the electromagnetic mass. We went through the derivation in our previous post, so let me just jot down the result:

emm - 2

The second formula depends on what ball of charge we’re thinking of, because the formulas for a charged sphere and a spherical shell of charge are different: both have the same structure as the relationship above (so the energy is also proportional to the square of the electron charge and inversely proportional to the radius a), but the constant of proportionality is different. For a sphere of charge, we write:

 f sphre

For a spherical shell of charge we write:


To compare the formulas, you need to note that the square of the electron charge in the formula for the field energy is equal to e2 = qe2/4πε= ke·qe2. So we multiply the square of the actual electron charge by the Coulomb constant k= 1/4πε0. As you can see, the three formulas have exactly the same form then. It’s just the proportionality constant that’s different: it’s 2/3, 3/5 and 1/2 respectively. It’s interesting to quickly reflect on the dimensions here: [ke] ≈ 9×109 N·m2/C2, so e2 is expressed in N·m2. That makes the units come out alright, as we divide by a (so that’s in meter) and so we get the energy in joule (which is newton·meter). In fact, now that we’re here, let’s quickly calculate the value of e2: it’s that ke·qe2 product, so it’s equal to 2.3×10−28 N·m2. We can quickly check this value because we know that the classical electron radius is equal to:

classical electron radius

So we divide 2.3×10−28 N·mby mec≈ 8.2×10−14 J, so we get r≈ 2.82×10−15 m. So we’re spot on! Why did I do this check? Not really to check what I wrote. It’s more to show what’s going on. We’ve got yet another formula relating the energy and the radius of an electron here, so now we have three. In fact we have more because the formula for Uelec depends on the finer details of our model for the electron (sphere versus shell, uniform versus non-uniform distribution):

  1. Eelec = (2/3)·(e2/a): This is the formula for the energy of the field, so we may all it is external energy.
  2. Uelec = (3/5)·(e2/a), or Uelec = (1/2)·(e2/a): This is the energy needed to assemble our electron, so we might, perhaps, call it its internal energy. The first formula assumes our electron is a uniformly charged sphere. The second assumes all charges sit on the surface of the sphere. If we drop the assumption of the charge having to be uniformly distributed, we’ll find yet another formula.
  3. mece2/r0: This is the energy associated with the so-called classical electron radius (r0) and the electron’s rest mass (me).

In our previous posts, we assumed the last equation was the right one. Why? Because it’s the one that’s been verified experimentally. The discrepancies between the various proportionality coefficients – i.e. the difference between 2/3 and 1, basically – are to be explained because of the binding forces within the electron, without which the electron would just ‘explode’, as the French physicist and polymath Henri Poincaré famously put itIndeed, if the electron is a little ball of negative charge, the repulsive forces between its parts should rip it apart. So we will not say anything more about this. You can have fun yourself by googling all the various theories that try to model these binding forces. [I may do the same some day, but now I’ve got other priorities: I want to move to Feynman’s third volume of Lectures, which is devoted to quantum physics only, so I look very much forward to that.]

In this post, I just wanted to reflect once more on what constants are really fundamental and what constants are somewhat less fundamental. From all what I wrote in my previous post, I said there were three:

  1. The fine-structure constant α, which is a dimensionless number.
  2. Planck’s constant h, whose dimension is joule·second, so that’s the dimension of action.
  3. The speed of light c, whose dimension is that of a velocity.

The three are related through the following expression:

alpha re-expressed

This is an interesting expression. Let’s first check its dimension. We already explained that e2 is expressed in N·m2. That’s rather strange, because it means the dimension of e itself is N1/2·m: what’s the square root of a force of one newton? In fact, to interpret the formula above, it’s probably better to re-write eas e2 = qe2/4πε= ke·qe2. That shows you how the electron charge and Coulomb’s constant are related. Of course, they are part and parcel of one and the same force lawCoulomb’s law. We don’t need anything else, except for relativity theory, because we need to explain the magnetic force as well—and that we can do because magnetism is just a relativistic effect. Think of the field momentum indeed: the magnetic field comes into play only when we start to move our electron. The relativity effect is captured by c  in that formula for α above. As for ħ, ħ = h/2π comes with the E = h·f equation, which links us to the electron’s Compton wavelength λ through the de Broglie relation λ = h/p.

The point is: we should probably not look at α as a ‘fundamental physical constant’. It’s e2 that’s the third fundamental constant, besides h and c. Indeed, it’s from e2 that all the rest follows: the electron’s internal energy, its external energy, and its radius, and then all the rest by combining stuff with other stuff.

Now, we took the magic out of α by doing what we did in the previous posts, and that’s to combine stuff with other stuff, and so now you may think I am putting the magic back in with that formula for α, which seems to define α in terms of the three mentioned ‘fundamental’ constants. That’s not the case: this relation comes out of all of the other relationships we found, and so it’s nothing new really. It’s actually not a definition of α: it just does what it does, and that’s to relate α to the ‘fundamental’ physical constants behind.

So… No new magic. In fact, I want to close this post by taking away even more of the magic. If you read my previous post, I said that α was ‘God’s cut-off factor’ :-) ensuring our energy functions do not blow up, but I also said it was impossible to say why he chose 0.00729735256 as the cut-off factor. The question is actually easily answered by thinking about those two formulas we had for the internal and external energy respectively. Let’s re-write them in natural units and, temporarily, two different subscripts for α, so we write:

  1. Eelec = αe/r0: This is the formula for the energy of the field.
  2. Uelec = αu/r0: This is the energy needed to assemble our electron.

Both energies are determined by the above-mentioned laws, i.e. Coulomb’s Law and the theory of relativity, so α has got nothing to do what that. However, both energies have to be the same, and so αhas to be equal to αu. In that sense, α is, quite simply, a proportionality constant that achieves that equality. Now that explains why we can derive α from the three other constants which, as mentioned above, are probably more fundamental. In fact, we’ve got only three degrees of freedom here, so if we chose c, h and as ‘fundamental’, then α isn’t any more.

The underlying deep question behind it all is why those two energies should be equal. Why would our electron have some internal energy if it’s elementary? The answer to that question is: because it has some non-zero radius, and it has some non-zero radius because we don’t want our formula for the field energy (or the field momentum) to blow up. Now, if it has some radius, then it has to have some internal energy.

You’ll say: that makes sense, but it doesn’t answer the question. Why would it have internal energy, with or without a zero radius? If an electron is an elementary particle, then it’s really elementary, isn’t? And so then we shouldn’t try to ‘assemble’ it from an infinite number of infinitesimally small charges. You’re right, and here we can also note that the fact that the electron doesn’t blow up is firm evidence it’s very elementary indeed.

I should also note that Feynman actually doesn’t talk about the energy that’s needed to assemble a charge: he gets his Uelec = (1/2)·(e2/a) by calculating the external field energy for a spherical shell of charge, and he sticks to it—presumably because it’s the same field for a uniform or non-uniform sphere of charge. He only notes there has to be some radius because, if not, the formula he uses blows up, indeed. So – who knows? – perhaps he doesn’t quite believe that formula for the internal energy is relevant either.

So perhaps there is no internal energy indeed. Perhaps there’s just the energy of the field. So… Well… I can’t say much about this… Except… Well… Perhaps just one more thing. Let me note something that, I hope, you noticed as well: the ke·qe2 is the numerator in Coulomb’s Law itself. You also know that energy equals force times distance. So if we divide both sides by r0, we get Coulomb’s Law itself Felec = ke·qe2/r02. The only thing is: what’s the distance? It’s one charge only, and there is no distance between one charge, is there? Well… Yes and no. I have been thinking that the requirement of the internal and external energies being equal resembles the statement that the forces between two charges are equal and opposite. That ties in with the idea of the internal energy itself: remember we were basically talking forces between infinitesimally small elements of charge within the electron itself? So r0 is, perhaps, some average distance or so. There must be some way of thinking of it like that. But… Well… Which one exactly?

This kind of reflection may not make sense. Who knows? I obviously need to think all of this through and so this post is, indeed, just a bunch of reflections for which I will have more time later—hopefully. :-) Perhaps we’re all just pushing the matter too far. Perhaps we should just accept that the external energy has that 2/3 factor but that the actual energy of the electron should also include the equivalent energy of some binding force that holds the electron together. Well… In any case. That’s all I am going to do on this extremely complicated matter. It’s time to move indeed! So the point to take home here is probably just this:

  1. When calculating the radius of an electron using classical theory, we get in trouble: not only do we find different radii, but the radii that we find do not respect the E = meclaw. It’s only the mece2/r0 that’s relativistically correct.
  2. That suggests the electron also has some non-electric mass, which are referred to as ‘binding forces’ or ‘Poincaré stresses’, but which remain to be explained convincingly.
  3. All of this shouldn’t surprise us: for all we know, the electron is something fuzzy. :-)

So my next posts will focus on the ‘essentials’ preparing for Feynman’s Volume on quantum mechanics. Those ‘essentials’ will still involve some classical stuff but, as you will see, even more contradictions, that – hopefully! – will then be solved in the quantum-mechanical picture of it all. :-)

Taking the magic out of God’s number

I think the post scriptum to my previous post is interesting enough to separate it out as a piece of its own, so let me do that here. You’ll remember that we were trying to find some kind of a model for the electron, picturing it like a tiny little ball of charge, and then we just applied the classical energy formulas to it to see what comes out of it. The key formula is the integral that gives us the energy that goes into assembling a charge. It was the following thing:

U 4

This is a double integral which we simplified in two stages, so we’re looking at an integral within an integral really, but we can substitute the integral over the ρ(2)·dVproduct by the formula we got for the potential, so we write that as Φ(1), and so the integral above becomes:

U 5Now, this integral integrates the ρ(1)·Φ(1)·dVproduct over all of space, so that’s over all points in space, and so we just dropped the index and wrote the whole thing as the integral of ρ·Φ·dV over all of space:

U 6

We then established that this integral was mathematically equivalent to the following equation:

U 7

So this integral is actually quite simple: it just integrates EE = E2 over all of space. The illustration below shows E as a function of the distance for a sphere of radius R filled uniformly with charge.

uniform density

So the field (E) goes as for r ≤ R and as 1/rfor r ≥ R. So, for r ≥ R, the integral will have (1/r2)2 = 1/rin it. Now, you know that the integral of some function is the surface under the graph of that function. Look at the 1/r4 function below: it blows up between 1 and 0. That’s where the problem is: there needs to be some kind of cut-off, because that integral will effectively blow up when the radius of our little sphere of charge gets ‘too small’. So that makes it clear why it doesn’t make sense to use this formula to try to calculate the energy of a point charge. It just doesn’t make sense to do that.


In fact, the need for a ‘cut-off factor’ so as to ensure our energy function doesn’t ‘blow up’ is not because of the exponent in the 1/r4 expression: the need is also there for any 1/r relation, as illustrated below. All 1/rfunction have the same pivot point, as you can see from the simple illustration below. So, yes, we cannot go all the way to zero from there when integrating: we have to stop somewhere.

graph 2So what’s the ‘cut-off point’? What’s ‘too small’ a radius? Let’s look at the formula we got for our electron as a shell of charge (so the assumption here is that the charge is uniformly distributed on the surface of a sphere with radius a):

energy electron

So we’ve got an even simpler formula here: it’s just a 1/relation (a is in this formula), not 1/r4. Why is that? Well… It’s just the way the math turns out: we’re integrating over volumes and so that involves an r3 factor and so it all simplifies to 1/r, and so that gives us this simple inversely proportional relationship between U and r, i.e a, in this case. :-) I copied the detail of Feynman’s calculation in my previous post, so you can double-check it. It’s quite wonderful, really. Look at it again: we have a very simple inversely proportional relationship between the radius of our electron and its energy as a sphere of charge. We could write it as:

Uelect  = α/a, with α = e2/2

Still… We need the cut-off point’. Also note that, as I pointed out, we don’t necessarily need to assume that the charge in our little ball of charge (i.e. our electron) sits on the surface only: if we’d assume it’s a uniformly charged sphere of charge, we’d just get another constant of proportionality: our 1/2 factor would become a 3/5 factor, so we’d write: Uelect  = (3/5)·e2/a. But we’re not interested in finding the right model here. We know the Uelect  = (3/5)·e2/a gives us a value for that differs with a 2/5 factor as the classical electron radius. That’s not so bad and so let’s go along with it. :-)

We’re going to look at the simple structure of this relation, and all of its implications. The simple equation above says that the energy of our electron is (a) proportional to the square of its charge and (b) inversely proportional to its radius. Now, that is a very remarkable result. In fact, we’ve seen something like this before, and we were astonished. We saw it when we were discussing the wonderful properties of that magical number, the fine-structure constant, which we also denoted by α. However, because we used α already, I’ll denote the fine-structure constant as αe here, so you don’t get confused. You’ll remember that the fine-structure constant is a God-like number indeed: it links all of the fundamental properties of the electron, i.e. its charge, its radius, its distance to the nucleus (i.e. the Bohr radius), its velocity, its mass (and, hence, its energy), its de Broglie wavelength. Whatever: all these physical constants are all related through the fine-structure constant. 

In my various posts on this topic, I’ve repeatedly said that, but I never showed why it’s true, and so it was a very magical number indeed. I am going to take some of the magic out now. Not too much but… Well… You can judge for yourself how much of the magic remains after I am done here. :-)

So, at this stage of the argument, α can be anything, and αcannot, of course. It’s just that magical number out there, which relates everything to everything: it’s the God-given number we don’t understand, or didn’t understand, I should say. Past tense. Indeed, we’re going to get some understanding here because we know that one of the many expressions involving αe was the following one:

me = αe/re

This says that the mass of the electron is equal to the ratio of the fine-structure constant and the electron radius. [Note that we express everything in natural units here, so that’s Planck units. For the detail of the conversion, please see the relevant section on that in my one of my posts on this and other stuff.] In fact, the U = (3/5)·e2/a and me = αe/rrelations looks exactly the same, because one of the other equations involving the fine-structure constant was: αe = eP2. So we’ve got the square of the charge here as well! Indeed, as I’ll explain in a moment, the difference between the two formulas is just a matter of units.

Now, mass is equivalent to energy, of course: it’s just a matter of units, so we can equate me with Ee (this amounts to expressing the energy of the electron in a kg unit—bit weird, but OK) and so we get:

Ee = αe/re

So there we have: the fine-structure constant αe is Nature’s ‘cut-off’ factor, so to speak. Why? Only God knows. :-) But it’s now (fairly) easy to see why all the relations involving αe are what they are. As I mentioned already, we also know that αe is the square of the electron charge expressed in Planck units, so we have:

 αe = eP2 and, therefore, Ee = eP2/re

Now, you can check for yourself: it’s just a matter of re-expressing everything in standard SI units, and relating eP2 to e2, and it should all work: you should get the Eelect  = (2/3)·e2/expression. So… Well… At least this takes some of the magic out the fine-structure constant. It’s still a wonderful thing, but so you see that the fundamental relationship between (a) the energy (and, hence, the mass), (b) the radius and (c) the charge of an electron is not something God-given. What’s God-given are Maxwell’s equations, and so the Ee = αe/r= eP2/re is just one of the many wonderful things that you can get out of  them.

So we found God’s ‘cut-off factor’ :-) It’s equal to αe ≈ 0.0073 = 7.3×10−3. So 7.3 thousands of… What? Well… Nothing. It’s just a pure ratio between the energy and the radius of an electron (if both are expressed in Planck units, of course). And so it determines the electron charge (again, expressed in Planck units). Indeed, we write:

eP = √αe

Really? Yes. Just do all these formulas:

eP = √α≈ √0.0073·1.9×10−18 coulomb ≈ 1.6 ×10−19 C

Just re-check it with all the known decimals: you’ll see it’s bang on. Let’s look at the E= me = αe/rratio once again. What’s the meaning of it? Let’s first calculate the value of re and me, i.e. the electron radius and electron mass expressed in Planck units. It’s equal to the classical electron radius divided by the Planck length, and then the same for the mass, so we get the following thing:

re ≈ (2.81794×10−15 m)/(1.6162×10−35 m) = 1.7435×1020 

me ≈ (9.1×10−31 kg)/(2.17651×10−8 kg) = 4.18×10−23

αe = (4.18×10−23)·(1.7435×1020) ≈ 0.0073

It works like a charm, but what does it mean? Well… It’s just a ratio between two physical quantities, and the scale you use to measure those quantities matters very much. We’ve explained that the Planck mass is a rather large unit at the atomic scale and, therefore, it’s perhaps not quite appropriate to use it here. In fact, out of the many interesting expressions for αe, I should highlight the following one:

αe = e2/(ħ·c) ≈ (1.60217662×10−19 C)2/(4πε0·[(1.054572×10−34 N·m·s)·(2.998×108 m/s)]) ≈ 0.0073 once more :-)

Note that the elementary charge e is actually equal to qe/4πε0, which is what I am using in the formula. I know that’s confusing, but it what it is. As for the units, it’s a bit tedious to write it all out, but you’ll get there. Note that ε≈ 8.8542×10−12 C2/(N·m2) so… Well… All the units do cancel out, and we get a dimensionless number indeed, which is what αe is.

The point is: this expression links αe to the the de Broglie relation (p = h/λ), with λ the wavelength that’s associated with the electron. Of course, because of the Uncertainty Principle, we know we’re talking some wavelength range really, so we should write the de Broglie relation as Δp = h·Δ(1/λ). Now, that, in turn, allows us to try to work out the Bohr radius, which is the other ‘dimension’ we associate with an electron. Of course, now you’ll say: why would you do that. Why would you bring in the de Broglie relation here?

Well… We’re talking energy, and so we have the Planck-Einstein relation first: the energy of some particle can always be written as the product of and some frequency f: E = h·f. The only thing that de Broglie relation adds is the Uncertainty Principle indeed: the frequency will be some frequency range, associated with some momentum range, and so that’s what the Uncertainty Principle really says. I can’t dwell too much on that here, because otherwise this post would become a book. :-) For more detail, you can check out one of my many posts on the Uncertainty Principle. In fact, the one I am referring to here has Feynman’s calculation of the Bohr radius, so I warmly recommend you check it out. The thrust of the argument is as follows:

  1. If we assume that (a) an electron takes some space – which I’ll denote by r :-) – and (b) that it has some momentum p because of its mass m and its velocity v, then the ΔxΔp = ħ relation (i.e. the Uncertainty Principle in its roughest form) suggests that the order of magnitude of r and p should be related in the very same way. Hence, let’s just boldly write r ≈ ħ/p and see what we can do with that.
  2. We know that the kinetic energy of our electron equals mv2/2, which we can write as p2/2m so we get rid of the velocity factor.Well… Substituting our p ≈ ħ/r conjecture, we get K.E. = ħ2/2mr2. So that’s a formula for the kinetic energy. Next is potential.
  3. The formula for the potential energy is U = q1q2/4πε0r12. Now, we’re actually talking about the size of an atom here, so one charge is the proton (+e) and the other is the electron (–e), so the potential energy is U = P.E. = –e2/4πε0r, with r the ‘distance’ between the proton and the electron—so that’s the Bohr radius we’re looking for!
  4. We can now write the total energy (which I’ll denote by E, but don’t confuse it with the electric field vector!) as E = K.E. + P.E. =  ħ2/2mr– e2/4πε0r. Now, the electron (whatever it is) is, obviously, in some kind of equilibrium state. Why is that obvious? Well… Otherwise our hydrogen atom wouldn’t or couldn’t exist. :-) Hence, it’s in some kind of energy ‘well’ indeed, at the bottom. Such equilibrium point ‘at the bottom’ is characterized by its derivative (in respect to whatever variable) being equal to zero. Now, the only ‘variable’ here is r (all the other symbols are physical constants), so we have to solve for dE/dr = 0. Writing it all out yields: dE/dr = –ħ2/mr+ e2/4πε0r= 0 ⇔ r = 4πε0ħ2/me2
  5. We can now put the values in: r = 4πε0h2/me= [(1/(9×109) C2/N·m2)·(1.055×10–34 J·s)2]/[(9.1×10–31 kg)·(1.6×10–19 C)2] = 53×10–12 m = 53 pico-meter (pm)

Done. We’re right on the spot. The Bohr radius is, effectively, about 53 trillionths of a meter indeed!


Yes… I know… Relax. We’re almost done. You should now be able to figure out why the classical electron radius and the Bohr radius can also be related to each other through the fine-structure constant. We write:

me = α/r= α/α2r = 1/αr

So we get that α/r= 1/αr and, therefore, we get re/r = α2, which explains why α is also equal to the so-called junction number, or the coupling constant, for an electron-photon coupling (see my post on the quantum-mechanical aspects of the photon-electron interaction). It gives a physical meaning to the probability (which, as you know, is the absolute square of the probability amplitude) in terms of the chance of a photon actually ‘hitting’ the electron as it goes through the atom. Indeed, the ratio of the Thomson scattering cross-section and the Bohr size of the atom should be of the same order as re/r, and so that’s α2.

[Note: To be fully correct and complete, I should add that the coupling constant itself is not α2 but √α = eP. Why do we have this square root? You’re right: the fact that the probability is the absolute square of the amplitude explains one square root (√α2 = α), but not two. The thing is: the photon-electron interaction consists of two things. First, the electron sort of ‘absorbs’ the photon, and then it emits another one, that has the same or a different frequency depending on whether or not the ‘collision’ was elastic or not. So if we denote the coupling constant as j, then the whole interaction will have a probability amplitude equal to j2. In fact, the value which Feynman uses in his wonderful popular presentation of quantum mechanics (The Strange Theory of Light and Matter), is −α ≈ −0.0073. I am not quite sure why the minus sign is there. It must be something with the angles involved (the emitted photon will not be following the trajectory of the incoming photon) or, else, with the special arithmetic involved in boson-fermion interactions (we add amplitudes when bosons are involved, but subtract amplitudes when it’s fermions interacting. I’ll probably find out once I am true through Feynman’s third volume of Lectures, which focus on quantum mechanics only.]

Finally, the last bit of unexplained ‘magic’ in the fine-structure constant is that the fine-structure constant (which I’ve started to write as α again, instead of αe) also gives us the (classical) relative speed of an electron, so that’s its speed as it orbits around the nucleus (according to the classical theory, that is), so we write

α = v/= β

I should go through the motions here – I’ll probably do so in the coming days – but you can see we must be able to get it out somehow from all what we wrote above. See how powerful our Uelect  ∼ e2/a relation really is? It links the electron, charge, its radius and its energy, and it’s all we need to all the rest out of it: its mass, its momentum, its speed and – through the Uncertainty Principle – the Bohr radius, which is the size of the atom.

We’ve come a long way. This is truly a milestone. We’ve taken the magic out of God’s number—to some extent at least. :-)

You’ll have one last question, of course: if proportionality constants are all about the scale in which we measure the physical quantities on either side of an equation, is there some way the fine-structure constant would come out differently? That’s the same as asking: what if we’d measure energy in units that are equivalent to the energy of an electron, and the radius of our electron just as… Well… What if we’d equate our unit of distance with the radius of the electron, so we’d write re = 1? What would happen to α? Well… I’ll let you figure that one out yourself. I am tired and so I should go to bed now. :-)

[…] OK. OK. Let me tell you. It’s not that simple here. All those relationships involving α, in one form or the other, are very deep. They relate a lot of stuff to a lot of stuff, and we can appreciate that only when doing a dimensional analysis. A dimensional analysis of the Ee = αe/r= eP2/r yields [eP2/r] = C2/m on the right-hand side and [Ee] = J = N·m on the left-hand side. How can we reconcile both? The coulomb is an SI base unit , so we can’t ‘translate’ it into something with N and m. [To be fully correct, for some reason, the ampère (i.e. coulomb per second) was chosen as an SI base unit, but they’re interchangeable in regard to their place in the international system of units: they can’t be reduced.] So we’ve got a problem. Yes. That’s where we sort of ‘smuggled’ the 4πε0 factor in when doing our calculations above. That ε0 constant is, obviously, not ‘as fundamental’ as or α (just think of the c−2 = ε0μ0 relationship to understand what I mean here) but, still, it was necessary to make the dimensions come out alright: we need the reciprocal dimension of ε0, i.e. (N·m2)/C2, to make the dimensional analysis work. We get: (C2/m)·(N·m2)/C2 = N·m = J, i.e. joule, so that’s the unit in which we measure energy or – using the E = mc2 equivalence – mass, which is the aspect of energy emphasizing its inertia.

So the answer is: no. Changing units won’t change alpha. So all that’s left is to play with it now. Let’s try to do that. Let me first plot that E= me = αe/re = 0.00729735256/re:

graph 3Unsurprisingly, we find the pivot point of this curve is at the intersection of the diagonal and the curve itself, so that’s at the (0.00729735256, 0.00729735256) point, where slopes are ± 1, i.e. plus or minus unity. What does this show? Nothing much. What? I can hear you: I should be excited because… Well… Yes! Think of it. If you would have to chose a cut-off point, you’d chose this one, wouldn’t you? :-) Sure, you’re right. How exciting! Let me show you. Look at it! It proves that God thinks in terms of logarithms. He has chosen α such that ln(E) = ln(α/r) = lnα – ln= lnα – ln= 0, so ln α = lnr and, therefore, α = r. :-)

Huh? Excuse me?

I am sorry. […] Well… I am not, of course… :-) I just wanted to illustrate the kind of exercise some people are tempted to do. It’s no use. The fine-structure constant is what it is: it sort of summarizes an awful lot of formulas. It basically shows what Maxwell’s equation imply in terms of the structure of an atom defined as a negative charge orbiting around some positive charge. It shows we can get calculate everything as a function of something else, and that’s what the fine-structure constant tells us: it relates everything to everything. However, when everything is said and done, the fine-structure constant shows us two things:

  1. Maxwell’s equations are complete: we can construct a complete model of the electron and the atom, which includes: the electron’s energy and mass, its velocity, its own radius, and the radius of the atom. [I might have forgotten one of the dimensions here, but you’ll add it. :-)]
  2. God doesn’t want our equations to blow up. Our equations are all correct but, in reality, there’s a cut-off factor that ensures we don’t go to the limit with them.

So the fine-structure constant anchors our world, so to speak. In other words: of all the worlds that are possible, we live in this one.

[…] It’s pretty good as far as I am concerned. Isn’t it amazing that our mind is able to just grasp things like that? I know my approach here is pretty intuitive, and with ‘intuitive’, I mean ‘not scientific’ here. :-) Frankly, I don’t like the talk about physicists “looking into God’s mind.” I don’t think that’s what they’re trying to do. I think they’re just trying to understand the fundamental unity behind it all. And that’s religion enough for me. :-)

So… What’s the conclusion? Nothing much. We’ve sort of concluded our description of the classical world… Well… Of its ‘electromagnetic sector’ at least. :-) That sector can be summarized in Maxwell’s equations, which describe an infinite world of possible worlds. However, God fixed three constants: hand α. So we live in a world that’s defined by this Trinity of fundamental physical constants. Why is it not two, or four?

My guts instinct tells me it’s because we live in three dimensions, and so there’s three degrees of freedom really. But what about time? Time is the fourth dimension, isn’t it? Yes. But time is symmetric in the ‘electromagnetic’ sector: we can reverse the arrow of time in our equations and everything still works. The arrow of time involves other theories: statistics (physicists refer to it as ‘statistical mechanics‘) and the ‘weak force’ sector, which I discussed when talking about symmetries in physics. So… Well… We’re not done. God gave us plenty of other stuff to try to understand. :-)

The classical explanation for the electron’s mass and radius

Feynman’s 28th Lecture in his series on electromagnetism is one of the more interesting but, at the same time, it’s one of the few Lectures that is clearly (out)dated. In essence, it talks about the difficulties involved in applying Maxwell’s equations to the elementary charges themselves, i.e. the electron and the proton. We already signaled some of these problems in previous posts. For example, in our post on the energy in electrostatic fields, we showed how our formulas for the field energy and/or the potential of a charge blow up when we use it to calculate the energy we’d need to assemble a point charge. What comes out is infinity: ∞. So our formulas tell us we’d need an infinite amount of energy to assemble a point charge.

Well… That’s no surprise, is it? The idea itself is impossible: how can one have a finite amount of charge in something that’s infinitely small? Something that has no size whatsoever? It’s pretty obvious we get some division by zero there. :-) The mathematical approach is often inconsistent. Indeed, a lot of blah-blah in physics is obviously just about applying formulas to situations that are clearly not within the relevant area of application of the formula. So that’s why I went through the trouble (in my previous post, that is) of explaining you how we get these energy and potential formulas, and that’s by bringing charges (note the plural) together. Now, we may assume these charges are point charges, but that assumption is not so essential. What I tried to say when being so explicit was the following: yes, a charge causes a field, but the idea of a potential makes sense only when we’re thinking of placing some other charge in that field. So point charges with ‘infinite energy’ should not be a problem. Feynman admits as much when he writes:

“If the energy can’t get out, but must stay there forever, is there any real difficulty with an infinite energy? Of course, a quantity that comes out infinite may be annoying, but what really matters is only whether there are any observable physical effects.”

So… Well… Let’s see. There’s another, more interesting, way to look at an electron: let’s have a look at the field it creates. A electron – stationary or moving – will create a field in Maxwell’s world, which we know inside out now. So let’s just calculate it. In fact, Feynman calculates it for the unit charge (+1), so that’s a positron. It eases the analysis because we don’t have to drag any minus sign along. So how does it work? Well…

We’ll have an energy flux density vector – i.e. the Poynting vector S – as well as a momentum density vector g all over space. Both are related through the g = S/c2 equation which, as I explained in my previous post, is probably best written as cg = S/c, because we’ve got units then, on both sides, that we can readily understand, like N/m2 (so that’s force per unit area) or J/m3 (so that’s energy per unit volume). On the other hand, we’ll need something that’s written as a function of the velocity of our positron, so that’s v, and so it’s probably best to just calculate g, the momentum, which is measured in N·s or kg·(m/s2)·s (both are equivalent units for the momentum p = mv, indeed) per unit volume (so we need to add a 1/ m3 to the unit). So we’ll have some integral all over space, but I won’t bother you with it. Why not? Well… Feynman uses a rather particular volume element to solve the integral, and so I want you to focus on the solution. The geometry of the situation, and the solution for g, i.e. the momentum of the field per unit volume, is what matters here.

So let’s look at that geometry. It’s depicted below. We’ve got a radial electric field—a Coulomb field really, because our charge is moving at a non-relativistic speed, so v << c and we can approximate with a Coulomb field indeed. Maxwell’s equations imply that B = v×E/c2, so g = ε0E×B is what it is in the illustration below. Note that we’d have to reverse the direction of both E and B for an electron (because it’s negative), but g would be the same. It is directed obliquely toward the line of motion and its magnitude is g = (ε0v/c2)·E2·sinθ. Don’t worry about it: Feynman integrates this thing for you. :-) It’s not that difficult, but still… To solve it, he uses the fact that the fields are symmetric about the line of motion, which is indicated by the little arrow around the v-axis, with the Φ symbol next to it (it symbolizes the potential). [The ‘rather particular volume element’ is a ring around the v-axis, and it’s because of this symmetry that Feynman picks the ring. Feynman’s Lectures are not only great to learn physics: they’re a treasure drove of mathematical tricks too. :-)]

momentum field

As said, I don’t want to bother you with the technicalities of the integral here. This is the result:


What does this say? It says that the momentum of the field – i.e. the electromagnetic momentum, integrated over all of space – is proportional to the velocity v of our charge. That makes sense: when v = 0, we’ll have an electrostatic field all over space and, hence, some inertia, but it’s only when we try to move our charge that Newton’s Law comes into play: then we’ll need some force to overcome that inertia. It all works through the Poynting formula: S = E×B0. If nothing’s moving, then B = 0, and so we’ll have some E and, therefore, we’ll have field energy alright, but the energy flow will be zero. But when we move the charge, we’re moving the field, and so then B ≠ 0 and so it’s through B that the E in our S equation start kicking in. Does that make sense? Think about it: it’s good to try to visualize things in your mind. :-)

The constants in the proportionality constant (2e2)/(3ac2) of our pv formula above are:

  • e2 = qe2/(4πε0), with qthe electron charge (without the minus sign) and ε0 our ubiquitous electric constant. [Note that, unlike Feynman, I prefer to not write e in italics, so as to not confuse it with Euler’s number ≈ 2.71828 etc. However, I know I am not always consistent in my notation. :-/ We don’t need Euler’s number in this post, so e or is always an expression for the electron charge, not Euler’s number. Stupid remark, perhaps, but I don’t want you to be confused.]
  • a is the radius of our charge—see we got away from the idea of a point charge? :-)
  • c2 is just c2, i.e. our weird constant (the square of the speed of light) which seems to connect everything to everything. Indeed, think about stuff like this: S/g = c= 1/(ε0μ0).

Now, p = mv, so that formula for p basically says that our elementary charge (as mentioned, g is the same for a positron or an electron: E and B will be reversed, but g is not) has an electromagnetic mass melec equal to:

emm - 2

That’s an amazing result. We don’t need to give our electron any rest mass: just its charge and its movement will do! Super! So we don’t need any Higgs fields here! :-) The electromagnetic field will do!

Well… Maybe. Let’s explore what we’ve got here.

First, let’s compare that radius a in our formula to what’s found in experiments. Huh? Did someone ever try to measure the electron radius? Of course. There are all these scattering experiments in which electrons get fired at atoms. They can fly through or, else, hit something. Therefore, one can some statistical analysis and determine what is referred to as a cross-section. A cross-section is denoted by the same symbol as the standard deviation: σ (sigma). In any case… So there’s something that’s referred to as the classical electron radius, and it’s equal to the so-called Thomsom scattering length. Thomson scattering, as opposed to Compton scattering, is elastic scattering, so it preserves kinetic energy (unlike Compton scattering, where energy gets absorbed and changes frequencies). So… Well… I won’t go into too much detail but, yes, this is the electron radius we need. [I am saying this rather explicitly because there are two other numbers around: the so-called Bohr radius and, as you might imagine, the Compton scattering cross-section.]

The Thomson scattering length is 2.82 femtometer (so that’s 2.82×10−15 m), more or less that is :-), and it’s usually related to the observed electron mass mthrough the fine-structure constant α. In fact, using Planck units, we can write:  re·me= α, which is an amazing formula but, unfortunately, I can’t dwell on it here. Using ordinary m, s, C and what have you units, we can write ras:

classical electron radius

That’s good, because if we equate mand melec and switch melec and a in our formula for melec, we get:


So, frankly, we’re spot on! Well… Almost. The two numbers differ by 1/3. But who cares about a 1/3 factor indeed? We’re talking rather fuzzy stuff here – scattering cross-sections and standard deviations and all that – so… Yes. Well done! Our theory works!

Well… Maybe. Physicists don’t think so. They think the 1/3 factor is an issue. It’s sad because it really makes a lot of sense. In fact, the Dutch physicist Hendrik Lorentz – whom we know so well by now :-) – had also worked out that, because of the length contraction effect, our spherical charge would contract into an ellipsoid and… Well… He worked it all out, and it was not a problem: he found that the momentum was altered by the factor (1−v2/c2)−1/2, so that’s the ubiquitous Lorentz factor γ! He got this formula in the 1890s already, so that’s long before the theory of relativity had been developed. So, many years before Planck and Einstein would come up with their stuff, Hendrik Antoon Lorentz had the correct formulas already: the mass, or everything really, all should vary with that γ-factor. :-)

Why bother about the 1/3 factor? [I should note it’s actually referred to as the 4/3 problem in physics.] Well… The critics do have a point: if we assume that (a) an electron is not a point charge – so if we allow it to have some radius a – and (b) that Maxwell’s Laws apply, then we should go all the way. The energy that’s needed to assemble an electron should then, effectively, be the same as the value we’d get out of those field energy formulas. So what do we get when we apply those formulas? Well… Let me quickly copy Feynman as he does the calculation for an electron, not looking at it as a point particle, but as a tiny shell of charge, i.e. a sphere with all charge sitting on the surface:

Feynman energy

 Let me enlarge the formula:

energy electron

Now, if we combine that with our formula for melec above, then we get:

4-3 problem

So that formula does not respect Einstein’s universal mass-energy equivalence formula E = mc2. Now, you will agree that we really want Einstein’s mass-energy equivalence relation to be respected by all, so our electron should respect it too. :-) So, yes, we’ve got a problem here, and it’s referred to as the 4/3 problem (yes, the ratio got turned around).

Now, you may think it got solved in the meanwhile. Well… No. It’s still a bit of a puzzle today, and the current-day explanation is not really different from what the French scientist Henri Poincaré proposed as a ‘solution’ to the problem back in the 1890s. He basically told Lorentz the following: “If the electron is some little ball of charge, then it should explode because of the repulsive forces inside. So there should be some binding forces there, and so that energy explains the ‘missing mass’ of the electron.” So these forces are effectively being referred to as Poincaré stresses, and the non-electromagnetic energy that’s associated with them – which, of course, has to be equal to 1/3 of the electromagnetic energy (I am sure you see why) :-) – adds to the total energy and all is alright now. We get:

U = mc2 = (melec + mPoincaré)c2

So… Yes… Pretty ad hoc. Worse, according to the Wikipedia article on electromagnetic mass, that’s still where we are. And, no, don’t read Feynman’s overview of all of the theories that were around then (so that’s in the 1960s, or earlier). As I said, it’s the one Lecture you don’t want to waste time on. So I won’t do that either.

In fact, let me try to do something else here, and that’s to de-construct the whole argument really. :-) Before I do so, let me highlight the essence of what was written above. It’s quite amazing really. Think of it: we say that the mass of an electron – i.e. its inertia, or the proportionality factor in Newton’s F = m·a law of motion – is the energy in the electric and magnetic field it causes. So the electron itself is just a hook for the force law, so to say. There’s nothing there, except for the charge causing the field. But so its mass is everywhere and, hence, nowhere really. Well… I should correct that: the field strength falls of as 1/rand, hence, the energy flow and momentum density that’s associated with it, falls of as 1/r4, so it falls of very rapidly and so the bulk of the energy is pretty near the charge. :-)

[Note: You’ll remember that the field that’s associated with electromagnetic radiation falls of as 1/r, not as 1/r2, which is why there is an energy flux there which is never lost, which can travel independently through space. It’s not the same here, so don’t get confused.]

So that’s something to note: the melec = (2c−2/3)·(e2/a) has the radius in it, but that radius is only the hook, so to say. That’s fine, because it is not inconsistent with the idea of the Thomson scattering cross-section, which is the area that one can hit. Now, you’ll wonder how one can hit an electron: you can readily imagine an electron beam aimed at nuclei, but how would one hit electrons? Well… You can shoot photons at them, and see if they bounce back elastically or non-elastically. The cross-section area that bounces them off elastically must be pretty ‘hard’, and the cross-section that deflects them non-elastically somewhat less so. :-)

OK… But… Yes? Hey! How did we get that electron radius in that formula? 

Good question! Brilliant, in fact! You’re right: it’s here that the whole argument falls apart really. We did a substitution. That radius a is the radius of a spherical shell of charge with an energy that’s equal to Uelec = (1/2)·(e2/a), so there’s another way of stating the inconsistency: the equivalent energy of melec = (2c−2/3)·(e2)/a) is equal to E = melec·c= (2/3)·(e2/a) and that’s not the same as Uelec = (1/2)·(e2/a). If we take the ratio of Uelec and melec·c=, we get the same factor: (1/2)/(2/3) = 3/4. But… Your question is superb! Look at it: putting it the way we put it reveals the inconsistency in the whole argument. We’re mixing two things here:

  1. We first calculate the momentum density, and the momentum, that’s caused by the unit charge, so we get some energy which I’ll denote as Eelec = melec·c2
  2. Now, we then assume this energy must be equal to the energy that’s needed to assemble the unit charge from an infinite number of infinitesimally small charges, thereby also assuming the unit charge is a uniformly charged sphere of charge with radius a.
  3. We then use this radius a to simplify our formula for Eelec = melec·c2

Now that is not kosher, really! First, it’s (a) a lot of assumptions, both implicit as well as explicit, and then (b) it’s, quite simply, not a legit mathematical procedure: calculating the energy in the field, or calculating the energy we need to assemble a uniformly charged sphere of radius a are two very different things.

Well… Let me put it differently. We’re using the same laws – it’s all Maxwell’s equations, really – but we should be clear about what we’re doing with them, and those two things are very different. The legitimate conclusion must be that our a is wrong. In other words, we should not assume that our electron is spherical shell of charge. So then what? Well… We could easily imagine something else, like a uniform or even a non-uniformly charged sphere. Indeed, if we’re just filling empty space with infinitesimally small charge ‘elements’, then we may want to think the density at the ‘center’ will be much higher, like what’s going on when planets form: the density of the inner core of our own planet Earth is more than four times the density of its surface material. [OK. Perhaps not very relevant here, but you get the idea.] Or, conversely, taking into account Poincaré’s objection, we may want to think all of the charge will be on the surface, just like on a perfect conductor, where all charge is surface charge!

Note that the field outside of a uniformly charged sphere and the field of a spherical shell of charge is exactly the same, so we would not find a different number for Eelec = melec·c2, but we surely would find a different number for Uelec. You may want to look up some formulas here: you’ll find that the energy of a uniformly distributed sphere of charge (so we do not assume that all of the charge sits on the surface here) is equal to (3/5)·(e2/a). So we’d already have much less of a problem, because the 3/4 factor in the Uelec = (3/4)·melec·c2 becomes a (5/3)·(2/3) = 10/9 factor. So now we have a discrepancy of some 10% only. :-)

You’ll say: 10% is 10%. It’s huge in physics, as it’s supposed to be an exact science. Well… It is and it isn’t. Do you realize we haven’t even started to talk about stuff like spin? Indeed, in modern physics, we think of electrons as something that also spins around one or the other axis, so there’s energy there too, and we didn’t include that in our analysis.

In short, Feynman’s approach here is disappointing. Naive even, but then… Well… Who knows? Perhaps he didn’t do this Lecture himself. Perhaps it’s just an assistant or so. In fact, I should wonder why there’s still physicists wasting time on this! I should also note that naively comparing that a radius with the classical electron radius also makes little or no sense. Unlike what you’d expect, the classical electron radius re and the Thomson scattering cross-section σare not related like you might think they are, i.e. like σ= π·re2 or σ= π·(re/2)2 or σre2 or σ= π·(2·re)2 or whatever circular surface calculation rule that might make sense here. No. The Thomson scattering cross-section is equal to:

σ= (8π/3)·re2 = (2π/3)·(2·re)2 = (2/3)·π·(2·re)2 ≈ 66.5×10−30 m= 66.5 (fm)2

Why? I am not sure. I must assume it’s got to do with the standard deviation and all that. The point is, we’ve got a 2/3 factor here too, so do we have a problem really? I mean… The a we got was equal to a = (2/3)·re, wasn’t it? It was. But, unfortunately, it doesn’t mean anything. It’s just a coincidence. In fact, looking at the Thomson scattering cross-section, instead of the Thomson scattering radius, makes the ‘problem’ a little bit worse. Indeed, applying the π·r2 rule for a circular surface, we get that the radius would be equal to (8/3)1/2·re ≈ 1.633·re, so we get something that’s much larger rather than something that’s smaller here.

In any case, it doesn’t matter. The point is: this kind of comparisons should not be taken too seriously. Indeed, when everything is said and done, we’re comparing three very different things here:

  1. The radius that’s associated with the energy that’s needed to assemble our electron from infinitesimally small charges, and so that’s based on Coulomb’s law and the model we use for our electron: is it a shell or a sphere of charge? If it’s a sphere, do we want to think of it as something that’s of uniform of non-uniform density.
  2. The second radius is associated with the field of an electron, which we calculate using Poynting’s formula for the energy flow and/or the momentum density. So that’s not about the internal structure of the electron but, of course, it would be nice if we could find some model of an electron that matches this radius.
  3. Finally, there’s the radius that’s associated with elastic scattering, which is also referred to as hard scattering because it’s like the collision of two hard spheres indeed. But so that’s some value that has to be established experimentally and so it involves judicious choices because there’s probabilities and standard deviations involved.

So should we worry about the gaps between these three different concepts? In my humble opinion: no. Why? Because they’re all damn close and so we’re actually talking about the same thing. I mean: isn’t terrific that we’ve got a model that brings the first and the second radius together with a difference of 10% only? As far as I am concerned, that shows the theory works. So what Feynman’s doing in that (in)famous chapter is some kind of ‘dimensional analysis’ which confirms rather than invalidates classical electromagnetic theory. So it shows classical theory’s strength, rather than its weakness. It actually shows our formula do work where we wouldn’t expect them to work. :-)

The thing is: when looking at the behavior of electrons themselves, we’ll need a different conceptual framework altogether. I am talking quantum mechanics here. Indeed, we’ll encounter other anomalies than the ones we presented above. There’s the issue of the anomalous magnetic moment of electrons, for example. Indeed, as I mentioned above, we’ll also want to think as electrons as spinning around their own axis, and so that implies some circulation of E that will generate a permanent magnetic dipole moment… […] OK, just think of some magnetic field if you don’t have a clue what I am saying here (but then you should check out my post on it). […] The point is: here too, the so-called ‘classical result’, so that’s its theoretical value, will differ from the experimentally measured value. Now, the difference here will be 0.0011614, so that’s about 0.1%, i.e. 100 times smaller than my 10%. :-)

Personally, I think that’s not so bad. :-) But then physicists need to stay in business, of course. So, yes, it is a problem. :-)

Post scriptum on the math versus the physics

The key to the calculation of the energy that goes into assembling a charge was the following integral:

U 4

This is a double integral which we simplified in two stages, so we’re looking at an integral within an integral really, but we can substitute the integral over the ρ(2)·dVproduct by the formula we got for the potential, so we write that as Φ(1), and so the integral above becomes:

U 5Now, this integral integrates the ρ(1)·Φ(1)·dVproduct over all of space, so that’s over all points in space, and so we just dropped the index and wrote the whole thing as the integral of ρ·Φ·dV over all of space:

U 6

We then established that this integral was mathematically equivalent to the following equation:

U 7

So this integral is actually quite simple: it just integrates EE = E2 over all of space. The illustration below shows E as a function of the distance for a sphere of radius R filled uniformly with charge.

uniform density

So the field (E) goes as for r ≤ R and as 1/rfor r ≥ R. So, for r ≥ R, the integral will have (1/r2)2 = 1/rin it. Now, you know that the integral of some function is the surface under the graph of that function. Look at the 1/r4 function below: it blows up between 1 and 0. That’s where the problem is: there needs to be some kind of cut-off, because that integral will effectively blow up when the radius of our little sphere of charge gets ‘too small’. So that makes it clear why it doesn’t make sense to use this formula to try to calculate the energy of a point charge. It just doesn’t make sense to do that.


What’s ‘too small’? Let’s look at the formula we got for our electron as a spherical shell of charge:

energy electron

So we’ve got an even simpler formula here: it’s just a 1/relation. Why is that? Well… It’s just the way the math turns it out. I copied the detail of Feynman’s calculation above, so you can double-check it. It’s quite wonderful, really. We have a very simple inversely proportional relationship between the radius of our electron and its energy as a sphere of charge. We could write it as:

Uelect  = α/, with α = e2/2

But – Hey! Wait a minute! We’ve seen something like this before, haven’t we? We did. We did when we were discussing the wonderful properties of that magical number, the fine-structure constant, which we also denoted by α. :-) However, because we used α already, I’ll denote the fine-structure constant as αe here, so you don’t get confused. As you can see, the fine-structure constant links all of the fundamental properties of the electron: its charge, its radius, its distance to the nucleus (i.e. the Bohr radius), its velocity, and its mass (and, hence, its energy). So, at this stage of the argument, α can be anything, and αcannot, of course. It’s just that magical number out there, which relates everything to everything: it’s the God-given number we don’t understand. :-) Having said that, it seems like we’re going to get some understanding here because we know that, one the many expressions involving αe was the following one:

me = αe/re

This says that the mass of the electron is equal to the ratio of the fine-structure constant and the electron radius. [Note that we express everything in natural units here, so that’s Planck units. For the detail of the conversion, please see the relevant section on that in my one of my posts on this and other stuff.] Now, mass is equivalent to energy, of course: it’s just a matter of units, so we can equate me with Ee (this amounts to expressing the energy of the electron in a kg unit—bit weird, but OK) and so we get:

Ee = αe/re

So there we have: the fine-structure constant αe is Nature’s ‘cut-off’ factor, so to speak. Why? Only God knows. :-) But it’s now (fairly) easy to see why all the relations involving αe are what they are. For example, we also know that αe is the square of the electron charge expressed in Planck units, so we have:

 α = eP2 and, therefore, Ee = eP2/re

Now, you can check for yourself: it’s just a matter of re-expressing everything in standard SI units, and relating eP2 to e2, and it should all work: you should get the Uelect  = (1/2)·e2/expression. So… Well… At least this takes some of the magic out the fine-structure constant. It’s still a wonderful thing, but so you see that the fundamental relationship between (a) the energy (and, hence, the mass), (b) the radius and (c) the charge of an electron is not something God-given. What’s God-given are Maxwell’s equations, and so the Ee = αe/r= eP2/re is just one of the many wonderful things that you can get out of  them :-)

Field energy and field momentum

This post goes to the heart of the E = mc2, equation. It’s kinda funny, because Feynman just compresses all of it in a sub-section of his Lectures. However, as far as I am concerned, I feel it’s a very crucial section. Pivotal, I’d say, which would fit with its place in all of the 115 Lectures that make up the three volumes, which is sort of mid-way, which is where we are here. So let’s get go for it. :-)

Let’s first recall what we wrote about the Poynting vector S, which we calculate from the magnetic and electric field vectors E and B by taking their cross-product:

S formula

This vector represents the energy flow, per unit area and per unit time, in electrodynamical situations. If E and/or are zero (which is the case in electrostatics, for example, because we don’t have magnetic fields in electrostatics), then S is zero too, so there is no energy flow then. That makes sense, because we have no moving charges, so where would the energy go to?

I also made it clear we should think of S as something physical, by comparing it to the heat flow vector h, which we presented when discussing vector analysis and vector operators. The heat flow out of a surface element da is the area times the component of perpendicular to da, so that’s (hn)·da = hn·da. Likewise, we can write (Sn)·da = Sn·da. The units of S and h are also the same: joule per second and per square meter or, using the definition of the watt (1 W = 1 J/s), in watt per square meter. In fact, if you google a bit, you’ll find that both h and S are referred to as a flux density:

  1. The heat flow vector h is the heat flux density vector, from which we get the heat flux through an area through the (hn)·da = hn·da product.
  2. The energy flow is the energy flux density vector, from which we get the energy flux through the (Sn)·da = Sn·da product.

So that should be enough as an introduction to what I want to talk about here. Let’s first look at the energy conservation principle once again.

Local energy conservation

In a way, you can look at my previous post as being all about the equation below, which we referred to as the ‘local’ energy conservation law:

energy flux

Of course, it is not the complete energy conservation law. The local energy is not only in the field. We’ve got matter as well, and so that’s what I want to discuss here: we want to look at the energy in the field as well as the energy that’s in the matter. Indeed, field energy is conserved, and then it isn’t: if the field is doing work on matter, or matter is doing work on the field, then… Well… Energy goes from one to the other, i.e. from the field to the matter or from the matter to the field. So we need to include matter in our analysis, which we didn’t do in our last post. Feynman gives the following simple example: we’re in a dark room, and suddenly someone turns on the light switch. So now the room is full of field energy—and, yes, I just mean it’s not dark anymore. :-). So that means some matter out there must have radiated its energy out and, in the process, it must have lost the equivalent mass of that energy. So, yes, we had matter losing energy and, hence, losing mass.

Now, we know that energy and momentum are related. Respecting and incorporating relativity theory, we’ve got two equivalent formulas for it:

  1. E− p2c2 = m02c4
  2. pc = E·(v/c) ⇔ p = v·E/c= m·v

The E = mc2 and m = ·m0·(1−v2/c2)−1/2 formulas connect both expressions. So we can look at it in either of two ways. We could use the energy conservation law, but Feynman prefers the conservation of momentum approach, so let’s see where he takes us. If the field has some energy (and, hence, some equivalent mass) per unit volume, and if there’s some flow, so if there’s some velocity (which there is: that’s what our previous post was all about), then it will have a certain momentum per unit volume. [Remember: momentum is mass times velocity.] That momentum will have a direction, so it’s a vector, just like p = mv. We’ll write it as g, so we define g as:

g is the momentum of the field per unit volume.

What units would we express it in? We’ve got a bit of choice here. For example, because we’re relating everything to energy here, we may want to convert our kilogram into eV/cor J/cunits, using the mass-energy equivalence relation E = mc2. Hmm… Let’s first keep the kg as a measure of inertia though. So we write: [g] = [m]·[v]/m= (kg·m/s)/m3. Hmm… That doesn’t show it’s energy, so let’s replace the kg with a unit that’s got newton and meter in it, cf. the F = ma law. So we write: [g] = (kg·m/s)/m= (kg/s)/m= [(N·s2/m)/s]/m= N·s/m3. Well… OK. The newton·second is the unit of momentum indeed, and we can re-write it including the joule (1 J = 1 N·m), so then we get [g] = (J·s/m4), so what’s that? Well… Nothing much. However, I do note it happens to be the dimension of S/c2, so that’s [S/c2] = [J/(s·m2)]·(s2/m2) = (J·s/m4). :-) Let’s continue the discussion.

Now, momentum is conserved, and each component of it is conserved. So let’s look at the x-direction. We should have something like:


If you look at this carefully, you’ll probably say: “OK. I understood the thing with the dark room and light switch. Mass got converted into field energy, but what’s that second term of the left?”

Good. Smart. Right remark. Perfect. […] Let me try to answer the question. While all of the quantities above are expressed per unit volume, we’re actually looking at the same infinitesimal volume element here, so the example of the light switch is actually an example of a ‘momentum outflow’, so it’s actually an example of that second term of the left-hand side of the equation above kicking in! :-)

Indeed, the first term just sort of reiterates the mass-energy equivalence: the energy that’s in the matter can become field energy, so to speak, in our infinitesimal volume element itself, and vice versa. But if it doesn’t, then it should get out and, hence, become ‘momentum outflow’. Does that make sense? No?

Hmm… What to say? You’ll need to look at that equation a couple of times more, I guess. :-/ But I need to move on, unfortunately. [Don’t get put off when I say things like this: I am basically talking to myself, so it means I’ll need to re-visit this myself. :-/]

Let’s look at all of the three terms:

  1. The left-hand side (i.e. the time rate-of-change of the momentum of matter) is easy. It’s just the force on it, which we know is equal to Fq(E+v×B). Do we know that? OK… I’ll admit it. Sometimes it’s easy to forget where we are in an analysis like this, but so we’re looking at the electromagnetic force here. :-) As we’re talking infinitesimals here and, therefore, charge density rather than discrete charges, we should re-write this as the force per unit volume which is ρE+j×B. [This is an interesting formula which I didn’t use before, so you should double-check it. :-)]
  2. The first term on the right-hand side should be equally obvious, or… Well… Perhaps somewhat less so. But with all my rambling on the Uncertainty Principle and/or the wave-particle duality, it should make sense. If we scrap the second term on the right-hand side, we basically have an equation that is equivalent to the E = mc2 equation. No? Sorry. Just look at it, again and again. You’ll end up understanding it. :-)
  3. So it’s that second term on the right-hand side. What the hell does that say? Well… I could say: it’s the local energy or momentum conservation law. If the energy or momentum doesn’t stay in, it has to go out. :-) But that’s not very satisfactory as an answer, of course. However, please just go along with this ‘temporary’ answer for a while.

So what is that second term on the right-hand side? As we wrote it, it’s an x-component – or, let’s put it differently, it is or was part of the x-component of the momentum density – but, frankly, we should probably allow it to go out in any direction really, as the only constraint on the left-hand side is a per second rate of change of something. Hence, Feynman suggest to equate it to something like this:


What a, b and c? The components of some vector? Not sure. We’re stuck. This piece really requires very advanced math. In fact, as far as I know, this is the only time where Feynman says: “Sorry. This is too advanced. I’ll just give you the equation. Sorry.” So that’s what he does. He explains the philosophy of the argument, which is the following:

  1. On the left-hand side, we’ve got the time rate-of-change of momentum, so that obeys the F = dp/dt = d(mv)/dt law, with the force Fper unit volume, being equal to F(unit volume) = ρE+j×B.
  2. On the right-hand side, we’ve got something that can be written as:

general 2

So we’d need to find a way to ρE+j×B in terms of and B only – eliminating ρ and j by using Maxwell’s equations or whatever other trick  – and then juggle terms and make substitutions to get it into a form that looks like the formula above, i.e. the right-hand side of that equation. But so Feynman doesn’t show us how it’s being done. He just mentions some theorem in physics, which says that the energy that’s flowing through a unit area per unit time divided by c2 – so that’s E/cper unit area and per unit time – must be equal to the momentum per unit volume in the space, so we write:

g = S/c2

He illustrates the general theorem that’s used to get the equation above by giving two examples:

example theorem

OK. Two good examples. However, it’s still frustrating to not see how we get the g = S/c2 in the specific context of the electromagnetic force, so let’s do a dimensional analysis at least. In my previous post, I showed that the dimension of S must be J/(m2·s), so [S/c2] = [J/(m2·s)]/(m2/s2) = [N·m/(m2·s)]·(s2/m2) = [N·s/m3]. Now, we know that the unit of mass is 1 kg = N/(m/s2). That’s just the force law: a force of 1 newton will give a mass of 1 kg an acceleration of 1 m/s per second, so 1 N = 1 kg·(m/s2). So the [N·s/m3] dimension is equal to [kg·(m/s2)·s/m3] = [(kg·(m/s)/m3] = [(kg·(m/s)]/m3, which is the dimension of momentum (p = mv) per unit volume, indeed. So, yes, the dimensional analysis works out, and it’s also in line with the p = v·E/c2 = m·v equation, but… Oh… We did a dimensional analysis already, where we also showed that [g] = [S/c2] = (J·s/m4). Well… In any case… It’s a bit frustrating to not see the detail here, but let us note the the Grand Result once again:

The Poynting vector S gives us the energy flow as well as the momentum density= S/c2.

But what does it all mean, really? Let’s go through Einstein’s illustration of the principle. That will help us a lot. Before we do, however, I’d like to note something. I’ve always wondered a bit about that dichotomy between energy and momentum. Energy is force times distance: 1 joule is 1 newton × 1 meter indeed (1 J = 1 N·m). Momentum is force times time, as we can express it in N·s. Planck’s constant combines all three in the dimension of action, which is force times distance times time: ≈ 6.6×10−34 N·m·s, indeed. I like that unity. In this regard, you should, perhaps, quickly review that post in which I explain that is the energy per cycle, i.e. per wavelength or per period, of a photon, regardless of its wavelength. So it’s really something very fundamental.

We’ve got something similar here: energy and momentum coming together, and being shown as one aspect of the same thing: some oscillation. Indeed, just see what happens with the dimensions when we ‘distribute’ the 1/cfactor on the right-hand side over the two sides, so we write: c·= S/c and work out the dimensions:

  1. [c·g = (m/s)·(N·s)/m= N/m= J/m3.
  2. [S/c] = (s/m)·(N·m)/(s·m2) = N/m= J/m3.

Isn’t that nice? Both sides of the equation now have a dimension like ‘the force per unit area’, or ‘the energy per unit volume’. To get that, we just re-scaled g and S, by c and 1/c respectively. As far as I am concerned, this shows an underlying unity we probably tend to mask with our ‘related but different’ energy and momentum concepts. It’s like E and B: I just love it we can write them together in our Poynting formula = ε0c2E×B. In fact, let me show something else here, which you should think about. You know that c= 1/(ε0μ0), so we can write also as SE×B0. That’s nice, but what’s nice too is the following:

  1. S/c = c·= ε0cE×B = E×B/μ0c
  2. S/g = c= 1/(ε0μ0)

So, once again, Feynman may feel the Poynting vector is sort of counter-intuitive when analyzing specific situations but, as far as I am concerned, I feel the Poyning vector makes things actually easier to understand. Instead of two E and B vectors, and two concepts to deal with ‘energy’ (i.e. energy and momentum), we’re sort of unifying things here. In that regard – i.e in regard of feeling we’re talking the same thing really – I’d really highlight the S/g = c2 = 1/(ε0μ0) equation. Indeed, the universal constant acts just like the fine-structure constant here: it links everything to everything. :-)

And, yes, it’s also about time we introduce the so-called principle of least action to explain things, because action, as a concept, combines force, distance and time indeed, so it’s a bit more promising than just energy, of just momentum. Having said that, you’ll see in the next section that it’s sometimes quite useful to have the choice between one formula or the other. But… Well… Enough talk. Let’s look at Einstein’s car.

Einstein’s car

Einstein’s car is a wonderful device: it rolls without any friction and it moves with a little flashlight. That’s all it needs. It’s pictured below. :-) So the situation is the following: the flashlight shoots some light out from one side, which is then stopped at the opposite end of the car. When the light is emitted, there must be some recoil. In fact, we know it’s going to be equal to 1/c times the energy because all we need to do is apply the pc = E·(v/c) formula for v = c, so we know that p = E/c. Of course, this momentum now needs to move Einstein’s car. It’s frictionless, so it should work, but still… The car has some mass M, and so that will determine its recoil velocity: v = p/M. We just apply the general p = mv formula here, and v is not equal to c here, of course! Of course, then the light hits the opposite end of the car and delivers the same momentum, so that stops the car again. However, it did move over some distance x = vt. So we could flash our light again and get to wherever we want to get. [Never mind the infinite accelerations involved!] So… Well… Great! Yes, but Einstein didn’t like this car when he first saw it. In fact, he still doesn’t like it, because he knows it won’t take you very far. :-)

Einsteins' car

The problem is that we seem to be moving the center of gravity of this car by fooling around on the inside only. Einstein doesn’t like that. He thinks it’s impossible. And he’s right of course. The thing is: the center of gravity did not change. What happened here is that we’ve got some blob of energy, and so that blob has some equivalent mass (which we’ll denote by U/c2), and so that equivalent mass moved all the way from one side to the other, i.e. over the length of the car, which we denote by L. In fact, it’s stuff like this that inspired the whole theory of the field energy and field momentum, and how it interacts with matter.

What happens here is like switching the light on in the dark room: we’ve got matter doing work on the field, and so matter loses mass, and the field gains it, through its momentum and/or energy. To calculate how much, we could integrate S/c or c·over the volume of our blob, and we’d get something in joule indeed, but there’s a simpler way here. The momentum conservation says that the momentum of our car and the momentum of our blob must be equal, so if T is the time that was needed for our blob to go to the other side – and so that’s, of course, also the time during which our car was rolling – then M·v = M·x/T must be equal to (U/c2= (U/c2)·L/T. The 1/T factor on both sides cancel, so we write: M·x = (U/c2)·L. Now, what is x? Yes. In case you were wondering, that’s what we’re looking for here. :-) Here it is:

x = vT = vL/c = (p/M)·(L/c) = [U/c)/M]·(L/c) = (U/c2)·(L/M)

So what’s next? Well… Now we need to show that the center-of-mass actually did not move with this ‘transfer’ of the blob. I’ll leave the math to you here: it should all work out. And you can also think through the obvious questions:

  1. Where is the energy and, hence, the mass of our blob after it stops the car? Hint: think about excited atoms and imagine they might radiate some light back. :-)
  2. As the car did move a little bit, we should be able to move it further and further away from its center of gravity, until the center of gravity is no longer in the car. Hint: think about batteries and energy levels going down while shooting light out. It just won’t happen. :-)

Now, what about a blob of light going from the top to the bottom of the car? Well… That involves the conservation of angular momentum: we’ll have more mass on the bottom, but on a shorter lever-arm, so angular momentum is being conserved. It’s a very good question though, and it led Einstein to combine the center-of-gravity theorem with the angular momentum conservation theorem to explain stuff like this.

It’s all fascinating, and one can think of a great many paradoxes that, at first, seem to contradict the Grand Principles we used here, which means that they would contradict all that we have learned so far. However, a careful analysis of those paradox reveals that they are paradoxes indeed: propositions which sound true but are, in the end, self-contradictory. In fact, when explaining electromagnetism over his various Lectures, Feynman tasks his readers with a rather formidable paradox when discussing the laws of induction, he solves it here, ten chapters later, after describing what we described above. You can busy yourself with it but… Well… I guess you’ve got something better to do. If so, just take away the key lesson: there’s momentum in the field, and it’s also possible to build up angular momentum in a magnetic field and, if you switch it off, the angular momentum will be given back, somehow, as it’s stored energy.

That’s also why the seemingly irrelevant circulation of S we discussed in my previous post, where we had a charge next to an ordinary magnet, and where we found that there was energy circulating around, is not so queer. The energy is there, in the circulating field, and it’s real. As real as can be. :-)


The energy of fields and the Poynting vector

For some reason, I always thought that Poynting was a Russian physicist, like Minkowski. He wasn’t. I just looked it up. Poynting was an Englishman, born near Manchester, and he teached in Birmingham. I should have known. Poynting is a very English name, isn’t it? My confusion probably stems from the fact that it was some Russian physicist, Nikolay Umov, who first proposed the basic concepts we are going to discuss here, i.e. the speed and direction of energy itself, or its movement. And as I am double-checking, I just learned that Hermann Minkowski is generally considered to be German-Jewish, not Russian. Makes sense. With Einstein and all that. His personal life story is actually quite interesting. You should check it out. :-)

Let’s go for it. We’ve done a few posts on the energy in the fields already, but all in the contexts of electrostatics. Let me first walk you through the ideas we presented there.

The basic concepts: force, work, energy and potential

1. A charge q causes an electric field E, and E‘s magnitude E is a simple function of the charge (q) and its distance (r) from the point that we’re looking at, which we usually write as P = (x, y, z). Of course, the origin of our reference frame here is q. The formula is the simple inverse-square law that you (should) know: E ∼ q/r2, and the proportionality constant is just Coulomb’s constant, which I think you wrote as ke in your high-school days and which, as you know, is there so as to make sure the units come out alright. So we could just write E = ke·q/r2. However, just to make sure it does not look like a piece of cake :-) physicists write the proportionality constant as 1/4πε0, so we get:

E 3

Now, the field is the force on any unit charge (+1) we’d bring to P. This led us to think of energy, potential energy, because… Well… You know: energy is measured by work, so that’s some force acting over some distance. The potential energy of a charge increases if we move it against the field, so we wrote:

formula 1

Well… We actually gave the formula below in that post, so that’s the work done per unit charge. To interpret it, you just need to remember that F = qE, which is equivalent to saying that E is the force per unit charge.

unit chage

As for the F•ds or E•ds product in the integrals, that’s a vector dot product, which we need because it’s only the tangential component of the force that’s doing work, as evidenced by the formula F•ds = |F|·|ds|·cosθ = Ft·ds, and as depicted below.

illustration 1

Now, this allowed us to describe the field in terms of the (electric) potential Φ and the potential differences between two points, like the points a and b in the integral above. We have to chose some reference point, of course, some P0 defining zero potential, which is usually infinitely far away. So we wrote our formula for the work that’s being done on a unit charge, i.e. W(unit) as:


2. The world is full of charges, of course, and so we need to add all of their fields. But so now you need a bit of imagination. Let’s reconstruct the world by moving all charges out, and then we bring them back one by one. So we take q1 now, and we bring it back into the now-empty world. Now that does not require any energy, because there’s no field to start with. However, when we take our second charge q2, we will be doing work as we move it against the field or, if it’s an opposite charge, we’ll be taking energy out of the field. Huh? Yes. Think about it. All is symmetric. Just to make sure you’re comfortable with every step we take, let me jot down the formula for the force that’s involved. It’s just the Coulomb force of course:

Coulomb's law

Fis the force on charge q1, and Fis the force on charge q2. Now, qand q2. may attract or repel each other but the forces will always be equal and opposite. The e12 vector makes sure the directions and signs come out alright, as it’s the unit vector from qto q(not from qto q2, as you might expect when looking at the order of the indices). So we would need to integrate this for r going from infinity to… Well… The distance between qand q2 – wherever they end up as we put them back into the world – so that’s what’s denoted by r12. Now I hate integrals too, but this is an easy one. Just note that ∫ r−2dr = 1/r and you’ll be able to figure out that what I’ll write now makes sense (if not, I’ll do a similar integral in a moment): the work done in bringing two charges together from a large distance (infinity) is equal to:

U 1So now we should bring in qand then q4, of course. That’s easy enough. Bringing the first two charges into that world we had emptied took a lot of time, but now we can automate processes. Trust me: we’ll be done in no time. :-) We just need to sum over all of the pairs of charges qi and qj. So we write the total electrostatic energy U as the sum of the energies of all possible pairs of charges:

U 3

Huh? Can we do that? I mean… Every new charge that we’re bringing in here changes the field, doesn’t it? It does. But it’s the magic of the superposition principle at work here. Our third charge qis associated with two pairs in this formula. Think of it: we’ve got the q1qand the q2qcombination, indeed. Likewise, our fourh charge qis to be paired up with three charges now: q1, q1 and q3. This formula takes care of it, and the ‘all pairs’ mention under the summation sign (Σ) reminds us we should watch we don’t double-count pairs: the q1qand q3qcombination, for example, count for one pair only, obviously. So, yes, we write ‘all pairs’ instead of the usual i, j subscripts. But then, yes, this formula takes care of it. We’re done!

Well… Not really, of course. We’ve still got some way to go before I can introduce the Poynting vector. :-) However, to make sure you ‘get’ the energy formula above, let me insert an extremely simple diagram so you’ve got a bit of a visual of what we’re talking about.

U system

3. Now, let’s take a step back. We just calculated the (potential) energy of the world (U), which is great. But perhaps we should also be interested in the world’s potential Φ, rather than its potential energy U. Why? Well, we’ll want to know what happens when we bring yet another charge in—from outer space or so. :-) And so then it’s easier to know the world’s potential, rather than its energy, because we can calculate the field from it using the E = −Φ formula. So let’s de- and re-construct the world once again :-) but now we’ll look at what happens with the field and the potential.

We know our first charge created a field with a field strength we calculated as:

E 3

So, when bringing in our second charge, we can use our Φ(P) integral to calculate the potential:


[Let me make a note here, just for the record. You probably think I am being pretty childish when talking about my re-construction of the world in terms of bringing all charges out and then back in again but, believe me, there will be a lot of confusion when we’ll start talking about the energy of one charge, and that confusion can be avoided, to a large extent, when you realize that the idea (I mean the concept itself, really—not its formula) of a potential involves two charges really. Just remember: it’s the first charge that causes the field (and, of course, any charge causes a field), but calculating a potential only makes sense when we’re talking some other charge. Just make a mental note of it. You’ll be grateful to me later.]

Let’s now combine the integral and the formula for E above. Because you hate integrals as much as I do, I’ll spell it out: the antiderivative of the Φ(P) integral is ∫ q/(4πε0r2)·dr. Now, let’s bring q/4πε0 out for a while so we can focus on solving ∫(1/r2)dr. Now, ∫(1/r2)dr is equal to –1/r + k, and so the whole antiderivative is –q/4πε0r + k. Now, we integrate from r = ∞ to r, and so the definite integral is [–q/(4πε0)]·[1/∞ − 1/r] = [–q/(4πε0)]·[0 − 1/r] = q/(4πε0r). Let me present this somewhat nicer:

E 4

You’ll say: so what? Well… We’re done! The only thing we need to do now is add up the potentials of all of the charges in the world. So the formula for the potential Φ at a point which we’ll simply refer to as point 1, is:

P 1

Note that our index j starts at 2, otherwise it doesn’t make sense: we’d have a division by zero for the q1/r11 term. Again, it’s an obvious remark, but not thinking about it can cause a lot of confusion down the line.

4. Now, I am very sorry but I have to inform you that we’ll be talking charge densities and all that shortly, rather than discrete charges, so I have to give you the continuum version of this formula, i.e. the formula we’ll use when we’ve got charge densities rather than individual charges. That sum above then becomes an infinite sum (i.e. an integral), and qj becomes a variable which we write as ρ(2). [That’s totally in line with our index j starts at 2, rather than from 1.] We get:

U 2

Just look at this integral, and try to understand it: we’re integrating over all of space – so we’re integrating the whole world, really :-) – and the ρ(2)·dVproduct in the integral is just the charge of an infinitesimally small volume of our world. So the whole integral is just the (infinite) sum of the contributions to the potential (at point 1) of all (infinitesimally small) charges that are around indeed. Now, there’s something funny here. It’s just a mathematical thing: we don’t need to worry about double-counting here. Why? We’re not having products of volumes here. Just make a mental note of it because it will be different in a moment.

Now we’re going to look at the continuum version for our energy formula indeed. Which energy formula? That electrostatic energy formula, which said that the total electrostatic energy U as the sum of the energies of all possible pairs of charges:

U 3

Its continuum version is the following monster:

U 4

Hmm… What kind of integral is that? We’ve got two variables here: dV2 and dV1. Yes. And we’ve also got a 1/2 factor now, because we do not want to double-count and, unfortunately, there is no convenient way of writing an integral like this that keeps track of the pairs. It’s a so-called double integral, but I’ll let you look up the math yourself. In any case, we can simplify this integral so you don’t need to worry about it too much. How do we simplify it? Well… Just look at that integral we got for Φ(1): we calculated the potential at point 1 by integrating the ρ(2)·dVproduct over all of space, so the integral above can be written as:

U 5But so this integral integrates the ρ(1)·Φ(1)·dVproduct over all of space, so that’s over all points in space. So we can just drop the index and write the whole thing as the integral of ρ·Φ·dV over all of space:

U 6

5. It’s time for the hat-trick now. The equation above is mathematically equivalent to the following equation:

U 7

Huh? Yes. Let me make two remarks here. First on the math, the E = −Φ formula allows you to the integrand of the integral above as E•E = (−Φ)•(−Φ) = (Φ)•(Φ). And then you may or may not remember that, when substituting E = −Φ in Maxwell’s first equation (E = ρ/ε0), we got the following equality: ρ = ε0·•(Φ) = ε0·∇2Φ, so we can write ρΦ as ε0·Φ·∇2Φ. However, that still doesn’t show the two integrals are the same thing. The proof is actually rather involved, and so I’ll refer to that post I referred to, so you can check the proof there.

The second remark is much more fundamental. The two integrals are mathematically equivalent, but are they also physically? What do I mean with that? Well… Look at it. The second integral implies that we can look at (ε0/2)·EE = ε0E2/2 as an energy density, which we’ll denote by u, so we write:

D 6

Just to make sure you ‘get’ what we’re talking about here: u is the energy density in the little cube dV in the rather simplistic (and, therefore, extremely useful) illustration below (which, just like most of what I write above, I got from Feynman).


Now the question: what is the reality of that formula? Indeed, what we did when calculating U amounted to calculating the Universe with some number U – and that’s kinda nice, of course! – but then what? Is u = ε0E2/2 anything real? Well… That’s what this post is about. So we’re finished with the introduction now. :-)

Energy density and energy flow in electrodynamics

Before giving you any more formulas, let me answer the question: there is no doubt, in the classical theory of electromagnetism at least, that the energy density u is something very real. It has to be because of the charge conservation law. Charges cannot just disappear in space, to then re-appear somewhere else. The charge conservation law is written as j = −∂ρ/∂t, and that makes it clear it’s a local conservation law. Therefore, charges can only disappear and re-appear through some current. We write dQ1/dt = ∫ (j•n)·da = −dQ2/dt, and here’s the simple illustration that comes with it:

charge flow

So we do not allow for any ‘non-local’ interactions here! Therefore, we say that, if energy goes away from a region, it’s because it flows away through the boundaries of that region. So that’s what the Poynting formulas are all about, and so I want to be clear on that from the outset.

Now, to get going with the discussion, I need to give you the formula for the energy density in electrodynamics. Its shape won’t surprise you:

energy density

However, it’s just like the electrostatic formula: it takes quite a bit of juggling to get this from our electrodynamic equations, so, if you want to see how it’s done, I’ll refer you to Feynman. Indeed, I feel the derivation doesn’t matter all that much, because the formula itself is very intuitive: it’s really the thing everyone knows about a wave, electromagnetic or not: the energy in it is proportional to the square of its amplitude, and so that’s E•E = E2 and B•B = B2. Now, you also know that the magnitude of B is 1/c of that of E, so cB = E, and so that explains the extra c2 factor in the second term.

The second formula is also very intuitive. Let me write it down:

energy flux

Just look at it: u is the energy density, so that’s the amount of energy per unit volume at a given point, and so whatever flows out of that point must represent its time rate of change. As for the –S expression… Well… Sorry, I can’t keep re-explaining things: the • operator is the divergence, and so it give us the magnitude of a (vector) field’s source or sink at a given point. is a scalar, and if it’s positive in a region, then that region is a source. Conversely, if it’s negative, then it’s a sink. To be precise, the divergence represents the volume density of the outward flux of a vector field from an infinitesimal volume around a given point. So, in this case, it gives us the volume density of the flux of S. As you can see, the formula has exactly the same shape as j = −∂ρ/∂t.

So what is S? Well… Think about the more general formula for the flux out of some closed surface, which we get from integrating over the volume enclosed. It’s just Gauss’ Theorem:

Gauss Theorem

Just replace C by E, and think about what it meant: the flux of E was the field strength multiplied by the surface area, so it was the total flow of E. Likewise, S represents the flow of (field) energy. Let me repeat this, because it’s an important result:

S represents the flow of field energy.

Huh? What flow? Per unit area? Per second? How do you define such ‘flow’? Good question. Let’s do a dimensional analysis:

  1. E is measured in newton per coulomb, so [E•E] = [E2] = N2/C2.
  2. B is measured in (N/C)/(m/s). [Huh? Well… Yes. I explained that a couple of times already. Just check it in my introduction to electric circuits.] So we get [B•B] = [B2] = (N2/C2)·(s2/m2) but the dimension of our c2 factor is (m2/s2) so we’re left with N2/C2. That’s nice, because we need to add in the same units.
  3. Now we need to look at ε0. That constant usually ‘fixes’ our units, but can we trust it to do the same now? Let’s see… One of the many ways in which we can express its dimension is [ε0] = C2/(N·m2), so if we multiply that with N2/C2, we find that u is expressed in N/m2Wow! That’s kinda neat. Why? Well… Just multiply with m/m and its dimension becomes N·m/m= J/m3, so that’s  joule per cubic meter, so… Yes: has got the right unit for something that’s supposed to measure energy density!
  4. OK. Now, we take the time rate of change of u, and so both the right and left of our ∂u/∂t = −formula are expressed in (J/m3)/s, which means that the dimension of S itself must be J/(m2·s). Just check it by writing it all out: = ∂Sx/∂x + ∂Sy/∂x + ∂Sz/∂z, and so that’s something per meter so, to get the dimension of S itself, we need to go from cubic meter to square meter. Done! Let me highlight the grand result:

S is the energy flow per unit area and per second.

Now we’ve got its magnitude and its dimension, but what is its direction? Indeed, we’ve been writing S as a vector, but… Well… What’s its direction indeed?

Well… Hmm… I referred you to Feynman for that derivation of that u = ε0E2/2 + ε0c2B2/2 formula energy for u, and so the direction of S – I should actually say, its complete definition – comes out of that derivation as well. So… Well… I think you should just believe what I’ll be writing here for S:

S formula

So it’s the vector cross product of E and B with ε0cthrown in. It’s a simple formula really, and because I didn’t drag you through the whole argument, you should just quickly do a dimensional analysis again—just to make sure I am not talking too much nonsense. :-) So what’s the direction? Well… You just need to apply the usual right-hand rule:

right hand rule

OK. We’re done! This S vector, which – let me repeat it – represents the energy flow per unit area and per second, is what is referred to as Poynting’s vector, and it’s a most remarkable thing, as I’ll show now. Let’s think about the implications of this thing.

Poynting’s vector in electrodynamics

The S vector is actually quite similar to the heat flow vector h, which we presented when discussing vector analysis and vector operators. The heat flow out of a surface element da is the area times the component of perpendicular to da, so that’s (hn)·da = hn·da. Likewise, we can write (Sn)·da = Sn·da. The units of S and h are also the same: joule per second and per square meter or, using the definition of the watt (1 W = 1 J/s), in watt per square meter. In fact, if you google a bit, you’ll find that both h and S are referred to as a flux density:

  1. The heat flow vector h is the heat flux density vector, from which we get the heat flux through an area through the (hn)·da = hn·da product.
  2. The energy flow is the energy flux density vector, from which we get the energy flux through the (Sn)·da = Sn·da product.

The big difference, of course, is that we get h from a simpler vector equation:

h = κT ⇔ (hxhyhz) = −κ(∂Tx/∂x, ∂Ty/∂y,∂Tz/∂x)

The vector equation for S is more complicated:

S formula

So it’s a vector product. Note that S will be zero if E = 0 and/or if B = 0. So S = 0 in electrostatics, i.e. when there are no moving charges and only steady currents. Let’s examine Feynman’s examples.

The illustration below shows the geometry of the E, B and S vectors for a light wave. It’s neat, and totally in line with what we wrote on the radiation pressure, or the momentum of light. So I’ll refer you to that post for an explanation, and to Feynman himself, of course.

light wave

OK. The situation here is rather simple. Feynman gives a few others examples that are not so simple, like that of a charging capacitor, which is depicted below.


The Poynting vector points inwards here, toward the axis. What does it mean? It means the energy isn’t actually coming down the wires, but from the space surrounding the capacitor. 

What? I know. It’s completely counter-intuitive, at first that is. You’d think it’s the charges. But it actually makes sense. The illustration below shows how we should think of it. The charges outside of the capacitor are associated with a weak, enormously spread-out field that surrounds the capacitor. So if we bring them to the capacitor, that field gets weaker, and the field between the plates gets stronger. So the field energy which is way out moves into the space between the capacitor plates indeed, and so that’s what Poynting’s vector tells us here.

capacitor 2

Hmm… Yes. You can be skeptic. You should be. But that’s how it works. The next illustration looks at a current-carrying wire itself. Let’s first look at the B and E vectors. You’re familiar with the magnetic field around a wire, so the B vector makes sense, but what about the electric field? Aren’t wires supposed to be electrically neutral? It’s a tricky question, and we handled it in our post on the relativity of fields. The positive and negative charges in a wire should cancel out, indeed, but then it’s the negative charges that move and, because of their movement, we have the relativistic effect of length contraction, so the volumes are different, and the positive and negative charge density do not cancel out: the wire appears to be charged, so we do have a mix of E and B! Let me quickly give you the formula: E = (2πε0)·(λ/r), with λ the (apparent) charge per unit length, so it’s the same formula as for a long line of charge, or for a long uniformly charged cylinder.

So we have a non-zero E and B and, hence, a non-zero Poynting vector S, whose direction is radially inward, so there is a flow of energy into the wire, all around. What the hell? Where does it go? Well… There’s a few possibilities here: the charges need kinetic energy to move, or as they increase their potential energy when moving towards the terminals of our capacitor to increase the charge on the plates or, much more mundane, the energy may be radiated out again in the form of heat. It looks crazy, but that’s how it is really. In fact, the more you think about, the more logical it all starts to sound. Energy must be conserved locally, and so it’s just field energy going in and re-appearing in some other form. So it does make sense. But, yes, it’s weird, because no one bothered to teach us this in school. :-)


The ‘craziest’ example is the one below: we’ve got a charge and a magnet here. All is at rest. Nothing is moving… Well… I’ll correct that in a moment. :-) The charge (q) causes a (static) Coulomb field, while our magnet produces the usual magnetic field, whose shape we (should) recognize: it’s the usual dipole field. So E and B are not changing. But so when we calculate our Poynting vector, we see there is a circulation of S. The E×B product is not zero. So what’s going on here?


Well… There is no net change in energy with time: the energy just circulates around and around. Everything which flows into one volume flows out again. As Feynman puts it: “It is like incompressible water flowing around.” What’s the explanation? Well… Let me copy Feynman’s explanation of this ‘craziness’:

“Perhaps it isn’t so terribly puzzling, though, when you remember that what we called a “static” magnet is really a circulating permanent current. In a permanent magnet the electrons are spinning permanently inside. So maybe a circulation of the energy outside isn’t so queer after all.”

So… Well… It looks like we do need to revise some of our ‘intuitions’ here. I’ll conclude this post by quoting Feynman on it once more:

“You no doubt get the impression that the Poynting theory at least partially violates your intuition as to where energy is located in an electromagnetic field. You might believe that you must revamp all your intuitions, and, therefore have a lot of things to study here. But it seems really not necessary. You don’t need to feel that you will be in great trouble if you forget once in a while that the energy in a wire is flowing into the wire from the outside, rather than along the wire. It seems to be only rarely of value, when using the idea of energy conservation, to notice in detail what path the energy is taking. The circulation of energy around a magnet and a charge seems, in most circumstances, to be quite unimportant. It is not a vital detail, but it is clear that our ordinary intuitions are quite wrong.”

Well… That says it all, I guess. As far as I am concerned, I feel the Poyning vector makes things actually easier to understand. Indeed, the E and B vectors were quite confusing, because we had two of them, and the magnetic field is, frankly, a weird thing. Just think about the units in which we’re measuring B: (N/C)/(m/s). can’t imagine what a unit like that could possible represent, so I must assume you can’t either. But so now we’ve got this Poynting vector that combines both E and B, and which represents the flow of the field energy. Frankly, I think that makes a lot of sense, and it’s surely much easier to visualize than E and/or B. [Having said that, of course, you should note that E and B do have their value, obviously, if only because they represent the lines of force, and so that’s something very physical too, of course. I guess it’s a matter of taste, to some extent, but so I’d tend to soften Feynman’s comments on the supposed ‘craziness’ of S.

In any case… The next thing I should discuss is field momentum. Indeed, if we’ve got flow, we’ve got momentum. But I’ll leave that for my next post. This topic can’t be exhausted in one post only, indeed. :-) So let me conclude this post. I’ll do with a very nice illustration I got from the Wikipedia article on the Poynting vector. It shows the Poynting vector around a voltage source and a resistor, as well as what’s going on in-between. [Note that the magnetic field is given by the field vector H, which is related to B as follows: B = μ0(H + M), with M the magnetization of the medium. B and H are obviously just proportional in empty space, with μ0 as the proportionality constant.]


Re-visiting relativity and four-vectors: the proper time, the tensor and the four-force

My previous post explained how four-vectors transform from one reference frame to the other. Indeed, a four-vector is not just some one-dimensional array of four numbers: it represent something—a physical vector that… Well… Transforms like a vector. :-) So what vectors are we talking about? Let’s see what we have:

  1. We knew the position four-vector already, which we’ll write as xμ = (ct, x, y, z) = (ct, x).
  2. We also proved that Aμ = (Φ, Ax, Ay, Az) = (Φ, A) is a four-vector: it’s referred to as the four-potential.
  3. We also know the momentum four-vector from the Lectures on special relativity. We write it as pμ = (E, px, py, pz) = (E, p), with E = γm0, p = γm0v, and γ = (1−v2/c2)−1/2 or, for = 1, γ = (1−v2)−1/2

To show that it’s not just a matter of adding some fourth t-component to a three-vector, Feynman gives the example of the four-velocity vector. We have v= dx/dt, v= dy/dt and v= dz/dt, but a vμ = (d(ct)/dt, dx/dt, dy/dt, dz/dt) = (c, dx/dt, dy/dt, dz/dt) ‘vector’ is, obviously, not a four-vector. [Why obviously? The inner product vμvμ  is not invariant.] In fact, Feynman ‘fixes’ the problem by noting that ct, x, y and z have the ‘right behavior’, but the d/dt operator doesn’t. The d/dt operator is not an invariant operator. So how does he fix it then? He tries the (1−v2/c2)−1/2·d/dt operator and, yes, it turns out we do get a four-vector then. In fact, we get that four-velocity vector μμ that we were looking for:


Now how do we know this is four-vector? How can we prove this one? It’s simple. We can get it from our pμ = (E, p) by dividing it by m0 = E/c2, which is an invariant scalar in four dimensions too. Now, it is easy to see that a division by an invariant scalar does not change the transformation properties. So just write it all out, and you’ll see that pμ/m0 = μμ and, hence, that μμ is a four-vector too. :-)

We’ve got an interesting thing here actually: division by an invariant scalar, or applying that (1−v2/c2)−1/2·d/dt operator, which is referred to as an invariant operator, on a four-vector will give us another four-vector. Why is that? Let’s switch to compatible time and distance units so c = 1 so to simplify the analysis that follows.

The invariant (1−v2)−1/2·d/dt operator and the proper time s

Why is the (1−v2)−1/2·d/dt operator invariant? Why does it ‘fix’ things? Well… Think about the invariant spacetime interval (Δs)= Δt− Δx− Δy− Δz2 going to the limit (ds)= dt− dx− dy− dz2 . Of course, we can and should relate this to an invariant quantity s = ∫ ds. Just like Δs, this quantity also ‘mixes’ time and distance. Now, we could try to associate some derivative d/ds with it because, as Feynman puts it, “it should be a nice four-dimensional operation because it is invariant with respect to a Lorentz transformation.” Yes. It should be. So let’s relate ds to dt and see what we get. That’s easy enough: dx = vx·dt, dy = vy·dt, dz = vz·dt, so we write:

(ds)= dt− vx2·dt− vy2·dt− vz2·dt⇔ (ds)= dt2·(1 − vx− vy− vz2) = dt2·(1 − v2)

and, therefore, ds = dt·(1−v2)1/2. So our operator d/ds is equal to (1−v2)−1/2·d/dt, and we can apply it to any four-vector, as we are sure that, as an invariant operator, it’s going to give us another four-vector. I’ll highlight the result, because it’s important:

The d/ds = (1−v2)−1/2·d/dt operator is an invariant operator for four-vectors.

For example, if we apply it to xμ = (t, x, y, z), we get the very same four-velocity vector μμ:

dxμ/ds = μμ = pμ/m0

Now, if you’re somewhat awake, you should ask yourself: what is this s, really, and what is this operator all about? Our new function s = ∫ ds is not the distance function, as it’s got both time and distance in it. Likewise, the invariant operator d/ds = (1−v2)−1/2·d/dt has both time and distance in it (the distance is implicit in the v2 factor). Still, it is referred to as the proper time along the path of a particle. Now why is that? If it’s got distance and time in it, why don’t we call it the ‘proper distance-time’ or something?

Well… The invariant quantity s actually is the time that would be measured by a clock that’s moving along, in spacetime, with the particle. Just think of it: in the reference frame of the moving particle itself, Δx, Δy and Δz must be zero, because it’s not moving in its own reference frame. So the (Δs)= Δt− Δx− Δy− Δz2 reduces to (Δs)= Δt2, and so we’re only adding time to s. Of course, this view of things implies that the proper time itself is fixed only up to some arbitrary additive constant, namely the setting of the clock at some event along the ‘world line’ of our particle, which is its path in four-dimensional spacetime. But… Well… In a way, s is the ‘genuine’ or ‘proper’ time coming with the particle’s reference frame, and so that’s why Einstein called it like that. You’ll see (later) that it plays a very important role in general relativity theory (which is a topic we haven’t discussed yet: we’ve only touched special relativity, so no gravity effects).

OK. I know this is simple and complicated at the same time: the math is (fairly) easy but, yes, it may be difficult to ‘understand’ this in some kind of intuitive way. But let’s move on.

The four-force vector fμ

We know the relativistically correct equation for the motion of some charge q. It’s just Newton’s Law F = dp/dt = d(mv)/dt but using m = p = γm0v, so we write:


How can we get a four-vector for the force? It turns out that we get it when applying our new invariant operator to the momentum four-vector pμ = (E, p), so we write: fμ = dpμ/ds. But pμ = m0μμ = m0dxμ/ds, so we can re-write this as fμ = d(m0·dxμ/ds)/ds, which gives us a formula which is reminiscent of the Newtonian F = ma equation:

force formula

What is this thing? Well… It’s not so difficult to verify that the x, y and z-components are just our old-fashioned Fx, Fy and Fz, so these are the components of F. The t-component is (1−v2)−1/2·dE/dt. Now, dE/dt is the time rate of change of energy and, hence, it’s equal to the rate of doing work on our charge, which is equal to Fv. So we can write fμ as:


The force and the tensor

We will now derive that formula which we ended the previous post with. We start with calculating the spacelike components of fμ from the Lorentz formula F = q(E + v×B). [The terminology is nice, isn’t it? The spacelike components of the four-force vector! Now that sounds impressive, doesn’t it? But so… Well… It’s really just the old stuff we know already.] So we start with fx = Fx, and write it all out:


What a monster! But, hey! We can ‘simplify’ this by substituting stuff by (1) the t-, x-, y- and z-components of the four-velocity vector μμ and (2) the components of our tensor Fμν = [Fij] = [∇iAj − ∇jAi] with i, j = t, x, y, z. We’ll also pop in the diagonal Fxx = 0 element, just to make sure it’s all there. We get:

fx 2

Looks better, doesn’t it? :-) Of course, it’s just the same, really. This is just an exercise in symbolism. Let me insert the electromagnetic tensor we defined in our previous post, just as a reminder of what that Fμν matrix actually is:

electromagnetic tensor final

If you read my previous post, this matrix – or the concept of a tensor – has no secrets for you. Let me briefly summarize it, because it’s an important result as well. The tensor is (a generalization of) the cross-product in four-dimensional space. We take two vectors: aμ = (at, ax, ay, az) and bμ = (bt, bx, by, bz) and then we take cross-products of their components just like we did in three-dimensional space, so we write Tij = aibj − ajbi. Now, it’s easy to see that this combination implies that Tij = − Tji and that Tii = 0, which is why we only have six independent numbers out of the 16 possible combinations, and which is why we’ll get a so-called anti-symmetric matrix when we organize them in a matrix. In three dimensions, the very same definition of the cross-product Tij gives us 9 combinations, and only 3 independent numbers, which is why we represented our ‘tensor’ as a vector too! In four-dimensional space we can’t do that: six things cannot be represented by a four-vector, so we need to use this matrix, which is referred to as a tensor of the second rank in four dimensions. [When you start using words like that, you’ve come a long way, really. :-)]

[…] OK. Back to our four-force. It’s easy to get a similar one-liner for fy and fz too, of course, as well as for ft. But… Yes, ft… Is it the same thing really? Let me quickly copy Feynman’s calculation for ft:


It does: remember that v×B and v are orthogonal, and so their dot product is zero indeed. So, to make a long story short, the four equations – one for each component of the four-force vector fμ – can be summarized in the following elegant equation:

motion equation

Writing this all requires a few conventions, however. For example, Fμν is a 4×4 matrix and so μν has to be written as a 1×4 vector. And the formula for the fx and ft component also make it clear that we also want to use the +−−− signature here, so the convention for the signs in the μνFμν product is the same as that for the scalar product aμbμ. So, in short, you really need to interpret what’s being written here.

A more important question, perhaps, is: what can we do with it? Well… Feynman’s evaluation of the usefulness of this formula is rather succinct: “Although it is nice to see that the equations can be written that way, this form is not particularly useful. It’s usually more convenient to solve for particle motions by using the F = q(E + v×B) = (1−v2)−1/2·d(m0v)/dt equations, and that’s what we will usually do.”

Having said that, this formula really makes good on the promise I started my previous post with: we wanted a formula, some mathematical construct, that effectively presents the electromagnetic force as one force, as one physical reality. So… Well… Here it is! :-)

Well… That’s it for today. Tomorrow we’ll talk about energy and about a very mysterious concept—the electromagnetic mass. That should be fun! So I’ll c u tomorrow! :-)

Relativistic transformations of fields and the electromagnetic tensor

We’re going to do a very interesting piece of math here. It’s going to bring a lot of things together. The key idea is to present a mathematical construct that effectively presents the electromagnetic force as one force, as one physical reality. Indeed, we’ve been saying repeatedly that electromagnetism is one phenomenon only but we’ve been writing it always as something involving two vectors—the electric field vector E and the magnetic field vector B—and, while Lorentz’ force law F = q(E + v×B) makes it clear we’re talking one force only, there’s a way of writing it all up that is much more elegant.

I have to warn you though: this post doesn’t add anything to the physics we’ve seen so far: it’s all math, really and, to a large extent, math only. So if you read this blog because you’re interested in the physics only, then you may just as well skip this post. However, the mathematical concept we’re going to present is that of the tensor and… Well… You’ll have to get to know that animal sooner or later anyway, so you may just as well give it a try right now, and see whatever you can get out of this post.

The concept of a tensor further builds on the concept of the vector, which we liked so much because it allows us to write the laws of physics as vector equations, which do not change when going from one reference frame to another. In fact, we’ll see that a tensor can be described as a ‘special’ vector cross product (to be precise, we’ll show that a tensor is a ‘more general’ cross product, really). So the tensor and vector concepts are very closely related, but then… Well… If you think about it, the concept of a vector and the concept of a scalar are closely related, too! So we’re just moving up the value chain, so to speak: from scalar fields to vector fields to… Well… Tensor fields! And in quantum mechanics, we’ll introduce spinors, and so we also have spinor fields! Having said that, don’t worry about tensor fields. Let’s first try to understand tensors tout court.  :-)

So… Well… Here we go. Let me start with it all by reminding you of the concept of a vector, and why we like to use vectors and vector equations.

The invariance of physics and the use of vector equations

What’s a vector? You may think, naively, that any one-dimensional of numbers is a vector. But no! In math, we may call any one-dimensional array of numbers a ‘vector’, perhaps, but in physics, a vector does represent something real, something physical, and so a vector is only a vector if it transforms like a vector under the transformation rules that apply when going from one another frame of reference, i.e. one coordinate system, to another. Examples of vectors in three dimensions are: the velocity vector v, or the momentum vector p = m·v, or the position vector r.

[Needless to say, the same can be said of scalars: mathematicians may define a scalar as just any real number, but it’s not in physics. A scalar in physics refers to something real, i.e. a scalar field, like the temperature (T) inside of a block of material. In fact, think about your first vector equation: it may have been the one determining the heat flow (h), i.e. h = −κ·T = (−κ·∂T/∂x, −κ·∂T/∂y, −κ·∂T/∂z). It immediately shows how scalar and vector fields are intimately related.]

Now, when discussing the relativistic framework of physics, we introduced vectors in four dimensions, i.e. four-vectors. The most basic four-vector is the spacetime four-vector R = (ct, x, y, z), which is often referred to as an event, but it’s just a point in spacetime, really. So it’s a ‘point’ with a time as well as a spatial dimension, so it also has t in it, besides x, y and z. It is also known as the position four-vector but, again, you should think of a ‘position’ that includes time! Of course, we can re-write R as R = (ct, r), with r = (x, y, z), so here we sort of ‘break up’ the four-vector in a scalar and a three-dimensional vector, which is something we’ll do from time to time, indeed. :-)

We also have a displacement four-vector, which we can write as ΔR = (c·Δt, Δr). There are other four-vectors as well, including the four-velocity, the four-momentum and the four-force four-vectors, which we’ll discuss later (in the last section of this post).

So it’s just like using three-dimensional vectors in three-dimensional physics, or ‘Newtonian’ physics, I should say: the use of four-vectors is going to allow us to write the laws of physics vector equations, but in four dimensions, rather than three, so we get the ‘Einsteinian’ physics, the real physics, so to speak—or the relativistically correct physics, I should say. And so these four-dimensional vector equations will also not change when going from one reference frame to another, and so our four-vector will be vectors indeed, i.e. they will transform like a vector under the transformation rules that apply when going from one another frame of reference, i.e. one coordinate system, to another.

What transformation? Well… In Newtonian or Galilean physics, we had translations and rotations and what have you, but what we are interested in here now, is ‘Einsteinian’ transformations of coordinate systems, so these have to ensure that all of the laws of physics that we know of, including the principle of relativity, still look the same. You’ve seen these transformation rules. We don’t call them the ‘Einsteinian’ transformation rules, but the Lorentz transformation rules, because it was a Dutch physicist (Hendrik Lorentz) who first wrote them down. So these rules are very different from the Newtonian or Galilean transformation rules which everyone assumed to be valid until the Michelson-Morley experiment unequivocally established that the speed of light did not respect the Galilean transformation rules. Very different? Well… Yes. In their mathematical structure, that is. Of course, when velocities are low, i.e. non-relativistic, then they yield the same result, approximately, that is. However, I explained that in my post on special relativity, and so I won’t dwell on that here.

Let me just jot down both sets of rules assuming that the two reference frames move with respect to each other along the x- axis only, so the y- and z-component of u is zero.


The Galilean or Newtonian rules are the simple rules on the right. Going from one reference frame to another (let’s call them S and S’ respectively) is just a matter of adding or subtracting speeds: if my car goes 100 km/h, and yours goes 120 km/h, then you will see my car falling behind at a speed of (minus) 20 km/h. That’s it. We could also rotate our reference frame, and our Newtonian vector equations would still look the same. As Feynman notes, smilingly, it’s what a lot of armchair philosophers think relativity theory is all about, but so it’s got nothing to do with it. It’s plain wrong!

In any case, back to vectors and transformations. The key to the so-called invariance of the laws of physics is the use of vectors and vector operators that transform like vectors. For example, if we defined A and B as (Ax, Ay, Az) and (Bx, By, Bz), then we knew that the so-called inner product Awould look the same in all rotated coordinate systems, so we can write: AB A’•B’. So we know that if we have a product like that on both sides of an equation, we’re fine: the equation will have the same form in all rotated coordinate systems. Also, the gradient, i.e. our vector operator  = (∂/∂x, ∂/∂y, ∂/∂z), when applied to a scalar function, gave three quantities that also transform like a vector under rotation. We also defined a vector cross product, which yielded a vector (as opposed to the inner product, i.e. the vector dot product, which yields a scalar):

cross product

So how does this thing behave under a Galilean transformation? Well… You may or may not remember that we used this cross-product to define the angular momentum L, which was a cross product of the radius vector r and the momentum vector p = mv, as illustrated below. The animation also gives the torque τ, which is, loosely speaking, a measure of the turning force: it’s the cross product of r and F, i.e. the force on the lever-arm.


The components of L are:

momentum angular

Now, we find that these three numbers, or objects if you want, transform in exactly the same way as the components of a vector. However, as Feynman points out, that’s a matter of ‘luck’ really. It’s something ‘special’. Indeed, you may or may not remember that we distinguished axial vectors from polar vectors. L is an axial vector, while r and p are polar vectors, and so we find that, in three dimensions, the cross product of two polar vectors will always yields an axial vector. Axial vectors are sometimes referred to as pseudovectors, which suggests that they are ‘not so real’ as… Well… Polar vectors, which are sometimes referred to as ‘true’ vectors. However, it doesn’t matter when doing these Newtonian or Galilean transformations: pseudo or true, both vectors transform like vectors. :-)

But so… Well… We’re actually getting a bit of a heads-up here: if we’d be mixing (or ‘crossing’) polar and axial vectors, or mixing axial vectors only, so if we’d define something involve and p (rather than r and p), or something involving and τ, then we may not be so lucky, and then we’d have to carefully examine our cross-product, or whatever other product we’d want to define, because its components may not behave like a vector.

Huh? Whatever other product we’d want to define? Why are you saying that? Well… We actually can think of other products. For example, if we have two vectors a = (ax, ay, az) and b = (bx, by, bz), then we’ll have nine possible combinations of their components, which we can write as Tij = aibj. So that’s like Lxy, Lyz and Lzx really. Now, you’ll say: “No. It isn’t. We don’t have nine combinations here. Just three numbers.” Well… Think about it: we actually do have nine Lij combinations too here, as we can write: Lij = ri·pj – rj·pi. It just happens that, with this definition, only three of these combinations Lij are independent. That’s because the other six numbers are either zero or the opposite. Indeed, it’s easy to verify that Lij = –Lji , and Lii  = 0. So… Well… It turns out that the three components of our L = r×p ‘vector’ are actually a subset of a set of nine Lij numbers. So… Well… Think about it. We cannot just do whatever we want with our ‘vectors’. We need to watch out.

In fact, I do not want to get too much ahead of myself, but I can already tell you that the matrix with these nine Tij = aibj combinations is what is referred to as the tensor. To be precise, it’s referred to as a tensor of the second rank in three dimensions. The ‘second rank’, aka as ‘degree’ or ‘order’ refers to the fact that we’ve got two indices, and the ‘three dimensions’ is because we’re using three-dimensional vectors. We’ll soon see that the electromagnetic tensor is also of the second rank, but it’s a tensor in four dimensions. In any case, I should not get ahead of myself. Just note what I am saying here: the tensor is like a ‘new’ product of two vectors, a new type of ‘cross’ product really (because we’re mixing the components, so to say), but it doesn’t yield a vector: it yields a matrix. For three-dimensional vectors, we get a 3×3 matrix. For four-vectors, we’ll get a 4×4 matrix. And so the full truth about our angular momentum vector L, is the following:

  1. There is a thing which we call the angular momentum tensor. It’s a 3×3 matrix, so it has nine elements which are defined as: Lij = ri·pj – rj·pi. Because of this definition, it’s an antisymmetric tensor of the second order in three dimensions, so it’s got only three independent components.
  2. The three independent elements are the components of our ‘vector’ L, and picking them out and calling these three components a ‘vector’ is actually a ‘trick’ that only works in three dimensions. They really just happen to transform like a vector under rotation or under whatever Galilean transformation! [By the way, do you know understand why I was saying that we can look at a tensor as a ‘more general’ cross product?]
  3. In fact, in four dimensions, we’ll use a similar definition and define 16 elements Fij as Fij = ∇iAj − ∇jAi, using the two four-vectors ∇μ and Aμ (so we have 4×4 = 16 combinations indeed), out of which only six will be independent for the very same reason: we have an antisymmetric vector combination here, Fij = −Fji and Fii = 0. :-) However, because we cannot represent six independent things by four things, we do not get some other four-vector, and so that’s why we cannot apply the same ‘trick’ in four dimensions.

However, here I am getting way ahead of myself and so… Well… Yes. Back to the main story line. :-) So let’s try to move to the next level of understanding, which is… Well…

Because of guys like Maxwell and Einstein, we now know that rotations are part of the Newtonian world, in which time and space are neatly separated, and that things are not so simple in Einstein’s world, which is the real world, as far as we know, at least! Under a Lorentz transformation, the new ‘primed’ space and time coordinates are a mixture of the ‘unprimed’ ones. Indeed, the new x’ is a mixture of x and t, and the new t’ is a mixture of x and t as well. [Yes, please scroll all the way up and have a look at the transformation on the left-hand side!]

So you don’t have that under a Galilean transformation: in the Newtonian world, space and time are neatly separated, and time is absolute, i.e. it is the same regardless of the reference frame. In Einstein’s world – our world – that’s not the case: time is relative, or local as Hendrik Lorentz termed it quite appropriately, and so it’s space-time – i.e. ‘some kind of union of space and time’ as Minkowski termed it  that transforms.

So that’s why physicists use four-vectors to keep track of things. These four-vectors always have three space-like components, but they also include one so-called time-like componentIt’s the only way to ensure that the laws of physics are unchanged when moving with uniform velocityIndeed, any true law of physics we write down must be arranged so that the invariance of physics (as a “fact of Nature”, as Feynman puts it) is built in, and so that’s why we use Lorentz transformations and four-vectors.

In the mentioned post, I gave a few examples illustrating how the Lorentz rules work. Suppose we’re looking at some spaceship that is moving at half the speed of light (i.e. 0.5c) and that, inside the spaceship, some object is also moving at half the speed of light, as measured in the reference frame of the spaceship, then we get the rather remarkable result that, from our point of view (i.e. our reference frame as observer on the ground), that object is not going as fast as light, as Newton or Galileo – and most present-day armchair philosophers :-) – would predict (0.5+ 0.5c = c). We’d see it move at a speed equal to = 0.8c. Huh? How do we know that? Well… We can derive a velocity formula from the Lorentz rules:


So now you can just put in the numbers now: vx = (0.5c + 0.5c)/(1 + 0.5·0.5) = 0.8c. See?

Let’s do another example. Suppose we’re looking at a light beam inside the spaceship, so something that’s traveling at speed c itself in the spaceship. How does that look to us? The Galilean transformation rules say its speed should be 1.5c, but that can’t be true of course, and the Lorentz rules save us once more: vx = (0.5c + c)/(1 + 0.5·1) = c, so it turns out that the speed of light does not depend on the reference frame: it looks the same – both to the man in the ship as well as to the man on the ground. As Feynman puts it: “This is good, for it is, in fact, what the Einstein theory of relativity was designed to do in the first place—so it had better work!” :-)

So let’s now apply relativity to electromagnetism. Indeed, that’s what this post is all about! However, before I do so, let me re-write the Lorentz transformation rules for = 1. We can equate the speed of light to one, indeed, when measure time and distance in equivalent units. It’s just a matter of ditching our seconds for meters (so our time unit becomes the time that light needs to travel a distance of one meter), or ditching our meters for seconds (so our distance unit becomes the distance that light travels in one second). You should be familiar with this procedure. If not, well… Check out my posts on relativity. So here’s the same set of rules for = 1:

Lorentz rules

They’re much easier to remember and work with, and so that’s good, because now we need to look at how these rules work with four-vectors and the various operations and operators we’ll be defining on them. Let’s look at that step by step.

Electrodynamics in relativistic notation

Let me copy the Universal Set of Equations and Their Solution once more:


The solution for Maxwell’s equations is given in terms of the (electric) potential Φ and the (magnetic) vector potential A. I explained that in my post on this, so I won’t repeat myself too much here either. The only point you should note is that this solution is the result of a special choice of Φ and A, which we referred to as the Lorentz gauge. We’ll touch upon this condition once more, so just make a mental note of it.

Now, E and B do not correspond to four-vectors: they depend on x, y, z and t, but they have three components only: Ex, Ey, Ez, and Bx, By, and Bz respectively. So we have six independent terms here, rather than four things that, somehow, we could combine into some four-vector. [Does this ring a bell? It should. :-)] Having said that, it turns out that we can combine Φ and A into a four-vector, which we’ll refer to as the four-potential and which we’ll will write as:

Aμ = (Φ, A) = (Φ, Ax, Ay, Az) = (At, Ax, Ay, Az) with At = Φ.

So that’s a four-vector just like R = (ct, x, y, z).

How do we know that Aμ is a four-vector? Well… Here I need to say a few things about those Lorentz transformation rules and, more importantly, about the required condition of invariance under a Lorentz transformation. So, yes, here we need to dive into the math.

Four-vectors and invariance under Lorentz transformations

When you were in high-school, you learned how to rotate your coordinate frame. You also learned that the distance of a point from the origin does not change under a rotation, so you’d write r’= x’+ y’+ z’= r= x+ y+ z2, and you’d say that r2 is an invariant quantity under a rotation. Indeed, transformations leave certain things unchanged. From the Lorentz transformation rules itself, it is easy to see that

c·t’– x’– y’–z ‘2 = c·t–x– y – z2, or,

if = 1, that t’– x’– y’– z’2 = t– x– y – z2,

is an invariant under a Lorentz transformation. We found the same for the so-called spacetime interval Δs = ΔrcΔt2, which we write as Δs = Δr– Δt2 as we chose our time or distance units such that = 1. [Note that, from now on, we’ll assume that’s the case, so = 1 everywhere. We can always change back to our old units when we’re done with the analysis.] Indeed, such invariance allowed us to define spaceliketimelike and lightlike intervals using the so-called light cone emanating from a single event and traveling in all directions.

You should note that, for four-vectors, we do not have a simple sum of three terms. Indeed, we don’t write x+ y+ z2 but t– x– y – z2. So we’ve got a +−−− thing here or, it’s just another convention, we could also work with a −+++ sum of terms. The convention is referred to as the signature, and we will use the so-called metric signature here, which is +−−−. Let’s continue the story. Now, all four-vectors aμ = (at, ax, ay, az) have this property that:

at– ax– ay– az2 = at– ax– ay – az2.

So. Yes. :-) But… Well… We can say that our four-potential vector is a four-vector, but so we still have to prove that. So we need to prove that Φ’– Ax– Ay– Az2 = Φ– Ax– Ay – Az2 for our four-potential vector Aμ = (Φ, A). So… Yes… How can we do that? The proof is not so easy, but you need to go through it as it will introduce some more concepts and ideas you need to understand.

In my post on the Lorentz gauge, I mentioned that Maxwell’s equations can be re-written in terms of Φ and A, rather than in terms of E and B. The equations are:

Equations 2

The expression look rather formidable, but don’t panic: just look at it. Of course, you need to be familiar with the operators that are being used here, so that’s the Laplacian ∇2 and the divergence operator • that’s being applied to the scalar Φ and the vector A. I can’t re-explain this. I am sorry. Just check my posts on vector analysis. You should also look at the third equation: that’s just the Lorentz gauge condition, which we introduced when deriving these equations from Maxwell’s equations. Having said that, it’s the first and second equation which describe Φ and A as a function of the charges and currents in space, and so that’s what matters here. So let’s unfold the first equation. It says the following:

potential formula

In fact, if we’d be talking free or empty space, i.e. regions where there are no charges and currents, then the right-hand side would be zero and this equation would then represent a wave equation, so some potential Φ that is changing in time and moving out at the speed c. Here again, I am sorry I can’t write about this here: you’ll need to check one of my posts on wave equations. If you don’t want to do that, you should believe me when I say that, if you see an equation like this:

f8then the function Ψ(x, t) must be some function


Now, that’s a function representing a wave traveling at speed c, i.e. the phase velocity. Always? Yes. Always! It’s got to do with the x − ct and/or x + ct  argument in the function. But, sorry, I need to move on here.

The unfolding of the equation with Φ makes it clear that we have four equations really. Indeed, the second equation is three equations: one for Ax, one for Ay, and one for Az respectively. The four quantities on the right-hand side of these equations are ρ, jx, jy and jz respectively, divided by ε0, which is a universal constant which does not change when going from one coordinate system to another. Now, the quantities ρ, jx, jy and jz transform like a four-vector. How do we know that? It’s just the charge conservation law. We used it when solving the problem of the fields around a moving wire, when we demonstrated the relativity of the electric and magnetic field. Indeed, the relevant equations were:

Lorentz j and rho

You can check that against the Lorentz transformation rules for = 1. They’re exactly the same, but so we chose t = 0, so the rules are even simpler. Hence, the (ρ, jx, jy, jz) vector is, effectively, a four-vector, and we’ll denote it by jμ = (ρ, j). I now need to explain something else. [And, yes, I know this is becoming a very long story but… Well… That’s how it is.]

It’s about our operators , ∇•, × and ∇, so that’s the gradient, the divergence, curl and Laplacian operator respectively: they all have a four-dimensional equivalent. Of course, that won’t surprise you. :-( Let me just jot all of them down, so we’re done with that, and then I’ll focus on the four-dimensional equivalent of the Laplacian  ∇•∇ = ∇, which is referred to as the D’Alembertian, and which is denoted by 2, because that’s the one we need to prove that our four-potential vector is a real four-vector. [I know: is a tiny symbol for a pretty monstrous thing, but I can’t help it: my editor tool is pretty limited.]


Now, we’re almost there. Just hang in for a little longer. It should be obvious that we can re-write those two equations with Φ, A, ρ and j, as:

Formula d'alembertian 2

Just to make sure, let me remind you that Aμ = (Φ, A) and that jμ = (ρ, j). Now, our new D’Alembertian operator is just an operator—a pretty formidable operator but, still, it’s an operator, and so it doesn’t change when the coordinate system changes, so the conclusion is that, IF jμ = (ρ, j) is a four-vector – which it is – and, therefore, transforms like a four-vector, THEN the quantities Φ, Ax, Ay, and Az must also transform like a four-vector, which means they are (the components of) a four-vector.

So… Well… Think about it, but not too long, because it’s just an intermediate result we had to prove. So that’s done. But we’re not done here. It’s just the beginning, actually. :-/ Let me repeat our intermediate result:

Aμ = (Φ, A) is a four-vector. We call it the four-potential vector.

OK. Let’s continue. Let me first draw your attention to that expression with the D’Alembertian above. Which expression? This one:

Formula d'alembertian 2

What about it? Well… You should note that the physics of that equation is just the same as Maxwell’s equations. So it’s one equation only, but it’s got it all.

It’s quite a pleasure to re-write it in such elegant form. Why? Think about it: it’s a four-vector equation: we’ve got a four-vector on the left-hand side, and a four-vector on the right-hand side. Therefore, this equation is invariant under a transformation. So, therefore, it directly shows the invariance of electrodynamics under the Lorentz transformation.

Huh? Yes. You may think about this a little longer. :-)

To wrap this up, I should also note that we can also express the gauge condition using our new four-vector notation. Indeed, we can write it as:

Lorentz condition

It’s referred to as the Lorentz condition and it is, effectively, a condition for invariance, i.e. it ensures that the four-vector equation above does stay in the form it is in for all reference frames. Note that we’re re-writing it using the four-dimensional equivalent of the divergence operator •, but so we don’t have a dot between ∇μ and Aμ. In fact, the notation is pretty confusing, and it’s easy to think we’re talking some gradient, rather than the divergence. So let me therefore highlight the meaning of both once again. It looks the same, but it’s two very different things: the gradient operates on a scalar, while the divergence operates on a (four-)vector. Also note the +−−− signature is only there for the gradient, not for the divergence!


You’ll wonder why they didn’t use some • or ∗ symbol, and the answer: I don’t know. I know it’s hard to keep inventing symbols for all these different ‘products’ – the ⊗ symbol, for example, is reserved for tensor products, which we won’t get into – but… Well… I think they could have done something here. :-(

In any case… Let’s move on. Before we do, please note that we can also re-write our conservation law for electric charge using our new four-vector notation. Indeed, you’ll remember that we wrote that conservation law as:

conservation law

Using our new four-vector operator ∇μ, we can re-write that as ∇μjμ = 0. So all of electrodynamics can be summarized in the two equations only—Maxwell’s law and the charge conservation law:


OK. We’re now ready to discuss the electromagnetic tensor. [I know… This is becoming an incredibly long and incredibly complicated piece but, if you get through it, you’ll admit it’s really worth it.]

The electromagnetic tensor

The whole analysis above was done in terms of the Φ and A potentials. It’s time to get back to our field vectors E and B. We know we can easily get them from Φ and A, using the rules we mentioned as solutions:

E and B solutions

These two equations should not look as yet another formula. They are essential, and you should be able to jot them down anytime anywhere. They should be on your kitchen door, in your toilet and above your bed. :-) For example, the second equation gives us the components of the magnetic field vector B:

B field components

Now, look at these equations. The x-component is equal to a couple of terms that involve only y– and z-components. The y-component is equal to something involving only x and z. Finally, the z-component only involves x and y. Interesting. Let’s define a ‘thing’ we’ll denote by Fzy and define as:

F definition

So now we can write: Bx = Fzy, By = Fxz, and Bz = Fxy. Now look at our equation for E. It turns out the components of E are equal to things like Fxt, Fyt and Fzt! Indeed, Fxt = ∂Ax/∂t − ∂At/∂x = Ex!

But… Well… No. :-( The sign is wrong! Ex = −∂Ax/∂t−∂At/∂x, so we need to modify our definition of Fxt. When the t-component is involved, we’ll define our ‘F-things’ as:

time f

So we’ve got a plus instead of a minus. It looks quite arbitrary but, frankly, you’ll have to admit it’s sort of consistent with our +−−− signature for our four-vectors and, in just a minute, you’ll see it’s fully consistent with our definition of the four-dimensional vector operator ∇μ = (∂/∂t, −∂/∂x, −∂/∂y, −∂/∂z). So… Well… Let’s go along with it.

What about the Fxx, Fyy, Fzz and Ftt terms? Well… Fxx = ∂Ax/∂x − ∂Ax/∂x = 0, and it’s easy to see that Fyy and Fzz are zero too. But Ftt? Well… It’s a bit tricky but, applying our definitions carefully, we see that Ftt must be zero too. In any case, the Ftt = 0 will become obvious as we will be arranging these ‘F-things’ in a matrix, which is what we’ll do now. [Again: does this ring a bell? If not, it should. :-)]

Indeed, we’ve got sixteen possible combinations here, which Feynman denotes as Fμν, which is somewhat confusing, because Fμν usually denotes the 4×4 matrix representing all of these combinations. So let me use the subscripts i and j instead, and define Fij as:

Fij = ∇iAj − ∇jAi

with ∇i being the t-, x-, y- or z-component of ∇μ = (∂/∂t, −∂/∂x, −∂/∂y, −∂/∂z) and, likewise, Ai being the t-, x-, y- or z-component of Aμ = (Φ, Ax, Ay, Az). Just check it: Fzy = −∂Ay/∂z + ∂Az/∂y = ∂Az/∂y − ∂Ay/∂z = Bx, for example, and Fxt = −∂Φ/∂x − ∂Ax/∂t = Ex. So the +−−− convention works. [Also note that it’s easier now to see that Ftt = ∂Φ/∂t − ∂Φ/∂t = 0.]

We can now arrange the Fij in a matrix. This matrix is antisymmetric, because Fij = – Fji, and its diagonal elements are zero. [For those of you who love math: note that the diagonal elements of an antisymmetric matrix are always zero because of the Fij = – Fji constraint: just use k = i = j in the constraint.]

Now that matrix is referred to as the electromagnetic tensor and it’s depicted below (we plugged back in, remember that B’s magnitude is 1/c times E’s magnitude).

electromagnetic tensor final

So… Well… Great ! We’re done! Well… Not quite. :-)

We can get this matrix in a number of ways. The least complicated way is, of course, just to calculate all Fij components and them put them in a [Fij] matrix using the as the row number and the as the column number. You need to watch out with the conventions though, and so i and j start on t and end on z. :-)

The other way to do it is to write the ∇μ = (∂/∂t, −∂/∂x, −∂/∂y, −∂/∂z) operator as a 4×1 column vector, which you then multiply with the four-vector Aμ written as a 4×1 row vector. So ∇μAμ is then a 4×4 matrix, which we combine with its transpose, i.e. (∇μAμ)T, as shown below. So what’s written below is (∇μAμ) − (∇μAμ)T.


If you google, you’ll see there’s more than one way to go about it, so I’d recommend you just go through the motions and double-check the whole thing yourself—and please do let me know if you find any mistake! In fact, the Wikipedia article on the electromagnetic tensor denotes the matrix above as Fμν, rather than as Fμν, which is the same tensor but in its so-called covariant form, but so I’ll refer you to that article as I don’t want to make things even more complicated here! As said, there’s different conventions around here, and so you need to double-check what is what really. :-)

Where are we heading with all of this? The next thing is to look at the Lorentz transformation of these Fij = ∇iAj − ∇jAcomponents, because then we know how our E and B fields transform. Before we do so, however, we should note the more general results and definitions which we obtained here:

1. The Fμν matrix (a matrix is just a multi-dimensional array, of course) is a so-called tensor. It’s a tensor of the second rank, because it has two indices in it. We think of it as a very special ‘product’ of two vectors, not unlike the vector cross product a × b, whose components were also defined by a similar combination of the components of a and b. Indeed, we wrote:

cross product

So one should think of a tensor as “another kind of cross product” or, preferably, and as Feynman puts it, as a “generalization of the cross product”.

2. In this case, the four-vectors are ∇μ = (∂/∂t, −∂/∂x, −∂/∂y, −∂/∂z) and Aμ = (Φ, Ax, Ay, Az). Now, you will probably say that ∇μ is an operator, not a vector, and you are right. However, we know that ∇μ behaves like a vector, and so this is just a special case. The point is: because the tensor is based on four-vectors, the Fμν tensor is referred to as a tensor of the second rank in four dimensions. In addition, because of the Fij = – Fji result, Fμν is an asymmetric tensor of the second rank in four dimensions.

3. Now, the whole point is to examine how tensors transform. We know that the vector dot product, aka the inner product, remains invariant under a Lorentz transformation, both in three as well as in four dimensions, but what about the vector cross product, and what about the tensor? That’s what we’ll be looking at now.

The Lorentz transformation of the electric and magnetic fields

Cross products are complicated, and tensors will be complicated too. Let’s recall our example in three dimensions, i.e. the angular momentum vector L, which was a cross product of the radius vector r and the momentum vector p = mv, as illustrated below (the animation also gives the torque τ, which is, loosely speaking, a measure of the turning force).


The components of L are:

momentum angular

Now, this particular definition ensures that Lij turns out to be an antisymmetric object:


So it’s a similar situation here. We have nine possible combinations, but only three independent numbers. So it’s a bit like our tensor in four dimensions: 16 combinations, but only 6 independent numbers.

Now, it so happens that that these three numbers, or objects if you want, transform in exactly the same way as the components of a vector. However, as Feynman points out, that’s a matter of ‘luck’ really. In fact, Feynman points out that, when we have two vectors a = (ax, ay, az) and b = (bx, by, bz), we’ll have nine products Tij = aibj which will also form a tensor of the second rank (cf. the two indices) but which, in general, will not obey the transformation rules we got for the angular momentum tensor, which happened to be an antisymmetric tensor of the second rank in three dimensions.

To make a long story short, it’s not simple in general, and surely not here: with E and B, we’ve got six independent terms, and so we cannot represent six things by four things, so the transformation rules for E and B will differ from those for a four-vector. So what are they then?

Well… Feynman first works out the rules for the general antisymmetric vector combination Gij = aib− ajbi, with aand bj the t-, x-, y- or z-component of the four-vectors aμ = (at, ax, ay, az) and bμ = (bt, bx, by, bz) respectively. The idea is to first get some general rules, and then replace Gij = aib− ajbi by Fij = ∇iAj − ∇jAi, of course! So let’s apply the Lorentz rules, which – let me remind you – are the following ones:

Lorentz rules

So we get:

set 1

The rest is all very tedious: you just need to plug these things into the various Gij = aib− ajbi formulas. For example, for G’tx, we get:


Hey! That’s just G’tx, so we find that G’tx = Gtx! What about the rest? Well… That yields something different. Let me shorten the story by simply copying Feynman here:


So… Done!

So what?

Well… Now we just substitute. In fact, there are two alternative formulations of the Lorentz transformations of E and B. They are given below (note the units are such that c = 1):

result 1 result 2

In addition, there is a third equivalent formulation which is more practical, and also simpler, even if it puts the c‘s back in. It re-defines the field components, distinguishing only two:

  1. The ‘parallel’ components E|| and B|| along the x-direction ( because they are parallel to the relative velocity of the S and S’ reference frames), and
  2. The ‘perpendicular’ or ‘total transverse’ components E and B, which are the vector sums of the y- and z-components.

So that gives us four equations only:

result 3

And, yes, we are done now. This is the Lorentz transformation of the fields. I am sure it has left you totally exhausted. Well… If not… […] It sure left me totally exhausted. :-)

To lighten things up, let me insert an image of how the transformed field E actually looks like. The first image is the reference frame of a charge itself: we have a simple Coulomb field. The second image shows the charge flying by. Its electric field is ‘squashed up’. To be precise, it’s just like the scale of is squashed up by a factor ((1−v2/c2)1/2. Let me refer you to Feynman for the detail of the calculations here.


OK. So that’s it. You may wonder: what about that promise I made? Indeed, when I started this post, I said I’d present a mathematical construct that presents the electromagnetic force as one force only, as one physical reality, but so we’re back writing all of it in terms of two vectors—the electric field vector E and the magnetic field vector B. Well… What can I say? I did present the mathematical construct: it’s the electromagnetic tensor. So it’s that antisymmetric matrix really, which one can combine with a transformation matrix embodying the Lorentz transformation rules. So, I did what I promised to do. But you’re right: I am re-presenting stuff in the old style once again.

The second objection that you may have—in fact, that you should have, is that all of this has been rather tedious. And you’re right. The whole thing just re-emphasizes the value of using the four-potential vector. It’s obviously much easier to take that vector from one reference frame to another – so we just apply the Lorentz transformation rules to Aμ = (Φ, A) and get Aμ‘ = (Φ’, A’) from it – and then calculate E’ and B’ from it, rather than trying to remember those equations above. However, that’s not the point, or…

Well… It is and it isn’t. We wanted to get away from those two vectors E and B, and show that electromagnetism is really one phenomenon only, and so that’s where the concept of the electromagnetic tensor came in. There were two objectives here: the first objective was to introduce you to the concept of tensors, which we’ll need in the future. The second objective was to show you that, while Lorentz’ force law – F = q(E + v×B) makes it clear we’re talking one force only, there is a way of writing it all up that is much more elegant.

I’ve introduced the concept of tensors here, so the first objective should have been achieved. As for the second objective, I’ll discuss that in my next post, in which I’ll introduce the four-velocity vector μμ as well as the four-force vector fμ. It will explain the following beautiful equation of motion:

motion equation

Now that looks very elegant and unified, doesn’t it? :-)

[…] Hmm… No reaction. I know… You’re tired now, and you’re thinking: yet another way of representing the same thing? Well… Yes! So…

OK… Enough for today. Let’s follow up tomorrow.

Electric circuits (2): Kirchoff’s rules, and the energy in a circuit

My previous post was long and tedious, and all it did was presenting the three (passive) circuit elements as well as the concept of impedance. It show the inner workings of these little devices are actually quite complicated. Fortunately, the conclusions were very neat and short: for all circuit elements, we have a very simple relationship between (a) the voltage across the terminals of the element (V) and (b) the current that’s going through the circuit element (I). We found they are always in some ratio, which is referred to as the impedance, which we denoted by Z:

ZV/I ⇔ V = I∗Z

So it’s a ‘simple’ ratio, indeed. But… Well… Simple and not simple. It’s a ratio of two complex numbers and, therefore, it’s a complex number itself. That’s why I use the ∗ symbol when re-writing the ZV/I formula as V = I∗Z, so it’s clear we’re talking a product of two complex numbers). This ‘complexity’ is best understood by thinking of the voltage and the current as phase vectors (or phasors as engineers call them). Indeed, instead of using the sinusoidal functions we are used to, so that’s

  • V = V0·cos(ωt + θV),
  • I = I0·cos(ωt + θI), and
  • Z = Z0·cos(ωt + θ) = (V0/I0)·cos(ωt + θV − θI),

we preferred the complex or vector notation, writing:

  • = |V|ei(ωt + θV) = V0·ei(ωt + θV)
  • = |I|ei(ωt + θI= I0·ei(ωt + θI)
  • Z = |ZIei(ωt + θ= Z0·ei(ωt + θ= (V0/I0ei(ωt + θV − θI)

For the three circuit elements, we found the following solution for in terms of the previously defined properties of the respective circuit elements, i.e. their resistance (R), capacitance (C), and inductance (L) respectively:

  1. For a resistor, we have Z(resistor) = Z= R
  2. For an capacitor, we have Z(capacitor) = Z= 1/iωC = –i/(ωC)
  3. For an inductor, we have Z(inductance) = ZL= iωL

We also explained what these formulas meant, using graphs like the ones below:

  1. The graph on the left-hand side gives you the ratio of the peak voltage and peak current for the three devices as a function of C, L, R and ω respectively.
  2. The graph on the right-hand side shows you the relationship between the phase of the voltage and the current for a capacitor and for an inductor. [For a resistor, the phases are the same, so no need for a graph. Also note that the lag of the phase of the current vis-á-vis the voltage phase is 90 degrees for an inductor, while it’s 270 degrees for a capacitor (which amounts to the current leading the voltage with a 90° phase difference).]


The inner workings of our circuit elements are all wonderful and mysterious, and so we spent a lot of time writing about them. That’s finished now. The summary about describes all of them in very simple terms, relating the voltage and current phasor through the concept of impedance, which is just a ratio—albeit a complex ratio.

As the graphs above suggest, we can build all kinds of crazy circuits now, and the idea of resonance as we’ve learned it when studying the behavior of waves will be particularly relevant when discussing circuits that are designed to filter certain frequencies or, the opposite, to amplify some. We won’t go that far in this post, however, as I just want to explain the basic rules one needs to know when looking at a circuit, i.e. Kirchoff’s circuit laws. There are two of them:

1. Kirchoff’s Voltage Law (KCL): The sum of the voltage drops around any closed path is zero.

The principle is illustrated below. It doesn’t matter whether or not we have other circuits feeding into this one: Kirchoff’s Voltage Law (KCL) remains valid.


We can write this law using the concept of circulation once again or, what you’ll probably like more, just using plain summation: integral


2. Kirchoff’s Current Law (KCL): The sum of the currents into any node is zero.

This law is written and illustrated as follows:


KCL illus


This Law requires some definition of a node, of course. Feynman defines a node as any set of terminals such as a, b, c, d in the illustration above which are connected. So it’s a set of connected terminals.

Now, I’ll refer you Feynman for some practical examples. The circuit below is one of them. It looks complicated but it all boils down to solving a set of linear equations. So… Well… That’s it, really. We’re done! We should do the exercises, of course, but then we’re too lazy for that, I guess. :-) So we’re done!


Well… Almost. I also need to mention how one can reduce complicated circuits by combining parallel impedances, using the following formula:

parallel Z

And then another powerful idea is the idea of equivalent circuits. The rules for this are as follows:

  1. Any two-terminal network of passive elements is equivalent to (and, hence, can be replaced by) an effective impedance (Zeff).
  2. Any two-terminal network of passive elements is equivalent to (and, hence, can be replaced by) a generator in series with an impedance.

These two principles are illustrated below: (a) is equivalent to (b) in each diagram.


The related formulas are:

  1. IƐ/Zeff
  2. VƐeff InZeff

Last but not least, I need to say something about the energy in circuits. As we noted in our previous post, the impedance will consist of a real and an imaginary part. We write:

Z = R + i·X

This gives rise to the following powerful equivalence: any impedance is equivalent to a series combination of a pure resistance and a pure reactance, as illustrated below (the ≡ sign stands for equivalence):


Of course, because this post risks becoming too short :-) I need to insert some more formulas now. If Z = R + i·X is the impedance of the whole circuit, then the whole circuit can be summarized in the following equation:

Ɛ = IZ = I∗(R + i·X)

Now, if we bring the analysis back to the real parts of this equation, then we may write our current as I = I0·cos(ωt). This implies we chose a t = 0 point so θI = 0. [Note that this is somewhat different than what we usually do: we usually chose our t = 0 point such that θV = 0, but it doesn’t matter.] The real emf is then going to be the real part of Ɛ = IZ = I∗(R + i·X), so we’ll write it as Ɛ (no bold-face), and it’s going to be the real part of that expression above, which we can also write as:

Ɛ = IZ = I0·ei(ωt) ∗(R + i·X)

So Ɛ is the real part of this Ɛ and, you should check, it’s going to be equal to:

Ɛ = I0·R·cos(ωt) − I0·X·sin(ωt)

The two terms in this equation represent the voltage drops across the resistance R and the reactance X in that illustration above. […] Now that I think of it, in line with the -or and -ance convention for circuit elements and their properties, should we, perhaps, say resistor and reactor in this case? :-) […] OK. That’s a bad joke. [I don’t seem to have good ones, isn’t it?] :-)

Jokes aside, we see that the voltage drop across the resistance is in phase with the current (because it’s a simple cosine function of ωt as well), while the voltage drop across the purely reactive part is out of phase with the current (as you know, the sine and cosine are the same function, but with a phase difference of π/2 indeed).

You’ll wonder where are we going with this, so let me wrap it all up. You know the power is the emf times the current, and so let’s integrate this thing over one cycle to get the average rate (and then I mean a time rate of change) of the energy that gets lost in the circuit. So we need to solve the following integral:

integral 2

This may look like a monster, but if you look back at your notes from your math classes, you should be able to figure it out:

  1. The first integral is (1/2)I02·R..
  2. The second integral is zero.

So what? Well… Look at it! It means that the (average) energy loss in a circuit with impedance Z = R + i·X only depends on the real part of Z, which is equal to I02·R/2. That’s, of course, how we want it to be: ideal inductances and capacitors store energy when being powered, and give whatever they stored when ‘charging’ back to the circuit when the current reverses direction.

So it’s a nice result, because it’s consistent with everything. Hmm… Let’s double-check though… Is it also consistent with the power equation for a resistor which, remember, is written as: P = V·I = I·R·I = I2·R. […] Well… What about the 1/2 factor?

Well… Think about it. I is a sine or a cosine here, and so we want the average value of its square, so that’s 〈cos2(ωt)〉 = 1/2.

Done!  :-)

Electric circuits (1): the circuit elements

OK. No escape. It’s part of physics. I am not going to go into the nitty-gritty of it all (because this is a blog about physics, not about engineering) but it’s good to review the basics, which are, essentially, Kirchoff’s rules. Just for the record, Gustav Kirchhoff was a German genius who formulated these circuit laws while he was still a student, when he was like 20 years old or so. He did it as a seminar exercise 170 years ago, and then turned it into doctoral dissertation. Makes me think of that Dire Straits song—That’s the way you do it—Them guys ain’t dumb. :-)

So this post is, in essence, just an ‘explanation’ of Feynman’s presentation of Kirchoff’s rules, so I am writing this post basically for myself, so as to ensure I am not missing anything. To be frank, Feynman’s use of notation when working with complex numbers is confusing at times and so, yes, I’ll do some ‘re-writing’ here. The nice thing about Feynman’s presentation of electrical circuits is that he sticks to Maxwell’s Laws when describing all ideal circuit elements, so he keeps using line integrals of the electric field E around closed paths (that’s what a circuit is, indeed) to describe the so-called passive circuit elements, and he also recapitulates the idea of the electromotive force when discussing the so-called active circuit element, so that’s the generator. That’s nice, because it links it all with what we’ve learned so far, i.e. the fundamentals as expressed in Maxwell’s set of equations. Having said that, I won’t make that link here in this post, because I feel it makes the whole approach rather heavy.

OK. Let’s go for it. Let’s first recall the concept of impedance.

The impedance concept

There are three ideal (passive) circuit elements: the resistor, the capacitor and the inductor. Real circuit elements usually combine characteristics of all of them, even if they are designed to work like ideal circuit elements. Collectively, these ideal (passive) circuit elements are referred to as impedances, because… Well… Because they have some impedance. In fact, you should note that, if we reserve the terms ending with -ance for the property of the circuit elements, and those ending on -or for the objects themselves, then we should call them impedors. However, that term does not seem to have caught on.

You already know what impedance is. I explained it before, notably in my post on the intricacies related to self- and mutual inductance. Impedance basically extends the concept of resistance, as we know it from direct current (DC) circuits, to alternating current (AC) circuits. To put it simply, when AC currents are involved – so when the flow of charge periodically changes reverses direction – then it’s likely that, because of the properties of the circuit, the current signal will lag the voltage signal, and so we’ll have some phase difference telling us by how much. So, resistance is just a simple real number R – it’s the ratio between (1) the voltage that is being applied across the resistor and (2) the current through it, so we write R = V/I – and it’s got a magnitude only, but impedance is a ‘number’ that has both a magnitude as well as phase, so it’s a complex number, or a vector.

In engineering, such ‘numbers’ with a magnitude as well as a phase are referred to as phasors. A phasor represents voltages, currents and impedances as a phase vector (note the bold italics: they explain how we got the pha-sor term). It’s just a rotating vector really. So a phasor has a varying magnitude (A) and phase (φ) , which is determined by (1) some maximum magnitude A0, (2) some angular frequency ω and (3) some initial phase (θ). So we can write the amplitude A as:

A = A(φ) = A0·cos(φ) = A0·cos(ωt + θ)

As usual, Wikipedia has a nice animation for it:


In case you wonder why I am using a cosine rather than a sine function, the answer is that it doesn’t matter: the sine and the cosine are the same function except for a π/2 phase difference: just rotate the animation above by 90 degrees, or think about the formula: sinφ = cos(φ−π/2). :-)

So A = A0·cos(ωt + θ) is the amplitude. It could be the voltage, or the current, or whatever real variable. The phase vector itself is represented by a complex number, i.e. a two-dimensional number, so to speak, which we can write as all of the following:

A = A0·eiφ = A0·cosφ + i·A0·sinφ = A0·cos(ωt+θ) + i·A0·sin(ωt+θ)

= A0·ei(ωt+θ) = A0·eiθ·eiωt = A0·eiωt with A= A0·eiθ

That’s just Euler’s formula, and I am afraid I have to refer you to my page on the essentials if you don’t get this. I know what you are thinking: why do we need the vector notation? Why can’t we just be happy with the A = A0·cos(ωt+θ) formula? The truthful answer is: it’s just to simplify calculations: it’s easier to work with exponentials than with cosines or sines. For example, writing ei(ωt + θ) = eiθ·eiωt is easier than writing cos(ωt + θ) = … […] Well? […] Hmm… :-)

See! You’re stuck already. You’d have to use the cos(α+β) = cosα·cosβ − sinα·sinβ formula: you’d get the same results (just do it for the simple calculation of the impedance below) but it takes a lot more time, and it’s easier to make mistake. Having said why complex number notation is great, I also need to warn you. There are a few things you have to watch out for. One of these things is notation. The other is the kind of mathematical operations we can do: it’s usually alright but we need to watch out with the i2 = –1 thing when multiplying complex numbers. However, I won’t talk about that here because it would only confuse you even more. :-)

Just for the notation, let me note that Feynman would write Aas A0 with the little hat or caret symbol (∧) on top of it, so as to indicate the complex coefficient is not a variable. So he writes Aas Â0 = A0·eiθ. However, I find that confusing and, hence, I prefer using bold-type for any complex number, variable or not. The disadvantage is that we need to remember that the coefficient in front of the exponential is not a variable: it’s a complex number alright, but not a variable. Indeed, do look at that A= A0·eiθ equality carefully: Ais a specific complex number that captures the initial phase θ. So it’s not the magnitude of the phasor itself, i.e. |A| = A0. In fact, magnitude, amplitude, phase… We’re using a lot confusing terminology here, and so that’s why you need to ‘get’ the math.

The impedance is not a variable either. It’s some constant. Having said that, this constant will depend on the angular frequency ω. So… Well… Just think about this as you continue to read. :-) So the impedance is some number, just like resistance, but it’s a complex number. We’ll denote it by Z and, using Euler’s formula once again, we’ll write it as:

Z = |Z|eiθ = V/I = |V|ei(ωt + θV)/|I|ei(ωt + θI= [|V|/|I|]·ei(θ− θI)

So, as you can see, it is, literally, some complex ratio, just like R = V/I was some real ratio: it is a complex ratio because it has a magnitude and a direction, obviously. Also please do note that, as I mentioned already, the impedance is, in general, some function of the frequency ω, as evidenced by the ωt term in the exponential, but so we’re not looking at ω as a variable: V and I are variables and, as such, they depend on ω, but so you should look at ω as some parameter. I know I should, perhaps, not be so explicit on what’s going on, but I want to make sure you understand.

So what’s going on? The illustration below (credit goes to Wikipedia, once again) explains. It’s a pretty generic view of a very simple AC circuit. So we don’t care what the impedance is: it might be an inductor or a capacitor, or a combination of both, but we don’t care: we just call it an impedance, or an impedor if you want. :-) The point is: if we apply an alternating current, then the current and the voltage will both go up and down, but the current signal will lag the voltage signal, and some phase factor θ tells us by how much, so θ will be the phase difference.


Now, we’re dividing one complex number by another in that Z = V/I formula above, and dividing one complex number by another is not all that straightforward, so let me re-write that formula for Z above as:

V = IZ = I∗|Z|eiθ

Now, while that V = IZ formula resembles the V = I·R formula, you should note the bold-face type for V and I, and the ∗ symbol I am using here for multiplication. The bold-face for V and I implies they’re vectors, or complex numbers. As for the ∗ symbol, that’s to make it clear we’re not talking a vector cross product A×B here, but a product of two complex numbers. [It’s obviously not a vector dot product either, because a vector dot product yields a real number, not some other vector.]

Now we write V and I as you’d expect us to write them:

  • = |V|ei(ωt + θV) = V0·ei(ωt + θV)
  • = |I|ei(ωt + θI= I0·ei(ωt + θI)

θV and θare, obviously, the so-called initial phase of the voltage and the current respectively. These ‘initial’ phases are not independent: we’re talking a phase difference really, between the voltage and the current signal, and it’s determined by the properties of the circuit. In fact, that’s the whole point here: the impedance is a property of the circuit and determines how the current signal varies as a function of the voltage signal. In fact, we’ll often choose the t = 0 point such that θand so then we need to find θI. […] OK. Let’s get on with it. Writing out all of the factors in the V = IZ = I∗|Z|eiθ equation yields:

= |V|ei(ωt + θV) IZ = |I|ei(ωt + θI)∗|Z|eiθ = |I||Z|ei(ωt + θ+ θ) 

Now, this equation must hold for all values of t, so we can equate the magnitudes and phases and, hence, the following equalities must hold:

  1. |V| = |I||Z| ⇔ |Z| = |V|/|I|
  2. ωt + θV =  ωt + θθ ⇔ θ = θV − θI


Of course, you’ll complain once again about those complex numbers: voltage and current are something real, isn’t it? And so what is really about this complex numbers? Well… I can just say what I said already. You’re right. I’ve used the complex notation only to simplify the calculus, so it’s only the real part of those complex-valued functions that counts.

OK. We’re done with impedance. We can now discuss the impedors, including resistors (for which we won’t have such lag or phase difference, but the concept of impedance applies nevertheless).

Before I start, however, you should think about what I’ve done above: I explained the concept of impedance, but I didn’t do much with it. The real-life problem will usually be that you get the voltage as a function of time, and then you’ll have to calculate the impedance of a circuit and, then, the current as a function of time. So I just showed the fundamental relations but, in real life, you won’t know what θ and θI could possibly be. Well… Let me correct that statement: we’ll give you formulas for θ as we discuss the various circuit elements and their impedance below, and so then you can use these formulas to calculate θI. :-)


Let’s start with what seems to be the easiest thing: a resistor. A real resistor is actually not easy to understand, because it requires us to understand the properties of real materials. Indeed, it may or may not surprise you, but the linear relation between the voltage and the current for real materials is only approximate. Also, the way resistors dissipate energy is not easy to understand. Indeed, unlike inductors and capacitors, i.e. the other two passive components of an electrical circuit, a resistor does not store but dissipates energy, as shown below.


It’s a nice animation (credit for it has to go to Wikipedia once more), as it shows how energy is being used in an electric circuit. Note that the little moving pluses are in line with the convention that a current is defined as the movement of positive charges, so we write I = dQ/dt instead of I = −dQ/dt. That also explains the direction of the field line E, which has been added to show that the charges move with the field that is being generated by the power source (which is not shown here). So, what we have here is that, on one side of the circuit, some generator or voltage source will create an emf pushing the charges, and so the animation shows how some load – i.e. the resistor in this case – will consume their energy, so they lose their push (as shown by the change in color from yellow to black). So power, per unit time, is supplied, and is then consumed.

To increase the current in the circuit above, you need to increase the voltage, but increasing both amounts to increasing the power that’s being consumed in the circuit. Electric power is voltage times current, so P = V·I (or v·i, if I use the small letters that are used in the two animations below). Now, Ohm’s Law (I = V/R) says that, if we’d want to double the current, we’d need to double the voltage, and so we’re quadrupling the power then: P2 = V2·I= (2·V1)·(2·I1) = 4·V1·I= 22·P1. So we have a square-cube law for the power, which we get by substituting V for R·I or by substituting I for V/R, so we can write the power P as P = V2/R = I2·R. This square-cube law says exactly the same: if you want to double the voltage or the current, you’ll actually have to double both and, hence, you’ll quadruple the power.

But back to the impedance: Ohm’s Law is the Z = V/I law for resistors, but we can simplify it because we know the voltage across the resistor and the current that’s going through are in phase. Hence, θV and θare identical and, therefore, the θ = θθin Z = |Z|eiθ is equal to zero and, hence, Z = |Z|. Now, |Z| = |V|/|I| = V0/I0. So the impedance is just some real number R = V0/I0, which we can also write as:

R = V0/I= (V0·ei(ωt + α))/(I0·ei(ωt + α)) = V(t)/I(t), with α = θV = θI

The equation above goes from R = V0/Ito R = V(t)/I(t) = V/I. It’s note the same thing: the second equation says that, at any point in time, the voltage and the current will be proportional to each other, with R or its reciprocal as the proportionality constant. In any case, we have our formula for Z here:

Z = R = V/I = V0/I0

So that’s simple. Before we move to the next, let me note that the resistance of a real resistor may depend on its temperature, so in real-life applications one will want to keep its temperature as stable as possible. That’s why real-life resistors have power ratings and recommended operating temperatures. The image below illustrates how so-called heat-sink resistors can be mounted on a heat sink with a simple spring clip so as to ensure the dissipated heat is transported away. These heat-sink resistors are rather small (10 by 15 mm only) but are rated for 35 watt – so that’s quite a lot for such small thing – if correctly mounted.


As mentioned, the linear relation between the voltage and the current is only approximate, and the observed relation is also there only for frequencies that are not ‘too high’ because, if the frequency becomes very high, the free electrons will start radiating energy away, as they produce electromagnetic radiation. So one always needs to look at the tolerances of real-life resistors, which may be ± 5%, ± 10%, or whatever. In any case… On to the next.

Capacitors (condensers)

We talked at length about capacitors (aka condensers) in our post explaining capacitance or, the more widely used term, capacity: the capacity of a capacitor is the observed proportionality between (1) the voltage (V) across and (2) the charge (Q) on the capacitor, so we wrote it as:

C = Q/V

Now, it’s easy to confuse the C here with the C for coulomb, which I’ll also use in a moment, and so… Well… Just don’t! :-) The meaning of the symbol is usually obvious from the context.

As for the explanation of this relation, it’s quite simple: a capacitor consists of two separate conductors in space, with positive charge on one, and an equal and opposite (i.e. negative) charge on the other. Now, the logic of the superposition of fields implies that, if we double the charges, we will also double the fields, and so the work one needs to do to carry a unit charge from one conductor to the other is also doubled! So that’s why the potential difference between the conductors is proportional to the charge.

The C = Q/V formula actually measures the ability of the capacitor to store electric charge and, therefore, to store energy, so that’s why the term capacity is really quite appropriate. I’ll let you google a few illustrations like the one below, that shows how a capacitor is actually being charged in a circuit. Usually, some resistance will be there in the circuit, so as to limit the current when it’s connected to the voltage source and, therefore, as you can see, the R times C factor (R·C) determines how fast or how slow the capacitor charges and/or discharges. Also note that the current is equal to the time rate of change of the charge: I = dQ/dt.


In the above-mentioned post, we also give a few formulas for the capacity of specific types of condensers. For example, for a parallel-plate condenser, the formula was C =  ε0A/d. We also mentioned its unit, which is is coulomb/volt, obviously, but – in honor of Michael Faraday, who gave us Faraday’s Law, and many other interesting formulas – it’s referred to as the farad: 1 F = 1 C/V. The C here is coulomb, of course. Sorry we have to use C to denote two different things but, as I mentioned, the meaning of the symbol is usually clear from the context.

We also talked about how dielectrics actually work in that post, but we did not talk about the impedance of a capacitor, so let’s do that now. The calculation is pretty straightforward. Its interpretation somewhat less so. But… Well… Let’s go for it.

It’s the current that’s charging the condenser (sorry I keep using both terms interchangeably), and we know that the current is the time rate of change of the charge (I = dQ/dt). Now, you’ll remember that, in general, we’d write a phasor A as A = A0·eiωt with A= A0·eiθ, so Ais a complex coefficient incorporating the initial phase, which we wrote as θand θfor the voltage and for the current respectively. So we’ll represent the voltage and the current now using that notation, so we write: V = V0·eiωt and I = I0·eiωt. So let’s now use that C = Q/V by re-writing it as Q = C·V and, because C is some constant, we can write:

I = dQ/dt = d(C·V)/dt = C·dV/dt

Now, what’s dV/dt? Oh… You’ll say: V is the magnitude of V, so it’s equal to |V| = |V0·eiωt| = |V0|·|eiωt| = |V0| = |V0·eiθ| = |V0|·|eiθ| = |V0| = V0. So… Well… What? V0 is some constant here! It’s the maximum amplitude of V, so… Well… It’s time derivative is zero: dV0/dt = 0.

Yes. Indeed. We did something very wrong here! You really need to watch out with this complex-number notation, and you need to think about what you’re doing. V is not the magnitude of V but its (varying) amplitude. So it’s the real voltage V that varies with time: it’s equal to V0·cos(ωt + θV), which is the real part of our phasor V. Huh? Yes. Just hang in for a while. I know it’s difficult and, frankly, Feynman doesn’t help us very much here. Let’s take one step back and so – you will see why I am doing this in a moment – let’s calculate the time derivative of our phasor V, instead of the time derivative of our real voltage V. So we calculate dV/dt, which is equal to:

dV/dtd(V0·eiωt)/dt = V0·d(eiωt)/dt = V0·(iω)·eiωt = iω·V0·eiωt = iω·V

Remarkable result, isn’t it? We take the time derivative of our phasor, and the result is the phasor itself multiplied with iω. Well… Yes. It’s a general property of exponentials, but still… Remarkable indeed! We’d get the same with I, but we don’t need that for the moment. What we do need to do is go from our I = C·dV/dt relation, which connects the real parts of I and V one to another, to the I = C·dV/dt relation, which relates the (complex) phasors. So we write:

 I = C·dV/dt ⇔ I = C·dV/dt

Can we do that? Just like that? We just replace I and V by I and V? Yes, we can. Why? Well… We know that I is the real part of I and so we can write I = Re(I)+ Im(Ii = I + Im(Ii, and then we can write the right-hand side of the equation as C·dV/dt = Re(C·dV/dt)+ Im(C·dV/dt)·i. Now, two complex numbers are equal if, and only if, their real and imaginary parts are the same, so… Well… Write it all out, if you want, using Euler’s formula, and you’ll see it all makes sense indeed.

So what do we get? The I = C·dV/dt gives us:

I = C·dV/dt = C·(iω)·V

That implies that I/V = C·(iω) and, hence, we get – finally! – what we need to get:

Z = V/I = 1/(iωC)

This is a grand result and, while I am sorry I made you suffer for it, I think it did a good job here because, if you’d check Feynman on it, you’ll see he – or, more probably, his assistants, – just skate over this without bothering too much about mathematical rigor. OK. All that’s left now is to interpret this ‘number’ Z = 1/(iωC). It is a purely imaginary number, and it’s a constant indeed, albeit a complex constant. It can be re-written as:

Z = 1/(iωC) = i-1/(ωC) = –i/(ωC) = (1/ωC)·ei·π/2

[Sorry. I can’t be more explicit here. It’s just of the wonders of complex numbers: i-1 = –i. Just check one my posts on complex numbers for more detail.] Now, a –i factor corresponds to a rotation of minus 90 degrees, and so that gives you the true meaning of what’s usually said about a circuit with a capacitor: the voltage across the capacitor will lag the current with a phase difference equal to π/2, as shown below. Of course, as it’s the voltage driving the current, we should say it’s the current that is lagging with a phase difference of 3π/2, rather than stating it the other way around! Indeed, i-1 = –i = –1·i = i2·i = i3, so that amounts to three ‘turns’ of the phase in the counter-clockwise direction, which is the direction in which our ωt angle is ‘turning’.


It is a remarkable result, though. The illustration above assumes the maximum amplitude of the voltage and the current are the same, so |Z| = |V|/|I| = 1, but what if they are not the same? What are the real bits then? I can hear you, indeed: “To hell with the bold-face letters: what’s V and I? What’s the real thing?”

Well… V and I are the real bits of = |V|ei(ωt+θV) = V0·ei(ωt+θVand of= |I|ei(ωt+θI= I0·ei(ωt+θV−θ) = I0·ei(ωt−θ) = I0·ei(ωt+π/2respectively so, assuming θV = 0 (as mentioned above, that’s just a matter of choosing a convenient t = 0 point), we get:

  • V = V0·cos(ωt)
  • I = I0·cos(ωt + π/2)

So the π/2 phase difference is there (you need to watch out with the signs, of course: θ = −π/2, but so it’s the current that seems to lead here) but the V0/Iratio doesn’t have to be one, so the real voltage and current could look like something below, where the maximum amplitude of the current is only half of the maximum amplitude of the voltage.


So let’s analyze this quickly: the V0/Iratio is equal to |Z| = |V|/|I| = V0/I= 1/ωC = (1/ω)(1/C) (note that it’s not equal to V/I = V(t)/I(t), which is a ratio that doesn’t make sense because I(t) goes through zero as the current switches direction). So what? Well… It means the ratio is inversely proportional to both the frequency ω as well as the capacity C, as shown below. Think about this: if ω goes to zero, V0/Igoes to ∞, which means that, for a given voltage, the current must go to zero. That makes sense, because we’re talking DC current when ω → 0, and the capacitor charges itself and then that’s it: no more currents. Now, if C goes to zero, so we’re talking capacitors with hardly any capacity, we’ll also get tiny currents. Conversely, for large C, we’ll get huge currents, as the capacitor can take pretty much any charge you throw at it, so that makes for small V0/Iratios. The most interesting thing to consider is ω going to infinity, as the V0/Iratio is also quite small then. What happens? The capacitor doesn’t get the time to charge, and so it’s always in this state where it has large currents flowing in and out of it, as it can’t build the voltage that would counter the electromotive force that’s being supplied by the voltage source.

graph 6OK. That’s it. Le’s discuss the last (passive) element.


We’ve spoiled the party a bit with that illustration above, as it gives the phase difference for an inductor already:

Z = iωL = ωL·ei·π/2, with L the inductance of the coil

So, again assuming that θV = 0, we can calculate I as:

= |I|ei(ωt+θI= I0·ei(ωt+θV−θ) = I0·ei(ωt−θ) = I0·ei(ωt−π/2

Of course, you’ll want to relate this, once again, to the real voltage and the real current, so let’s write the real parts of our phasors:

  • V = V0·cos(ωt)
  • I = I0·cos(ωt − π/2)

Just to make sure you’re not falling asleep as you’re reading, I’ve made another graph of how things could look like. So now’s it’s the current signal that’s lagging the voltage signal with a phase difference equal to θ = π/2.


Also, to be fully complete, I should show you how the V0/Iratio now varies with L and ω. Indeed, here also we can write that |Z| = |V|/|I| = V0/I0, but so here we find that V0/I0 =  ωL, so we have a simple linear proportionality here! For example, for a given voltage V0, we’ll have smaller currents as ω increases, so that’s the opposite of what happens with our ideal capacitors. I’ll let you think about that… :-)


Now how do we get that Z = iωL formula? In my post on inductance, I explained what an inductor is: a coil of wire, basically. Its defining characteristic is that a changing current will cause a changing magnetic field in it and, hence, some change in the flux of the magnetic field. Now, Faraday’s Law tells us that that will cause some circulation of the electric field in the coil, which amounts to an induced potential difference which is referred to as the electromotive force (emf). Now, it turns out that the induced emf is proportional to the change in current. So we’ve got another constant of proportionality here, so it’s like how we defined resistance, or capacitance. So, in many ways, the inductance is just another proportionality coefficient. If we denote it by L – the symbol is said to honor the Russian phyicist Heinrich Lenz, whom you know from Lenz’ Law – then we define it as:

L = −Ɛ/(dI/dt)

The dI/dt factor is, obviously, the time rate of change of the current, and the negative sign indicates that the emf opposes the change in current, so it will tend to cause an opposing current. However, the power of our voltage source will ensure the current does effectively change, so it will counter the ‘back emf’ that’s being generated by the inductor. To be precise, the voltage across the terminals of our inductor, which we denote by V, will be equal and opposite to Ɛ, so we write:

V = −Ɛ = L·(dI/dt)

Now, this very much resembles the I = C·dV/dt relation we had for capacitors, and it’s completely analogous indeed: we just need to switch the I and V, and C and L symbols. So we write:

 V = L·dI/dt⇔ V = L·dI/dt

Now, dI/dt is a similar time derivative as dV/dt. We calculate it as:

dI/dtd(I0·eiωt)/dt = I0·d(eiωt)/dt = I0·(iω)·eiωt = iω·I0·eiωt = iω·I

So we get what we want and have to get:

V = L·dI/dt = iωL·I

Now, Z = V/I, so ZiωL indeed!

Summary of conclusions

Let’s summarize what we found:

  1. For a resistor, we have Z(resistor) = Z= R = V/I = V0/I0
  2. For an capacitor, we have Z(capacitor) = Z= 1/(iωC) = –i/(ωC)
  3. For an inductor, we have Z(inductance) = ZL= iωL

Note that the impedance of capacitors decreases as frequency increases, while for inductors, it’s the other way around. We explained that by making you think of the currents: for a given voltage, we’ll have large currents for high frequencies, and, hence, a small V0/Iratio. Can you think of what happens with an inductor? It’s not so easy, so I’ll refer you to the addendum below for some more explanation.

Let me also note that, as you can see, the impedance of (ideal) inductors and capacitors is a pure imaginary number, so that’s a complex number which has no real part. In engineering, the imaginary part of the impedance is referred to as the reactance, so engineers will say that ideal capacitors and inductors have a purely imaginary reactive impedance

However, in real life, the impedance will usually have both a real as well as an imaginary part, so it will be some kind of mix, so to speak. The real part is referred to as the ‘resistance’ R, and the ‘imaginary’ part is referred to as the ‘reactance’ X. The formula for both is given below:

formula resistance and reactance

But here I have to end my post on circuit elements. It’s become quite long, so I’ll discuss Kirchoff’s rules in my next post.

Addendum: Why is V = − Ɛ?

Inductors are not easy to understand—intuitively, that is. That’s why I spent so much time writing on them in my other post on them, to which I should be referring you here. But let me recapitulate the key points. The key idea is that we’re pumping energy into an inductor when applying a current and, as you know, the time rate of change is power: P = dW/dt, so we’re talking power here too, which is voltage times current: P = dW/dt = V·I. The illustration below shows what happens when an alternating current is applied to the circuit with the inductor. So the assumption is that the current goes in one and then in the other direction, so I > 0, and then I < 0, etcetera. We’re also assuming some nice sinusoidal curve for the current here (i.e. the blue curve), and so we get what we get for U (i.e. the red curve), which is the energy that’s stored in the inductor really, as it tries to resist the changing current: the energy goes up and down between zero and some maximum amplitude that’s determined by the maximum current.

power 2

So, yes, building up current requires energy from some external source, which is used to overcome the ‘back emf’ in the inductor, and that energy is stored in the inductor itself. [If you still wonder why it’s stored in the inductor, think about the other question: where else would it be stored?] How is stored? Look at the graph and think: it’s stored as kinetic energy of the charges, obviously. That explains why the energy is zero when the current is zero, and why the energy maxes out when the current maxes out. So, yes, it all makes sense! :-)

Let me give another example. The graph below assumes the current builds up to some maximum. As it reaches its maximum, the stored energy will also max out. This example assumes direct current, so it’s a DC circuit: the current builds up, but then stabilizes at some maximum that we can find by applying Ohm’s Law to the resistance of the circuit: I = V/R. Resistance? But we were talking an ideal inductor? We are. If there’s no other resistance in the circuit, we’ll have a short-circuit, so the assumption is that we do have some resistance in the circuit and, therefore, we should also think of some energy loss to heat from the current in the resistance. If not, well… Your power source will obviously soon reach its limits. :-)


So what’s going on then? We have some changing current in the coil but, obviously, some kind of inertia also: the coil itself opposes the change in current through the ‘back emf’. Now, it requires energy, or power, to overcome the inertia, so that’s the power that comes from our voltage source: it will offset the ‘back emf’, so we may effectively think of a little circuit with an inductor and a voltage source, as shown below.

circuit with coil

But why do we write V = − Ɛ? Our voltage source can have any voltage, can’t it? Yes. Sure. But so the coil will always provide an emf that’s exactly the opposite of this voltage. Think of it: we have some voltage that’s being applied across the terminals of the inductor, and so we’ll have some current. A current that’s changing. And it’s that current will generate an emf that’s equal to Ɛ = –L·(dI/dt). So don’t think of Ɛ as some constant: it’s the self-inductance coefficient L that’s constant, but I (and, hence, dI/dt) and V are variable.

The point is: we cannot have any potential difference in a perfect conductor, which is what the terminals are: any potential difference, i.e. any electric field really, would cause huge currents. In other words, the voltage V and the emf Ɛ have to cancel each other out, all of the time. If not, we’d have huge currents in the wires re-establishing the V = −Ɛ equality.

Let me use Feynman’s argument here. Perhaps that will work better. :-) Our ideal inductor is shown below: it’s shielded by some metal box so as to ensure it does not interact with the rest of the circuit. So we have some current I, which we assume to be an AC current, and we know some voltage is needed to cause that current, so that’s the potential difference V between the terminals.


The total circulation of E – around the whole circuit – can be written as the sum of two parts:

Formula circulaton

Now, we know circulation of E can only be caused by some changing magnetic field, which is what’s going on in the inductor:


So this change in the magnetic flux is what it causing the ‘back emf’, and so the integral on the left is, effectively, equal to Ɛ, not minus Ɛ but +Ɛ. Now, the second integral is equal to V, because that’s the voltage V between the two terminals a and b. So the whole integral is equal to 0 = Ɛ + V and, therefore, we have that:

V = − Ɛ = L·dI/dt

Re-visiting the speed of light, Planck’s constant, and the fine-structure constant

A brother of mine sent me a link to an article he liked. Now, because we share some interest in physics and math and other stuff, I looked at it and…

Well… I was disappointed. Despite the impressive credentials of its author – a retired physics professor – it was very poorly written. It made me realize how much badly written stuff is around, and I am glad I am no longer wasting my time on it. However, I do owe my brother some explanation of (a) why I think it was bad, and of (b) what, in my humble opinion, he should be wasting his time on. :-) So what it is all about?

The article talks about physicists deriving the speed of light from “the electromagnetic properties of the quantum vacuum.” Now, it’s the term ‘quantum‘, in ‘quantum vacuum’, that made me read the article.

Indeed, deriving the theoretical speed of light in empty space from the properties of the classical vacuum – aka empty space – is a piece of cake: it was done by Maxwell himself as he was figuring out his equations back in the 1850s (see my post on Maxwell’s equations and the speed of light). And then he compared it to the measured value, and he saw it was right on the mark. Therefore, saying that the speed of light is a property of the vacuum, or of empty space, is like a tautology: we may just as well put it the other way around, and say that it’s the speed of light that defines the (properties of the) vacuum!

Indeed, as I’ll explain in a moment: the speed of light determines both the electric as well as the magnetic constants μand ε0, which are the (magnetic) permeability and the (electric) permittivity of the vacuum respectively. Both constants depend on the units we are working with (i.e. the units for electric charge, for distance, for time and for force – or for inertia, if you want, because force is defined in terms of overcoming inertia), but so they are just proportionality coefficients in Maxwell’s equations. So once we decide what units to use in Maxwell’s equations, then μand ε0 are just proportionality coefficients which we get from c. So they are not separate constants really – I mean, they are not separate from c – and all of the ‘properties’ of the vacuum, including these constants, are in Maxwell’s equations.

In fact, when Maxwell compared the theoretical value of c with its presumed actual value, he didn’t compare c‘s theoretical value with the speed of light as measured by astronomers (like that 17th century Ole Roemer, to which our professor refers, which had a first go at it suggesting some specific value for it based on his observations of the timing of the eclipses of one of Jupiter’s moons), but with c‘s value as calculated from the experimental values of μand ε0! So he knew very well what he was looking at. In fact, to drive home the point, it may also be useful to note that the Michelson-Morley experiment – which accurately measured the speed of light – was done some thirty years later. So Maxwell had already left this world by then—very much in peace, because he had solved the mystery all 19th century physicists wanted to solve through his great unification: his set of equations covers it all, indeed: electricity, magnetism, light, and even relativity!

I think the article my brother liked so much does a very lousy job in pointing all of that out, but that’s not why I wouldn’t recommend it. It got my attention because I wondered why one would try to derive the speed of light from the properties of the quantum vacuum. In fact, to be precise, I hoped the article would tell me what the quantum vacuum actually is. Indeed, as far as I know, there’s only one vacuum—one ’empty space’: empty is empty, isn’t it? :-) So I wondered: do we have a ‘quantum’ vacuum? And, if so, what is it, really?

Now, that is where the article is really disappointing, I think. The professor drops a few names (like the Max Planck Institute, the University of Paris-Sud, etcetera), and then, promisingly, mentions ‘fleeting excitations of the quantum vacuum’ and ‘virtual pairs of particles’, but then he basically stops talking about quantum physics. Instead, he wanders off to share some philosophical thoughts on the fundamental physical constants. What makes it all worse is that even those thoughts on the ‘essential’ constants are quite off the mark.

So… This post is just a ‘quick and dirty’ thing for my brother which, I hope, will be somewhat more thought-provoking than that article. More importantly, I hope that my thoughts will encourage him to try to grind through better stuff.

On Maxwell’s equations and the properties of empty space

Let me first say something about the speed of light indeed. Maxwell’s four equations may look fairly simple, but that’s only until one starts unpacking all those differential vector equations, and it’s only when going through all of their consequences that one starts appreciating their deep mathematical structure. Let me quickly copy how another blogger jotted them down: :-)


As I showed in my above-mentioned post, the speed of light (i.e. the speed with which an electromagnetic pulse or wave travels through space) is just one of the many consequences of the mathematical structure of Maxwell’s set of equations. As such, the speed of light is a direct consequence of the ‘condition’, or the properties, of the vacuum indeed, as Maxwell suggested when he wrote that “we can scarcely avoid the inference that light consists in the transverse undulations of the same medium which is the cause of electric and magnetic phenomena”.

Of course, while Maxwell still suggests light needs some ‘medium’ here – so that’s a reference to the infamous aether theory – we now know that’s because he was a 19th century scientist, and so we’ve done away with the aether concept (because it’s a redundant hypothesis), and so now we also know there’s absolutely no reason whatsoever to try to “avoid the inference.” :-) It’s all OK, indeed: light is some kind of “transverse undulation” of… Well… Of what?

We analyze light as traveling fields, represented by two vectors, E and B, whose direction and magnitude varies both in space as well as in time. E and B are field vectors, and represent the electric and magnetic field respectively. An equivalent formulation – more or less, that is (see my post on the Liénard-Wiechert potentials) – for Maxwell’s equations when only one (moving) charge is involved is:



This re-formulation, which is Feynman’s preferred formula for electromagnetic radiation, is interesting in a number of ways. It clearly shows that, while we analyze the electric and magnetic field as separate mathematical entities, they’re one and the same phenomenon really, as evidenced by the B = –er×E/c equation, which tells us the magnetic field from a single moving charge is always normal (i.e. perpendicular) to the electric field vector, and also that B‘s magnitude is 1/times the magnitude of E, so |B| = B = |E|/c = E/c. In short, B is fully determined by E, or vice versa: if we have one of the two fields, we have the other, so they’re ‘one and the same thing’ really—not in a mathematical sense, but in a real sense.

Also note that E and B‘s magnitude is just the same if we’re using natural units, so if we equate c with 1. Finally, as I pointed out in my post on the relativity of electromagnetic fields, if we would switch from one reference frame to another, we’ll have a different mix of E and B, but that different mix obviously describes the same physical reality. More in particular, if we’d be moving with the charges, the magnetic field sort of disappears to re-appear as an electric field. So the Lorentz force F = Felectric + Fmagnetic = qE + qv×B is one force really, and its ‘electric’ and ‘magnetic’ component appear the way they appear in our reference frame only. In some other reference frame, we’d have the same force, but its components would look different, even if they, obviously, would and should add up to the same. [Well… Yes and no… You know there’s relativistic corrections to be made to the forces to, but that’s a minor point, really. The force surely doesn’t disappear!]

All of this reinforces what you know already: electricity and magnetism are part and parcel of one and the same phenomenon, the electromagnetic force field, and Maxwell’s equations are the most elegant way of ‘cutting it up’. Why elegant? Well… Click the Occam tab. :-)

Now, after having praised Maxwell once more, I must say that Feynman’s equations above have another advantage. In Maxwell’s equations, we see two constants, the electric and magnetic constant (denoted by μand ε0 respectively), and Maxwell’s equations imply that the product of the electric and magnetic constant is the reciprocal of c2: μ0·ε= 1/c2. So here we see εand only, so no μ0, so that makes it even more obvious that the magnetic and electric constant are related one to another through c.

[…] Let me digress briefly: why do we have c2 in μ0·ε= 1/c2, instead of just c? That’s related to the relativistic nature of the magnetic force: think about that B = E/c relation. Or, better still, think about the Lorentz equation F = Felectric + Fmagnetic = qE + qv×B = q[E + (v/c)×(E×er)]: the 1/c factor is there because the magnetic force involves some velocity, and any velocity is always relative—and here I don’t mean relative to the frame of reference but relative to the (absolute) speed of light! Indeed, it’s the v/c ratio (usually denoted by β = v/c) that enters all relativistic formulas. So the left-hand side of the μ0·ε= 1/c2 equation is best written as (1/c)·(1/c), with one of the two 1/c factors accounting for the fact that the ‘magnetic’ force is a relativistic effect of the ‘electric’ force, really, and the other 1/c factor giving us the proper relationship between the magnetic and the electric constant. To drive home the point, I invite you to think about the following:

  • μ0 is expressed in (V·s)/(A·m), while εis expressed in (A·s)/(V·m), so the dimension in which the μ0·εproduct is expressed is [(V·s)/(A·m)]·[(A·s)/(V·m)] = s2/m2, so that’s the dimension of 1/c2.
  • Now, this dimensional analysis makes it clear that we can sort of distribute 1/c2 over the two constants. All it takes is re-defining the fundamental units we use to calculate stuff, i.e. the units for electric charge, for distance, for time and for force – or for inertia, as explained above. But so we could, if we wanted, equate both μ0 as well as εwith 1/c.
  • Now, if we would then equate c with 1, we’d have μ0 = ε= c = 1. We’d have to define our units for electric charge, for distance, for time and for force accordingly, but it could be done, and then we could re-write Maxwell’s set of equations using these ‘natural’ units.

In any case, the nitty-gritty here is less important: the point is that μand εare also related through the speed of light and, hence, they are ‘properties’ of the vacuum as well. [I may add that this is quite obvious if you look at their definition, but we’re approaching the matter from another angle here.]

In any case, we’re done with this. On to the next!

On quantum oscillations, Planck’s constant, and Planck units 

The second thought I want to develop is about the mentioned quantum oscillation. What is it? Or what could it be? An electromagnetic wave is caused by a moving electric charge. What kind of movement? Whatever: the charge could move up or down, or it could just spin around some axis—whatever, really. For example, if it spins around some axis, it will have a magnetic moment and, hence, the field is essentially magnetic, but then, again, E and B are related and so it doesn’t really matter if the first cause is magnetic or electric: that’s just our way of looking at the world: in another reference frame, one that’s moving with the charges, the field would essential be electric. So the motion can be anything: linear, rotational, or non-linear in some irregular way. It doesn’t matter: any motion can always be analyzed as the sum of a number of ‘ideal’ motions. So let’s assume we have some elementary charge in space, and it moves and so it emits some electromagnetic radiation.

So now we need to think about that oscillation. The key question is: how small can it be? Indeed, in one of my previous posts, I tried to explain some of the thinking behind the idea of the ‘Great Desert’, as physicists call it. The whole idea is based on our thinking about the limit: what is the smallest wavelength that still makes sense? So let’s pick up that conversation once again.

The Great Desert lies between the 1032 and 1043 Hz scale. 1032 Hz corresponds to a photon energy of Eγ = h·f = (4×10−15 eV·s)·(1032 Hz) = 4×1017 eV = 400,000 tera-electronvolt (1 TeV = 1012 eV). I use the γ (gamma) subscript in my Eγ symbol for two reasons: (1) to make it clear that I am not talking the electric field E here but energy, and (2) to make it clear we are talking ultra-high-energy gamma-rays here.

In fact, γ-rays of this frequency and energy are theoretical only. Ultra-high-energy gamma-rays are defined as rays with photon energies higher than 100 TeV, which is the upper limit for very-high-energy gamma-rays, which have been observed as part of the radiation emitted by so-called gamma-ray bursts (GRBs): flashes associated with extremely energetic explosions in distant galaxies. Wikipedia refers to them as the ‘brightest’ electromagnetic events know to occur in the Universe. These rays are not to be confused with cosmic rays, which consist of high-energy protons and atomic nuclei stripped of their electron shells. Cosmic rays aren’t rays really and, because they consist of particles with a considerable rest mass, their energy is even higher. The so-called Oh-My-God particle, for example, which is the most energetic particle ever detected, had an energy of 3×1020 eV, i.e. 300 million TeV. But it’s not a photon: its energy is largely kinetic energy, with the rest mass m0 counting for a lot in the m in the E = m·c2 formula. To be precise: the mentioned particle was thought to be an iron nucleus, and it packed the equivalent energy of a baseball traveling at 100 km/h! 

But let me refer you to another source for a good discussion on these high-energy particles, so I can get get back to the energy of electromagnetic radiation. When I talked about the Great Desert in that post, I did so using the Planck-Einstein relation (E = h·f), which embodies the idea of the photon being valid always and everywhere and, importantly, at every scale. I also discussed the Great Desert using real-life light being emitted by real-life atomic oscillators. Hence, I may have given the (wrong) impression that the idea of a photon as a ‘wave train’ is inextricably linked with these real-life atomic oscillators, i.e. to electrons going from one energy level to the next in some atom. Let’s explore these assumptions somewhat more.

Let’s start with the second point. Electromagnetic radiation is emitted by any accelerating electric charge, so the atomic oscillator model is an assumption that should not be essential. And it isn’t. For example, whatever is left of the nucleus after alpha or beta decay (i.e. a nuclear decay process resulting in the emission of an α- or β-particle) it likely to be in an excited state, and likely to emit a gamma-ray for about 10−12 seconds, so that’s a burst that’s about 10,000 times shorter than the 10–8 seconds it takes for the energy of a radiating atom to die out. [As for the calculation of that 10–8 sec decay time – so that’s like 10 nanoseconds – I’ve talked about this before but it’s probably better to refer you to the source, i.e. one of Feynman’s Lectures.]

However, what we’re interested in is not the energy of the photon, but the energy of one cycle. In other words, we’re not thinking of the photon as some wave train here, but what we’re thinking about is the energy that’s packed into a space corresponding to one wavelength. What can we say about that?

As you know, that energy will depend both on the amplitude of the electromagnetic wave as well as its frequency. To be precise, the energy is (1) proportional to the square of the amplitude, and (2) proportional to the frequency. Let’s look at the first proportionality relation. It can be written in a number of ways, but one way of doing it is stating the following: if we know the electric field, then the amount of energy that passes per square meter per second through a surface that is normal to the direction in which the radiation is going (which we’ll denote by S – the s from surface – in the formula below), must be proportional to the average of the square of the field. So we write S ∝ 〈E2〉, and so we should think about the constant of proportionality now. Now, let’s not get into the nitty-gritty, and so I’ll just refer to Feynman for the derivation of the formula below:

S = ε0c·〈E2

So the constant of proportionality is ε0c. [Note that, in light of what we wrote above, we can also write this as S = (1/μ0·c)·〈(c·B)2〉 = (c0)·〈B2〉, so that underlines once again that we’re talking one electromagnetic phenomenon only really.] So that’s a nice and rather intuitive result in light of all of the other formulas we’ve been jotting down. However, it is a ‘wave’ perspective. The ‘photon’ perspective assumes that, somehow, the amplitude is given and, therefore, the Planck-Einstein relation only captures the frequency variable: Eγ = h·f.

Indeed, ‘more energy’ in the ‘wave’ perspective basically means ‘more photons’, but photons are photons: they have a definite frequency and a definite energy, and both are given by that Planck-Einstein relation. So let’s look at that relation by doing a bit of dimensional analysis:

  • Energy is measured in electronvolt or, using SI units, joule: 1 eV ≈ 1.6×10−19 J. Energy is force times distance: 1 joule = 1 newton·meter, which means that a larger force over a shorter distance yields the same energy as a smaller force over a longer distance. The oscillations we’re talking about here involve very tiny distances obviously. But the principle is the same: we’re talking some moving charge q, and the power – which is the time rate of change of the energy – that goes in or out at any point of time is equal to dW/dt = F·v, with W the work that’s being done by the charge as it emits radiation.
  • I would also like to add that, as you know, forces are related to the inertia of things. Newton’s Law basically defines a force as that what causes a mass to accelerate: F = m·a = m·(dv/dt) = d(m·v)/dt = dp/dt, with p the momentum of the object that’s involved. When charges are involved, we’ve got the same thing: a potential difference will cause some current to change, and one of the equivalents of Newton’s Law F = m·a = m·(dv/dt) in electromagnetism is V = L·(dI/dt). [I am just saying this so you get a better ‘feel’ for what’s going on.]
  • Planck’s constant is measured in electronvolt·seconds (eV·s) or in, using SI units, in joule·seconds (J·s), so its dimension is that of (physical) action, which is energy times time: [energy]·[time]. Again, a lot of energy during a short time yields the same energy as less energy over a longer time. [Again, I am just saying this so you get a better ‘feel’ for these dimensions.]
  • The frequency f is the number of cycles per time unit, so that’s expressed per second, i.e. in herz (Hz) = 1/second = s−1.

So… Well… It all makes sense: [x joule] = [6.626×10−34 joule]·[1 second]×[f cycles]/[1 second]. But let’s try to deepen our understanding even more: what’s the Planck-Einstein relation really about?

To answer that question, let’s think some more about the wave function. As you know, it’s customary to express the frequency as an angular frequency ω, as used in the wave function A(x, t) = A0·sin(kx − ωt). The angular frequency is the frequency expressed in radians per second. That’s because we need an angle in our wave function, and so we need to relate x and t to some angle. The way to think about this is as follows: one cycle takes a time T (i.e. the period of the wave) which is equal to T = 1/f. Yes: one second divided by the number of cycles per second gives you the time that’s needed for one cycle. One cycle is also equivalent to our argument ωt going around the full circle (i.e. 2π), so we write:  ω·T = 2π and, therefore:

ω = 2π/T = 2π·f

Now we’re ready to play with the Planck-Einstein relation. We know it gives us the energy of one photon really, but what if we re-write our equation Eγ = h·f as Eγ/f = h? The dimensions in this equation are:

[x joule]·[1 second]/[cyles] = [6.626×10−34 joule]·[1 second]

⇔ = 6.626×10−34 joule per cycle

So that means that the energy per cycle is equal to 6.626×10−34 joule, i.e. the value of Planck’s constant.

Let me rephrase truly amazing result, so you appreciate it—perhaps: regardless of the frequency of the light (or our electromagnetic wave, in general) involved, the energy per cycle, i.e. per wavelength or per period, is always equal to 6.626×10−34 joule or, using the electronvolt as the unit, 4.135667662×10−15 eV. So, in case you wondered, that is the true meaning of Planck’s constant!

Now, if we have the frequency f, we also have the wavelength λ, because the velocity of the wave is the frequency times the wavelength: = λ·f and, therefore, λ = c/f. So if we increase the frequency, the wavelength becomes smaller and smaller, and so we’re packing the same amount of energy – admittedly, 4.135667662×10−15 eV is a very tiny amount of energy – into a space that becomes smaller and smaller. Well… What’s tiny, and what’s small? All is relative, of course. :-) So that’s where the Planck scale comes in. If we pack that amount of energy into some tiny little space of the Planck dimension, i.e. a ‘length’ of 1.6162×10−35 m, then it becomes a tiny black hole, and it’s hard to think about how that would work.

[…] Let me make a small digression here. I said it’s hard to think about black holes but, of course, it’s not because it’s ‘hard’ that we shouldn’t try it. So let me just mention a few basic facts. For starters, black holes do emit radiation! So they swallow stuff, but they also spit stuff out. More in particular, there is the so-called Hawking radiation, as Roger Penrose and Stephen Hawking discovered.

Let me quickly make a few remarks on that: Hawking radiation is basically a form of blackbody radiation, so all frequencies are there, as shown below: the distribution of the various frequencies depends on the temperature of the black body, i.e. the black hole in this case. [The black curve is the curve that Lord Rayleigh and Sir James Jeans derived in the late 19th century, using classical theory only, so that’s the one that does not correspond to experimental fact, and which led Max Planck to become the ‘reluctant’ father of quantum mechanics. In any case, that’s history and so I shouldn’t dwell on this.]


The interesting thing about blackbody radiation, including Hawking radiation, is that it reduces energy and, hence, the equivalent mass of our blackbody. So Hawking radiation reduces the mass and energy of black holes and is therefore also known as black hole evaporation. So black holes that lose more mass than they gain through other means are expected to shrink and ultimately vanish. Therefore, there’s all kind of theories that say why micro black holes, like that Planck scale black hole we’re thinking of right now, should be much larger net emitters of radiation than large black holes and, hence, whey they should shrink and dissipate faster.

Hmm… Interesting… What do we do with all of this information? Well… Let’s think about it as we continue our trek on this long journey to reality over the next year or, more probably, years (plural). :-)

The key lesson here is that space and time are intimately related because of the idea of movement, i.e. the idea of something having some velocity, and that it’s not so easy to separate the dimensions of time and distance in any hard and fast way. As energy scales become larger and, therefore, our natural time and distance units become smaller and smaller, it’s the energy concept that comes to the fore. It sort of ‘swallows’ all other dimensions, and it does lead to limiting situations which are hard to imagine. Of course, that just underscores the underlying unity of Nature, and the mysteries involved.

So… To relate all of this back to the story that our professor is trying to tell, it’s a simple story really. He’s talking about two fundamental constants basically, c and h, pointing out that c is a property of empty space, and h is related to something doing something. Well… OK. That’s really nothing new, and surely not ground-breaking research. :-)

Now, let me finish my thoughts on all of the above by making one more remark. If you’ve read a thing or two about this – which you surely have – you’ll probably say: this is not how people usually explain it. That’s true, they don’t. Anything I’ve seen about this just associates the 1043 Hz scale with the 1028 eV energy scale, using the same Planck-Einstein relation. For example, the Wikipedia article on micro black holes writes that “the minimum energy of a microscopic black hole is 1019 GeV [i.e. 1028 eV], which would have to be condensed into a region on the order of the Planck length.” So that’s wrong. I want to emphasize this point because I’ve been led astray by it for years. It’s not the total photon energy, but the energy per cycle that counts. Having said that, it is correct, however, and easy to verify, that the 1043 Hz scale corresponds to a wavelength of the Planck scale: λ = c/= (3×10m/s)/(1043 s−1) = 3×10−35 m. The confusion between the photon energy and the energy per wavelength arises because of the idea of a photon: it travels at the speed of light and, hence, because of the relativistic length contraction effect, it is said to be point-like, to have no dimension whatsoever. So that’s why we think of packing all of its energy in some infinitesimally small place. But you shouldn’t think like that. The photon is dimensionless in our reference frame: in its own ‘world’, it is spread out, so it is a wave train. And it’s in its ‘own world’ that the contradictions start… :-)

OK. Done!

My third and final point is about what our professor writes on the fundamental physical constants, and more in particular on what he writes on the fine-structure constant. In fact, I could just refer you to my own post on it, but that’s probably a bit too easy for me and a bit difficult for you :-) so let me summarize that post and tell you what you need to know about it.

The fine-structure constant

The fine-structure constant α is a dimensionless constant which also illustrates the underlying unity of Nature, but in a way that’s much more fascinating than the two or three things the professor mentions. Indeed, it’s quite incredible how this number (α = 0.00729735…, but you’ll usually see it written as its reciprocal, which is a number that’s close to 137.036…) links charge with the relative speeds, radii, and the mass of fundamental particles and, therefore, how this number also these concepts with each other. And, yes, the fact that it is, effectively, dimensionless, unlike h or c, makes it even more special. Let me quickly sum up what the very same number α all stands for:

(1) α is the square of the electron charge expressed in Planck units: α = eP2.

(2) α is the square root of the ratio of (a) the classical electron radius and (b) the Bohr radius: α = √(re /r). You’ll see this more often written as re = α2r. Also note that this is an equation that does not depend on the units, in contrast to equation 1 (above), and 4 and 5 (below), which require you to switch to Planck units. It’s the square of a ratio and, hence, the units don’t matter. They fall away.

(3) α is the (relative) speed of an electron: α = v/c. [The relative speed is the speed as measured against the speed of light. Note that the ‘natural’ unit of speed in the Planck system of units is equal to c. Indeed, if you divide one Planck length by one Planck time unit, you get (1.616×10−35 m)/(5.391×10−44 s) = m/s. However, this is another equation, just like (2), that does not depend on the units: we can express v and c in whatever unit we want, as long we’re consistent and express both in the same units.]

(4) α is also equal to the product of (a) the electron mass (which I’ll simply write as me here) and (b) the classical electron radius re (if both are expressed in Planck units): α = me·re. Now think that’s, perhaps, the most amazing of all of the expressions for α. [If you don’t think that’s amazing, I’d really suggest you stop trying to study physics. :-)]

Also note that, from (2) and (4), we find that:

(5) The electron mass (in Planck units) is equal me = α/r= α/α2r = 1/αr. So that gives us an expression, using α once again, for the electron mass as a function of the Bohr radius r expressed in Planck units.

Finally, we can also substitute (1) in (5) to get:

(6) The electron mass (in Planck units) is equal to me = α/r = eP2/re. Using the Bohr radius, we get me = 1/αr = 1/eP2r.

So… As you can see, this fine-structure constant really links all of the fundamental properties of the electron: its charge, its radius, its distance to the nucleus (i.e. the Bohr radius), its velocity, its mass (and, hence, its energy),…

So… Why is what it is?

Well… We all marvel at this, but what can we say about it, really? I struggle how to interpret this, just as much – or probably much more :-) – as the professor who wrote the article I don’t like (because it’s so imprecise, and that’s what made me write all what I am writing here).

Having said that, it’s obvious that it points to a unity beyond these numbers and constants that I am only beginning to appreciate for what it is: deep, mysterious, and very beautiful. But so I don’t think that professor does a good job at showing how deep, mysterious and beautiful it all is. But then that’s up to you, my brother and you, my imaginary reader, to judge, of course. :-)

[…] I forgot to mention what I mean with ‘Planck units’. Well… Once again, I should refer you to one of my other posts. But, yes, that’s too easy for me and a bit difficult for you. :-) So let me just note we get those Planck units by equating not less than five fundamental physical constants to 1, notably (1) the speed of light, (2) Planck’s (reduced) constant, (3) Boltzmann’s constant, (4) Coulomb’s constant and (5) Newton’s constant (i.e. the gravitational constant). Hence, we have a set of five equations here (ħ = kB = ke = G = 1), and so we can solve that to get the five Planck units, i.e. the Planck length unit, the Planck time unit, the Planck mass unit, the Planck energy unit, the Planck charge unit and, finally (oft forgotten), the Planck temperature unit. Of course, you should note that all mass and energy units are directly related because of the mass-energy equivalence relation E = mc2, which simplifies to E = m if c is equated to 1. [I could also say something about the relation between temperature and (kinetic) energy, but I won’t, as it would only further confuse you.]

OK. Done! :-)

Addendum: How to think about space and time?

If you read the argument on the Planck scale and constant carefully, then you’ll note that it does not depend on the idea of an indivisible photon. However, it does depend on that Planck-Einstein relation being valid always and everywhere. Now, the Planck-Einstein relation is, in its essence, a fairly basic result from classical electromagnetic theory: it incorporates quantum theory – remember: it’s the equation that allowed Planck to solve the black-body radiation problem, and so it’s why they call Planck the (reluctant) ‘Father of Quantum Theory’ – but it’s not quantum theory.

So the obvious question is: can we make this reflection somewhat more general, so we can think of the electromagnetic force as an example only. In other words: can we apply the thoughts above to any force and any movement really?

The truth is: I haven’t advanced enough in my little study to give the equations for the other forces. Of course, we could think of gravity, and I developed some thoughts on how gravity waves might look like, but nothing specific really. And then we have the shorter-range nuclear forces, of course: the strong force, and the weak force. The laws involved are very different. The strong force involves color charges, and the way distances work is entirely different. So it would surely be some different analysis. However, the results should be the same. Let me offer some thoughts though:

  • We know that the relative strength of the nuclear force is much larger, because it pulls like charges (protons) together, despite the strong electromagnetic force that wants to push them apart! So the mentioned problem of trying to ‘pack’ some oscillation in some tiny little space should be worse with the strong force. And the strong force is there, obviously, at tiny little distances!
  • Even gravity should become important, because if we’ve got a lot of energy packed into some tiny space, its equivalent mass will ensure the gravitational forces also become important. In fact, that’s what the whole argument was all about!
  • There’s also all this talk about the fundamental forces becoming one at the Planck scale. I must, again, admit my knowledge is not advanced enough to explain how that would be possible, but I must assume that, if physicists are making such statements, the argument must be fairly robust.

So… Whatever charge or whatever force we are talking about, we’ll be thinking of waves or oscillations—or simply movement, but it’s always a movement in a force field, and so there’s power and energy involved (energy is force times distance, and power is the time rate of change of energy). So, yes, we should expect the same issues in regard to scale. And so that’s what’s captured by h.

As we’re talking the smallest things possible, I should also mention that there are also other inconsistencies in the electromagnetic theory, which should (also) have their parallel for other forces. For example, the idea of a point charge is mathematically inconsistent, as I show in my post on fields and charges. Charge, any charge really, must occupy some space. It cannot all be squeezed into one dimensionless point. So the reasoning behind the Planck time and distance scale is surely valid.

In short, the whole argument about the Planck scale and those limits is very valid. However, does it imply our thinking about the Planck scale is actually relevant? I mean: it’s not because we can imagine how things might look like  – they may look like those tiny little black holes, for example – that these things actually exist. GUT or string theorists obviously think they are thinking about something real. But, frankly, Feynman had a point when he said what he said about string theory, shortly before his untimely death in 1988: “I don’t like that they’re not calculating anything. I don’t like that they don’t check their ideas. I don’t like that for anything that disagrees with an experiment, they cook up an explanation—a fix-up to say, ‘Well, it still might be true.'”

It’s true that the so-called Standard Model does not look very nice. It’s not like Maxwell’s equations. It’s complicated. It’s got various ‘sectors’: the electroweak sector, the QCD sector, the Higgs sector,… So ‘it looks like it’s got too much going on’, as a friend of mine said when he looked at a new design for mountainbike suspension. :-) But, unlike mountainbike designs, there’s no real alternative for the Standard Model. So perhaps we should just accept it is what it is and, hence, in a way, accept Nature as we can see it. So perhaps we should just continue to focus on what’s here, before we reach the Great Desert, rather than wasting time on trying to figure out how things might look like on the other side, especially because we’ll never be able to test our theories about ‘the other side.’

On the other hand, we can see where the Great Desert sort of starts (somewhere near the 1032 Hz scale), and so it’s only natural to think it should also stop somewhere. In fact, we know where it stops: it stops at the 1043 Hz scale, because everything beyond that doesn’t make sense. The question is: is there actually there? Like fundamental strings or whatever you want to call it. Perhaps we should just stop where the Great Desert begins. And what’s the Great Desert anyway? Perhaps it’s a desert indeed, and so then there is absolutely nothing there. :-)

Hmm… There’s not all that much one can say about it. However, when looking at the history of physics, there’s one thing that’s really striking. Most of what physicists can think of, in the sense that it made physical sense, turned out to exist. Think of anti-matter, for instance. Paul Dirac thought it might exist, that it made sense to exist, and so everyone started looking for it, and Carl Anderson found in a few years later (in 1932). In fact, it had been observed before, but people just didn’t pay attention, so they didn’t want to see it, in a way. […] OK. I am exaggerating a bit, but you know what I mean. The 1930s are full of examples like that. There was a burst of scientific creativity, as the formalism of quantum physics was being developed, and the experimental confirmations of the theory just followed suit.

In the field of astronomy, or astrophysics I should say, it was the same with black holes. No one could really imagine the existence of black holes until the 1960s or so: they were thought of a mathematical curiosity only, a logical possibility. However, the circumstantial evidence now is quite large and so… Well… It seems a lot of what we can think of actually has some existence somewhere. :-)

So… Who knows? […] I surely don’t. And so I need to get back to the grind and work my way through the rest of Feynman’s Lectures and the related math. However, this was a nice digression, and so I am grateful to my brother he initiated it. :-)

Self-inductance, mutual inductance, and the power and energy of inductors

As Feynman puts it, studying physics is not always about the ‘the great and esoteric heights’. In fact, you usually have to come down from then fairly quickly – studying physics is similar to mountain climbing in that regard :-) – and study ‘relatively low-level subjects’, such as electrical circuits, which is what we’ll do in this and the next post.

As I’ve introduced some key concepts in a previous post already, let me recapitulate the basics, which include the concept of the electromotive force, which is basically the voltage, i.e. the potential difference, that’s produced in a loop or coil of wire as the magnetic flux changes. I also talked about the impedance in a AC circuit. Finally, we also discussed the power and energies involved. Important results from this previous discussion include (but are not limited to):

  1. A constant speed AC generator will create an alternating current with the emf, i.e. the voltage, varying as V0·sin(ωt).
  2. If we only have resistors as circuit elements, and the resistance in the circuit adds up to R, then the electric current in the circuit will be equal to I = Ɛ/R = V/R = (V0/R)·sin(ωt). So that’s Ohm’s Law, basically.
  3. The power that’s produced and consumed in an AC circuit is the product of the voltage and the current, so P = Ɛ·I = V·I. We also showed this electrical power is equal to the mechanical power dW/dt that makes the generator run.
  4. Finally, we explained the concept of impedance (denoted by Z) using Euler’s formula: Z = |Z|eiθ, mentioning that, if other circuit elements than resistors are involved, such as inductors, then it’s quite likely that the current signal will lag the voltage signal, with the phase factor θ telling us by how much.

It’s now time to introduce those ‘other’ circuit elements. So let’s start with inductors here, and the concept of inductance itself. There’s a lot of stuff to them, and so let’s go over it point by point.

The concept of self-inductance

In its simplest form, an inductor is just a coil, but they come in all sizes and shapes. If you want to see how they might look like, just google some images of micro-electronic inductors, and then, just to see the range of applications, some images of inductors used for large-scale industrial applications. If you do so, you’re likely to see images of transformers too, because transformers work on the principle of mutual inductance, and so they involve two coils, i.e. two inductors.

Contrary to what you might expect, the concept of self-inductance (or inductance tout court) is quite simply: a changing current will cause a changing magnetic field and, hence, some emf. Now, it turns out that the induced emf is proportional to the change in current. So we’ve got another constant of proportionality here, so it’s like how we defined resistance, or capacitance. So, in many ways, the inductance is just another proportionality coefficient. If we denote it by L – the symbol is said to honor the Russian phyicist Heinrich Lenz, whom you know from Lenz’ Law – then we define it as:

L = −Ɛ/(dI/dt)

The dI/dt factor is, obviously, the time rate of change of the current, and the negative sign indicates that the emf opposes the change in current, so it will tend to cause an opposing current. That’s why the emf involved is often referred to as a ‘back emf’. So that’s Lenz’ Law basically. As you might expect, the physicists came up with yet another derived unit, the Henry, to honor yet another physicist, Joseph Henry, an American scientist who was a contemporary of Michael Faraday and independently discovered pretty much the same as Faraday: one henry (H) equals one volt·second per ampere: 1 H = 1 V·s/A.

The concept of mutual inductance

Feynman introduces the topic of inductance with a two-coil set-up, as shown below, noting that a current in coil 1 will induce some emf in coil 2, which he denotes by M12. Conversely, a current in coil 2 will induce some emf in coil 1, which he denoted by M21. M12 and M21 are also constants: they depend on the geometry of the coils, including the length of the solenoid (l), its surface area (S) and the number of loop turns of the coils (N1 and N2).

mutual inductance

The next step in the analysis is then to acknowledge that each coil should also produce a ‘back emf’ in itself, which we can denote by M11 and M22 respectively, but then these constants are, of course, equal to the self-inductance of the coils so, taking into account the convention in regard to the sign of the self-inductance, we write:

L1 = − M11 and L1 = − M22

You will now wonder: what’s the total emf in each coil, taking into account that we do not only have mutual inductance but also self-inductance? Frankly, when I was a kid, and my father tried to tell me one or two things about this, it confused me very much. I could not imagine what happened in one coil, let alone in two coils. I had this vision of a current producing some ‘back-current’, and then the ‘back-current’ producing ‘back-current’ again, and so I could not imagine how one could solve this problem. So the image in my head was very much like that baking powder box which Feynman evokes when talking about the method of images to find the electric fields in situations with an easy geometry, so that’s the picture of a baking powder box which has on its label a picture of a baking powder box which has… Well… Etcetera. Unfortunately, my father didn’t push us to study math and, therefore, I knew that one could solve such problems mathematically – we’re talking a converging series here – but I did not know how, and that’s why I found it all very confusing.

Now I understand there’s one current only, and one potential difference only, and that the formulas do not involve some infinite series of terms. But… Well… I am not ashamed to say these problems are still testing the (limited) agility of my mind. The first thing to ‘get’ is that we’re talking a back emf, and so that’s not a current but a potential difference. In fact, as I explained in my post on the electromotive force, the term ‘force’ in emf is actually misleading, and may lead to that same erroneous vision that I had as a kid: forces generating counter-forces, that generate counter-forces, that generate counter-forces, etcetera. It’s not like that: we have some current – one current – in a coil and we’ll have some voltage – one voltage – across the coil. If the coil would be a resistor instead of a coil, we’d find that the ratio of this voltage and the current would be some constant R = V/I. Now here we’re talking a coil indeed, so that’s a different circuit element, and we find some other ratio, L = −V/(dI/dt) = −Ɛ/(dI/dt). Why the minus sign? Well… As said, the induced emf will be such that it will tend to counter the current, and current flows from positive to negative as per our convention.

But… Yes? So how does it work when we put this coil in some circuit, and how does the resistance of the inductor come into play? Relax. We’ve just been talking ideal circuit elements so far, and we’ve discussed only two: the resistor and the inductor. We’ll talk about voltage sources (or generators) and capacitors too, and then we’ll link all of these ideal circuit elements. In short, we’ll analyze some real-life electrical circuit soon, but first you need to understand the basics. Let me just note that an ideal inductor appears a zero-resistance conductor in a direct current (DC) circuit, so it’s a short-circuit really! Please try to mentally separate out those ‘ideal’ circuit components. Otherwise you’ll never be able to make sense of it all!

In fact, there’s a good reason why Feynman starts with explaining mutual inductance before discussing a little circuit like the one below, which has an inductor and a voltage source. The two-coil situation above is effectively easier to understand, although it may not look like that at first. So let’s analyze that two-coil situation in more detail first. In other words, let me try to understand the situation that I didn’t understand as a kid. :-)

circuit with coil

Because of the law of superposition, we should add fluxes and changes in fluxes and, hence, we should also add the electromotive forces, i.e. the induced voltages. So, what we have here is that the total emf in coil 2 should be written as:

Ɛ2 = M21·(dI1/dt) + M22·(dI2/dt) = M21·(dI1/dt) – L2·(dI2/dt)

What we’re saying here is that the emf, i.e. the voltage across the coil, will indeed depend on the change in current in the other coil, but also on the change in current of the coil itself. Likewise, the total emf in coil 1 should be written as:

Ɛ1 = M12·(dI2/dt) + M11·(dI1/dt) = M12·(dI2/dt) – L1·(dI1/dt)

Of course, this does reduce to the simple L = −Ɛ/(dI/dt) if there’s one coil only. But so you see where it comes from and, while we do not have some infinite series :-) we do have a system of two equations here, and so let me say one or two things about it.

The first thing to note is that it is not so difficult to show that M21 is equal to M12, so we can simplify and write that M21 = M12 = M. Now, while I said ‘not so difficult’, I didn’t mean it’s easy and, because I don’t want this post to become too long, I’ll refer you to Feynman for the proof of this M21 = M12 = M equation. It’s a general proof for any two coils or ‘circuits’ of arbitrary shape and it’s really worth the read. However, I have to move on.

The second thing to note is that this coefficient M, which is referred to as the mutual inductance now (so singular instead of plural) depends on the ‘circuit geometry’ indeed. For a simple solenoid, Feynman calculates it as

M = −(1/ε0c2)·(N1·N2)·S/l,

with the length of the solenoid, S its surface area (S), and N1 and Nthe respective number of loop turns of the two coils. So, yes, only ‘geometry’ comes into play. [Note that’s quite obvious from the formula because a switch of the subscripts of Nand Nmakes no difference, of course!]. Now, it’s interesting to note that M is the same for, let’s say, N= 100 and N2 = 10 and for N= 20 and N2 = 50. In fact, because you’re familiar with what transformers do, i.e. transforming voltages, you may think that’s counter-intuitive. It’s not. The thing with the number of coils does not imply that Ɛ1 and Ɛ2 remain the same. Our set of equations is Ɛ1 = M·(dI2/dt) – L1·(dI1/dt) and Ɛ= M·(dI1/dt) – L2·(dI2/dt), and so Land Lclearly do vary as Nand N2 vary! So… Well… Yes. We’ve got a set of two equations with two independent variables (Iand I2) and two dependent variables (Ɛ1 and Ɛ2). Of course, we could also phrase the problem the other way around: given two voltages, what are the currents? :-)

Of course, that makes us think of the power that goes in and out of a transformer. Indeed, you’ll remember that power is voltage times current. So what’s going on here in regard to that?

Well… There’s a thing with transformers, or with two-coil systems like this in general, that is referred to as coupling. The geometry of the situation will determine how much flux from one coil is linked with the flux of the other coil. If most, or all of it, is linked, we say the two coils are ‘tightly coupled’ or, in the limit, that they are fully coupled. There’s a measure for that, and it’s called the coefficient of coupling. Let’s first explore that concept of power once more.

Inductance, energy and electric power

It’s easy to see that we need electric power to get some current going. Now, as we pointed out in our previous post, the power is equal to the voltage times the current. It’s also equal, of course, to the amount of work done per second, i.e. the time rate of change of the energy W, so we write:

dW/dt = Ɛ·I

Now, we defined the self-inductance as L = −Ɛ/(dI/dt) and, therefore, we know that Ɛ = −L·(dI/dt), so we have:

dW/dt = −L·I·(dI/dt)

What is this? A differential equation? Yes and no. We’ve got not one but two functions of time here, W and I, and, while their derivatives with respect to time do appear in the equation, what we need to do is just integrate the two sides over time. We get: W = −(1/2)·L·I2. Just check it by taking the time derivative of both sides. Of course, we can add any constant, to both sides in fact, but that’s just a matter of choosing some reference point. We’ll chose our constant to be zero, and also think about the energy that’s stored in the coil, i.e. U, which we define as:

U = −W = −(1/2)·L·I

Huh? What’s going on here? Well… It’s not an easy discussion, but let’s try to make sense of it. We have some changing current in the coil here but, obviously, some kind of inertia also: the coil itself opposes the change in current through the ‘back emf’. It requires energy, or power, to overcome the inertia. We may think of applying some voltage to offset the ‘back emf’, so we may effectively think of that little circuit with an inductor and a voltage source. The voltage V we’d need to apply to offset the inertia would, obviously, be equal to the ‘back emf’, but with its sign reversed, so we have:

V = − Ɛ = L·(dI/dt)

Now, it helps to think of what a current really is: it’s about electric charges that are moving at some velocity v because some force is being applied to them. As in any system, the power that’s being delivered is the dot product of the force and the velocity vectors (that ensures we only take the tangential component of the force into account), so if we have moving charges, the power that is being delivered to the circuit is (F·v)·n. What is F? It’s obviously, qE, as the electric field is the force per unit charge, so E = F/q. But so we’re talking some circuit here and we need to think of the power being delivered to some infinitesimal element ds in the coil, and so that’s (F·v)·n·ds, which can be written as: (F·ds)·n·v. And then we integrate over the whole coil to find:power

Now, you may or may not remember that the emf (Ɛ) is actually defined as the line integral ∫ E·ds line, taken around the entire coil and, hence, noting that = F/q, and that the current I is equal to I = q·n·v, we got our power equation. Indeed, the integrand or kernel of our integral becomes F·n·v·d= q·E·n·v·d= I·E·ds. Hence, we get our power formula indeed: P = V·I, with V the potential difference, i.e. the voltage across the coil.

I am getting too much into the weeds here. The point is: we’ve got a full and complete analog to the concept of inertia in mechanics here: instead of some force F causing some mass m to change its velocity according to Newton’s Law, i.e. F = m·a = m·(dv/dt), we here have a potential difference V causing some current I to change according to the V = L·(dI/dt) law.

This is very confusing but, remember, the same equations must have the same solutions! So, in an electric circuit, the inductance is really like what the mass is in a mechanics. Now, in mechanics, we’ll say that our mass has some momentum p = m·v, and we’ll also say that its kinetic energy is equal to (1/2)m·v2. We can do the same for our circuit: potential energy is continuously being converted into kinetic energy which, for our inductor, we write as U = (1/2)·L·I2.

Just think about by playing with one of the many online graphing tools. The graph below, for example, assumes the current builds up to some maximum. As it reaches its maximum, the stored energy will also max out. Now, you should not worry about the units here, or the scale of the graphs. The assumption is that I builds up from 0 to 1, and that L = 1, so that makes U what it is. Using a different constant for L, and/or different units for I, will change the scale of U too, but not its general shape, and that shape gives you the general idea.

powerThe example above obviously assumes some direct current, so it’s a DC circuit: the current builds up, but then stabilizes at some maximum that we can find by applying Ohm’s Law to the resistance of the circuit: I = V/R. Resistance? But we were talking an ideal inductor? We are. If there’s no other resistance in the circuit, we’ll have a short-circuit, so the assumption is that we do have some resistance in the circuit and, therefore, we should also think of some energy loss to heat from the current in the resistance, but that’s not our worry here.

The illustration below is, perhaps, more interesting. Here we are, obviously, applying an alternating current, and so the current goes in one and then in the other direction, so I > 0, and then I < 0, etcetera. We’re assuming some nice sinusoidal curve for the current here (i.e. the blue curve), and so we get what we get for U (i.e. the red curve): the energy goes up and down between zero and some maximum amplitude that’s determined by the maximum current.

power 2

So, yes, it is, after all, quite intuitive: building up a current does require energy from some external source, which is used to overcome the ‘back emf’ in the inductor, and that energy is stored in the inductor itself. [If you still wonder why it’s stored in the inductor, think about the other question: where else would it be stored?] How is stored? Look at the graph and think: it’s stored as kinetic energy of the charges, obviously. That explains why the energy is zero when the current is zero, and why the energy maxes out when the current maxes out. So, yes, it all makes sense! :-)

Let’s now get back to that coupling constant.

The coupling constant

We can apply our reasoning to two coils. Indeed, we know that Ɛ1 = M·(dI2/dt) – L1·(dI1/dt) and Ɛ= M·(dI1/dt) – L2·(dI2/dt). So the power in the two-coils system is dW/dt = Ɛ1·I1 + Ɛ2·I2, so we have:

dW/dt = M·I1(dI2/dt) – L1·I1·(dI1/dt) + M·I2·(dI1/dt) – L2·I2·(dI2/dt)

= – L1·I1·(dI1/dt) – L2·I2·(dI2/dt) + M·I1(dI2/dt)·I2·(dI1/dt)

Integrating both sides, and equating U with −W once more, yields:

U = (1/2)·L1·I1+ (1/2)·L2·I2+ M·I1·I2

[Again, you should just take the time derivative to verify this. If you don’t forget to apply the product rule for the M·I1·I2 term, you’ll see I am not writing too much nonsense here.] Now, there’s an interesting algebraic transformation of this expression, and an equally interesting explanation why we’d re-write the expression as we do. Let me copy it from Feynman so I’ll be using his fancier L and M symbols now. :-)

explanation coupling constant

So what? Well… Taking into account that inequality above, we can write the relation between M and the self-inductances L1 and L2 using some constant k, which varies between 0 and 1 and which we’ll refer to as the coupling constant:

formula coupling constant 2

We refer to k as the coupling constant, for rather obvious reasons: if it’s near zero, the mutual inductance will be very small, and if it’s near one, then the coils are said to be ‘tightly coupled’, and the ‘mutual flux linkage’ is then maximized. As you can imagine, there’s a whole body of literature out there relating this coupling constant to the behavior of transformers or other circuits where mutual inductance plays a role.

The formula for self-inductance

We gave the formula for the mutual inductance of two coils that are arranged as one solenoid on top of the other (cf. the illustration I started with):

M = −(1/ε0c2)·(N1·N2)·S/l

It’s a very easy calculation, so let me quickly copy it from Feynman:

calculation solenoid

You’ll say: where is the M here? This is a formula for the emf! It is, but M is the constant of proportionality in front, remember? So there you go. :-)

Now, you would think that getting a formula for the self-inductance L of some solenoid would be equally straightforward. It turns out that that is not the case. Feynman needs two full pages and… Well… By now, you should now how ‘dense’ his writing really is: if it weren’t so dense, you’d be reading Feynman yourself, rather than my ‘explanations’ of him. :-) So… Well.. If you want to see how it works, just click on the link here and scroll down to the last two pages of his exposé on self-inductance. I’ll limit myself to just jotting down the formula he does obtain when he’s through the whole argument:

solenoid formula

See why he uses a fancier L than ‘my’ L? ‘His’ L is the length of the solenoid. :-) And, yes, r is the radius of the coil and n the number of turns per unit length in the winding. Also note this formula is valid only if L >> R, so the effects at the end of the solenoid can be neglected. OK. Done. :-)

Well… That’s it for today! I am sorry to say but the next post promises to be as boring as this one because… Well… It’s on electric circuits again. :-(

Reconciling the wave-particle duality in electromagnetism

As I talked about Feynman’s equation for electromagnetic radiation in my previous post, I thought I should add a few remarks on wave-particle duality, but then I didn’t do it there, because my post would have become way too long. So let me add those remarks here. In fact, I’ve written about this before, and so I’ll just mention the basic ideas without going too much in detail. Let me first jot down the formula once again, as well as illustrate the geometry of the situation:



The gist of the matter is that light, in classical theory, is a traveling electromagnetic field caused by an accelerating electric charge and that, because light travels at speed c, it’s the acceleration at the retarded time t – r/c, i.e. a‘ = a(t – r/c), that enters the formula. You’ve also seen the diagrams that accompany this formula:

EM 1 EM 2

The two diagrams above show that the curve of the electric field in space is a “reversed” plot of the acceleration as a function of time. As I mentioned before, that’s quite obvious from the mathematical behavior of a function with argument like the argument above, i.e. a function F(t – r/c). When we write t – r/c, we basically measure distance units in seconds, instead of in meter. So we basically use as the scale for both time as well as distance. I explained that in a previous post, so please have a look there if you’d want so see how that works.

So it’s pretty straightforward, really. However, having said that, when I see a diagram like the one above, so all of these diagrams plotting an E or B wave in space, I can’t help thinking it’s somewhat misleading: after all, we’re talking something traveling at the speed of light here and, therefore, its length – in our frame of reference – should be zero. And it is, obviously. Electromagnetic radiation comes packed in point-like, dimensionless photons: the length of something that travels at the speed of light must be zero.

Now, I don’t claim to know what’s going on exactly, but my thinking on it may not be far off the mark. We know that light is emitted and absorbed by atoms, as electrons go from one energy level to another, and the energy of the photons of light corresponds to the difference between those energy levels (i.e. a few electron-volt only, typically: it’s given by the E = h·ν relation). Therefore, we can look at a photon as a transient electromagnetic wave. It’s a very short pulse: the decay time for one such pulse of sodium light, i.e. one photon of sodium light, is 3.2×10–8 seconds. However, taking into account the frequency of sodium light (500 THz), that still makes for some 16 million oscillations, and a wave-train with a length of almost 10 meter. [Yes. Quite incredible, isn’t it?] So the photon could look like the transient wave I depicted below, except… Well… This wavetrain is traveling at the speed of light and, hence, we will not see it as a ten-meter long wave-train. Why not? Well… Because of the relativistic length contraction, it will effectively appear as a point-like particle to us.

Photon wave

So relativistic length contraction is why the wave and particle duality can be easily reconciled in electromagnetism: we can think of light as an irregular beam of point-like photons indeed, as one atomic oscillator after the other releases a photon, in no particularly organized way. So we can think of photons as transient wave-trains, but we should remind ourselves that they are traveling at the speed of light, so they’ll look point-like to us.

Is such view consistent with the results of the famous – of should I say infamous? – double-slit experiment. Well… Maybe. As I mentioned in one of my posts, it is rather remarkable that is actually hard to find actual double-slit experiments that use actual detectors near the slits, and even harder to find such experiments involving photons! Indeed, experiments involving detectors near the slits are usually experiments with ‘real’ particles, such as electrons, for example. Now, a lot of advances have been made in the set-up of these experiments over the past five years, and one of these experiments is a 2010 experiment of an Italian team which suggests that it’s the interaction between the detector and the electron wave that may cause the interference pattern to disappear. Now that throws some doubts on the traditional explanation of the results of the double-slit experiment.

The idea is shown below. The electron is depicted as an incoming plane wave which effectively breaks up as it goes through the slits. The slit on the left has no ‘filter’ (which you may think of as a detector) and, hence, the plane wave goes through as a cylindrical wave. The slit on the right-hand side is covered by a ‘filter’ made of several layers of ‘low atomic number material’, so the electron goes through but, at the same time, the barrier creates a spherical wave as it goes through. The researchers note that “the spherical and cylindrical wave do not have any phase correlation, and so even if an electron passed through both slits, the two different waves that come out cannot create an interference pattern on the wall behind them.” [I hope I don’t have to remind you that, while being represented as ‘real’ waves here, the ‘waves’ are, obviously, complex-valued psi functions.]

double-slit experiment

In fact, to be precise, the experimenters note that there still was an interference effect if the filter was thin enough. Let me quote the reason for that: “The thicker the filter, the greater the probability for inelastic scattering. When the electron suffers inelastic scattering, it is localized. This means that its wavefunction collapses and, after the measurement act, it propagates roughly as a spherical wave from the region of interaction, with no phase relation at all with other elastically or inelastically scattered electrons. If the filter is made thick enough, the interference effects cancels out almost completely.”

This does not solve the ‘mystery’ of the double-slit experiment, but it throws doubt on how it’s usually being explained. The mystery in such experiments is that, when we put detectors, it is either the detector at A or the detector at B that goes off. They should never go off together—”at half strength, perhaps”, as Feynman puts it. But so there are doubts here now. Perhaps the electron does go through both slits at the same time! And so that’s why I used italics when writing “even if an electron passed through both slits”: the electron, or the photon in a similar set-up, is not supposed to do that according to the traditional explanation of the results of the double-slit experiment! It’s one or the other, and the wavefunction collapses or reduces as it goes through. 

However, that’s where these so-called ‘weak measurement’ experiments now come in, like this 2010 experiment: it does not prove but indicates that interaction does not have to be that way. They strongly suggest that it is not all or nothing, that our observations should not necessarily destroy the wavefunction. So, who knows, perhaps we will be able, one day, to show that the wavefunction does go through both slits, as it should (otherwise the interference pattern cannot be explained), and then we will have resolved the paradox.

I am pretty sure that, when that’s done, physicists will also be able to relate the image of a photon as a transient electromagnetic wave (cf. the diagram above), being emitted by an atomic oscillator for a few nanoseconds only (we gave the example for sodium light, for which the decay time was 3.2×10–8 seconds) with the image of a photon as a particle that can be represented by a complex-valued probability amplitude function (cf. the diagram below). I look forward to that day. I think it will come soon.

Photon wave

Here I should add two remarks. First, a lot has been said about the so-called indivisibility of a photon, but inelastic scattering implies that photons are not monolithic: the photon loses energy to the electron and, hence, its wavelength changes. Now, you’ll say: the scattered photon is not the same photon as the incident photon, and you’re right. But… Well. Think about it. It does say something about the presumed oneness of a photon.


The other remark is on the mathematics of interference. Photons are bosons and, therefore, we have to add their amplitudes to get the interference effect. So you may try to think of an amplitude function, like Ψ = (1/√2π)·eiθ or whatever, and think it’s just a matter of ‘splitting’ this function before it enters the two slits and then ‘putting it back together’, so to say, after our photon has gone through the slits. [For the detailed math of interference in quantum mechanics, see my page on essentials.]  Well… No. It’s not that simple. The illustration with that plane wave entering the slits, and the cylindrical and/or spherical wave coming out, makes it obvious that something happens to our wave as it goes through the slit. As I said a couple of times already, the two-slit experiment is interesting, but the interference phenomenon – or diffraction as it’s called – involving one slit only is at least as interesting. So… Well… The analysis is not that simple. Not at all, really. :-)

The Liénard–Wiechert potentials and the solution for Maxwell’s equations

In my post on gauges and gauge transformations in electromagnetics, I mentioned the full and complete solution for Maxwell’s equations, using the electric and magnetic (vector) potential Φ and A. Feynman frames it nicely, so I should print it and put it on the kitchen door, so I can look at it everyday. :-)


I should print the wave equation we derived in our previous post too. Hmm… Stupid question, perhaps, but why is there no wave equation above? I mean: in the previous post, we said the wave equation was the solution for Maxwell’s equation, didn’t we? The answer is simple, of course: the wave equation is a solution for waves originating from some source and traveling through free space, so that’s a special case. Here we have everything. Those integrals ‘sweep’ all over space, and so that’s real space, which is full of moving charges and so there’s waves everywhere. So the solution above is far more general and captures it all: it’s the potential at every point in space, and at every point in time, taking into account whatever else is there, moving or not moving. In fact, it is the general solution of Maxwell’s equations.

How do we find it? Well… I could copy Feynman’s 21st Lecture but I won’t do that. The solution is based on the formula for Φ and A for a small blob of charge, and then the formulas above just integrate over all of space. That solution for a small blob of charge, i.e. a point charge really, was first deduced in 1898, by a French engineer: Alfred-Marie Liénard. However, his equations did not get much attention, apparently, because a German physicist, Emil Johann Wiechert, worked on the same thing and found the very same equations just two years later. That’s why they are referred to as the Liénard-Wiechert potentials, so they both get credit for it, even if both of them worked it out independently. These are the equations:

electric potential

magnetic potential

Now, you may wonder why I am mentioning them, and you may also wonder how we get those integrals above, i.e. our general solution for Maxwell’s equations, from them. You can find the answer to your second question in Feynman’s 21st Lecture. :-) As for the first question, I mention them because one can derive two other formulas for E and B from them. It’s the formulas that Feynman uses in his first Volume, when studying light: E


Now you’ll probably wonder how we can get these two equations from the Liénard-Wiechert potentials. They don’t look very similar, do they? No, they don’t. Frankly, I would like to give you the same answer as above, i.e. check it in Feynman’s 21st Lecture, but the truth is that the derivation is so long and tedious that even Feynman says one needs “a lot of paper and a lot of time” for that. So… Well… I’d suggest we just use all of those formulas and not worry too much about where they come from. If we can agree on that, we’re actually sort of finished with electromagnetism. All the chapters that follow Feynman’s 21st Lecture are applications indeed, so they do not add all that much to the core of the classical theory of electromagnetism.

So why did I write this post? Well… I am not sure. I guess I just wanted to sum things up for myself, so I can print it all out and put it on the kitchen door indeed. :-) Oh, and now that I think of it, I should add one more formula, and that’s the formula for spherical waves (as opposed to the plane waves we discussed in my previous post). It’s a very simple formula, and entirely what you’d expect to see:

spherical wave

The S function is the source function, and you can see that the formula is a Coulomb-like potential, but with the retarded argument. You’ll wonder: what is ψ? Is it E or B or what? Well… You can just substitute: ψ can be anything. Indeed, Feynman gives a very general solution for any type of spherical wave here. :-)

So… That’s it, folks. That’s all there is to it. I hope you enjoyed it. :-)

Addendum: Feynman’s equation for electromagnetic radiation

I talked about Feynman’s formula for electromagnetic radiation before, but it’s probably good to quickly re-explain it here. Note that it talks about the electric field only, as the magnetic field is so tiny and, in any case, if we have E then we can find B. So the formula is:


The geometry of the situation is depicted below. We have some charge q that, we assume, is moving through space, and so it creates some field E at point P. The er‘ vector is the unit vector from P to Q, so it points at the charge. Well… It points to where the charge was at the time just a little while ago, i.e. at the time t – r‘/c. Why? Well… We don’t know where q is right now, because the field needs some time travel, we don’t know q right now, i.e. q at time t. It might be anywhere. Perhaps it followed some weird trajectory during the time r‘/c, like the trajectory below.

radiation formula

So our er‘ vector moves as the charge moves, and so it will also have velocity and, likely, some acceleration, but what we measure for its velocity and acceleration, i.e. the d(er)/dt and d2(er)/dt2 in that Feynman equation, is also the retarded velocity and the retarded acceleration. But look at the terms in the equation. The first two terms have a 1/r’2 in them, so these two effects diminish with the square of the distance. The first term is just Coulomb’s Law (note that the minus sign in front takes care of the fact that like charges repel and so the E vector will point in the other way). Well… It is and it isn’t, because of the retarded time argument, of course. And so we have the second term, which sort of compensates for that. Indeed, the d(er)/dt is the time rate of change of er and, hence, if r‘/c = Δt, then (r‘/cd(er)/dt is a first-order approximation of Δer.

As Feynman puts it: “The second term is as though nature were trying to allow for the fact that the Coulomb effect is retarded, if we might put it very crudely. It suggests that we should calculate the delayed Coulomb field but add a correction to it, which is its rate of change times the time delay that we use. Nature seems to be attempting to guess what the field at the present time is going to be, by taking the rate of change and multiplying by the time that is delayed.” In short, the first two terms can be written as E = −(q/4πε0)/r2·[er + Δer] and, hence, it’s a sort of modified Coulomb Law that sort of tries to guess what the electrostatic field at P should be based on (a) what it is right now, and (b) how q’s direction and velocity, as measured now, would change it.

Now, the third term has a 1/c2 factor in front but, unlike the other two terms, this effect does not fall off with distance. So the formula below fully describes electromagnetic radiation, indeed, because it’s the only important term when we get ‘far enough away’, with ‘far enough’ meaning that the parts that go as the square of the distance have fallen off so much that they’re no longer significant.

radiation formula 2Of course, you’re smart, and so you’ll immediately note that, as r increases, that unit vector keeps wiggling but that effect will also diminish. You’re right. It does, but in a fairly complicated way. The acceleration of er has two components indeed. One is the transverse or tangential piece, because the end of er goes up and down, and the other is a radial piece because it stays on a sphere and so it changes direction. The radial piece is the smallest bit, and actually also varies as the inverse square of r when r is fairly large. The tangential piece, however, varies only inversely as the distance, so as 1/r. So, yes, the wigglings of er look smaller and smaller, inversely as the distance, but the tangential piece is and remains significant, because it does not vary as 1/r2 but as 1/r only.  That’s why you’ll usually see the law of radiation written in an even simpler way:

final law of radiation

This law reduces the whole effect to the component of the acceleration that is perpendicular to the line of sight only. It assumes the distance is huge as compared to the distance over which the charge is moving and, therefore, that r‘ and r can be equated for all practical purposes. It also notes that the tangential piece is all that matters, and so it equates d2(er)/dtwith ax/r. The whole thing is probably best illustrated as below: we have a generator driving charges up and down in G – so it’s an antenna really – and so we’ll measure a strong signal when putting the radiation detector D in position 1, but we’ll measure nothing in position 3. [The detector is, of course, another antenna, but with an amplifier for the signal.] But so here I am starting to talk about electromagnetic radiation once more, which was not what I wanted to do here, if only because Feynman does a much better job at that than I could ever do. :-)radiator

Traveling fields: the wave equation and its solutions

We’ve climbed a big mountain over the past few weeks, post by post, :-) slowly gaining height, and carefully checking out the various routes to the top. But we are there now: we finally fully understand how Maxwell’s equations actually work. Let me jot them down once more:

Maxwell's equations

As for how real or unreal the E and B fields are, I gave you Feynman’s answer to it, so… Well… I can’t add to that. I should just note, or remind you, that we have a fully equivalent description of it all in terms of the electric and magnetic (vector) potential Φ and A, and so we can ask the same question about Φ and A. They explain real stuff, so they’re real in that sense. That’s what Feynman’s answer amounts to, and I am happy with it. :-)

What I want to do here is show how we can get from those equations to some kind of wave equation: an equation that describes how a field actually travels through space. So… Well… Let’s first look at that very particular wave function we used in the previous post to prove that electromagnetic waves propagate with speed c, i.e. the speed of light. The fields were very simple: the electric field had a y-component only, and the magnetic field a z-component only. Their magnitudes, i.e. their magnitude where the field had reached, as it fills the space traveling outwards, were given in terms of J, i.e. the surface current density going in the positive y-direction, and the geometry of the situation is illustrated below.


sheet of charge

The fields were, obviously, zero where the fields had not reached as they were traveling outwards. And, yes, I know that sounds stupid. But… Well… It’s just to make clear what we’re looking at here. :-)

We also showed how the wave would look like if we would turn off its First Cause after some time T, so if the moving sheet of charge would no longer move after time T. We’d have the following pulse traveling through space, a rectangular shape really:

wavefrontWe can imagine more complicated shapes for the pulse, like the shape shown below. J goes from one unit to two units at time t = t1 and then to zero at t = t2. Now, the illustration on the right shows the electric field as a function of x at the time t shown by the arrow. We’ve seen this before when discussing waves: if the speed of travel of the wave is equal to c, then x is equal to x = c·t, and the pattern is as shown below indeed: it mirrors what happened at the source x/c seconds ago. So we write:

equation 2


This idea of using the retarded time t’ = tx/c in the argument of a wave function f – or, what amounts to the same, using x − c/t – is key to understanding wave functions. I’ve explained this in very simple language in a post for my kids and, if you don’t get this, I recommend you check it out. What we’re doing, basically, is converting something expressed in time units into something expressed in distance units, or vice versa, using the velocity of the wave as the scale factor, so time and distance are both expressed in the same unit, which may be seconds, or meter.

To see how it works, suppose we add some time Δt to the argument of our wave function f, so we’re looking at f[x−c(t+Δt)] now, instead of f(x−ct). Now, f[x−c(t+Δt)] = f(x−ct−cΔt), so we’ll get a different value for our function—obviously! But it’s easy to see that we can restore our wave function F to its former value by also adding some distance Δx = cΔt to the argument. Indeed, if we do so, we get f[x+Δx−c(t+Δt)] = f(x+cΔt–ct−cΔt) = f(x–ct). You’ll say: t − x/c is not the same as x–ct. It is and it isn’t: any function of x–ct is also a function of t − x/c, because we can write:


Here, I need to add something about the direction of travel. The pulse above travel in the positive x-direction, so that’s why we have x minus ct in the argument. For a wave traveling in the negative x-direction, we’ll have a wave function y = F(x+ct). In any case, I can’t dwell on this, so let me move on.

Now, Maxwell’s equations in free or empty space, where are there no charges nor currents to interact with, reduce to:

Maxwell in free space

Now, how can we relate this set of complicated equations to a simple wave function? Let’s do the exercise for our simple Ey and Bz wave. Let’s start by writing out the first equation, i.e. ·E = 0, so we get:


Now, our wave does not vary in the y and z direction, so none of the components, including Ey and Edepend on y or z. It only varies in the x-direction, so ∂Ey/∂y and ∂Ez/∂z are zero. Note that the cross-derivatives ∂Ey/∂z and ∂Ez/∂y are also zero: we’re talking a plane wave here, the field varies only with x. However, because ·E = 0, ∂Ex/∂x must be zero and, hence, Ex must be zero.

Huh? What? How is that possible? You just said that our field does vary in the x-direction! And now you’re saying it doesn’t it? Read carefully. I know it’s complicated business, but it all makes sense. Look at the function: we’re talking Ey, not Ex. Ey does vary as a function of x, but our field does not have an x-component, so Ex = 0. We have no cross-derivative ∂Ey/∂x in the divergence of E (i.e. in ·E = 0).

Huh? What? Let me put it differently. E has three components: Ex, Ey and Ez, and we have three space coordinates: x, y and z, so we have nine cross-derivatives. What I am saying is that all derivatives with respect to y and z are zero. That still leaves us with three derivatives: ∂Ex/∂x, ∂Ey/∂x, and ∂Ey/∂x. So… Because all derivatives in respect to y and z are zero, and because of the ·E = 0 equation, we know that ∂Ex/∂x must be zero. So, to make a long story short, I did not say anything about ∂Ey/∂x or ∂Ez/∂x. These may still be whatever they want to be, and they may vary in more or in less complicated ways. I’ll give an example of that in a moment.

Having said that, I do agree that I was a bit quick in writing that, because ∂Ex/∂x = 0, Ex must be zero too. Looking at the math only, Ex is not necessarily zero: it might be some non-zero constant. So… Yes. That’s a mathematical possibility. The static field from some charged condenser plate would be an example of a constant Ex field. However, the point is that we’re not looking at such static fields here: we’re talking dynamics here, and we’re looking at a particular type of wave: we’re talking a so-called plane wave. Now, the wave front of a plane wave is… Well… A plane. :-) So Ex is zero indeed. It’s a general result for plane waves: the electric field of a plane wave will always be at right angles to the direction of propagation.

Hmm… I can feel your skepticism here. You’ll say I am arbitrarily restricting the field of analysis… Well… Yes. For the moment. It’s not a reasonable restriction though. As I mentioned above, the field of a plane wave may still vary in both the y- and z-directions, as shown in the illustration below (for which the credit goes to Wikipedia), which visualizes the electric field of circularly polarized light. In any case, don’t worry too much about. Let’s get back to the analysis. Just note we’re talking plane waves here. We’ll talk about non-plane waves i.e. incoherent light waves later. :-)

circular polarization

So we have plane waves and, therefore, a so-called transverse E field which we can resolve in two components: Eand Ez. However, we wanted to study a very simply Efield only. Why? Remember the objective of this lesson: it’s just to show how we go from Maxwell’s equations to the wave function, and so let’s keep the analysis simple as we can for now: we can make it more general later. In fact, if we do the analysis now for non-zero Eand zero Ez, we can do a similar analysis for non-zero Eand zero Ey, and the general solution is going to be some superposition of two such fields, so we’ll have a non-zero Eand Ez. Capito? :-) So let me write out Maxwell’s second equation, and use the results we got above, so I’ll incorporate the zero values for the derivatives with respect to y and z, and also the assumption that Ez is zero. So we get:

f3[By the way: note that, out of the nine derivatives, the curl involves only the (six) cross-derivatives. That’s linked to the neat separation between the curl and the divergence operator. Math is great! :-)]

Now, because of the flux rule (×E = –∂B/∂t), we can (and should) equate the three components of ×E above with the three components of –∂B/∂t, so we get:


[In case you wonder what it is that I am trying to do, patience, please! We’ll get where we want to get. Just hang in there and read on.] Now, ∂Bx/∂t = 0 and ∂By/∂t = 0 do not necessarily imply that Bx and Bare zero: there might be some magnets and, hence, we may have some constant static field. However, that’s a matter of choosing a reference point or, more simply, assuming that empty space is effectively empty, and so we don’t have magnets lying around and so we assume that Bx and Bare effectively zero. [Again, we can always throw more stuff in when our analysis is finished, but let’s keep it simple and stupid right now, especially because the Bx = B= 0 is entirely in line with the Ex = E= 0 assumption.]

The equations above tell us what we know already: the E and B fields are at right angles to each other. However, note, once again, that this is a more general result for all plane electromagnetic waves, so it’s not only that very special caterpillar or butterfly field that we’re looking at it. [If you didn’t read my previous post, you won’t get the pun, but don’t worry about it. You need to understand the equations, not the silly jokes.]

OK. We’re almost there. Now we need Maxwell’s last equation. When we write it out, we get the following monstrously looking set of equations:


However, because of all of the equations involving zeroes above :-) only ∂Bz/∂x is not equal to zero, so the whole set reduced to only simple equation only:


Simplifying assumptions are great, aren’t they? :-) Having said that, it’s easy to be confused. You should watch out for the denominators: a ∂x and a ∂t are two very different things. So we have two equations now involving first-order derivatives:

  1. ∂Bz/∂t = −∂Ey/∂x
  2. c2∂Bz/∂x = −∂Ey/∂t

So what? Patience, please!  :-) Let’s differentiate the first equation with respect to x and the second with respect to t. Why? Because… Well… You’ll see. Don’t complain. It’s simple. Just do it. We get:

  1. ∂[∂Bz/∂t]/∂x = −∂2Ey/∂x2
  2. ∂[−c2∂Bz/∂x]/∂t = −∂2Ey/∂x2

So we can equate the left-hand sides of our two equations now, and what we get is a differential equation of the second order that we’ve encountered already, when we were studying wave equations. In fact, it is the wave equation for one-dimensional waves:

f7In case you want to double-check, I did a few posts on this, but, if you don’t get this, well… I am sorry. You’ll need to do some homework. More in particular, you’ll need to do some homework on differential equations. The equation above is basically some constraint on the functional form of Ey. More in general, if we see an equation like:


then the function ψ(x, t) must be some function


So any function ψ like that will work. You can check it out by doing the necessary derivatives and plug them into the wave equation. [In case you wonder how you should go about this, Feynman actually does it for you in his Lecture on this topic, so you may want to check it there.]

In fact, the functions f(x − c/t) and g(x + c/t) themselves will also work as possible solutions. So we can drop one or the other, which amounts to saying that our ‘shape’ has to travel in some direction, rather than in both at the same time. :-) Indeed, from all of my explanations above, you know what f(x − c/t) represents: it’s a wave that travels in the positive x-direction. Now, it may be periodic, but it doesn’t have to be periodic. The f(x − c/t) function could represent any constant ‘shape’ that’s traveling in the positive x-direction at speed c. Likewise, the g(x + c/t) function could represent any constant ‘shape’ that’s traveling in the negative x-direction at speed c. As for super-imposing both…

Well… I suggest you check that post I wrote for my son, Vincent. It’s on the math of waves, but it doesn’t have derivatives and/or differential equations. It just explains how superimposition and all that works. It’s not very abstract, as it revolves around a vibrating guitar string. So, if you have trouble with all of the above, you may want to read that first. :-) The bottom line is that we can get any wavefunction we want by superimposing simple sinusoidals that are traveling in one or the other direction, and so that’s what’s the more general solution really says. Full stop. So that’s what’s we’re doing really: we add very simple waves to get very more complicated waveforms. :-)

Now, I could leave it at this, but then it’s very easy to just go one step further, and that is to assume that Eand, therefore, Bare not zero. It’s just a matter of super-imposing solutions. Let me just give you the general solution. Just look at it for a while. If you understood all that I’ve said above, 20 seconds or so should be sufficient to say: “Yes, that makes sense. That’s the solution in two dimensions.” At least, I hope so! :-)

General solution two dimensions

OK. I should really stop now. But… Well… Now that we’ve got a general solution for all plane waves, why not be even bolder and think about what we could possibly say about three-dimensional waves? So then Eand, therefore, Bwould not necessarily be zero either. After all, light can behave that way. In fact, light is likely to be non-polarized and, hence, Eand, therefore, Bare most probably not equal to zero!

Now, you may think the analysis is going to be terribly complicated. And you’re right. It would be if we’d stick to our analysis in terms of x, y and z coordinates. However, it turns out that the analysis in terms of vector equations is actually quite straightforward. I’ll just copy the Master here, so you can see His Greatness. :-)

waves in three dimensions

But what solution does an equation like (20.27) have? We can appreciate it’s actually three equations, i.e. one for each component, and so… Well… Hmm… What can we say about that? I’ll quote the Master on this too:

“How shall we find the general wave solution? The answer is that all the solutions of the three-dimensional wave equation can be represented as a superposition of the one-dimensional solutions we have already found. We obtained the equation for waves which move in the x-direction by supposing that the field did not depend on y and z. Obviously, there are other solutions in which the fields do not depend on x and z, representing waves going in the y-direction. Then there are solutions which do not depend on x and y, representing waves travelling in the z-direction. Or in general, since we have written our equations in vector form, the three-dimensional wave equation can have solutions which are plane waves moving in any direction at all. Again, since the equations are linear, we may have simultaneously as many plane waves as we wish, travelling in as many different directions. Thus the most general solution of the three-dimensional wave equation is a superposition of all sorts of plane waves moving in all sorts of directions.”

It’s the same thing once more: we add very simple waves to get very more complicated waveforms. :-)

You must have fallen asleep by now or, else, be watching something else. Feynman must have felt the same. After explaining all of the nitty-gritty above, Feynman wakes up his students. He does so by appealing to their imagination:

“Try to imagine what the electric and magnetic fields look like at present in the space in this lecture room. First of all, there is a steady magnetic field; it comes from the currents in the interior of the earth—that is, the earth’s steady magnetic field. Then there are some irregular, nearly static electric fields produced perhaps by electric charges generated by friction as various people move about in their chairs and rub their coat sleeves against the chair arms. Then there are other magnetic fields produced by oscillating currents in the electrical wiring—fields which vary at a frequency of 6060 cycles per second, in synchronism with the generator at Boulder Dam. But more interesting are the electric and magnetic fields varying at much higher frequencies. For instance, as light travels from window to floor and wall to wall, there are little wiggles of the electric and magnetic fields moving along at 186,000 miles per second. Then there are also infrared waves travelling from the warm foreheads to the cold blackboard. And we have forgotten the ultraviolet light, the x-rays, and the radiowaves travelling through the room.

Flying across the room are electromagnetic waves which carry music of a jazz band. There are waves modulated by a series of impulses representing pictures of events going on in other parts of the world, or of imaginary aspirins dissolving in imaginary stomachs. To demonstrate the reality of these waves it is only necessary to turn on electronic equipment that converts these waves into pictures and sounds.

If we go into further detail to analyze even the smallest wiggles, there are tiny electromagnetic waves that have come into the room from enormous distances. There are now tiny oscillations of the electric field, whose crests are separated by a distance of one foot, that have come from millions of miles away, transmitted to the earth from the Mariner II space craft which has just passed Venus. Its signals carry summaries of information it has picked up about the planets (information obtained from electromagnetic waves that travelled from the planet to the space craft).

There are very tiny wiggles of the electric and magnetic fields that are waves which originated billions of light years away—from galaxies in the remotest corners of the universe. That this is true has been found by “filling the room with wires”—by building antennas as large as this room. Such radiowaves have been detected from places in space beyond the range of the greatest optical telescopes. Even they, the optical telescopes, are simply gatherers of electromagnetic waves. What we call the stars are only inferences, inferences drawn from the only physical reality we have yet gotten from them—from a careful study of the unendingly complex undulations of the electric and magnetic fields reaching us on earth.

There is, of course, more: the fields produced by lightning miles away, the fields of the charged cosmic ray particles as they zip through the room, and more, and more. What a complicated thing is the electric field in the space around you! Yet it always satisfies the three-dimensional wave equation.”

So… Well… That’s it for today, folks. :-) We have some more gymnastics to do, still… But we’re really there. Or here, I should say: on top of the peak. What a view we have here! Isn’t it beautiful? It took us quite some effort to get on top of this thing, and we’re still trying to catch our breath as we struggle with what we’ve learned so far, but it’s really worthwhile, isn’t it? :-)

Maxwell’s equations and the speed of light

We know how electromagnetic waves travel through space: they do so because of the mechanism described in Maxwell’s equation: a changing magnetic field causes a changing electric field, and a changing magnetic field causes a (changing) electric field, as illustrated below.

Maxwell interaction

So we need some First Cause to get it all started :-) i.e. some current, i.e. some moving charge, but then the electromagnetic wave travels, all by itself, through empty space, completely detached from the cause. You know that by now – indeed, you’ve heard this a thousand times before – but, if you’re reading this, you want to know how it works exactly. :-)

In my post on the Lorentz gauge, I included a few links to Feynman’s Lectures that explain the nitty-gritty of this mechanism from various angles. However, they’re pretty horrendous to read, and so I just want to summarize them a bit—if only for myself, so as to remind myself what’s important and not. In this post, I’ll focus on the speed of light: why do electromagnetic waves – light – travel at the speed of light?

You’ll immediately say: that’s a nonsensical question. It’s light, so it travels at the speed of light. Sure, smart-arse! Let me be more precise: how can we relate the speed of light to Maxwell’s equations? That is the question here. Let’s go for it.

Feynman deals with the matter of the speed of an electromagnetic wave, and the speed of light, in a rather complicated exposé on the fields from some infinite sheet of charge that is suddenly set into motion, parallel to itself, as shown below. The situation looks – and actually is – very simple, but the math is rather messy because of the rather exotic assumptions: infinite sheets and infinite acceleration are not easy to deal with. :-) But so the whole point of the exposé is just to prove that the speed of propagation (v) of the electric and magnetic fields is equal to the speed of light (c), and it does a marvelous job at that. So let’s focus on that here only. So what I am saying is that I am going to leave out most of the nitty-gritty and just try to get to that v = result as fast as I possibly can. So, fasten your seat belt, please.

sheet of charge

Most of the nitty-gritty in Feynman’s exposé is about how to determine the direction and magnitude of the electric and magnetic fields, i.e. E and B. Now, when the nitty-gritty business is finished, the grand conclusion is that both E and B travel out in both the positive as well as the negative x-direction at some speed v and sort of ‘fill’ the entire space as they do. Now, the region they are filling extends infinitely far in both the y- and z-direction but, because they travel along the x-axis, there are no fields (yet) in the region beyond x = ± v·t (t = 0 is the moment when the sheet started moving, and it moves in the positive y-direction). As you can see, the sheet of charge fills the yz-plane, and the assumption is that its speed goes from zero to u instantaneously, or very very quickly at least. So the E and B fields move out like a tidal wave, as illustrated below, and thereby ‘fill’ the space indeed, as they move out.

tidal wave

The magnitude of E and B is constant, but it’s not the same constant, and part of the exercise here is to determine the relationship between the two constants. As for their direction, you can see it in the first illustration: B points in the negative z-direction for x > 0 and in the positive z-direction for x < 0, while E‘s direction is opposite to u‘s direction everywhere, so E points in the negative y-direction. As said, you should just take my word for it, because the nitty-gritty on this – which we do not want to deal with here – is all in Feynman and so I don’t want to copy that.

The crux of the argument revolves around what happens at the wavefront itself, as it travels out. Feynman relates flux and circulation there. It’s the typical thing to do: it’s at the wavefront itself that the fields change: before they were zero, and now they are equal to that constant. The fields do not change anywhere else, so there’s no changing flux or circulation business to be analyzed anywhere else. So we define two loops at the wavefront itself: Γ1 and Γ2. They are normal to each other (cf. the top and side view of the situation below), because the E and B fields are normal to each other. And so then we use Maxwell’s equations to check out what happens with the flux and circulation there and conclude what needs to be concluded. :-)

top view side view

We start with rectangle Γ2. So one side is in the region where there are fields, and one side is in the region where the fields haven’t reached yet. There is some magnetic flux through this loop, and it is changing, so there is an emf around it, i.e. some circulation of E. The flux changes because the area in which B exists increases at speed v. Now, the time rate of change of the flux is, obviously, the width of the rectangle L times the rate of change of the area, so that’s (B·L·v·Δt)/Δt = B·L·v, with Δt some differential time interval co-defining how slow or how fast the field changes. Now, according to Faraday’s Law (see my previous post), this will be equal to minus the line integral of E around Γ2, which is E·L. So E·L = B·L·v and, hence, we find: E = v·B.

Interesting! To satisfy Faraday’s equation (which is just one of Maxwell’s equations in integral rather than in differential form), E must equal B times v, with v the speed of propagation of our ‘tidal’ wave. Now let’s look at Γ1. There we should apply:

IntegralNow the line integral is just B·L, and the right-hand side is E·L·v, so, not forgetting that c2 in front—i.e. the square of the speed of light, as you know!—we get: c2B = E·v, or E = (c2/v)·B. 

Now, the E = v·B and E = (c2/v)·B equations must both apply (we’re talking one wave and one and the same phenomenon) and, obviously, that’s only possible if v = c2/v, i.e. if v = c. So the wavefront must travel at the speed of light! Waw ! That’s fast. :-) Yes. […] Jokes aside, that’s the result we wanted here: we just proved that the speed of travel of an electromagnetic wave must be equal to the speed of light.

As an added bonus, we also showed the mechanism of travel. It’s obvious from the equations we used to prove the result: it works through the derivatives of the fields with respect to time, i.e. ∂E/∂t and ∂B/∂t.

Done! Great! Enjoy the view!

Well… Yes and no. If you’re smart, you’ll say: we got this result because of the c2 factor in that equation, so Maxwell had already put it in, so to speak. Waw! You really are a smart-arse, aren’t you? :-)

The thing is… Well… The answer is: no. Maxwell did not put it in. Well… Yes and no. Let me explain. Maxwell’s first equation was the electric flux law ·E = σ/ε0: the flux of E through a closed surface is proportional to the charge inside. So that’s basically an other way of writing Coulomb’s Law, and ε0 was just some constant in it, the electric constant. So it’s a constant of proportionality that depends on the unit in which we measure electric charge. The only reason that it’s there is to make the units come out alright, so if we’d measure charge not in coulomb (C) in a unit equal to 1 C/ε0, it would disappear. If we’d do that, our new unit would be equivalent to the charge of some 700,000 protons. You can figure that magical number yourself by checking the values of the proton charge and ε0. :-)

OK. And then Faraday came up with the exact laws for magnetism, and they involved current and some other constant of proportionality, and Maxwell formalized that by writing ×B = μ0j, with μ0 the magnetic constant. It’s not a flux law but a circulation law: currents cause circulation of B. We get the flux rule from it by integrating it. But currents are moving charges, and so Maxwell knew magnetism was related to the same thing: electric charge. So Maxwell knew the two constants had to be related. In fact, when putting the full set of equations together – there are four, as you know – Maxwell figured out that μtimes εwould have to be equal to the reciprocal of c2, with the speed of propagation of the wave. So Maxwell knew that, whatever the unit of charge, we’d get two constants of proportionality, and electric and a magnetic constant, and that μ0·εwould be equal to 1/c2. However, while he knew that, at the time, light and electromagnetism were considered to be separate phenomena, and so Maxwell did not say that c was the speed of light: the only thing his equations told him was that is the speed of propagation of that ‘electromagnetic’ wave that came out of his equations.

The rest is history. In 1856, the great Wilhelm Eduard Weber – you’ve seen his name before, didn’t you? – did a whole bunch of experiments which measured the electric constant rather precisely, and Maxwell jumped on it and calculated all the rest, i.e. μ0, and so then he took the reciprocal of the square root of μ0·εand – Bang! – he had c, the speed of propagation of the electromagnetic wave he was thinking of. Now, was some value of the order of 3×108 m/s, and so that happened to be the same as the speed of light, which suggested that Maxwell’s c and the speed of light were actually one and the same thing!

Now, I am a smart-arse too :-) and, hence, when I first heard this story, I actually wondered how Maxwell could possibly know the speed of light at the time: Maxwell died many years before the Michelson-Morley experiment unequivocally established the value of the speed of light. [In case, you wonder: the Michelson-Morley experiment was done in 1887. So I check it. The fact is that the Michelson-Morley experiment concluded that the speed of light was an absolute value and that, in the process of doing so, they got a rather precise value for it, but the value of itself has already been established, more or less, that is, by a Danish astronomer, Ole Römer, in 1676 ! He did so by carefully observing the timing of the repeating eclipses of Io, one of Jupiter’s moons. Newton mentioned his results in his Principia, which he wrote in 1687, duly noting that it takes about seven to eight minutes for light to travel from the Sun to the Earth. Done! The whole story is fascinating, really, so you should check it out yourself. :-)

In any case, to make a long story short, Maxwell was puzzled by this mysterious coincidence, but he was bold enough to immediately point to the right conclusion, tentatively at least, and so he told the Cambridge Philosophical Society, in the very same year, i.e. 1856, that “we can scarcely avoid the inference that light consists in the transverse undulations of the same medium which is the cause of electric and magnetic phenomena.”

So… Well… Maxwell still suggests light needs some medium here, so the ‘medium’ is a reference to the infamous aether theory, but that’s not the point: what he says here is what we all take for granted now: light is an electromagnetic wave. So now we know there’s absolute no reason whatsoever to avoid the ‘inference’, but… Well… 160 years ago, it was quite a big deal to suggest something like that. :-)

So that’s the full story. I hoped you like it. Don’t underestimate what you just did: understanding an argument like this is like “climbing a great peak”, as Feynman puts it. So it is “a great moment” indeed. :-) The only thing left is, perhaps, to explain the ‘other’ flux rules I used above. Indeed, you know Faraday’s Law:


But that other one? Well… As I explained in my previous post, Faraday’s Law is the integral form of Maxwell’s second equation: −∂B/∂t = ×E. The ‘other’ flux rule above – so that’s the one with the c2 in front and without a minus sign, is the integral form of Maxwell’s fourth equation: c2×= j+ ∂E/∂t, taking into account that we’re talking a wave traveling in free space, so there are no charges and currents (it’s just a wave in empty space—whatever that means) and, hence, the Maxwell equation reduces to c2×= ∂E/∂t. Now, I could take you through the same gymnastics as I did in my previous post but, if I were you, I’d just apply the general principle that ”the same equations must yield the same solutions” and so I’d just switch E for B and vice versa in Faraday’s equation. :-)

So we’re done… Well… Perhaps one more thing. We’ve got these flux rules above telling us that the electromagnetic wave will travel all by itself, through empty space, completely detached from its First Cause. But… […] Well… Again you may think there’s some trick here. In other words, you may think the wavefront has to remain connected to the First Cause somehow, just like the whip below is connected to some person whipping it. :-)


There’s no such connection. The whip is not needed. :-) If we’d switch off the First Cause after some time T, so our moving sheet stops moving, then we’d have the pulse below traveling through empty space. As Feynman puts it: “The fields have taken off: they are freely propagating through space, no longer connected in any way with the source. The caterpillar has turned into a butterfly! 


Now, the last question is always the same: what are those fields? What’s their reality? Here, I should refer you to one of the most delightful sections in Feynman’s Lectures. It’s on the scientific imagination. I’ll just quote the introduction to it, but I warmly recommend you go and check it out for yourself: it has no formulas whatsoever, and so you should understand all of it without any problem at all. :-)

“I have asked you to imagine these electric and magnetic fields. What do you do? Do you know how? How do I imagine the electric and magnetic field? What do I actually see? What are the demands of scientific imagination? Is it any different from trying to imagine that the room is full of invisible angels? No, it is not like imagining invisible angels. It requires a much higher degree of imagination to understand the electromagnetic field than to understand invisible angels. Why? Because to make invisible angels understandable, all I have to do is to alter their properties a little bit—I make them slightly visible, and then I can see the shapes of their wings, and bodies, and halos. Once I succeed in imagining a visible angel, the abstraction required—which is to take almost invisible angels and imagine them completely invisible—is relatively easy. So you say, “Professor, please give me an approximate description of the electromagnetic waves, even though it may be slightly inaccurate, so that I too can see them as well as I can see almost invisible angels. Then I will modify the picture to the necessary abstraction.”

I’m sorry I can’t do that for you. I don’t know how. I have no picture of this electromagnetic field that is in any sense accurate. I have known about the electromagnetic field a long time—I was in the same position 25 years ago that you are now, and I have had 25 years more of experience thinking about these wiggling waves. When I start describing the magnetic field moving through space, I speak of the and fields and wave my arms and you may imagine that I can see them. I’ll tell you what I see. I see some kind of vague shadowy, wiggling lines—here and there is an E and a B written on them somehow, and perhaps some of the lines have arrows on them—an arrow here or there which disappears when I look too closely at it. When I talk about the fields swishing through space, I have a terrible confusion between the symbols I use to describe the objects and the objects themselves. I cannot really make a picture that is even nearly like the true waves. So if you have some difficulty in making such a picture, you should not be worried that your difficulty is unusual.

Our science makes terrific demands on the imagination. The degree of imagination that is required is much more extreme than that required for some of the ancient ideas. The modern ideas are much harder to imagine. We use a lot of tools, though. We use mathematical equations and rules, and make a lot of pictures. What I realize now is that when I talk about the electromagnetic field in space, I see some kind of a superposition of all of the diagrams which I’ve ever seen drawn about them. I don’t see little bundles of field lines running about because it worries me that if I ran at a different speed the bundles would disappear, I don’t even always see the electric and magnetic fields because sometimes I think I should have made a picture with the vector potential and the scalar potential, for those were perhaps the more physically significant things that were wiggling.

Perhaps the only hope, you say, is to take a mathematical view. Now what is a mathematical view? From a mathematical view, there is an electric field vector and a magnetic field vector at every point in space; that is, there are six numbers associated with every point. Can you imagine six numbers associated with each point in space? That’s too hard. Can you imagine even one number associated with every point? I cannot! I can imagine such a thing as the temperature at every point in space. That seems to be understandable. There is a hotness and coldness that varies from place to place. But I honestly do not understand the idea of a number at every point.

So perhaps we should put the question: Can we represent the electric field by something more like a temperature, say like the displacement of a piece of jello? Suppose that we were to begin by imagining that the world was filled with thin jello and that the fields represented some distortion—say a stretching or twisting—of the jello. Then we could visualize the field. After we “see” what it is like we could abstract the jello away. For many years that’s what people tried to do. Maxwell, Ampère, Faraday, and others tried to understand electromagnetism this way. (Sometimes they called the abstract jello “ether.”) But it turned out that the attempt to imagine the electromagnetic field in that way was really standing in the way of progress. We are unfortunately limited to abstractions, to using instruments to detect the field, to using mathematical symbols to describe the field, etc. But nevertheless, in some sense the fields are real, because after we are all finished fiddling around with mathematical equations—with or without making pictures and drawings or trying to visualize the thing—we can still make the instruments detect the signals from Mariner II and find out about galaxies a billion miles away, and so on.

The whole question of imagination in science is often misunderstood by people in other disciplines. They try to test our imagination in the following way. They say, “Here is a picture of some people in a situation. What do you imagine will happen next?” When we say, “I can’t imagine,” they may think we have a weak imagination. They overlook the fact that whatever we are allowed to imagine in science must be consistent with everything else we know: that the electric fields and the waves we talk about are not just some happy thoughts which we are free to make as we wish, but ideas which must be consistent with all the laws of physics we know. We can’t allow ourselves to seriously imagine things which are obviously in contradiction to the known laws of nature. And so our kind of imagination is quite a difficult game. One has to have the imagination to think of something that has never been seen before, never been heard of before. At the same time the thoughts are restricted in a strait jacket, so to speak, limited by the conditions that come from our knowledge of the way nature really is. The problem of creating something which is new, but which is consistent with everything which has been seen before, is one of extreme difficulty.”

Isn’t that great? I mean: Feynman, one of the greatest physicists of all time, didn’t write what he wrote above when he was a undergrad student or so. No. He did so in 1964, when he was 45 years old, at the height of his scientific career! And it gets better, because Feynman then starts talking about beauty. What is beauty in science? Well… Just click and check what Feynman thinks about it. :-)

Oh… Last thing. So what is the magnitude of the E and B field? Well… You can work it out yourself, but I’ll give you the answer. The geometry of the situation makes it clear that the electric field has a y-component only, and the magnetic field a z-component only. Their magnitudes are given in terms of J, i.e. the surface current density going in the positive y-direction:


An introduction to electric circuits

In my previous post,I introduced electric motors, generators and transformers. They all work because of Faraday’s flux rule: a changing magnetic flux will produce some circulation of the electric field. The formula for the flux rule is given below:


It is a wonderful thing, really, but not easy to grasp intuitively. It’s one of these equations where I should quote Feynman’s introduction to electromagnetism: “The laws of Newton were very simple to write down, but they had a lot of complicated consequences and it took us a long time to learn about them all. The laws of electromagnetism are not nearly as simple to write down, which means that the consequences are going to be more elaborate and it will take us quite a lot of time to figure them all out.”

Now, among Maxwell’s Laws, this is surely the most complicated one! However, that shouldn’t deter us. :-) Recalling Stokes’ Theorem helps to appreciate what the integral on the left-hand side represents:

Stokes theorem

We’ve got a line integral around some closed loop Γ on the left and, on the right, we’ve got a surface integral over some surface S whose boundary is Γ. The illustration below depicts the geometry of the situation. You know what it all means. If not, I am afraid I have to send you back to square one, i.e. my posts on vector analysis. Yep. Sorry. Can’t keep copying stuff and make my posts longer and longer. :-)

Diagram stokesTo understand the flux rule, you should imagine that the loop Γ is some loop of electric wire, and then you just replace C by E, the electric field vector. The circulation of E, which is caused by the change in magnetic flux, is referred to as the electromotive force (emf), and it’s the tangential force (E·ds) per unit charge in the wire integrated over its entire length around the loop, which is denoted by Γ here, and which encloses a surface S.

Now, you can go from the line integral to the surface integral by noting Maxwell’s Law: −∂B/∂t = ×E. In fact, it’s the same flux rule really, but in differential form. As for (×E)n, i.e. the component of ×E that is normal to the surface, you know that any vector multiplied with the normal unit vector will yield its normal component. In any case, if you’re reading this, you should already be acquainted with all of this. Let’s explore the concept of the electromotive force, and then apply it our first electric circuit. :-)

Indeed, it’s now time for a small series on circuits, and so we’ll start right here and right now, but… Well… First things first. :-)

The electromotive force: concept and units

The term ‘force’ in ‘electromotive force’ is actually somewhat misleading. There is a force involved, of course, but the emf is not a force. The emf is expressed in volts. That’s consistent with its definition as the circulation of E: a force times a distance amounts to work, or energy (one joule is one newton·meter), and because E is the force on a unit charge, the circulation of E is expressed in joule per coulomb, so that’s a voltage: 1 volt = 1 joule/coulomb. Hence, on the left-hand side of Faraday’s equation, we don’t have any dimension of time: it’s energy per unit charge, so it’s x joule per coulomb . Full stop.

On the right-hand side, however, we have the time rate of change of the magnetic flux. through the surface S. The magnetic flux is a surface integral, and so it’s a quantity expressed in [B]·m2, with [B] the measurement unit for the magnetic field strength. The time rate of change of the flux is then, of course, expressed in [B]·mper second, i.e. [B]·m2/s. Now what is the unit for the magnetic field strength B, which we denoted by [B]?

Well… [B] is a bit of a special unit: it is not measured as some force per unit charge, i.e. in newton per coulomb, like the electric field strength E. No. [B] is measured in (N/C)/(m/s). Why? Because the magnetic force is not F = qE but F = qv×B. Hence, so as to make the units come out alright, we need to express B in (N·s)/(C·m), which is a unit known as the tesla (1 T = N·s/C·m), so as to honor the Serbian-American genius Nikola Tesla. [I know it’s a bit of short and dumb answer, but the complete answer is quite complicated: it’s got to do with the relativity of the magnetic force, which I explained in another post: both the v in F = qv×B equation as well as the m/s unit in [B] should make you think: whose velocity? In which reference frame? But that’s something I can’t summarize in two lines, so just click the link if you want to know more. I need to get back to the lesson.]

Now that we’re talking units, I should note that the unit of flux also got a special name, the weber, so as to honor one of Germany’s most famous physicists, Wilhelm Eduard Weber: as you might expect, 1 Wb = 1 T·m2. But don’t worry about these strange names. Besides the units you know, like the joule and the newton, I’ll only use the volt, which got its name to honor some other physicist, Alessandro Volta, the inventor of the electrical battery. Or… Well… I might mention the watt as well at some point… :-)

So how does it work? On one side, we have something expressed per second – so that’s per unit time – and on the other we have something that’s expressed per coulomb – so that’s per unit charge. The link between the two is the power, so that’s the time rate of doing work. It’s expressed in joule per second. So… Well… Yes. Here we go: in honor of yet another genius, James Watt, the unit of power got its own special name too: the watt. :-) In the argument below, I’ll show that the power that is being generated by a generator, and that is being consumed in the circuit (through resistive heating, for example, or whatever else taking energy out of the circuit) is equal to the emf times the current. For the moment, however, I’ll just assume you believe me. :-)

We need to look at the whole circuit now, indeed, in which our little generator (i.e. our loop or coil of wire) is just one of the circuit elements. The units come out alright: the poweremf·current product is expressed in volt·coulomb/second = (joule/coulomb)·(coulomb/second) = joule/second. So, yes, it looks OK. But what’s going on really? How does it work, literally?

A short digression: on Ohm’s Law and electric power

Well… Let me first recall the basic concepts involved which, believe it or not, are probably easiest to explain by briefly recalling Ohm’s Law, which you’ll surely remember from your high-school physics classes. It’s quite simple really: we have some resistance in a little circuit, so that’s something that resists the passage of electric current, and then we also have a voltage source. Now, Ohm’s Law tells us that the ratio of (i) the voltage V across the resistance (so that’s between the two points marked as + and −) and (ii) the current I will be some constant. It’s the same as saying that V and I are inversely proportional to each other.  The constant of proportionality is referred to as the resistance itself and, while it’s often looked at as a property of the circuit itself, we may embody it in a circuit element itself: a resistor, as shown below.


So we write R = V/I, and the brief presentation above should remind you of the capacity of a capacitor, which was just another constant of proportionality. Indeed, instead of feeding a resistor (so all energy gets dissipated away), we could charge a capacitor with a voltage source, so that’s a energy storage device, and then we find that the ratio between (i) the charge on the capacitor and (ii) the voltage across the capacitor was a constant too, which we defined as the capacity of the capacitor, and so we wrote C = Q/V. So, yes, another constant of proportionality (there are many in electricity!).

In any case, the point is: to increase the current in the circuit above, you need to increase the voltage, but increasing both amounts to increasing the power that’s being consumed in the circuit, because the power is voltage times current indeed, so P = V·I (or v·i, if I use the small letters that are used in the two animations below). For example, if we’d want to double the current, we’d need to double the voltage, and so we’re quadrupling the power: (2·V)·(2·I) = 22·V·I. So we have a square-cube law for the power, which we get by substituting V for R·I or by substituting I for V/R, so we can write the power P as P = V2/R = I2·R. This square-cube law says exactly the same: if you want to double the voltage or the current, you’ll actually have to double both and, hence, you’ll quadruple the power. Now let’s look at the animations below (for which credit must go to Wikipedia).

Electric_power_source_animation_1 Electric_load_animation_2

They show how energy is being used in an electric circuit in  terms of power. [Note that the little moving pluses are in line with the convention that a current is defined as the movement of positive charges, so we write I = dQ/dt instead of I = −dQ/dt. That also explains the direction of the field line E, which has been added to show that the power source effectively moves charges against the field and, hence, against the electric force.] What we have here is that, on one side of the circuit, some generator or voltage source will create an emf pushing the charges, and then some load will consume their energy, so they lose their push. So power, i.e. energy per unit time, is supplied, and is then consumed.

Back to the emf…

Now, I mentioned that the emf is a ratio of two terms: the numerator is expressed in joule, and the denominator is expressed in coulomb. So you might think we’ve got some trade-off here—something like: if we double the energy of half of the individual charges, then we still get the same emf. Or vice versa: we could, perhaps, double the number of charges and load them with only half the energy. One thing is for sure: we can’t both.

Hmm… Well… Let’s have a look at this line of reasoning by writing it down more formally.

  1. The time rate of change of the magnetic flux generates some emf, which we can and should think of as a property of the loop or the coil of wire in which it is being generated. Indeed, the magnetic flux through it depends on its orientation, its size, and its shape. So it’s really very much like the capacity of a capacitor or the resistance of a conductor. So we write: emf = Δ(flux)/Δt. [In fact, the induced emf tries to oppose the change in flux, so I should add the minus sign, but you get the idea.]
  2. For a uniform magnetic field, the flux is equal to the field strength B times the surface area S. [To be precise, we need to take the normal component of B, so the flux is B·S = B·S·cosθ.]  So the flux can change because of a change in B or because of a change in S, or because of both.
  3. The emf = Δ(flux)/Δt formula makes it clear that a very slow change in flux (i.e. the same Δ(flux) over a much larger Δt) will generate little emf. In contrast, a very fast change (i.e. the the same Δ(flux) over a much smaller Δt) will produce a lot of emf. So, in that sense, emf is not like the capacity or resistance, because it’s variable: it depends on Δ(flux), as well as on Δt. However, you should still think of it as a property of the loop or the ‘generator’ we’re talking about here.
  4. Now, the power that is being produced or consumed in the circuit in which our ‘generator’ is just one of the elements, is equal to the emf times the current. The power is the time rate of change of the energy, and the energy is the work that’s being done in the circuit (which I’ll denote by ΔU), so we write: emf·current = ΔU/Δt.
  5. Now, the current is equal to the time rate of change of the charge, so I = ΔQ/Δt. Hence, the emf is equal to emf = (ΔU/Δt)/I = (ΔU/Δt)/(ΔQ/Δt) = ΔU/ΔQ. From this, it follows that: emf = Δ(flux)/Δt = ΔU/ΔQ, which we can re-write as:

Δ(flux) = ΔU·Δt/ΔQ

What this says is the following. For a given amount of change in the magnetic flux (so we treat Δ(flux) as constant in the equation above), we could do more work on the same charge (ΔQ) – we could double ΔU by moving the same charge over a potential difference that’s twice as large, for example – but then Δt must be cut in half. So the same change in magnetic flux can do twice as much work if the change happens in half of the time.

Now, does that mean the current is being doubled? We’re talking the same ΔQ and half the Δt, so… Well? No. The Δt here measures the time of the flux change, so it’s not the dt in I = dQ/dt. For the current to change, we’d need to move the same charge faster, i.e. over a larger distance over the same time. We didn’t say we’d do that above: we only said we’d move the charge across a larger potential difference: we didn’t say we’d change the distance over which they are moved.

OK. That makes sense. But we’re not quite finished. Let’s first try something else, to then come back to where we are right now via some other way. :-) Can we change ΔQ? Here we need to look at the physics behind. What’s happening really is that the change in magnetic flux causes an induced current which consists of the free electrons in the Γ loop. So we have electrons moving in and out of our loop, and through the whole circuit really, but so there’s only so many free electrons per unit length in the wire. However, if we would effectively double the voltage, then their speed will effectively increase proportionally, so we’ll have more of them passing through per second. Now that effect surely impacts the current. It’s what we wrote above: all other things being the same, including the resistance, then we’ll also double the current as we double the voltage.

So where is that effect in the flux rule? The answer is: it isn’t there. The circulation of E around the loop is what it is: it’s some energy per unit charge. Not per unit time. So our flux rule gives us a voltage, which tells us that we’re going to have some push on the charges in the wire, but it doesn’t tell us anything about the current. To know the current, we must know the velocity of the moving charges, which we can calculate from the push if we also get some other information (such as the resistance involved, for instance), but so it’s not there in the formula of the flux rule. You’ll protest: there is a Δt on the right-hand side! Yes, that’s true. But it’s not the Δt in the v = Δs/Δt equation for our charges. Full stop.

Hmm… I may have lost you by now. If not, please continue reading. Let me drive the point home by asking another question. Think about the following: we can re-write that Δ(flux) = ΔU·Δt/ΔQ equation above as Δ(flux) = (ΔU/ΔQ)·Δt equation. Now, does that imply that, with the same change in flux, i.e. the same Δ(flux), and, importantly, for the same Δt, we could double both ΔU as well as ΔQ? I mean: (2·ΔU)/(2·ΔQ) = ΔU/ΔQ and so the equation holds, mathematically that is. […] Think about it.

You should shake your head now, and rightly so, because, while the Δ(flux) = (ΔU/ΔQ)·Δt equation suggests that would be possible, it’s totally counter-intuitive. We’re changing nothing in the real world (what happens there is the same change of flux in the same amount of time), but so we’d get twice the energy and twice the charge ?! Of course, we could also put a 3 there, or 20,000, or minus a million. So who decides on what we get? You get the point: it is, indeed, not possible. Again, what we can change is the speed of the free electrons, but not their number, and to change their speed, you’ll need to do more work, and so the reality is that we’re always looking at the same ΔQ, so if we want a larger ΔU, then we’ll need a larger change in flux, or we a shorter Δt during which that change in flux is happening.

So what can we do? We can change the physics of the situation. We can do so in many ways, like we could change the length of the loop, or its shape. One particularly interesting thing to do would be to increase the number of loops, so instead of one loop, we could have some coil with, say, N turns, so that’s N of these Γ loops. So what happens then? In fact, contrary to what you might expect, the ΔQ still doesn’t change as it moves into the coil and then from loop to loop to get out and then through the circuit: it’s still the same ΔQ. But the work that can be done by this current becomes much larger. In fact, two loops give us twice the emf of one loop, and N loops give us N times the emf of one loop. So then we can make the free electrons move faster, so they cover more distance in the same time (and you know work is force times distance), or we can move them across a larger potential difference over the same distance (and so then we move them against a larger force, so it also implies we’re doing more work). The first case is a larger current, while the second is a larger voltage. So what is it going to be?

Think about the physics of the situation once more: to make the charges move faster, you’ll need a larger force, so you’ll have a larger potential difference, i.e. a larger voltage. As for what happens to the current, I’ll explain that below. Before I do, let me talk some more basics.

In the exposé below, we’ll talk about power again, and also about load. What is load? Think about what it is in real life: when buying a battery for a big car, we’ll want a big battery, so we don’t look at the voltage only (they’re all 12-volt anyway). We’ll look at how many ampères it can deliver, and for how long. The starter motor in the car, for example, can suck up like 200 A, but for a very short time only, of course, as the car engine itself should kick in. So that’s why the capacity of batteries is expressed in ampère-hours.

Now, how do we get such large currents, such large loads? Well… Use Ohm’s Law: to get 200 A at 12 V, the resistance of the starter motor will have to as low as 0.06 ohm. So large currents are associated with very low resistance. Think practical: a 240-volt 60 watt light-bulb will suck in 0.25 A, and hence, its internal resistance, is about 960 Ω. Also think of what goes on in your house: we’ve got a lot of resistors in parallel consuming power there. The formula for the total resistance is 1/Rtotal = 1/R+ 1/R+ 1/R+ … So more appliances is less resistance, so that’s what draws in the larger current.

The point is: when looking at circuits, emf is one thing, but energy and power, i.e. the work done per second, are all that matters really. And so then we’re talking currents, but our flux rule does not say how much current our generator will produce: that depends on the load. OK. We really need to get back to the lesson now.

A circuit with an AC generator

The situation is depicted below. We’ve got a coil of wire of, let’s say, N turns of wire, and we’ll use it to generate an alternating current (AC) in a circuit.

AC generatorCircuit

The coil is really like the loop of wire in that primitive electric motor I introduced in my previous post, but so now we use the motor as a generator. To simplify the analysis, we assume we’ll rotate our coil of wire in a uniform magnetic field, as shown by the field lines B.


Now, our coil is not a loop, of course: the two ends of the coil are brought to external connections through some kind of sliding contacts, but that doesn’t change the flux rule: a changing magnetic flux will produce some emf and, therefore, some current in the coil.

OK. That’s clear enough. Let’s see what’s happening really. When we rotate our coil of wire, we change the magnetic flux through it. If S is the area of the coil, and θ is the angle between the magnetic field and the normal to the plane of the coil, then the flux through the coil will be equal to B·S·cosθ. Now, if we rotate the coil at a uniform angular velocity ω, then θ varies with time as θ = ω·t. Now, each turn of the coil will have an emf equal to the rate of change of the flux, i.e. d(B·S·cosθ)/dt. We’ve got N turns of wire, and so the total emf, which we’ll denote by Ɛ (yep, a new symbol), will be equal to:

Formula emfNow, that’s just a nice sinusoidal function indeed, which will look like the graph below.

graph (1)

When no current is being drawn from the wire, this Ɛ will effectively be the potential difference between the two wires. What happens really is that the emf produces a current in the coil which pushes some charges out to the wire, and so then they’re stuck there for a while, and so there’s a potential difference between them, which we’ll denote by V, and that potential difference will be equal to Ɛ. It has to be equal to Ɛ because, if it were any different, we’d have an equalizing counter-current, of course. [It’s a fine point, so you should think about it.] So we can write:

formula VSo what happens when we do connect the wires to the circuit, so we’ve got that closed circuit depicted above (and below)?


Then we’ll have a current I going through the circuit, and Ohm’s Law then tells us that the ratio between (i) the voltage across the resistance in this circuit (we assume the connections between the generator and the resistor itself are perfect conductors) and (ii) the current will be some constant, so we have R = V/I and, therefore:

Formula AC generator

[To be fully complete, I should note that, when other circuit elements than resistors are involved, like capacitors and inductors, we’ll have a phase difference between the voltage and current functions, and so we should look at the impedance of the circuit, rather than its resistance. For more detail, see the addendum below this post.]

OK. Let’s now look at the power and energy involved.

Energy and power in the AC circuit

You’ll probably have many questions about the analysis above. You should. I do. The most remarkable thing, perhaps, is that this analysis suggests that the voltage doesn’t drop as we connect the generator to the circuit. It should. Why not? Why do the charges at both ends of the wire simply discharge through the circuit? In real life, there surely is such tendencysudden large changes in loading will effectively produce temporary changes in the voltage. But then it’s like Feynman writes: “The emf will continue to provide charge to the wires as current is drawn from them, attempting to keep the wires always at the same potential difference.”

So how much current is drawn from them? As I explained above, that depends not on the generator but on the circuit, and more in particular on the load, so that’s the resistor in this case. Again, the resistance is the (constant) ratio of the voltage and the current: R = V/I. So think about increasing or decreasing the resistance. If the voltage remains the same, it implies the current must decrease or increase accordingly, because R = V/I implies that I = V/R. So the current is inversely proportional to R, as I explained above when discussing car batteries and lamps and loads. :-)

Now, I still have to prove that the power provided by our generator is effectively equal to P = Ɛ·I but, if it is, it implies the power that’s being delivered will be inversely proportional to R. Indeed, when Ɛ and/or V remain what they are as we insert a larger resistance in the circuit, then P = Ɛ·I = Ɛ2/R, and so the power that’s being delivered would be inversely proportional to R. To be clear, we’d have a relation between P and R like the one below.


This is somewhat weird. Why? Well… I also have to show you that the power that goes into moving our coil in the magnetic field, i.e. the rate of mechanical work required to rotate the coil against the magnetic forces, is equal to the electric power Ɛ·I, i.e. the rate at which electrical energy is being delivered by the emf of the generator. However, I’ll postpone that for a while and, hence, I’ll just ask you, once again, to take me on my word. :-) Now, if that’s true, so if the mechanical power equals the electric power, then that implies that a larger resistance will reduce the mechanical power we need to maintain the angular velocity ω. Think of a practical example: if we’d double the resistance (i.e. we halve the load), and if the voltage stays the same, then the current would be halved, and the power would also be halved. And let’s think about the limit situations: as the resistance goes to infinity, the power that’s being delivered goes to zero, as the current goes to zero, while if the resistance goes to zero, both the current as well as the power would go to infinity!

Well… We actually know that’s also true in real-life: actual generators consume more fuel when the load increases, so when they deliver more power, and much less fuel, so less power, when there’s no load at all. You’ll know that, at least when you’re living in a developing country with a lot of load shedding! :-) And the difference is huge: no or just a little load will only consume 10% of what you need when fully loading it. It’s totally in line with what I wrote on the relationship between the resistance and the current that it draws in. So, yes, it does make sense:

An emf does produce more current if the resistance in the circuit is low (so i.e. when the load is high), and the stronger currents do represent greater mechanical forces.

That’s a very remarkable thing. It means that, if we’d put a larger load on our little AC generator, it should require more mechanical work to keep the coil rotating at the same angular velocity ω. But… What changes? The change in flux is the same, the Δt is the same, and so what changes really? What changes is the current going through the coil, and it’s not a change in that ΔQ factor above, but a change in its velocity v.

Hmm… That all looks quite complicated, doesn’t it? It does, so let’s get back to the analysis of what we have here, so we’ll simply assume that we have some dynamic equilibrium obeying that formula above, and so I and R are what they are, and we relate them to Ɛ according to that equation above, i.e.:

Formula AC generator

Now let me prove those formulas on the power of our generator and in the circuit. We have all these charges in our coil that are receiving some energy. Now, the rate at which they receive energy is F·v.

Huh? Yes. Let me explain: the work that’s being done on a charge along some path is the line integral ∫ F·ds along this path. But the infinitesimal distance ds is equal to v·dt, as ds/dt = v (note that we write s and v as vectors, so the dot product with F gives us the component of F that is tangential to the path). So ∫ F·ds = ∫ (F·v)dt. So the time rate of change of the energy, which is the power, is F·v. Just take the time derivative of the integral. :-)

Now let’s assume we have n moving charges per unit length of our coil (so that’s in line with what I wrote about ΔQ above), then the power being delivered to any element ds of the coil is (F·v)·n·ds, which can be written as: (F·ds)·n·v. [Why? Because v and ds have the same direction: the direction of both vectors is tangential to the wire, always.] Now all we need to do to find out how much power is being delivered to the circuit by our AC generator is integrate this expression over the coil, so we need to find:


However, the emf (Ɛ) is defined as the line integral ∫ E·ds line, taken around the entire coil, and = F/q, and the current I is equal to I = q·n·v. So the power from our little AC generator is indeed equal to:

Power = Ɛ·I

So that’s done. Now I need to make good on my other promise, and that is to show that Ɛ·I product is equal to the mechanical power that’s required to rotate the coil in the magnetic field. So how do we do that?

We know there’s going to be some torque because of the current in the coil. It’s formula is given by τ = μ×B. What magnetic field? Well… Let me refer you to my post on the magnetic dipole and its torque: it’s not the magnetic field caused by the current, but the external magnetic field, so that’s the B we’ve been talking about here all along. So… Well… I am not trying to fool you here. :-) However, the magnetic moment μ was not defined by that external field, but by the current in the coil and its area. Indeed, μ‘s magnitude was the current times the area, so that’s N·I·S in this case. Of course, we need to watch out because μ is a vector itself and so we need the angle between μ and B to calculate that vector cross product τ = μ×B. However, if you check how we defined the direction of μ, you’ll see it’s normal to the plane of the coil and, hence, the angle between μ and B is the very same θ = ω·t that we started our analysis with. So, to make a long story short, the magnitude of the torque τ is equal to:

τ = (N·I·S)·B·sinθ

Now, we know the torque is also equal to the work done per unit of distance traveled (around the axis of rotation, that is), so τ = dW/dθ. Now dθ = d(ω·t) = ω·dt. So we can now find the work done per unit of time, so that’s the power once more:

dW/dt = ω·τ = ω·(N·I·S)·B·sinθ

But so we found that Ɛ = N·S·B·ω·sinθ, so… Well… We find that:

dW/dt = Ɛ·I

Now, this equation doesn’t sort out our question as to how much power actually goes in and out of the circuit as we put some load on it, but it is what we promised to do: I showed that the mechanical work we’re doing on the coil is equal to the electric energy that’s being delivered to the circuit. :-)

It’s all quite mysterious, isn’t it? It is. And we didn’t include other stuff that’s relevant here, such as the phenomenon of self-inductance: the varying current in the coil will actually produce its own magnetic field and, hence, in practice, we’d get some “back emf” in the circuit. This “back emf” is opposite to the current when it is increasing, and it is in the direction of the current when it is decreasing. In short, the self-inductance effect causes a current to have ‘inertia’: the inductive effects try to keep the flow constant, just as mechanical inertia tries to keep the velocity of an object constant. But… Well… I left that out. I’ll take about next time because…

[…] Well… It’s getting late in the day, and so I must assume this is sort of ‘OK enough’ as an introduction to what we’ll be busying ourselves with over the coming week. You take care, and I’ll talk to you again some day soon. :-)

Perhaps one little note, on a question that might have popped up when you were reading all of the above: so how do actual generators keep the voltage up? Well… Most AC generators are, indeed, so-called constant speed devices. You can download some manuals from the Web, and you’ll find things like this: don’t operate at speeds above 4% of the rated speed, or more than 1% below the rated speed. Fortunately, the so-called engine governor will take car of that. :-)

Addendum: The concept of impedance

In one of my posts on oscillators, I explain the concept of impedance, which is the equivalent of resistance, but for AC circuits. Just like resistance, impedance also sort of measures the ‘opposition’ that a circuit presents to a current when a voltage is applied, but it’s a complex ratio, as opposed to R = V/I. It’s literally a complex ratio because the impedance has a magnitude and a direction, or a phase as it’s usually referred to. Hence, one will often write the impedance (denoted by Z) using Euler’s formula:

Z = |Z|eiθ

The illustration below (credit goes to Wikipedia, once again) explains what’s going on. It’s a pretty generic view of the same AC circuit. The truth is: if we apply an alternating current, then the current and the voltage will both go up and down, but the current signal will usually lag the voltage signal, and the phase factor θ tells us by how much. Hence, using complex-number notation, we write:

V = IZ = I∗|Z|eiθ


Now, while that resembles the V = R·I formula, you should note the bold-face type for V and I, and the ∗ symbol I am using here for multiplication. First the ∗ symbol: that’s to make it clear we’re not talking a vector cross product A×B here, but a product of two complex numbers. The bold-face for V and I implies they’re like vectors, or like complex numbers: so they have a phase too and, hence, we can write them as:

  • = |V|ei(ωt + θV)
  • = |I|ei(ωt + θI)

To be fully complete – you may skip all of this if you want, but it’s not that difficult, nor very long – it all works out as follows. We write:

IZ = |I|ei(ωt + θI)∗|Z|eiθ = |I||Z|ei(ωt + θ+ θ) = |V|ei(ωt + θV)

Now, this equation must hold for all t, so we can equate the magnitudes and phases and, hence, we get: |V| = |I||Z| and so we get the formula we need, i.e. the phase difference between our function for the voltage and our function for the current.

θ= θI + θ

Of course, you’ll say: voltage and current are something real, isn’t it? So what’s this about complex numbers? You’re right. I’ve used the complex notation only to simplify the calculus, so it’s only the real part of those complex-valued functions that counts.

Oh… And also note that, as mentioned above, we do not have such lag or phase difference when only resistors are involved. So we don’t need the concept of impedance in the analysis above. With this addendum, I just wanted to be as complete as I can be. :-)

Induced currents

In my two previous posts, I presented all of the ingredients of the meal we’re going to cook now, most notably:

  1. The formula for the torque on a loop of a current in a magnetic field, and its energy: (i) τ = μ×B, and (ii) Umech = −μ·B.
  2. The Biot-Savart Law, which gives you the magnetic field that’s produced by wires carrying currents:

B formula 2

Both ingredients are, obviously, relevant to the design of an electromagnetic motor, i.e. an ‘engine that can do some work’, as Feynman calls it. :-) Its principle is illustrated below.


The two formulas above explain how and why the coil go around, and the coil can be made to keep going by arranging that the connections to the coil are reversed each half-turn by contacts mounted on the shaft. Then the torque is always in the same direction. That’s how a small direct current (DC) motor is made. My father made me make a couple of these thirty years ago, with a magnet, a big nail and some copper coil. I used sliding contacts, and they were the most difficult thing in the whole design. But now I found a very nice demo on YouTube of a guy whose system to ‘reverse’ the connections is wonderfully simple: he doesn’t use any sliding contacts. He just removes half of the insulation on the wire of the coil on one side. It works like a charm, but I think it’s not so sustainable, as it spins so fast that the insulation on the other side will probably come off after a while! :-)

Now, to make this motor run, you need current and, hence, 19th century physicists and mechanical engineers also wondered how one could produce currents by changing the magnetic field. Indeed, they could use Alessandro Volta’s ‘voltaic pile‘ to produce currents but it was not very handy: it consisted of alternating zinc and copper discs, with pieces of cloth soaked in salt water in-between!

Now, while the Biot-Savart Law goes back to 1820, it took another decade to find out how that could be done. Initially, people thought magnetic fields should just cause some current, but that didn’t work. Finally, Faraday unequivocally established the fundamental principle that electric effects are only there when something is changingSo you’ll get a current in a wire by moving it in a magnetic field, or by moving the magnet or, if the magnetic field is caused by some other current, by changing the current in that wire. It’s referred to as the ‘flux rule’, or Faraday’s Law. Remember: we’ve seen Gauss’ Law, then Ampère’s Law, and then that Biot-Savart Law, and so now it’s time for Faraday’s Law. :-) Faraday’s Law is Maxwell’s third equation really, aka as the Maxwell-Faraday Law of Induction:

×E = −∂B/∂t

Now you’ll wonder: what’s flux got to do with this formula? ×E is about circulation, not about flux! Well… Let me copy Feynman’s answer:

Faraday's law

So… There you go. And, yes, you’re right, instead of writing Faraday’s Law as ×E = −∂B/∂t, we should write it as:


That’s a easier to understand, and it’s also easier to work with, as we’ll see in a moment. So the point is: whenever the magnetic flux changes, there’s a push on the electrons in the wire. That push is referred to as the electromotive force, abbreviated as emf or EMF, and so it’s that line and/or surface integral above indeed. Let me paraphrase Feynman so you fully understand what we’re talking about here:

When we move our wire in a magnetic field, or when we move a magnet near the wire, or when we change the current in a nearby wire, there will be some net push on the electrons in the wire in one direction along the wire. There may be pushes in different directions at different places, but there will be more push in one direction than another. What counts is the push integrated around the complete circuit. We call this net integrated push the electromotive force (abbreviated emf) in the circuit. More precisely, the emf is defined as the tangential force per unit charge in the wire integrated over length, once around the complete circuit.

So that’s the integral. :-) And that’s how we can turn that motor above into a generator: instead of putting a current through the wire to make it turn, we can turn the loop, by hand or by a waterwheel or by whatever. Now, when the coil rotates, its wires will be moving in the magnetic field and so we will find an emf in the circuit of the coil, and so that’s how the motor becomes a generator.

Now, let me quickly interject something here: when I say ‘a push on the electrons in the wire’, what electrons are we talking about? How many? Well… I’ll answer that question in very much detail in a moment but, as for now, just note that the emf is some quantity expressed per coulomb or, as Feynman puts it above, per unit charge. So we’ll need to multiply it with the current in the circuit to get the power of our little generator.

OK. Let’s move on. Indeed, all I can do here is mention just a few basics, so we can move on to the next thing. If you really want to know all of the nitty-gritty, then you should just read Feynman’s Lecture on induced currents. That’s got everything. And, no, don’t worry: contrary to what you might expect, my ‘basics’ do not amount to a terrible pile of formulas. In fact, it’s all easy and quite amusing stuff, and I should probably include a lot more. But then… Well… I always need to move on… If not, I’ll never get to the stuff that I really want to understand. :-(

The electromotive force

We defined the electromotive force above, including its formula:


What are the units? Let’s see… We know B was measured not in newton per coulomb, like the electric field E, but in N·s/C·m, because we had to multiply the magnetic field strength with the velocity of the charge to find the force per unit charge, cf. the F/q = v×equation. Now what’s the unit in which we’d express that surface integral? We must multiply with m2, so we get N·m·s/C. Now let’s simplify that by noting that one volt is equal to 1 N·m/C. [The volt has a number of definitions, but the one that applies here is that it’s the potential difference between two points that will impart one joule (i.e. 1 N·m) of energy to a unit of charge (i.e. 1 C) that passes between them.] So we can measure the magnetic flux in volt-seconds, i.e. V·s. And then we take the derivative in regard to time, so we divide by s, and so we get… Volt! The emf is measured in volt!

Does that make sense? I guess so: the emf causes a current, just like a potential difference, i.e. a voltage, and, therefore, we can and should look at the emf as a voltage too!

But let’s think about it some more, though. In differential form, Faraday’s Law, is just that ×E = −∂B/∂t equation, so that’s just one of Maxwell’s four equations, and so we prefer to write it as the “flux rule”. Now, the “flux rule” says that the electromotive force (abbreviated as emf or EMF) on the electrons in a closed circuit is equal to the time rate of change of the magnetic flux it encloses. As mentioned above, we measure magnetic flux in volt-seconds (i.e. V·s), so its time rate of change is measured in volt (because the time rate of change is a quantity expressed per second), and so the emf is measured in volt, i.e. joule per coulomb, as 1 V = 1 N·m/C = 1 J/C. What does it mean?

The time rate of change of the magnetic flux can change because the surface covered by our loop changes, or because the field itself changes, or by both. Whatever the cause, it will change the emf, or the voltage, and so it will make the electrons move. So let’s suppose we have some generator generating some emf. The emf can be used to do some work. We can charge a capacitor, for example. So how would that work?

More charge on the capacitor will increase the voltage V of the capacitor, i.e. the potential difference V = Φ1 − Φ2 between the two plates. Now, we know that the increase of the voltage V will be proportional to the increase of the charge Q, and that the constant of proportionality is defined by the capacity C of the capacitor: C = Q/V. [How do we know that? Well… Have a look at my post on capacitors.] Now, if our capacitor has an enormous capacity, then its voltage won’t increase very rapidly. However, it’s clear that, no matter how large the capacity, its voltage will increase. It’s just a matter of time. Now, its voltage cannot be higher than the emf provided by our ‘generator’, because it will then want to discharge through the same circuit!

So we’re talking power and energy here, and so we need to put some load on our generator. Power is the rate of doing work, so it’s the time rate of change of energy, and it’s expressed in joule per second. The energy of our capacitor is U = (1/2)·Q2/C = (1/2)·C·V2. [How do we know that? Well… Have a look at my post on capacitors once again. :-)] So let’s take the time derivative of U assuming some constant voltage V. We get: dU/dt = d[(1/2)·Q2/C]/dt = (Q/C)·dQ/dt = V·dQ/dt. So that’s the power that the generator would need to supply to charge the generator. As I’ll show in a moment, the power supplied by a generator is, indeed, equal to the emf times the current, and the current is the time rate of change of the charge, so I = dQ/dt.

So, yes, it all works out: the power that’s being supplied by our generator will be used to charge our capacitor. Now, you may wonder: what about the current? Where is the current in Faraday’s Law? The answer is: Faraday’s Law doesn’t have the current. It’s just not there. The emf is expressed in volt, and so that’s energy per coulomb, so it’s per unit charge. How much power an generator can and will deliver depends on its design, and the circuit and load that we will be putting on it. So we can’t say how many coulomb we will have. It all depends. But you can imagine that, if the loop would be bigger, or if we’d have a coil with many loops, then our generator would be able to produce more power, i.e. it would be able to move more electrons, so the mentioned power = (emf)×(current) product would be larger. :-)

Finally, to conclude, note Feynman’s definition of the emf: the tangential force per unit charge in the wire integrated over length around the complete circuit. So we’ve got force times distance here, but per unit charge. Now, force times distance is work, or energy, and so… Yes, emf is joule per coulomb, definitely! :-)

[…] Don’t worry too much if you don’t quite ‘get’ this. I’ll come back to it when discussing electric circuits, which I’ll do in my next posts.

Self-inductance and Lenz’s rule

We talked about motors and generators above. We also have transformers, like the one below. What’s going on here is that an alternating current (AC) produces a continuously varying magnetic field, which generates an alternating emf in the second coil, which produces enough power to light an electric bulb.


Now, the total emf in coil (b) is the sum of the emf’s of the separate turns of coil, so if we wind (b) with many turns, we’ll get a larger emf, so we can ‘transform’ the voltage to some other voltage. From your high-school classes, you should know how that works.

The thing I want to talk about here is something else, though. There is an induction effect in coil (a) itself. Indeed, the varying current in coil (a) produces a varying magnetic field inside itself, and the flux of this field is continually changing, so there is a self-induced emf in coil (a). The effect is called self-inductance, and so it’s the emf acting on a current itself when it is building up a magnetic field or, in general, when its field is changing in any way. It’s a most remarkable phenomenon, and so let me paraphrase Feynman as he describes it:

“When we gave “the flux rule” that the emf is equal to the rate of change of the flux linkage, we didn’t specify the direction of the emf. There is a simple rule, called Lenz’s rule, for figuring out which way the emf goes: the emf tries to oppose any flux change. That is, the direction of an induced emf is always such that if a current were to flow in the direction of the emf, it would produce a flux of B that opposes the change in B that produces the emf. In particular, if there is a changing current in a single coil (or in any wire), there is a “back” emf in the circuit. This emf acts on the charges flowing in the coil to oppose the change in magnetic field, and so in the direction to oppose the change in current. It tries to keep the current constant; it is opposite to the current when the current is increasing, and it is in the direction of the current when it is decreasing. A current in a self-inductance has “inertia,” because the inductive effects try to keep the flow constant, just as mechanical inertia tries to keep the velocity of an object constant.”

Hmm… That’s something you need to read a couple of times to fully digest it. There’s a nice demo on YouTube, showing an MIT physics video demonstrating this effect with a metal ring placed on the end of an electromagnet. You’ve probably seen it before: the electromagnet is connected to a current, and the ring flies into the air. The explanation is that the induced currents in the ring create a magnetic field opposing the change of field through it. So the ring and the coil repel just like two magnets with opposite poles. The effect is no longer there when a thin radial cut is made in the ring, because then there can be no current. The nice thing about the video is that it shows how the effect gets much more dramatic when an alternating current is applied, rather than a DC current. And it also shows what happens when you first cool the ring in liquid nitrogen. :-)

You may also notice the sparks when the electromagnet is being turned on. Believe it or not, that’s also related to a “back emf”. Indeed, when we disconnect a large electromagnet by opening a switch, the current is supposed to immediately go to zero but, in trying to do so, it generates a large “back emf”: large enough to develop an arc across the opening contacts of the switch. The high voltage is also not good for the insulation of the coil, as it might damage it. So that’s why large electromagnets usually include some extra circuit, which allows the “back current” to discharge less dramatically. But I’ll refer you to Feynman for more details, as any illustration here would clutter the exposé.

Eddy currents

I like educational videos, and so I should give you a few references here, but there’s so many of this that I’ll let you google a few yourself. The most spectacular demonstration of eddy currents is those that appear in a superconductor: even back in the 1970s, when Feynman wrote his Lectures, the effect of magnetic levitation was well known. Feynman illustrates the effect with the simple diagram below: when bringing a magnet near to a perfect conductor, such as tin below 3.8°K, eddy currents will create opposing fields, so that no magnetic flux enters the superconducting material. The effect is also referred to as the Meisner effect, after the German physicist Walther Meisner, although it was discovered much earlier (in 1911) by a Dutch physicist in Leiden, Heike Kamerlingh Onnes, who got a Nobel Prize for it.


Of course, we have eddy currents in less dramatic situations as well. The phenomenon of eddy currents is usually demonstrated by the braking of a sheet of metal as it swings back and forth between the poles of an electromagnet, as illustrated below (left). The illustration on the right shows how eddy-current effect can be drastically reduced by cutting slots in the plate, so that’s like making a radial cut in our jumping ring. :-)

eddy currentseddy currents 2

The Faraday disc

The Faraday disc is interesting, not only from a historical point of view – the illustration below is a 19th century model, so Michael Faraday may have used himself – but also because it seems to contradict the “flux of rule”: as the disc rotates through a steady magnetic field, it will produce some emf, but so there’s no change in the flux. How is that possible?

Faraday_disk_generatorFaraday disk

The answer, of course, is that we are ‘cheating’ here: the material is moving, so we’re actually moving the ‘wire’, or the circuit if you want, so here we need to combine two equations:

two laws

If we do that, you’ll see it all makes sense. :-) Oh… That Faraday disc is referred to as a homopolar generator, and it’s quite interesting. You should check out what happened to the concept in the Wikipedia article on it. The Faraday disc was apparently used as a source for power pulses in the 1950s. The thing below could store 500 mega-joules and deliver currents up to 2 mega-ampère, i.e. 2 million amps! Fascinating, isn’t it? :-)800px-Homopolar_anu-MJC

Magnetic dipoles and their torque and energy

We studied the magnetic dipole in very much detail in one of my previous posts but, while we talked about an awful lot of stuff there, we actually managed to not talk about the torque on a it, when it’s placed in the magnetic field of other currents, or some other magnetic field tout court. Now, that’s what drives electric motors and generators, of course, and so we should talk about it, which is what I’ll do here. Let me first remind you of the concept of torque, and then we’ll apply it to a loop of current. :-)

The concept of torque

The concept of torque is easy to grasp intuitively, but the math involved is not so easy. Let me sum up the basics (for the detail, I’ll refer you to my posts on spin and angular momentum). In essence, for rotations in space (i.e. rotational motion), the torque is what the force is for linear motion:

  1. It’s the torque (τ) that makes an object spin faster or slower around some axis, just like the force would accelerate or decelerate that very same object when it would be moving along some curve.
  2. There’s also a similar ‘law of Newton’ for torque: you’ll remember that the force equals the time rate of change of a vector quantity referred to as (linear) momentum: F = dp/dt = d(mv)/dt = ma (the mass times the acceleration). Likewise, we have a vector quantity that is referred to as angular momentum (L), and we can write: τ (i.e. the Greek tau) = dL/dt.
  3. Finally, instead of linear velocity, we’ll have an angular velocity ω (omega), which is the time rate of change of the angle θ that defines how far the object has gone around (as opposed to the distance in linear dynamics, describing how far the object has gone along). So we have ω = dθ/dt. This is actually easy to visualize because we know that θ, expressed in radians, is actually the length of the corresponding arc on the unit circle. Hence, the equivalence with the linear distance traveled is easily ascertained.

There are many more similarities, like an angular acceleration: α = dω/dt = d2θ/dt2, and we should also note that, just like the force, the torque is doing work – in its conventional definition as used in physics – as it turns an object instead of just moving it, so we can write:

ΔW = τ·Δθ

So it’s all the same-same but different once more :-) and so now we also need to point out some differences. The animation below does that very well, as it relates the ‘new’ concepts – i.e. torque and angular momentum – to the ‘old’ concepts – i.e. force and linear momentum. It does so using the vector cross product, which is really all you need to understand the math involved. Just look carefully at all of the vectors involved, which you can identify by their colors, i.e. red-brown (r), light-blue (τ), dark-blue (F), light-green (L), and dark-green (p).


So what do we have here? We have vector quantities once again, denoted by symbols in bold-face. Having said that, I should note that τ, L and ω are ‘special’ vectors: they are referred to as axial vectors, as opposed to the polar vectors F, p and v. To put it simply: polar vectors represent something physical, and axial vectors are more like mathematical vectors, but that’s a very imprecise and, hence, essential non-correct definition. :-) Axial vectors are directed along the axis of spin – so that is, strangely enough, at right angles to the direction of spin, or perpendicular to the ‘plane of the twist’ as Feynman calls it – and the direction of the axial vector is determined by a convention which is referred to as the ‘right-hand screw rule’. :-)

Now, I know it’s not so easy to visualize vector cross products, so it may help to first think of torque (also known, for some obscure reason, as the moment of the force) as a twist on an object or a plane. Indeed, the torque’s magnitude can be defined in another way: it’s equal to the tangential component of the force, i.e. F·sin(Δθ), times the distance between the object and the axis of rotation (we’ll denote this distance by r). This quantity is also equal to the product of the magnitude of the force itself and the length of the so-called lever arm, i.e. the perpendicular distance from the axis to the line of action of the force (this lever arm length is denoted by r0). So, we can define τ without the use of the vector cross-product, and in not less than three different ways actually. Indeed, the torque is equal to:

  1. The product of the tangential component of the force times the distance r: τ = r·Ft= r·F·sin(Δθ);
  2. The product of the length of the lever arm times the force: τ = r0·F;
  3. The work done per unit of distance traveled: τ = ΔW/Δθ or τ = dW/dθ in the limit.

Phew! Yeah. I know. It’s not so easy… However, I regret to have to inform you that you’ll need to go even further in your understanding of torque. More specifically, you really need to understand why and how we define the torque as a vector cross product, and so please do check out that post of mine on the fundamentals of ‘torque math’. If you don’t want to do that, then just try to remember the definition of torque as an axial vector, which is:

τ = (τyz, τzx, τxy) = (τx, τy, τz) with

τx = τyz = yFz – zFy (i.e. the torque about the x-axis, i.e. in the yz-plane),

τy = τzx = zFx – xFz (i.e. the torque about the y-axis, i.e. in the zx-plane), and

τz = τxy = xFy – yFx (i.e. the torque about the z-axis, i.e. in the xy-plane).

The angular momentum L is defined in the same way:

L = (Lyz, Lzx, Lxy) = (Lx, Ly, Lz) with

Lx = Lyz = ypz – zpy (i.e. the angular momentum about the x-axis),

Ly = Lzx = zpx – xpz (i.e. the angular momentum about the y-axis), and

Lz = Lxy = xpy – ypx (i.e. the angular momentum about the z-axis).

Let’s now apply the concepts to a loop of current.

The forces on a current loop

The geometry of the situation is depicted below. I know it looks messy but let me help you identifying the moving parts, so to speak. :-) We’ve got a loop with current and so we’ve got a magnetic dipole with some moment μ. From my post on the magnetic dipole, you know that μ‘s magnitude is equal to |μ| = μ = (current)·(area of the loop) = I·a·b.

Geometry 2

Now look at the B vectors, i.e. the magnetic field. Please note that these vectors represent some external magnetic field! So it’s not like what we did in our post on the dipole: we’re not looking at the magnetic field caused by our loop, but at how it behaves in some external magnetic field. Now, because it’s kinda convenient to analyze, we assume that the direction of our external field B is the direction of the z-axis, so that’s what you see in this illustration: the B vectors all point north. Now look at the force vectors, remembering that the magnetic force is equal to:

Fmagnetic = qv×B

So that gives the F1F2F3, and F4 vectors (so that’s the force on the first, second, third and fourth leg of the loop respectively) the magnitude and direction they’re having. Now, it’s easy to see that the opposite forces, i.e. the F1F2 and F3Fpair respectively, create a torque. The torque because of Fand Fis a torque which will tend to rotate the loop about the y-axis, so that’s a torque in the xz-plane, while the torque because of Fand Fwill be some torque about the x-axis and/or the z-axis. As you can see, the torque is such that it will try to line up the moment vector μ with the magnetic field B. In fact, the geometry of the situation above is such that Fand Fhave already done their job, so to speak: the moment vector μ is already lined up with the xz-plane, so there’s not net torque in that plane. However, that’s just because of the specifics of the situation here: the more general situation is that we’d have some torque about all three axes, and so we need to find that vector τ.

If we’d be talking some electric dipole, the analysis would be very straightforward, because the electric force is just Felectric = qE, which we can also write as E = Felectric =/q, so the field is just the force on one unit of electric charge, and so it’s (relatively) easy to see that we’d get the following formula for the torque vector:

τ = p×E

Of course, the p is the electric dipole moment here, not some linear momentum. [And, yes, please do try to check this formula. Sorry I can’t elaborate on it, but the objective of this blog is not substitute for a textbook!]

Now, all of the analogies between the electric and magnetic dipole field, which we explored in the above-mentioned post of mine, would tend to make us think that we can write τ here as:

τ = μ×B

Well… Yes. It works. Now you may want to know why it works :-) and so let me give you the following hint. Each charge in a wire feels that Fmagnetic = qv×B force, so the total magnetic force on some volume ΔV, which I’ll denote by ΔF for a while, is the sum of the forces on all of the individual charges. So let’s assume we’ve got N charges per unit volume, then we’ve got N·ΔV charges in our little volume ΔV, so we write: ΔF = N·ΔV·q·v×B. You’re probably confused now: what’s the v here? It’s the (drift) velocity of the (free) electrons that make up our current I. Indeed, the protons don’t move. :-) So N·q·v is just the current density j, so we get: ΔF = j×BΔV, which implies that the force per unit volume is equal to j×B. But we need to relate it to the current in our wire, not the current density. Relax. We’re almost there. The ΔV in a wire is just its cross-sectional area A times some length, which I’ll denote by ΔL, so ΔF = j×BΔV becomes ΔF = j×BAΔL. Now, jA is the vector current I, so we get the simple result we need here: ΔF = I×BΔL, i.e.  the magnetic force per unit length on a wire is equal to ΔF/ΔL = I×B.

Let’s now get back to our magnetic dipole and calculate Fand F2. The length of ‘wire’ is the length of the leg of the loop, i.e. b, so we can write:

F= −F2 = b·I×B

So the magnitude of these forces is equal F= F2 = I·B·b. Now, The length of the moment or lever arm is, obviously, equal to a·sinθ, so the magnitude of the torque is equal to the force times the lever arm (cf. the τ = r0·F formula above) and so we can write:

τ = I·B·b·a·sinθ

But I·a·b is the magnitude of the magnetic moment μ, so we get:

τ = μ·B·sinθ

Now that’s consistent with the definition of the vector cross product:

τμ×= |μ|·|B|·sinθ·n = μ·B·sinθ·n

Done! Now, electric motors and generators are all about work and, therefore, we also need to briefly talk about energy here.

The energy of a magnetic dipole

Let me remind you that we could also write the torque as the work done per unit of distance traveled, i.e. as τ = ΔW/Δθ or τ = dW/dθ in the limit. Now, the torque tries to line up the moment with the field, and so the energy will be lowest when μ and B are parallel, so we need to throw in a minus sign when writing:

 τ = −dU/dθ ⇔ dU = −τ·dθ

We should now integrate over the [0, θ] interval to find U, also using our τ = μ·B·sinθ formula. That’s easy, because we know that d(cosθ)/dθ = −sinθ, so that integral yields:

U = 1 − μ·B·cosθ + a constant

If we choose the constant to be zero, and if we equate μ·B with 1, we get the blue graph below:

graph energy magnetic dipole 3

The μ·B in the U = 1 − μ·B·cosθ formula is just a scaling factor, obviously, so it determines the minimum and maximum energy. Now, you may want to limit the relevant range of θ to [0, π], but that’s not necessary: the energy of our loop of current does go up and down as shown in the graph. Just think about it: it all makes perfect sense!

Now, there is, of course, more energy in the loop than this U energy because energy is needed to maintain the current in the loop, and so we didn’t talk about that here. Therefore, we’ll qualify this ‘energy’ and call it the mechanical energy, which we’ll abbreviate by Umech. In addition, we could, and will, choose some other constant of integration, so that amounts to choosing some other reference point for the lowest energy level. Why? Because it then allows us to write Umech as a vector dot product, so we get:

Umech = −μ·B·cosθ = −μ·B

The graph is pretty much the same, but it now goes from −μ·B to +μ·B, as shown by the red graph in the illustration above.

Finally, you should note that the Umech = −μ·B formula is similar to what you’ll usually see written for the energy of an electric dipole: U = −p·E. So that’s all nice and good! However, you should remember that the electrostatic energy of an electric dipole (i.e. two opposite charges separated by some distance d) is all of the energy, as we don’t need to maintain some current to create the dipole moment!

Now, Feynman does all kinds of things with these formulas in his Lectures on electromagnetism but I really think this is all you need to know about it—for the moment, at least. :-)

The magnetic field of circuits: the Law of Biot and Savart

We studied the magnetic dipole in very much detail in one of my previous posts. While we talked about an awful lot of stuff there, we actually managed to not talk about the torque on a it, when it’s placed in the magnetic field of other currents. Now, that’s what drives electric motors and generators, of course, and so we should talk about it, which is what I’ll do in my next post. Before doing so, however, I need to give you one or two extra formulas generalizing some of the results we obtained in our previous posts on magnetostatics. So that’s what I do under this heading: the magnetic field of circuits. The idea is simple: loops of current are not always nice squares or circles. Their shape might be quite irregular, indeed, like the loop below.

irregular loop

Of course, the same general formula should apply. So we can find the magnetic vector potential with the following integral:

loop of current V

Just to make sure, let me re-insert its equivalent for electrostatics, so you can see they’re (almost) the same:

formula 1

But we’re talking a wire here, so how can we relate the current density j and the volume element dV to that? It’s easy: the illustration below shows that we can simply write:

j·dV = j·S·ds = I·ds

volume and wire

Therefore, we can write our integral for the vector potential as:

loop of current I

Of course, you should note the subtle change from a volume integral to a line integral, so it’s not all that straightforward, but we’re good to go. Now, in electrostatics, we actually had a fairly simple integral for the electric field itself:

formula for E no 2

To be clear, E(1) is the field of a known charge distribution, which is represented by ρ(2), at point (1). The integral is almost the same as the one for Φ, but we’re talking vectors here (E and e12) rather than scalars (ρ and Φ), and you should also note the square in the denominator of the integral. :-)

As you might expect, there is a similar integral for B, which we find by… Well… We just need to calculate B, so that’s the curl of A:

formula for B integral

How do we do that? It’s not so easy, so let me just copy the master himself:

integral calculation

So this integral gives B directly in terms of the known currents. The geometry involved is easy but, just in case, Feynman illustrates it, quite simply, as follows:


Now, there’s one more step to take, and then we’re done. If we’re talking a circuit of small wire, then we can replace j·dV by I·donce more, and, hence, we get the Biot-Savart Law in its final form:

B formula 2

Note the minus sign: it appears because we reversed the order of the vector cross product, and also note we actually have three integrals here, one for each component of B, so that’s just like that integral for A.

So… That’s it. :-) I’ll conclude by two small remarks:

  1. The law is named after Jean-Baptiste Biot and Félix Savart, two incredible Frenchmen (it’s really worthwhile checking their biographies on Wikipedia), who jotted it down in 1820, so that’s almost 200 years ago. Isn’t that amazing?
  2. You see we sort of got rid of the vector potential with this formula. So the question is: “What is the advantage of the vector potential if we can find B directly with a vector integral? After all, A also involves three integrals!” I’ll let Feynman reply to that question:

Because of the cross product, the integrals for B are usually more complicated. Also, since the integrals for A are like those of electrostatics, we may already know them. Finally, we will see that in more advanced theoretical matters (in relativity, in advanced formulations of the laws of mechanics, like the principle of least action to be discussed later, and in quantum mechanics), the vector potential plays an important role.

In fact, Feynman makes the point on the vector potential being relevant very explicit by just boldly stating two laws in quantum mechanics in which the magnetic and electric potential are used, not the magnetic or electric field. Indeed, it seems an external magnetic or electric field changes probability amplitudes. I’ll just jot down the two laws below, but leave it to you to decide whether or not you want to read the whole argumentqm1qm2

The key point that Feynman is making is that Φ and A are equally ‘real’ or ‘unreal’ as E and B in terms of explaining physical realities. I get the point, but I don’t find it necessary to copy the whole argument here. Perhaps it’s sufficient to just quote Feynman’s introduction to it, which says it all, in my humble opinion, that is:

“There are many changes in what concepts are important when we go from classical to quantum mechanics. We have already discussed some of them in Volume I. In particular, the force concept gradually fades away, while the concepts of energy and momentum become of paramount importance. You remember that instead of particle motions, one deals with probability amplitudes which vary in space and time. In these amplitudes there are wavelengths related to momenta, and frequencies related to energies. The momenta and energies, which determine the phases of wave functions, are therefore the important quantities in quantum mechanics. Instead of forces, we deal with the way interactions change the wavelength of the waves. The idea of a force becomes quite secondary—if it is there at all. When people talk about nuclear forces, for example, what they usually analyze and work with are the energies of interaction of two nucleons, and not the force between them. Nobody ever differentiates the energy to find out what the force looks like. In this section we want to describe how the vector and scalar potentials enter into quantum mechanics. It is, in fact, just because momentum and energy play a central role in quantum mechanics that A and Φ provide the most direct way of introducing electromagnetic effects into quantum descriptions.”

OK. That’s sufficient really. Onwards!