The energy of fields and the Poynting vector

For some reason, I always thought that Poynting was a Russian physicist, like Minkowski. He wasn’t. I just looked it up. Poynting was an Englishman, born near Manchester, and he teached in Birmingham. I should have known. Poynting is a very English name, isn’t it? My confusion probably stems from the fact that it was some Russian physicist, Nikolay Umov, who first proposed the basic concepts we are going to discuss here, i.e. the speed and direction of energy itself, or its movement. And as I am double-checking, I just learned that Hermann Minkowski is generally considered to be German-Jewish, not Russian. Makes sense. With Einstein and all that. His personal life story is actually quite interesting. You should check it out. 🙂

Let’s go for it. We’ve done a few posts on the energy in the fields already, but all in the contexts of electrostatics. Let me first walk you through the ideas we presented there.

The basic concepts: force, work, energy and potential

1. A charge q causes an electric field E, and E‘s magnitude E is a simple function of the charge (q) and its distance (r) from the point that we’re looking at, which we usually write as P = (x, y, z). Of course, the origin of our reference frame here is q. The formula is the simple inverse-square law that you (should) know: E ∼ q/r2, and the proportionality constant is just Coulomb’s constant, which I think you wrote as ke in your high-school days and which, as you know, is there so as to make sure the units come out alright. So we could just write E = ke·q/r2. However, just to make sure it does not look like a piece of cake 🙂 physicists write the proportionality constant as 1/4πε0, so we get:

E 3

Now, the field is the force on any unit charge (+1) we’d bring to P. This led us to think of energy, potential energy, because… Well… You know: energy is measured by work, so that’s some force acting over some distance. The potential energy of a charge increases if we move it against the field, so we wrote:

formula 1

Well… We actually gave the formula below in that post, so that’s the work done per unit charge. To interpret it, you just need to remember that F = qE, which is equivalent to saying that E is the force per unit charge.

unit chage

As for the F•ds or E•ds product in the integrals, that’s a vector dot product, which we need because it’s only the tangential component of the force that’s doing work, as evidenced by the formula F•ds = |F|·|ds|·cosθ = Ft·ds, and as depicted below.

illustration 1

Now, this allowed us to describe the field in terms of the (electric) potential Φ and the potential differences between two points, like the points a and b in the integral above. We have to chose some reference point, of course, some P0 defining zero potential, which is usually infinitely far away. So we wrote our formula for the work that’s being done on a unit charge, i.e. W(unit) as:

potential

2. The world is full of charges, of course, and so we need to add all of their fields. But so now you need a bit of imagination. Let’s reconstruct the world by moving all charges out, and then we bring them back one by one. So we take q1 now, and we bring it back into the now-empty world. Now that does not require any energy, because there’s no field to start with. However, when we take our second charge q2, we will be doing work as we move it against the field or, if it’s an opposite charge, we’ll be taking energy out of the field. Huh? Yes. Think about it. All is symmetric. Just to make sure you’re comfortable with every step we take, let me jot down the formula for the force that’s involved. It’s just the Coulomb force of course:

Coulomb's law

Fis the force on charge q1, and Fis the force on charge q2. Now, qand q2. may attract or repel each other but the forces will always be equal and opposite. The e12 vector makes sure the directions and signs come out alright, as it’s the unit vector from qto q(not from qto q2, as you might expect when looking at the order of the indices). So we would need to integrate this for r going from infinity to… Well… The distance between qand q2 – wherever they end up as we put them back into the world – so that’s what’s denoted by r12. Now I hate integrals too, but this is an easy one. Just note that ∫ r−2dr = 1/r and you’ll be able to figure out that what I’ll write now makes sense (if not, I’ll do a similar integral in a moment): the work done in bringing two charges together from a large distance (infinity) is equal to:

U 1So now we should bring in qand then q4, of course. That’s easy enough. Bringing the first two charges into that world we had emptied took a lot of time, but now we can automate processes. Trust me: we’ll be done in no time. 🙂 We just need to sum over all of the pairs of charges qi and qj. So we write the total electrostatic energy U as the sum of the energies of all possible pairs of charges:

U 3

Huh? Can we do that? I mean… Every new charge that we’re bringing in here changes the field, doesn’t it? It does. But it’s the magic of the superposition principle at work here. Our third charge qis associated with two pairs in this formula. Think of it: we’ve got the q1qand the q2qcombination, indeed. Likewise, our fourh charge qis to be paired up with three charges now: q1, q1 and q3. This formula takes care of it, and the ‘all pairs’ mention under the summation sign (Σ) reminds us we should watch we don’t double-count pairs: the q1qand q3qcombination, for example, count for one pair only, obviously. So, yes, we write ‘all pairs’ instead of the usual i, j subscripts. But then, yes, this formula takes care of it. We’re done!

Well… Not really, of course. We’ve still got some way to go before I can introduce the Poynting vector. 🙂 However, to make sure you ‘get’ the energy formula above, let me insert an extremely simple diagram so you’ve got a bit of a visual of what we’re talking about.

U system

3. Now, let’s take a step back. We just calculated the (potential) energy of the world (U), which is great. But perhaps we should also be interested in the world’s potential Φ, rather than its potential energy U. Why? Well, we’ll want to know what happens when we bring yet another charge in—from outer space or so. 🙂 And so then it’s easier to know the world’s potential, rather than its energy, because we can calculate the field from it using the E = −Φ formula. So let’s de- and re-construct the world once again 🙂 but now we’ll look at what happens with the field and the potential.

We know our first charge created a field with a field strength we calculated as:

E 3

So, when bringing in our second charge, we can use our Φ(P) integral to calculate the potential:

potential

[Let me make a note here, just for the record. You probably think I am being pretty childish when talking about my re-construction of the world in terms of bringing all charges out and then back in again but, believe me, there will be a lot of confusion when we’ll start talking about the energy of one charge, and that confusion can be avoided, to a large extent, when you realize that the idea (I mean the concept itself, really—not its formula) of a potential involves two charges really. Just remember: it’s the first charge that causes the field (and, of course, any charge causes a field), but calculating a potential only makes sense when we’re talking some other charge. Just make a mental note of it. You’ll be grateful to me later.]

Let’s now combine the integral and the formula for E above. Because you hate integrals as much as I do, I’ll spell it out: the antiderivative of the Φ(P) integral is ∫ q/(4πε0r2)·dr. Now, let’s bring q/4πε0 out for a while so we can focus on solving ∫(1/r2)dr. Now, ∫(1/r2)dr is equal to –1/r + k, and so the whole antiderivative is –q/4πε0r + k. Now, we integrate from r = ∞ to r, and so the definite integral is [–q/(4πε0)]·[1/∞ − 1/r] = [–q/(4πε0)]·[0 − 1/r] = q/(4πε0r). Let me present this somewhat nicer:

E 4

You’ll say: so what? Well… We’re done! The only thing we need to do now is add up the potentials of all of the charges in the world. So the formula for the potential Φ at a point which we’ll simply refer to as point 1, is:

P 1

Note that our index j starts at 2, otherwise it doesn’t make sense: we’d have a division by zero for the q1/r11 term. Again, it’s an obvious remark, but not thinking about it can cause a lot of confusion down the line.

4. Now, I am very sorry but I have to inform you that we’ll be talking charge densities and all that shortly, rather than discrete charges, so I have to give you the continuum version of this formula, i.e. the formula we’ll use when we’ve got charge densities rather than individual charges. That sum above then becomes an infinite sum (i.e. an integral), and qj becomes a variable which we write as ρ(2). [That’s totally in line with our index j starts at 2, rather than from 1.] We get:

U 2

Just look at this integral, and try to understand it: we’re integrating over all of space – so we’re integrating the whole world, really 🙂 – and the ρ(2)·dVproduct in the integral is just the charge of an infinitesimally small volume of our world. So the whole integral is just the (infinite) sum of the contributions to the potential (at point 1) of all (infinitesimally small) charges that are around indeed. Now, there’s something funny here. It’s just a mathematical thing: we don’t need to worry about double-counting here. Why? We’re not having products of volumes here. Just make a mental note of it because it will be different in a moment.

Now we’re going to look at the continuum version for our energy formula indeed. Which energy formula? That electrostatic energy formula, which said that the total electrostatic energy U as the sum of the energies of all possible pairs of charges:

U 3

Its continuum version is the following monster:

U 4

Hmm… What kind of integral is that? We’ve got two variables here: dV2 and dV1. Yes. And we’ve also got a 1/2 factor now, because we do not want to double-count and, unfortunately, there is no convenient way of writing an integral like this that keeps track of the pairs. It’s a so-called double integral, but I’ll let you look up the math yourself. In any case, we can simplify this integral so you don’t need to worry about it too much. How do we simplify it? Well… Just look at that integral we got for Φ(1): we calculated the potential at point 1 by integrating the ρ(2)·dVproduct over all of space, so the integral above can be written as:

U 5But so this integral integrates the ρ(1)·Φ(1)·dVproduct over all of space, so that’s over all points in space. So we can just drop the index and write the whole thing as the integral of ρ·Φ·dV over all of space:

U 6

5. It’s time for the hat-trick now. The equation above is mathematically equivalent to the following equation:

U 7

Huh? Yes. Let me make two remarks here. First on the math, the E = −Φ formula allows you to the integrand of the integral above as E•E = (−Φ)•(−Φ) = (Φ)•(Φ). And then you may or may not remember that, when substituting E = −Φ in Maxwell’s first equation (E = ρ/ε0), we got the following equality: ρ = ε0·•(Φ) = ε0·∇2Φ, so we can write ρΦ as ε0·Φ·∇2Φ. However, that still doesn’t show the two integrals are the same thing. The proof is actually rather involved, and so I’ll refer to that post I referred to, so you can check the proof there.

The second remark is much more fundamental. The two integrals are mathematically equivalent, but are they also physically? What do I mean with that? Well… Look at it. The second integral implies that we can look at (ε0/2)·EE = ε0E2/2 as an energy density, which we’ll denote by u, so we write:

D 6

Just to make sure you ‘get’ what we’re talking about here: u is the energy density in the little cube dV in the rather simplistic (and, therefore, extremely useful) illustration below (which, just like most of what I write above, I got from Feynman).

Capture

Now the question: what is the reality of that formula? Indeed, what we did when calculating U amounted to calculating the Universe with some number U – and that’s kinda nice, of course! – but then what? Is u = ε0E2/2 anything real? Well… That’s what this post is about. So we’re finished with the introduction now. 🙂

Energy density and energy flow in electrodynamics

Before giving you any more formulas, let me answer the question: there is no doubt, in the classical theory of electromagnetism at least, that the energy density u is something very real. It has to be because of the charge conservation law. Charges cannot just disappear in space, to then re-appear somewhere else. The charge conservation law is written as j = −∂ρ/∂t, and that makes it clear it’s a local conservation law. Therefore, charges can only disappear and re-appear through some current. We write dQ1/dt = ∫ (j•n)·da = −dQ2/dt, and here’s the simple illustration that comes with it:

charge flow

So we do not allow for any ‘non-local’ interactions here! Therefore, we say that, if energy goes away from a region, it’s because it flows away through the boundaries of that region. So that’s what the Poynting formulas are all about, and so I want to be clear on that from the outset.

Now, to get going with the discussion, I need to give you the formula for the energy density in electrodynamics. Its shape won’t surprise you:

energy density

However, it’s just like the electrostatic formula: it takes quite a bit of juggling to get this from our electrodynamic equations, so, if you want to see how it’s done, I’ll refer you to Feynman. Indeed, I feel the derivation doesn’t matter all that much, because the formula itself is very intuitive: it’s really the thing everyone knows about a wave, electromagnetic or not: the energy in it is proportional to the square of its amplitude, and so that’s E•E = E2 and B•B = B2. Now, you also know that the magnitude of B is 1/c of that of E, so cB = E, and so that explains the extra c2 factor in the second term.

The second formula is also very intuitive. Let me write it down:

energy flux

Just look at it: u is the energy density, so that’s the amount of energy per unit volume at a given point, and so whatever flows out of that point must represent its time rate of change. As for the –S expression… Well… Sorry, I can’t keep re-explaining things: the • operator is the divergence, and so it give us the magnitude of a (vector) field’s source or sink at a given point. is a scalar, and if it’s positive in a region, then that region is a source. Conversely, if it’s negative, then it’s a sink. To be precise, the divergence represents the volume density of the outward flux of a vector field from an infinitesimal volume around a given point. So, in this case, it gives us the volume density of the flux of S. As you can see, the formula has exactly the same shape as j = −∂ρ/∂t.

So what is S? Well… Think about the more general formula for the flux out of some closed surface, which we get from integrating over the volume enclosed. It’s just Gauss’ Theorem:

Gauss Theorem

Just replace C by E, and think about what it meant: the flux of E was the field strength multiplied by the surface area, so it was the total flow of E. Likewise, S represents the flow of (field) energy. Let me repeat this, because it’s an important result:

S represents the flow of field energy.

Huh? What flow? Per unit area? Per second? How do you define such ‘flow’? Good question. Let’s do a dimensional analysis:

  1. E is measured in newton per coulomb, so [E•E] = [E2] = N2/C2.
  2. B is measured in (N/C)/(m/s). [Huh? Well… Yes. I explained that a couple of times already. Just check it in my introduction to electric circuits.] So we get [B•B] = [B2] = (N2/C2)·(s2/m2) but the dimension of our c2 factor is (m2/s2) so we’re left with N2/C2. That’s nice, because we need to add in the same units.
  3. Now we need to look at ε0. That constant usually ‘fixes’ our units, but can we trust it to do the same now? Let’s see… One of the many ways in which we can express its dimension is [ε0] = C2/(N·m2), so if we multiply that with N2/C2, we find that u is expressed in N/m2Wow! That’s kinda neat. Why? Well… Just multiply with m/m and its dimension becomes N·m/m= J/m3, so that’s  joule per cubic meter, so… Yes: has got the right unit for something that’s supposed to measure energy density!
  4. OK. Now, we take the time rate of change of u, and so both the right and left of our ∂u/∂t = −formula are expressed in (J/m3)/s, which means that the dimension of S itself must be J/(m2·s). Just check it by writing it all out: = ∂Sx/∂x + ∂Sy/∂x + ∂Sz/∂z, and so that’s something per meter so, to get the dimension of S itself, we need to go from cubic meter to square meter. Done! Let me highlight the grand result:

S is the energy flow per unit area and per second.

Now we’ve got its magnitude and its dimension, but what is its direction? Indeed, we’ve been writing S as a vector, but… Well… What’s its direction indeed?

Well… Hmm… I referred you to Feynman for that derivation of that u = ε0E2/2 + ε0c2B2/2 formula energy for u, and so the direction of S – I should actually say, its complete definition – comes out of that derivation as well. So… Well… I think you should just believe what I’ll be writing here for S:

S formula

So it’s the vector cross product of E and B with ε0cthrown in. It’s a simple formula really, and because I didn’t drag you through the whole argument, you should just quickly do a dimensional analysis again—just to make sure I am not talking too much nonsense. 🙂 So what’s the direction? Well… You just need to apply the usual right-hand rule:

right hand rule

OK. We’re done! This S vector, which – let me repeat it – represents the energy flow per unit area and per second, is what is referred to as Poynting’s vector, and it’s a most remarkable thing, as I’ll show now. Let’s think about the implications of this thing.

Poynting’s vector in electrodynamics

The S vector is actually quite similar to the heat flow vector h, which we presented when discussing vector analysis and vector operators. The heat flow out of a surface element da is the area times the component of perpendicular to da, so that’s (hn)·da = hn·da. Likewise, we can write (Sn)·da = Sn·da. The units of S and h are also the same: joule per second and per square meter or, using the definition of the watt (1 W = 1 J/s), in watt per square meter. In fact, if you google a bit, you’ll find that both h and S are referred to as a flux density:

  1. The heat flow vector h is the heat flux density vector, from which we get the heat flux through an area through the (hn)·da = hn·da product.
  2. The energy flow is the energy flux density vector, from which we get the energy flux through the (Sn)·da = Sn·da product.

The big difference, of course, is that we get h from a simpler vector equation:

h = κT ⇔ (hxhyhz) = −κ(∂Tx/∂x, ∂Ty/∂y,∂Tz/∂x)

The vector equation for S is more complicated:

S formula

So it’s a vector product. Note that S will be zero if E = 0 and/or if B = 0. So S = 0 in electrostatics, i.e. when there are no moving charges and only steady currents. Let’s examine Feynman’s examples.

The illustration below shows the geometry of the E, B and S vectors for a light wave. It’s neat, and totally in line with what we wrote on the radiation pressure, or the momentum of light. So I’ll refer you to that post for an explanation, and to Feynman himself, of course.

light wave

OK. The situation here is rather simple. Feynman gives a few others examples that are not so simple, like that of a charging capacitor, which is depicted below.

capacitor

The Poynting vector points inwards here, toward the axis. What does it mean? It means the energy isn’t actually coming down the wires, but from the space surrounding the capacitor. 

What? I know. It’s completely counter-intuitive, at first that is. You’d think it’s the charges. But it actually makes sense. The illustration below shows how we should think of it. The charges outside of the capacitor are associated with a weak, enormously spread-out field that surrounds the capacitor. So if we bring them to the capacitor, that field gets weaker, and the field between the plates gets stronger. So the field energy which is way out moves into the space between the capacitor plates indeed, and so that’s what Poynting’s vector tells us here.

capacitor 2

Hmm… Yes. You can be skeptic. You should be. But that’s how it works. The next illustration looks at a current-carrying wire itself. Let’s first look at the B and E vectors. You’re familiar with the magnetic field around a wire, so the B vector makes sense, but what about the electric field? Aren’t wires supposed to be electrically neutral? It’s a tricky question, and we handled it in our post on the relativity of fields. The positive and negative charges in a wire should cancel out, indeed, but then it’s the negative charges that move and, because of their movement, we have the relativistic effect of length contraction, so the volumes are different, and the positive and negative charge density do not cancel out: the wire appears to be charged, so we do have a mix of E and B! Let me quickly give you the formula: E = (2πε0)·(λ/r), with λ the (apparent) charge per unit length, so it’s the same formula as for a long line of charge, or for a long uniformly charged cylinder.

So we have a non-zero E and B and, hence, a non-zero Poynting vector S, whose direction is radially inward, so there is a flow of energy into the wire, all around. What the hell? Where does it go? Well… There’s a few possibilities here: the charges need kinetic energy to move, or as they increase their potential energy when moving towards the terminals of our capacitor to increase the charge on the plates or, much more mundane, the energy may be radiated out again in the form of heat. It looks crazy, but that’s how it is really. In fact, the more you think about, the more logical it all starts to sound. Energy must be conserved locally, and so it’s just field energy going in and re-appearing in some other form. So it does make sense. But, yes, it’s weird, because no one bothered to teach us this in school. 🙂

wire

The ‘craziest’ example is the one below: we’ve got a charge and a magnet here. All is at rest. Nothing is moving… Well… I’ll correct that in a moment. 🙂 The charge (q) causes a (static) Coulomb field, while our magnet produces the usual magnetic field, whose shape we (should) recognize: it’s the usual dipole field. So E and B are not changing. But so when we calculate our Poynting vector, we see there is a circulation of S. The E×B product is not zero. So what’s going on here?

crazy

Well… There is no net change in energy with time: the energy just circulates around and around. Everything which flows into one volume flows out again. As Feynman puts it: “It is like incompressible water flowing around.” What’s the explanation? Well… Let me copy Feynman’s explanation of this ‘craziness’:

“Perhaps it isn’t so terribly puzzling, though, when you remember that what we called a “static” magnet is really a circulating permanent current. In a permanent magnet the electrons are spinning permanently inside. So maybe a circulation of the energy outside isn’t so queer after all.”

So… Well… It looks like we do need to revise some of our ‘intuitions’ here. I’ll conclude this post by quoting Feynman on it once more:

“You no doubt get the impression that the Poynting theory at least partially violates your intuition as to where energy is located in an electromagnetic field. You might believe that you must revamp all your intuitions, and, therefore have a lot of things to study here. But it seems really not necessary. You don’t need to feel that you will be in great trouble if you forget once in a while that the energy in a wire is flowing into the wire from the outside, rather than along the wire. It seems to be only rarely of value, when using the idea of energy conservation, to notice in detail what path the energy is taking. The circulation of energy around a magnet and a charge seems, in most circumstances, to be quite unimportant. It is not a vital detail, but it is clear that our ordinary intuitions are quite wrong.”

Well… That says it all, I guess. As far as I am concerned, I feel the Poyning vector makes things actually easier to understand. Indeed, the E and B vectors were quite confusing, because we had two of them, and the magnetic field is, frankly, a weird thing. Just think about the units in which we’re measuring B: (N/C)/(m/s). can’t imagine what a unit like that could possible represent, so I must assume you can’t either. But so now we’ve got this Poynting vector that combines both E and B, and which represents the flow of the field energy. Frankly, I think that makes a lot of sense, and it’s surely much easier to visualize than E and/or B. [Having said that, of course, you should note that E and B do have their value, obviously, if only because they represent the lines of force, and so that’s something very physical too, of course. I guess it’s a matter of taste, to some extent, but so I’d tend to soften Feynman’s comments on the supposed ‘craziness’ of S.

In any case… The next thing I should discuss is field momentum. Indeed, if we’ve got flow, we’ve got momentum. But I’ll leave that for my next post. This topic can’t be exhausted in one post only, indeed. 🙂 So let me conclude this post. I’ll do with a very nice illustration I got from the Wikipedia article on the Poynting vector. It shows the Poynting vector around a voltage source and a resistor, as well as what’s going on in-between. [Note that the magnetic field is given by the field vector H, which is related to B as follows: B = μ0(H + M), with M the magnetization of the medium. B and H are obviously just proportional in empty space, with μ0 as the proportionality constant.]

Poynting_vectors_of_DC_circuit

Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 20, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Re-visiting relativity and four-vectors: the proper time, the tensor and the four-force

Pre-script (dated 26 June 2020): Our ideas have evolved into a full-blown realistic (or classical) interpretation of all things quantum-mechanical. In addition, I note the dark force has amused himself by removing some material. So no use to read this. Read my recent papers instead. 🙂

Original post:

My previous post explained how four-vectors transform from one reference frame to the other. Indeed, a four-vector is not just some one-dimensional array of four numbers: it represent something—a physical vector that… Well… Transforms like a vector. 🙂 So what vectors are we talking about? Let’s see what we have:

  1. We knew the position four-vector already, which we’ll write as xμ = (ct, x, y, z) = (ct, x).
  2. We also proved that Aμ = (Φ, Ax, Ay, Az) = (Φ, A) is a four-vector: it’s referred to as the four-potential.
  3. We also know the momentum four-vector from the Lectures on special relativity. We write it as pμ = (E, px, py, pz) = (E, p), with E = γm0, p = γm0v, and γ = (1−v2/c2)−1/2 or, for = 1, γ = (1−v2)−1/2

To show that it’s not just a matter of adding some fourth t-component to a three-vector, Feynman gives the example of the four-velocity vector. We have v= dx/dt, v= dy/dt and v= dz/dt, but a vμ = (d(ct)/dt, dx/dt, dy/dt, dz/dt) = (c, dx/dt, dy/dt, dz/dt) ‘vector’ is, obviously, not a four-vector. [Why obviously? The inner product vμvμ  is not invariant.] In fact, Feynman ‘fixes’ the problem by noting that ct, x, y and z have the ‘right behavior’, but the d/dt operator doesn’t. The d/dt operator is not an invariant operator. So how does he fix it then? He tries the (1−v2/c2)−1/2·d/dt operator and, yes, it turns out we do get a four-vector then. In fact, we get that four-velocity vector uμ that we were looking for:four-velocity vector[Note we assume we’re using equivalent time and distance units now, so = 1 and v/c reduces to a new variable v.]

Now how do we know this is four-vector? How can we prove this one? It’s simple. We can get it from our pμ = (E, p) by dividing it by m0, which is an invariant scalar in four dimensions too. Now, it is easy to see that a division by an invariant scalar does not change the transformation properties. So just write it all out, and you’ll see that pμ/m0 = uμ and, hence, that uμ is a four-vector too. 🙂

We’ve got an interesting thing here actually: division by an invariant scalar, or applying that (1−v2/c2)−1/2·d/dt operator, which is referred to as an invariant operator, on a four-vector will give us another four-vector. Why is that? Let’s switch to compatible time and distance units so c = 1 so to simplify the analysis that follows.

The invariant (1−v2)−1/2·d/dt operator and the proper time s

Why is the (1−v2)−1/2·d/dt operator invariant? Why does it ‘fix’ things? Well… Think about the invariant spacetime interval (Δs)= Δt− Δx− Δy− Δz2 going to the limit (ds)= dt− dx− dy− dz2 . Of course, we can and should relate this to an invariant quantity s = ∫ ds. Just like Δs, this quantity also ‘mixes’ time and distance. Now, we could try to associate some derivative d/ds with it because, as Feynman puts it, “it should be a nice four-dimensional operation because it is invariant with respect to a Lorentz transformation.” Yes. It should be. So let’s relate ds to dt and see what we get. That’s easy enough: dx = vx·dt, dy = vy·dt, dz = vz·dt, so we write:

(ds)= dt− vx2·dt− vy2·dt− vz2·dt⇔ (ds)= dt2·(1 − vx− vy− vz2) = dt2·(1 − v2)

and, therefore, ds = dt·(1−v2)1/2. So our operator d/ds is equal to (1−v2)−1/2·d/dt, and we can apply it to any four-vector, as we are sure that, as an invariant operator, it’s going to give us another four-vector. I’ll highlight the result, because it’s important:

The d/ds = (1−v2)−1/2·d/dt operator is an invariant operator for four-vectors.

For example, if we apply it to xμ = (t, x, y, z), we get the very same four-velocity vector μμ:

dxμ/ds = uμ = pμ/m0

Now, if you’re somewhat awake, you should ask yourself: what is this s, really, and what is this operator all about? Our new function s = ∫ ds is not the distance function, as it’s got both time and distance in it. Likewise, the invariant operator d/ds = (1−v2)−1/2·d/dt has both time and distance in it (the distance is implicit in the v2 factor). Still, it is referred to as the proper time along the path of a particle. Now why is that? If it’s got distance and time in it, why don’t we call it the ‘proper distance-time’ or something?

Well… The invariant quantity s actually is the time that would be measured by a clock that’s moving along, in spacetime, with the particle. Just think of it: in the reference frame of the moving particle itself, Δx, Δy and Δz must be zero, because it’s not moving in its own reference frame. So the (Δs)= Δt− Δx− Δy− Δz2 reduces to (Δs)= Δt2, and so we’re only adding time to s. Of course, this view of things implies that the proper time itself is fixed only up to some arbitrary additive constant, namely the setting of the clock at some event along the ‘world line’ of our particle, which is its path in four-dimensional spacetime. But… Well… In a way, s is the ‘genuine’ or ‘proper’ time coming with the particle’s reference frame, and so that’s why Einstein called it like that. You’ll see (later) that it plays a very important role in general relativity theory (which is a topic we haven’t discussed yet: we’ve only touched special relativity, so no gravity effects).

OK. I know this is simple and complicated at the same time: the math is (fairly) easy but, yes, it may be difficult to ‘understand’ this in some kind of intuitive way. But let’s move on.

The four-force vector fμ

We know the relativistically correct equation for the motion of some charge q. It’s just Newton’s Law F = dp/dt = d(mv)/dt. The only difference is that we are not assuming that m is some constant. Instead, we use the p = γm0v formula to get:

motion

How can we get a four-vector for the force? It turns out that we get it when applying our new invariant operator to the momentum four-vector pμ = (E, p), so we write: fμ = dpμ/ds. But pμ = m0uμ = m0dxμ/ds, so we can re-write this as fμ = d(m0·dxμ/ds)/ds, which gives us a formula which is reminiscent of the Newtonian F = ma equation:

force formula

What is this thing? Well… It’s not so difficult to verify that the x, y and z-components are just our old-fashioned Fx, Fy and Fz, so these are the components of F. The t-component is (1−v2)−1/2·dE/dt. Now, dE/dt is the time rate of change of energy and, hence, it’s equal to the rate of doing work on our charge, which is equal to Fv. So we can write fμ as:

froce

The force and the tensor

We will now derive that formula which we ended the previous post with. We start with calculating the spacelike components of fμ from the Lorentz formula F = q(E + v×B). [The terminology is nice, isn’t it? The spacelike components of the four-force vector! Now that sounds impressive, doesn’t it? But so… Well… It’s really just the old stuff we know already.] So we start with fx = Fx, and write it all out:

fx

What a monster! But, hey! We can ‘simplify’ this by substituting stuff by (1) the t-, x-, y- and z-components of the four-velocity vector uμ and (2) the components of our tensor Fμν = [Fij] = [∇iAj − ∇jAi] with i, j = t, x, y, z. We’ll also pop in the diagonal Fxx = 0 element, just to make sure it’s all there. We get:

fx 2

Looks better, doesn’t it? 🙂 Of course, it’s just the same, really. This is just an exercise in symbolism. Let me insert the electromagnetic tensor we defined in our previous post, just as a reminder of what that Fμν matrix actually is:

electromagnetic tensor final

If you read my previous post, this matrix – or the concept of a tensor – has no secrets for you. Let me briefly summarize it, because it’s an important result as well. The tensor is (a generalization of) the cross-product in four-dimensional space. We take two vectors: aμ = (at, ax, ay, az) and bμ = (bt, bx, by, bz) and then we take cross-products of their components just like we did in three-dimensional space, so we write Tij = aibj − ajbi. Now, it’s easy to see that this combination implies that Tij = − Tji and that Tii = 0, which is why we only have six independent numbers out of the 16 possible combinations, and which is why we’ll get a so-called anti-symmetric matrix when we organize them in a matrix. In three dimensions, the very same definition of the cross-product Tij gives us 9 combinations, and only 3 independent numbers, which is why we represented our ‘tensor’ as a vector too! In four-dimensional space we can’t do that: six things cannot be represented by a four-vector, so we need to use this matrix, which is referred to as a tensor of the second rank in four dimensions. [When you start using words like that, you’ve come a long way, really. :-)]

[…] OK. Back to our four-force. It’s easy to get a similar one-liner for fy and fz too, of course, as well as for ft. But… Yes, ft… Is it the same thing really? Let me quickly copy Feynman’s calculation for ft:

ft

It does: remember that v×B and v are orthogonal, and so their dot product is zero indeed. So, to make a long story short, the four equations – one for each component of the four-force vector fμ – can be summarized in the following elegant equation:

motion equation

Writing this all requires a few conventions, however. For example, Fμν is a 4×4 matrix and so uν has to be written as a 1×4 vector. And the formula for the fx and ft component also make it clear that we also want to use the +−−− signature here, so the convention for the signs in the uνFμν product is the same as that for the scalar product aμbμ. So, in short, you really need to interpret what’s being written here.

A more important question, perhaps, is: what can we do with it? Well… Feynman’s evaluation of the usefulness of this formula is rather succinct: “Although it is nice to see that the equations can be written that way, this form is not particularly useful. It’s usually more convenient to solve for particle motions by using the F = q(E + v×B) = (1−v2)−1/2·d(m0v)/dt equations, and that’s what we will usually do.”

Having said that, this formula really makes good on the promise I started my previous post with: we wanted a formula, some mathematical construct, that effectively presents the electromagnetic force as one force, as one physical reality. So… Well… Here it is! 🙂

Well… That’s it for today. Tomorrow we’ll talk about energy and about a very mysterious concept—the electromagnetic mass. That should be fun! So I’ll c u tomorrow! 🙂

Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 20, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Relativistic transformations of fields and the electromagnetic tensor

Pre-script (dated 26 June 2020): Our ideas have evolved into a full-blown realistic (or classical) interpretation of all things quantum-mechanical. In addition, I note the dark force has amused himself by removing some material. So no use to read this. Read my recent papers instead. 🙂

Original post:

We’re going to do a very interesting piece of math here. It’s going to bring a lot of things together. The key idea is to present a mathematical construct that effectively presents the electromagnetic force as one force, as one physical reality. Indeed, we’ve been saying repeatedly that electromagnetism is one phenomenon only but we’ve been writing it always as something involving two vectors: he electric field vector E and the magnetic field vector B. Of course, Lorentz’ force law F = q(E + v×B) makes it clear we’re talking one force only but… Well… There is a way of writing it all up that is much more elegant.

I have to warn you though: this post doesn’t add anything to the physics we’ve seen so far: it’s all math, really and, to a large extent, math only. So if you read this blog because you’re interested in the physics only, then you may just as well skip this post. Having said that, the mathematical concept we’re going to present is that of the tensor and… Well… You’ll have to get to know that animal sooner or later anyway, so you may just as well give it a try right now, and see whatever you can get out of this post.

The concept of a tensor further builds on the concept of the vector, which we liked so much because it allows us to write the laws of physics as vector equations, which do not change when going from one reference frame to another. In fact, we’ll see that a tensor can be described as a ‘special’ vector cross product (to be precise, we’ll show that a tensor is a ‘more general’ cross product, really). So the tensor and vector concepts are very closely related, but then… Well… If you think about it, the concept of a vector and the concept of a scalar are closely related, too! So we’re just moving up the value chain, so to speak: from scalar fields to vector fields to… Well… Tensor fields! And in quantum mechanics, we’ll introduce spinors, and so we also have spinor fields! Having said that, don’t worry about tensor fields. Let’s first try to understand tensors tout court. 🙂

So… Well… Here we go. Let me start with it all by reminding you of the concept of a vector, and why we like to use vectors and vector equations.

The invariance of physics and the use of vector equations

What’s a vector? You may think, naively, that any one-dimensional array of numbers is a vector. But… Well… No! In math, we may, effectively, refer to any one-dimensional array of numbers as a ‘vector’, perhaps, but in physics, a vector does represent something real, something physical, and so a vector is only a vector if it transforms like a vector under the transformation rules that apply when going from one another frame of reference, i.e. one coordinate system, to another. Examples of vectors in three dimensions are: the velocity vector v, or the momentum vector p = m·v, or the position vector r.

Needless to say, the same can be said of scalars: mathematicians may define a scalar as just any real number, but it’s not in physics. A scalar in physics refers to something real, i.e. a scalar field, like the temperature (T) inside of a block of material. In fact, think about your first vector equation: it may have been the one determining the heat flow (h), i.e. h = −κ·T = (−κ·∂T/∂x, −κ·∂T/∂y, −κ·∂T/∂z). It immediately shows how scalar and vector fields are intimately related.

Now, when discussing the relativistic framework of physics, we introduced vectors in four dimensions, i.e. four-vectors. The most basic four-vector is the spacetime four-vector R = (ct, x, y, z), which is often referred to as an event, but it’s just a point in spacetime, really. So it’s a ‘point’ with a time as well as a spatial dimension, so it also has t in it, besides x, y and z. It is also known as the position four-vector but, again, you should think of a ‘position’ that includes time! Of course, we can re-write R as R = (ct, r), with r = (x, y, z), so here we sort of ‘break up’ the four-vector in a scalar and a three-dimensional vector, which is something we’ll do from time to time, indeed. 🙂

We also have a displacement four-vector, which we can write as ΔR = (c·Δt, Δr). There are other four-vectors as well, including the four-velocity, the four-momentum and the four-force four-vectors, which we’ll discuss later (in the last section of this post).

So it’s just like using three-dimensional vectors in three-dimensional physics, or ‘Newtonian’ physics, I should say: the use of four-vectors is going to allow us to write the laws of physics using vector equations, but in four dimensions, rather than three, so we get the ‘Einsteinian’ physics, the real physics, so to speak—or the relativistically correct physics, I should say. And so these four-dimensional vector equations will also not change when going from one reference frame to another, and so our four-vector will be vectors indeed, i.e. they will transform like a vector under the transformation rules that apply when going from one another frame of reference, i.e. one coordinate system, to another.

What transformation? Well… In Newtonian or Galilean physics, we had translations and rotations and what have you, but what we are interested in right now are ‘Einsteinian’ transformations of coordinate systems, so these have to ensure that all of the laws of physics that we know of, including the principle of relativity, still look the same. You’ve seen these transformation rules. We don’t call them the ‘Einsteinian’ transformation rules, but the Lorentz transformation rules, because it was a Dutch physicist (Hendrik Lorentz) who first wrote them down. So these rules are very different from the Newtonian or Galilean transformation rules which everyone assumed to be valid until the Michelson-Morley experiment unequivocally established that the speed of light did not respect the Galilean transformation rules. Very different? Well… Yes. In their mathematical structure, that is. Of course, when velocities are low, i.e. non-relativistic, then they yield the same result, approximately, that is. However, I explained that in my post on special relativity, and so I won’t dwell on that here.

Let me just jot down both sets of rules assuming that the two reference frames move with respect to each other along the x- axis only, so the y- and z-component of u is zero.

Capture

The Galilean or Newtonian rules are the simple rules on the right. Going from one reference frame to another (let’s call them S and S’ respectively) is just a matter of adding or subtracting speeds: if my car goes 100 km/h, and yours goes 120 km/h, then you will see my car falling behind at a speed of (minus) 20 km/h. That’s it. We could also rotate our reference frame, and our Newtonian vector equations would still look the same. As Feynman notes, smilingly, it’s what a lot of armchair philosophers think relativity theory is all about, but so it’s got nothing to do with it. It’s plain wrong!

In any case, back to vectors and transformations. The key to the so-called invariance of the laws of physics is the use of vectors and vector operators that transform like vectors. For example, if we defined A and B as (Ax, Ay, Az) and (Bx, By, Bz), then we knew that the so-called inner product Awould look the same in all rotated coordinate systems, so we can write: AB A’•B’. So we know that if we have a product like that on both sides of an equation, we’re fine: the equation will have the same form in all rotated coordinate systems. Also, the gradient, i.e. our vector operator  = (∂/∂x, ∂/∂y, ∂/∂z), when applied to a scalar function, gave three quantities that also transform like a vector under rotation. We also defined a vector cross product, which yielded a vector (as opposed to the inner product, i.e. the vector dot product, which yields a scalar):

cross product

So how does this thing behave under a Galilean transformation? Well… You may or may not remember that we used this cross-product to define the angular momentum L, which was a cross product of the radius vector r and the momentum vector p = mv, as illustrated below. The animation also gives the torque τ, which is, loosely speaking, a measure of the turning force: it’s the cross product of r and F, i.e. the force on the lever-arm.

Torque_animation

The components of L are:

momentum angular

Now, we find that these three numbers, or objects if you want, transform in exactly the same way as the components of a vector. However, as Feynman points out, that’s a matter of ‘luck’ really. It’s something ‘special’. Indeed, you may or may not remember that we distinguished axial vectors from polar vectors. L is an axial vector, while r and p are polar vectors, and so we find that, in three dimensions, the cross product of two polar vectors will always yields an axial vector. Axial vectors are sometimes referred to as pseudovectors, which suggests that they are ‘not so real’ as… Well… Polar vectors, which are sometimes referred to as ‘true’ vectors. However, it doesn’t matter when doing these Newtonian or Galilean transformations: pseudo or true, both vectors transform like vectors. 🙂

But so… Well… We’re actually getting a bit of a heads-up here: if we’d be mixing (or ‘crossing’) polar and axial vectors, or mixing axial vectors only, so if we’d define something involving and p (rather than r and p), or something involving and τ, then we may not be so lucky, and then we’d have to carefully examine our cross-product, or whatever other product we’d want to define, because its components may not behave like a vector.

Huh? Whatever other product we’d want to define? Why are you saying that? Well… We actually can think of other products. For example, if we have two vectors a = (ax, ay, az) and b = (bx, by, bz), then we’ll have nine possible combinations of their components, which we can write as Tij = aibj. So that’s like Lxy, Lyz and Lzx really. Now, you’ll say: “No. It isn’t. We don’t have nine combinations here. Just three numbers.” Well… Think about it: we actually do have nine Lij combinations too here, as we can write: Lij = ri·pj – rj·pi. It just happens that, with this definition, only three of these combinations Lij are independent. That’s because the other six numbers are either zero or the opposite. Indeed, it’s easy to verify that Lij = –Lji , and Lii  = 0. So… Well… It turns out that the three components of our L = r×p ‘vector’ are actually a subset of a set of nine Lij numbers. So… Well… Think about it. We cannot just do whatever we want with our ‘vectors’. We need to watch out.

In fact, I do not want to get too much ahead of myself, but I can already tell you that the matrix with these nine Tij = aibj combinations is what is referred to as the tensor. To be precise, it’s referred to as a tensor of the second rank in three dimensions. The ‘second rank’, aka as ‘degree’ or ‘order’ refers to the fact that we’ve got two indices, and the ‘three dimensions’ is because we’re using three-dimensional vectors. We’ll soon see that the electromagnetic tensor is also of the second rank, but it’s a tensor in four dimensions. In any case, I should not get ahead of myself. Just note what I am saying here: the tensor is like a ‘new’ product of two vectors, a new type of ‘cross’ product really (because we’re mixing the components, so to say), but it doesn’t yield a vector: it yields a matrix. For three-dimensional vectors, we get a 3×3 matrix. For four-vectors, we’ll get a 4×4 matrix. And so the full truth about our angular momentum vector L, is the following:

  1. There is a thing which we call the angular momentum tensor. It’s a 3×3 matrix, so it has nine elements which are defined as: Lij = ri·pj – rj·pi. Because of this definition, it’s an antisymmetric tensor of the second order in three dimensions, so it’s got only three independent components.
  2. The three independent elements are the components of our ‘vector’ L, and picking them out and calling these three components a ‘vector’ is actually a ‘trick’ that only works in three dimensions. They really just happen to transform like a vector under rotation or under whatever Galilean transformation! [By the way, do you know understand why I was saying that we can look at a tensor as a ‘more general’ cross product?]
  3. In fact, in four dimensions, we’ll use a similar definition and define 16 elements Fij as Fij = ∇iAj − ∇jAi, using the two four-vectors ∇μ and Aμ (so we have 4×4 = 16 combinations indeed), out of which only six will be independent for the very same reason: we have an antisymmetric vector combination here, Fij = −Fji and Fii = 0. 🙂 However, because we cannot represent six independent things by four things, we do not get some other four-vector, and so that’s why we cannot apply the same ‘trick’ in four dimensions.

However, here I am getting way ahead of myself and so… Well… Yes. Back to the main story line. 🙂 So let’s try to move to the next level of understanding, which is… Well…

Because of guys like Maxwell and Einstein, we now know that rotations are part of the Newtonian world, in which time and space are neatly separated, and that things are not so simple in Einstein’s world, which is the real world, as far as we know, at least! Under a Lorentz transformation, the new ‘primed’ space and time coordinates are a mixture of the ‘unprimed’ ones. Indeed, the new x’ is a mixture of x and t, and the new t’ is a mixture of x and t as well. [Yes, please scroll all the way up and have a look at the transformation on the left-hand side!]

So you don’t have that under a Galilean transformation: in the Newtonian world, space and time are neatly separated, and time is absolute, i.e. it is the same regardless of the reference frame. In Einstein’s world – our world – that’s not the case: time is relative, or local as Hendrik Lorentz termed it quite appropriately, and so it’s space-time – i.e. ‘some kind of union of space and time’ as Minkowski termed it  that transforms.

So that’s why physicists use four-vectors to keep track of things. These four-vectors always have three space-like components, but they also include one so-called time-like componentIt’s the only way to ensure that the laws of physics are unchanged when moving with uniform velocityIndeed, any true law of physics we write down must be arranged so that the invariance of physics (as a “fact of Nature”, as Feynman puts it) is built in, and so that’s why we use Lorentz transformations and four-vectors.

In the mentioned post, I gave a few examples illustrating how the Lorentz rules work. Suppose we’re looking at some spaceship that is moving at half the speed of light (i.e. 0.5c) and that, inside the spaceship, some object is also moving at half the speed of light, as measured in the reference frame of the spaceship, then we get the rather remarkable result that, from our point of view (i.e. our reference frame as observer on the ground), that object is not going as fast as light, as Newton or Galileo – and most present-day armchair philosophers 🙂 – would predict (0.5+ 0.5c = c). We’d see it move at a speed equal to = 0.8c. Huh? How do we know that? Well… We can derive a velocity formula from the Lorentz rules:

Capture

So now you can just put in the numbers now: vx = (0.5c + 0.5c)/(1 + 0.5·0.5) = 0.8c. See?

Let’s do another example. Suppose we’re looking at a light beam inside the spaceship, so something that’s traveling at speed c itself in the spaceship. How does that look to us? The Galilean transformation rules say its speed should be 1.5c, but that can’t be true of course, and the Lorentz rules save us once more: vx = (0.5c + c)/(1 + 0.5·1) = c, so it turns out that the speed of light does not depend on the reference frame: it looks the same – both to the man in the ship as well as to the man on the ground. As Feynman puts it: “This is good, for it is, in fact, what the Einstein theory of relativity was designed to do in the first place—so it had better work!” 🙂

So let’s now apply relativity to electromagnetism. Indeed, that’s what this post is all about! However, before I do so, let me re-write the Lorentz transformation rules for = 1. We can equate the speed of light to one, indeed, when measure time and distance in equivalent units. It’s just a matter of ditching our seconds for meters (so our time unit becomes the time that light needs to travel a distance of one meter), or ditching our meters for seconds (so our distance unit becomes the distance that light travels in one second). You should be familiar with this procedure. If not, well… Check out my posts on relativity. So here’s the same set of rules for = 1:

Lorentz rules

They’re much easier to remember and work with, and so that’s good, because now we need to look at how these rules work with four-vectors and the various operations and operators we’ll be defining on them. Let’s look at that step by step.

Electrodynamics in relativistic notation

Let me copy the Universal Set of Equations and Their Solution once more:

frame

The solution for Maxwell’s equations is given in terms of the (electric) potential Φ and the (magnetic) vector potential A. I explained that in my post on this, so I won’t repeat myself too much here either. The only point you should note is that this solution is the result of a special choice of Φ and A, which we referred to as the Lorentz gauge. We’ll touch upon this condition once more, so just make a mental note of it.

Now, E and B do not correspond to four-vectors: they depend on x, y, z and t, but they have three components only: Ex, Ey, Ez, and Bx, By, and Bz respectively. So we have six independent terms here, rather than four things that, somehow, we could combine into some four-vector. [Does this ring a bell? It should. :-)] Having said that, it turns out that we can combine Φ and A into a four-vector, which we’ll refer to as the four-potential and which we’ll will write as:

Aμ = (Φ, A) = (Φ, Ax, Ay, Az) = (At, Ax, Ay, Az) with At = Φ.

So that’s a four-vector just like R = (ct, x, y, z).

How do we know that Aμ is a four-vector? Well… Here I need to say a few things about those Lorentz transformation rules and, more importantly, about the required condition of invariance under a Lorentz transformation. So, yes, here we need to dive into the math.

Four-vectors and invariance under Lorentz transformations

When you were in high-school, you learned how to rotate your coordinate frame. You also learned that the distance of a point from the origin does not change under a rotation, so you’d write r’= x’+ y’+ z’= r= x+ y+ z2, and you’d say that r2 is an invariant quantity under a rotation. Indeed, transformations leave certain things unchanged. From the Lorentz transformation rules itself, it is easy to see that

c·t’– x’– y’–z ‘2 = c·t–x– y – z2, or,

if = 1, that t’– x’– y’– z’2 = t– x– y – z2,

is an invariant under a Lorentz transformation. We found the same for the so-called spacetime interval Δs = ΔrcΔt2, which we write as Δs = Δr– Δt2 as we chose our time or distance units such that = 1. [Note that, from now on, we’ll assume that’s the case, so = 1 everywhere. We can always change back to our old units when we’re done with the analysis.] Indeed, such invariance allowed us to define spaceliketimelike and lightlike intervals using the so-called light cone emanating from a single event and traveling in all directions.

You should note that, for four-vectors, we do not have a simple sum of three terms. Indeed, we don’t write x+ y+ z2 but t– x– y – z2. So we’ve got a +−−− thing here or, it’s just another convention, we could also work with a −+++ sum of terms. The convention is referred to as the signature, and we will use the so-called metric signature here, which is +−−−. Let’s continue the story. Now, all four-vectors aμ = (at, ax, ay, az) have this property that:

at– ax– ay– az2 = at– ax– ay – az2.

[The primed quantities are, obviously, the quantities as measured in the other reference frame.] So. Well… Yes. 🙂 But… Well… Hmm… We can say that our four-potential vector is a four-vector, but so we still have to prove that. So we need to prove that Φ’– Ax– Ay– Az2 = Φ– Ax– Ay – Az2 for our four-potential vector Aμ = (Φ, A). So… Yes… How can we do that? The proof is not so easy, but you need to go through it as it will introduce some more concepts and ideas you need to understand.

In my post on the Lorentz gauge, I mentioned that Maxwell’s equations can be re-written in terms of Φ and A, rather than in terms of E and B. The equations are:

Equations 2

The expression look rather formidable, but don’t panic: just look at it. Of course, you need to be familiar with the operators that are being used here, so that’s the Laplacian ∇2 and the divergence operator • that’s being applied to the scalar Φ and the vector A. I can’t re-explain this. I am sorry. Just check my posts on vector analysis. You should also look at the third equation: that’s just the Lorentz gauge condition, which we introduced when deriving these equations from Maxwell’s equations. Having said that, it’s the first and second equation which describe Φ and A as a function of the charges and currents in space, and so that’s what matters here. So let’s unfold the first equation. It says the following:

potential formula

In fact, if we’d be talking free or empty space, i.e. regions where there are no charges and currents, then the right-hand side would be zero and this equation would then represent a wave equation, so some potential Φ that is changing in time and moving out at the speed c. Here again, I am sorry I can’t write about this here: you’ll need to check one of my posts on wave equations. If you don’t want to do that, you should believe me when I say that, if you see an equation like this:

f8then the function Ψ(x, t) must be some function

solution

Now, that’s a function representing a wave traveling at speed c, i.e. the phase velocity. Always? Yes. Always! It’s got to do with the x − ct and/or x + ct  argument in the function. But, sorry, I need to move on here.

The unfolding of the equation with Φ makes it clear that we have four equations really. Indeed, the second equation is three equations: one for Ax, one for Ay, and one for Az respectively. The four quantities on the right-hand side of these equations are ρ, jx, jy and jz respectively, divided by ε0, which is a universal constant which does not change when going from one coordinate system to another. Now, the quantities ρ, jx, jy and jz transform like a four-vector. How do we know that? It’s just the charge conservation law. We used it when solving the problem of the fields around a moving wire, when we demonstrated the relativity of the electric and magnetic field. Indeed, the relevant equations were:

Lorentz j and rho

You can check that against the Lorentz transformation rules for = 1. They’re exactly the same, but so we chose t = 0, so the rules are even simpler. Hence, the (ρ, jx, jy, jz) vector is, effectively, a four-vector, and we’ll denote it by jμ = (ρ, j). I now need to explain something else. [And, yes, I know this is becoming a very long story but… Well… That’s how it is.]

It’s about our operators , ∇•, × and ∇, so that’s the gradient, the divergence, curl and Laplacian operator respectively: they all have a four-dimensional equivalent. Of course, that won’t surprise you. 😦 Let me just jot all of them down, so we’re done with that, and then I’ll focus on the four-dimensional equivalent of the Laplacian  ∇•∇ = ∇, which is referred to as the D’Alembertian, and which is denoted by 2, because that’s the one we need to prove that our four-potential vector is a real four-vector. [I know: is a tiny symbol for a pretty monstrous thing, but I can’t help it: my editor tool is pretty limited.]

Four-vectors

Now, we’re almost there. Just hang in for a little longer. It should be obvious that we can re-write those two equations with Φ, A, ρ and j, as:

Formula d'alembertian 2

Just to make sure, let me remind you that Aμ = (Φ, A) and that jμ = (ρ, j). Now, our new D’Alembertian operator is just an operator—a pretty formidable operator but, still, it’s an operator, and so it doesn’t change when the coordinate system changes, so the conclusion is that, IF jμ = (ρ, j) is a four-vector – which it is – and, therefore, transforms like a four-vector, THEN the quantities Φ, Ax, Ay, and Az must also transform like a four-vector, which means they are (the components of) a four-vector.

So… Well… Think about it, but not too long, because it’s just an intermediate result we had to prove. So that’s done. But we’re not done here. It’s just the beginning, actually. :-/ Let me repeat our intermediate result:

Aμ = (Φ, A) is a four-vector. We call it the four-potential vector.

OK. Let’s continue. Let me first draw your attention to that expression with the D’Alembertian above. Which expression? This one:

Formula d'alembertian 2

What about it? Well… You should note that the physics of that equation is just the same as Maxwell’s equations. So it’s one equation only, but it’s got it all.

It’s quite a pleasure to re-write it in such elegant form. Why? Think about it: it’s a four-vector equation: we’ve got a four-vector on the left-hand side, and a four-vector on the right-hand side. Therefore, this equation is invariant under a transformation. So, therefore, it directly shows the invariance of electrodynamics under the Lorentz transformation.

Huh? Yes. You may think about this a little longer. 🙂

To wrap this up, I should also note that we can also express the gauge condition using our new four-vector notation. Indeed, we can write it as:

Lorentz condition

It’s referred to as the Lorentz condition and it is, effectively, a condition for invariance, i.e. it ensures that the four-vector equation above does stay in the form it is in for all reference frames. Note that we’re re-writing it using the four-dimensional equivalent of the divergence operator •, but so we don’t have a dot between ∇μ and Aμ. In fact, the notation is pretty confusing, and it’s easy to think we’re talking some gradient, rather than the divergence. So let me therefore highlight the meaning of both once again. It looks the same, but it’s two very different things: the gradient operates on a scalar, while the divergence operates on a (four-)vector. Also note the +−−− signature is only there for the gradient, not for the divergence!

example

You’ll wonder why they didn’t use some • or ∗ symbol, and the answer: I don’t know. I know it’s hard to keep inventing symbols for all these different ‘products’ – the ⊗ symbol, for example, is reserved for tensor products, which we won’t get into – but… Well… I think they could have done something here. 😦

In any case… Let’s move on. Before we do, please note that we can also re-write our conservation law for electric charge using our new four-vector notation. Indeed, you’ll remember that we wrote that conservation law as:

conservation law

Using our new four-vector operator ∇μ, we can re-write that as ∇μjμ = 0. So all of electrodynamics can be summarized in the two equations only—Maxwell’s law and the charge conservation law:

all

OK. We’re now ready to discuss the electromagnetic tensor. [I know… This is becoming an incredibly long and incredibly complicated piece but, if you get through it, you’ll admit it’s really worth it.]

The electromagnetic tensor

The whole analysis above was done in terms of the Φ and A potentials. It’s time to get back to our field vectors E and B. We know we can easily get them from Φ and A, using the rules we mentioned as solutions:

E and B solutions

These two equations should not look as yet another formula. They are essential, and you should be able to jot them down anytime anywhere. They should be on your kitchen door, in your toilet and above your bed. 🙂 For example, the second equation gives us the components of the magnetic field vector B:

B field components

Now, look at these equations. The x-component is equal to a couple of terms that involve only y– and z-components. The y-component is equal to something involving only x and z. Finally, the z-component only involves x and y. Interesting. Let’s define a ‘thing’ we’ll denote by Fzy and define as:

F definition

So now we can write: Bx = Fzy, By = Fxz, and Bz = Fxy. Now look at our equation for E. It turns out the components of E are equal to things like Fxt, Fyt and Fzt! Indeed, Fxt = ∂Ax/∂t − ∂At/∂x = Ex!

But… Well… No. 😦 The sign is wrong! Ex = −∂Ax/∂t−∂At/∂x, so we need to modify our definition of Fxt. When the t-component is involved, we’ll define our ‘F-things’ as:

time f

So we’ve got a plus instead of a minus. It looks quite arbitrary but, frankly, you’ll have to admit it’s sort of consistent with our +−−− signature for our four-vectors and, in just a minute, you’ll see it’s fully consistent with our definition of the four-dimensional vector operator ∇μ = (∂/∂t, −∂/∂x, −∂/∂y, −∂/∂z). So… Well… Let’s go along with it.

What about the Fxx, Fyy, Fzz and Ftt terms? Well… Fxx = ∂Ax/∂x − ∂Ax/∂x = 0, and it’s easy to see that Fyy and Fzz are zero too. But Ftt? Well… It’s a bit tricky but, applying our definitions carefully, we see that Ftt must be zero too. In any case, the Ftt = 0 will become obvious as we will be arranging these ‘F-things’ in a matrix, which is what we’ll do now. [Again: does this ring a bell? If not, it should. :-)]

Indeed, we’ve got sixteen possible combinations here, which Feynman denotes as Fμν, which is somewhat confusing, because Fμν usually denotes the 4×4 matrix representing all of these combinations. So let me use the subscripts i and j instead, and define Fij as:

Fij = ∇iAj − ∇jAi

with ∇i being the t-, x-, y- or z-component of ∇μ = (∂/∂t, −∂/∂x, −∂/∂y, −∂/∂z) and, likewise, Ai being the t-, x-, y- or z-component of Aμ = (Φ, Ax, Ay, Az). Just check it: Fzy = −∂Ay/∂z + ∂Az/∂y = ∂Az/∂y − ∂Ay/∂z = Bx, for example, and Fxt = −∂Φ/∂x − ∂Ax/∂t = Ex. So the +−−− convention works. [Also note that it’s easier now to see that Ftt = ∂Φ/∂t − ∂Φ/∂t = 0.]

We can now arrange the Fij in a matrix. This matrix is antisymmetric, because Fij = – Fji, and its diagonal elements are zero. [For those of you who love math: note that the diagonal elements of an antisymmetric matrix are always zero because of the Fij = – Fji constraint: just use k = i = j in the constraint.]

Now that matrix is referred to as the electromagnetic tensor and it’s depicted below (we plugged back in, remember that B’s magnitude is 1/c times E’s magnitude).

electromagnetic tensor final

So… Well… Great ! We’re done! Well… Not quite. 🙂

We can get this matrix in a number of ways. The least complicated way is, of course, just to calculate all Fij components and them put them in a [Fij] matrix using the as the row number and the as the column number. You need to watch out with the conventions though, and so i and j start on t and end on z. 🙂

The other way to do it is to write the ∇μ = (∂/∂t, −∂/∂x, −∂/∂y, −∂/∂z) operator as a 4×1 column vector, which you then multiply with the four-vector Aμ written as a 4×1 row vector. So ∇μAμ is then a 4×4 matrix, which we combine with its transpose, i.e. (∇μAμ)T, as shown below. So what’s written below is (∇μAμ) − (∇μAμ)T.

matrix

If you google, you’ll see there’s more than one way to go about it, so I’d recommend you just go through the motions and double-check the whole thing yourself—and please do let me know if you find any mistake! In fact, the Wikipedia article on the electromagnetic tensor denotes the matrix above as Fμν, rather than as Fμν, which is the same tensor but in its so-called covariant form, but so I’ll refer you to that article as I don’t want to make things even more complicated here! As said, there’s different conventions around here, and so you need to double-check what is what really. 🙂

Where are we heading with all of this? The next thing is to look at the Lorentz transformation of these Fij = ∇iAj − ∇jAcomponents, because then we know how our E and B fields transform. Before we do so, however, we should note the more general results and definitions which we obtained here:

1. The Fμν matrix (a matrix is just a multi-dimensional array, of course) is a so-called tensor. It’s a tensor of the second rank, because it has two indices in it. We think of it as a very special ‘product’ of two vectors, not unlike the vector cross product a × b, whose components were also defined by a similar combination of the components of a and b. Indeed, we wrote:

cross product

So one should think of a tensor as “another kind of cross product” or, preferably, and as Feynman puts it, as a “generalization of the cross product”.

2. In this case, the four-vectors are ∇μ = (∂/∂t, −∂/∂x, −∂/∂y, −∂/∂z) and Aμ = (Φ, Ax, Ay, Az). Now, you will probably say that ∇μ is an operator, not a vector, and you are right. However, we know that ∇μ behaves like a vector, and so this is just a special case. The point is: because the tensor is based on four-vectors, the Fμν tensor is referred to as a tensor of the second rank in four dimensions. In addition, because of the Fij = – Fji result, Fμν is an asymmetric tensor of the second rank in four dimensions.

3. Now, the whole point is to examine how tensors transform. We know that the vector dot product, aka the inner product, remains invariant under a Lorentz transformation, both in three as well as in four dimensions, but what about the vector cross product, and what about the tensor? That’s what we’ll be looking at now.

The Lorentz transformation of the electric and magnetic fields

Cross products are complicated, and tensors will be complicated too. Let’s recall our example in three dimensions, i.e. the angular momentum vector L, which was a cross product of the radius vector r and the momentum vector p = mv, as illustrated below (the animation also gives the torque τ, which is, loosely speaking, a measure of the turning force).

Torque_animation

The components of L are:

momentum angular

Now, this particular definition ensures that Lij turns out to be an antisymmetric object:

three-vector

So it’s a similar situation here. We have nine possible combinations, but only three independent numbers. So it’s a bit like our tensor in four dimensions: 16 combinations, but only 6 independent numbers.

Now, it so happens that that these three numbers, or objects if you want, transform in exactly the same way as the components of a vector. However, as Feynman points out, that’s a matter of ‘luck’ really. In fact, Feynman points out that, when we have two vectors a = (ax, ay, az) and b = (bx, by, bz), we’ll have nine products Tij = aibj which will also form a tensor of the second rank (cf. the two indices) but which, in general, will not obey the transformation rules we got for the angular momentum tensor, which happened to be an antisymmetric tensor of the second rank in three dimensions.

To make a long story short, it’s not simple in general, and surely not here: with E and B, we’ve got six independent terms, and so we cannot represent six things by four things, so the transformation rules for E and B will differ from those for a four-vector. So what are they then?

Well… Feynman first works out the rules for the general antisymmetric vector combination Gij = aib− ajbi, with aand bj the t-, x-, y- or z-component of the four-vectors aμ = (at, ax, ay, az) and bμ = (bt, bx, by, bz) respectively. The idea is to first get some general rules, and then replace Gij = aib− ajbi by Fij = ∇iAj − ∇jAi, of course! So let’s apply the Lorentz rules, which – let me remind you – are the following ones:

Lorentz rules

So we get:

set 1

The rest is all very tedious: you just need to plug these things into the various Gij = aib− ajbi formulas. For example, for G’tx, we get:

G1

Hey! That’s just G’tx, so we find that G’tx = Gtx! What about the rest? Well… That yields something different. Let me shorten the story by simply copying Feynman here:

resulsts

So… Done!

So what?

Well… Now we just substitute. In fact, there are two alternative formulations of the Lorentz transformations of E and B. They are given below (note the units are such that c = 1):

result 1 result 2

In addition, there is a third equivalent formulation which is more practical, and also simpler, even if it puts the c‘s back in. It re-defines the field components, distinguishing only two:

  1. The ‘parallel’ components E|| and B|| along the x-direction ( because they are parallel to the relative velocity of the S and S’ reference frames), and
  2. The ‘perpendicular’ or ‘total transverse’ components E and B, which are the vector sums of the y- and z-components.

So that gives us four equations only:

result 3

And, yes, we are done now. This is the Lorentz transformation of the fields. I am sure it has left you totally exhausted. Well… If not… […] It sure left me totally exhausted. 🙂

To lighten things up, let me insert an image of how the transformed field E actually looks like. The first image is the reference frame of a charge itself: we have a simple Coulomb field. The second image shows the charge flying by. Its electric field is ‘squashed up’. To be precise, it’s just like the scale of is squashed up by a factor ((1−v2/c2)1/2. Let me refer you to Feynman for the detail of the calculations here.

field

OK. So that’s it. You may wonder: what about that promise I made? Indeed, when I started this post, I said I’d present a mathematical construct that presents the electromagnetic force as one force only, as one physical reality, but so we’re back writing all of it in terms of two vectors—the electric field vector E and the magnetic field vector B. Well… What can I say? I did present the mathematical construct: it’s the electromagnetic tensor. So it’s that antisymmetric matrix really, which one can combine with a transformation matrix embodying the Lorentz transformation rules. So, I did what I promised to do. But you’re right: I am re-presenting stuff in the old style once again.

The second objection that you may have—in fact, that you should have, is that all of this has been rather tedious. And you’re right. The whole thing just re-emphasizes the value of using the four-potential vector. It’s obviously much easier to take that vector from one reference frame to another – so we just apply the Lorentz transformation rules to Aμ = (Φ, A) and get Aμ‘ = (Φ’, A’) from it – and then calculate E’ and B’ from it, rather than trying to remember those equations above. However, that’s not the point, or…

Well… It is and it isn’t. We wanted to get away from those two vectors E and B, and show that electromagnetism is really one phenomenon only, and so that’s where the concept of the electromagnetic tensor came in. There were two objectives here: the first objective was to introduce you to the concept of tensors, which we’ll need in the future. The second objective was to show you that, while Lorentz’ force law – F = q(E + v×B) makes it clear we’re talking one force only, there is a way of writing it all up that is much more elegant.

I’ve introduced the concept of tensors here, so the first objective should have been achieved. As for the second objective, I’ll discuss that in my next post, in which I’ll introduce the four-velocity vector μμ as well as the four-force vector fμ. It will explain the following beautiful equation of motion:

motion equation

Now that looks very elegant and unified, doesn’t it? 🙂

[…] Hmm… No reaction. I know… You’re tired now, and you’re thinking: yet another way of representing the same thing? Well… Yes! So…

OK… Enough for today. Let’s follow up tomorrow.

Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 20, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 20, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 20, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Re-visiting the speed of light, Planck’s constant, and the fine-structure constant

Note: I have published a paper that is very coherent and fully explains what the fine-structure constant actually is. There is nothing magical about it. It’s not some God-given number. It’s a scaling constant – and then some more. But not God-given. Check it out: The Meaning of the Fine-Structure Constant. No ambiguity. No hocus-pocus.

Jean Louis Van Belle, 23 December 2018

Original post:

A brother of mine sent me a link to an article he liked. Now, because we share some interest in physics and math and other stuff, I looked at it and…

Well… I was disappointed. Despite the impressive credentials of its author – a retired physics professor – it was very poorly written. It made me realize how much badly written stuff is around, and I am glad I am no longer wasting my time on it. However, I do owe my brother some explanation of (a) why I think it was bad, and of (b) what, in my humble opinion, he should be wasting his time on. 🙂 So what it is all about?

The article talks about physicists deriving the speed of light from “the electromagnetic properties of the quantum vacuum.” Now, it’s the term ‘quantum‘, in ‘quantum vacuum’, that made me read the article.

Indeed, deriving the theoretical speed of light in empty space from the properties of the classical vacuum – aka empty space – is a piece of cake: it was done by Maxwell himself as he was figuring out his equations back in the 1850s (see my post on Maxwell’s equations and the speed of light). And then he compared it to the measured value, and he saw it was right on the mark. Therefore, saying that the speed of light is a property of the vacuum, or of empty space, is like a tautology: we may just as well put it the other way around, and say that it’s the speed of light that defines the (properties of the) vacuum!

Indeed, as I’ll explain in a moment: the speed of light determines both the electric as well as the magnetic constants μand ε0, which are the (magnetic) permeability and the (electric) permittivity of the vacuum respectively. Both constants depend on the units we are working with (i.e. the units for electric charge, for distance, for time and for force – or for inertia, if you want, because force is defined in terms of overcoming inertia), but so they are just proportionality coefficients in Maxwell’s equations. So once we decide what units to use in Maxwell’s equations, then μand ε0 are just proportionality coefficients which we get from c. So they are not separate constants really – I mean, they are not separate from c – and all of the ‘properties’ of the vacuum, including these constants, are in Maxwell’s equations.

In fact, when Maxwell compared the theoretical value of c with its presumed actual value, he didn’t compare c‘s theoretical value with the speed of light as measured by astronomers (like that 17th century Ole Roemer, to which our professor refers: he had a first go at it by suggesting some specific value for it based on his observations of the timing of the eclipses of one of Jupiter’s moons), but with c‘s value as calculated from the experimental values of μand ε0! So he knew very well what he was looking at. In fact, to drive home the point, it may also be useful to note that the Michelson-Morley experiment – which accurately measured the speed of light – was done some thirty years later. So Maxwell had already left this world by then—very much in peace, because he had solved the mystery all 19th century physicists wanted to solve through his great unification: his set of equations covers it all, indeed: electricity, magnetism, light, and even relativity!

I think the article my brother liked so much does a very lousy job in pointing all of that out, but that’s not why I wouldn’t recommend it. It got my attention because I wondered why one would try to derive the speed of light from the properties of the quantum vacuum. In fact, to be precise, I hoped the article would tell me what the quantum vacuum actually is. Indeed, as far as I know, there’s only one vacuum—one ’empty space’: empty is empty, isn’t it? 🙂 So I wondered: do we have a ‘quantum’ vacuum? And, if so, what is it, really?

Now, that is where the article is really disappointing, I think. The professor drops a few names (like the Max Planck Institute, the University of Paris-Sud, etcetera), and then, promisingly, mentions ‘fleeting excitations of the quantum vacuum’ and ‘virtual pairs of particles’, but then he basically stops talking about quantum physics. Instead, he wanders off to share some philosophical thoughts on the fundamental physical constants. What makes it all worse is that even those thoughts on the ‘essential’ constants are quite off the mark.

So… This post is just a ‘quick and dirty’ thing for my brother which, I hope, will be somewhat more thought-provoking than that article. More importantly, I hope that my thoughts will encourage him to try to grind through better stuff.

On Maxwell’s equations and the properties of empty space

Let me first say something about the speed of light indeed. Maxwell’s four equations may look fairly simple, but that’s only until one starts unpacking all those differential vector equations, and it’s only when going through all of their consequences that one starts appreciating their deep mathematical structure. Let me quickly copy how another blogger jotted them down: 🙂

god-said-maxwell-equation

As I showed in my above-mentioned post, the speed of light (i.e. the speed with which an electromagnetic pulse or wave travels through space) is just one of the many consequences of the mathematical structure of Maxwell’s set of equations. As such, the speed of light is a direct consequence of the ‘condition’, or the properties, of the vacuum indeed, as Maxwell suggested when he wrote that “we can scarcely avoid the inference that light consists in the transverse undulations of the same medium which is the cause of electric and magnetic phenomena”.

Of course, while Maxwell still suggests light needs some ‘medium’ here – so that’s a reference to the infamous aether theory – we now know that’s because he was a 19th century scientist, and so we’ve done away with the aether concept (because it’s a redundant hypothesis), and so now we also know there’s absolutely no reason whatsoever to try to “avoid the inference.” 🙂 It’s all OK, indeed: light is some kind of “transverse undulation” of… Well… Of what?

We analyze light as traveling fields, represented by two vectors, E and B, whose direction and magnitude varies both in space as well as in time. E and B are field vectors, and represent the electric and magnetic field respectively. An equivalent formulation – more or less, that is (see my post on the Liénard-Wiechert potentials) – for Maxwell’s equations when only one (moving) charge is involved is:

E

B

This re-formulation, which is Feynman’s preferred formula for electromagnetic radiation, is interesting in a number of ways. It clearly shows that, while we analyze the electric and magnetic field as separate mathematical entities, they’re one and the same phenomenon really, as evidenced by the B = –er×E/c equation, which tells us the magnetic field from a single moving charge is always normal (i.e. perpendicular) to the electric field vector, and also that B‘s magnitude is 1/times the magnitude of E, so |B| = B = |E|/c = E/c. In short, B is fully determined by E, or vice versa: if we have one of the two fields, we have the other, so they’re ‘one and the same thing’ really—not in a mathematical sense, but in a real sense.

Also note that E and B‘s magnitude is just the same if we’re using natural units, so if we equate c with 1. Finally, as I pointed out in my post on the relativity of electromagnetic fields, if we would switch from one reference frame to another, we’ll have a different mix of E and B, but that different mix obviously describes the same physical reality. More in particular, if we’d be moving with the charges, the magnetic field sort of disappears to re-appear as an electric field. So the Lorentz force F = Felectric + Fmagnetic = qE + qv×B is one force really, and its ‘electric’ and ‘magnetic’ component appear the way they appear in our reference frame only. In some other reference frame, we’d have the same force, but its components would look different, even if they, obviously, would and should add up to the same. [Well… Yes and no… You know there’s relativistic corrections to be made to the forces to, but that’s a minor point, really. The force surely doesn’t disappear!]

All of this reinforces what you know already: electricity and magnetism are part and parcel of one and the same phenomenon, the electromagnetic force field, and Maxwell’s equations are the most elegant way of ‘cutting it up’. Why elegant? Well… Click the Occam tab. 🙂

Now, after having praised Maxwell once more, I must say that Feynman’s equations above have another advantage. In Maxwell’s equations, we see two constants, the electric and magnetic constant (denoted by μand ε0 respectively), and Maxwell’s equations imply that the product of the electric and magnetic constant is the reciprocal of c2: μ0·ε= 1/c2. So here we see εand only, so no μ0, so that makes it even more obvious that the magnetic and electric constant are related one to another through c.

[…] Let me digress briefly: why do we have c2 in μ0·ε= 1/c2, instead of just c? That’s related to the relativistic nature of the magnetic force: think about that B = E/c relation. Or, better still, think about the Lorentz equation F = Felectric + Fmagnetic = qE + qv×B = q[E + (v/c)×(E×er)]: the 1/c factor is there because the magnetic force involves some velocity, and any velocity is always relative—and here I don’t mean relative to the frame of reference but relative to the (absolute) speed of light! Indeed, it’s the v/c ratio (usually denoted by β = v/c) that enters all relativistic formulas. So the left-hand side of the μ0·ε= 1/c2 equation is best written as (1/c)·(1/c), with one of the two 1/c factors accounting for the fact that the ‘magnetic’ force is a relativistic effect of the ‘electric’ force, really, and the other 1/c factor giving us the proper relationship between the magnetic and the electric constant. To drive home the point, I invite you to think about the following:

  • μ0 is expressed in (V·s)/(A·m), while εis expressed in (A·s)/(V·m), so the dimension in which the μ0·εproduct is expressed is [(V·s)/(A·m)]·[(A·s)/(V·m)] = s2/m2, so that’s the dimension of 1/c2.
  • Now, this dimensional analysis makes it clear that we can sort of distribute 1/c2 over the two constants. All it takes is re-defining the fundamental units we use to calculate stuff, i.e. the units for electric charge, for distance, for time and for force – or for inertia, as explained above. But so we could, if we wanted, equate both μ0 as well as εwith 1/c.
  • Now, if we would then equate c with 1, we’d have μ0 = ε= c = 1. We’d have to define our units for electric charge, for distance, for time and for force accordingly, but it could be done, and then we could re-write Maxwell’s set of equations using these ‘natural’ units.

In any case, the nitty-gritty here is less important: the point is that μand εare also related through the speed of light and, hence, they are ‘properties’ of the vacuum as well. [I may add that this is quite obvious if you look at their definition, but we’re approaching the matter from another angle here.]

In any case, we’re done with this. On to the next!

On quantum oscillations, Planck’s constant, and Planck units 

The second thought I want to develop is about the mentioned quantum oscillation. What is it? Or what could it be? An electromagnetic wave is caused by a moving electric charge. What kind of movement? Whatever: the charge could move up or down, or it could just spin around some axis—whatever, really. For example, if it spins around some axis, it will have a magnetic moment and, hence, the field is essentially magnetic, but then, again, E and B are related and so it doesn’t really matter if the first cause is magnetic or electric: that’s just our way of looking at the world: in another reference frame, one that’s moving with the charges, the field would essential be electric. So the motion can be anything: linear, rotational, or non-linear in some irregular way. It doesn’t matter: any motion can always be analyzed as the sum of a number of ‘ideal’ motions. So let’s assume we have some elementary charge in space, and it moves and so it emits some electromagnetic radiation.

So now we need to think about that oscillation. The key question is: how small can it be? Indeed, in one of my previous posts, I tried to explain some of the thinking behind the idea of the ‘Great Desert’, as physicists call it. The whole idea is based on our thinking about the limit: what is the smallest wavelength that still makes sense? So let’s pick up that conversation once again.

The Great Desert lies between the 1032 and 1043 Hz scale. 1032 Hz corresponds to a photon energy of Eγ = h·f = (4×10−15 eV·s)·(1032 Hz) = 4×1017 eV = 400,000 tera-electronvolt (1 TeV = 1012 eV). I use the γ (gamma) subscript in my Eγ symbol for two reasons: (1) to make it clear that I am not talking the electric field E here but energy, and (2) to make it clear we are talking ultra-high-energy gamma-rays here.

In fact, γ-rays of this frequency and energy are theoretical only. Ultra-high-energy gamma-rays are defined as rays with photon energies higher than 100 TeV, which is the upper limit for very-high-energy gamma-rays, which have been observed as part of the radiation emitted by so-called gamma-ray bursts (GRBs): flashes associated with extremely energetic explosions in distant galaxies. Wikipedia refers to them as the ‘brightest’ electromagnetic events know to occur in the Universe. These rays are not to be confused with cosmic rays, which consist of high-energy protons and atomic nuclei stripped of their electron shells. Cosmic rays aren’t rays really and, because they consist of particles with a considerable rest mass, their energy is even higher. The so-called Oh-My-God particle, for example, which is the most energetic particle ever detected, had an energy of 3×1020 eV, i.e. 300 million TeV. But it’s not a photon: its energy is largely kinetic energy, with the rest mass m0 counting for a lot in the m in the E = m·c2 formula. To be precise: the mentioned particle was thought to be an iron nucleus, and it packed the equivalent energy of a baseball traveling at 100 km/h! 

But let me refer you to another source for a good discussion on these high-energy particles, so I can get get back to the energy of electromagnetic radiation. When I talked about the Great Desert in that post, I did so using the Planck-Einstein relation (E = h·f), which embodies the idea of the photon being valid always and everywhere and, importantly, at every scale. I also discussed the Great Desert using real-life light being emitted by real-life atomic oscillators. Hence, I may have given the (wrong) impression that the idea of a photon as a ‘wave train’ is inextricably linked with these real-life atomic oscillators, i.e. to electrons going from one energy level to the next in some atom. Let’s explore these assumptions somewhat more.

Let’s start with the second point. Electromagnetic radiation is emitted by any accelerating electric charge, so the atomic oscillator model is an assumption that should not be essential. And it isn’t. For example, whatever is left of the nucleus after alpha or beta decay (i.e. a nuclear decay process resulting in the emission of an α- or β-particle) it likely to be in an excited state, and likely to emit a gamma-ray for about 10−12 seconds, so that’s a burst that’s about 10,000 times shorter than the 10–8 seconds it takes for the energy of a radiating atom to die out. [As for the calculation of that 10–8 sec decay time – so that’s like 10 nanoseconds – I’ve talked about this before but it’s probably better to refer you to the source, i.e. one of Feynman’s Lectures.]

However, what we’re interested in is not the energy of the photon, but the energy of one cycle. In other words, we’re not thinking of the photon as some wave train here, but what we’re thinking about is the energy that’s packed into a space corresponding to one wavelength. What can we say about that?

As you know, that energy will depend both on the amplitude of the electromagnetic wave as well as its frequency. To be precise, the energy is (1) proportional to the square of the amplitude, and (2) proportional to the frequency. Let’s look at the first proportionality relation. It can be written in a number of ways, but one way of doing it is stating the following: if we know the electric field, then the amount of energy that passes per square meter per second through a surface that is normal to the direction in which the radiation is going (which we’ll denote by S – the s from surface – in the formula below), must be proportional to the average of the square of the field. So we write S ∝ 〈E2〉, and so we should think about the constant of proportionality now. Now, let’s not get into the nitty-gritty, and so I’ll just refer to Feynman for the derivation of the formula below:

S = ε0c·〈E2

So the constant of proportionality is ε0c. [Note that, in light of what we wrote above, we can also write this as S = (1/μ0·c)·〈(c·B)2〉 = (c0)·〈B2〉, so that underlines once again that we’re talking one electromagnetic phenomenon only really.] So that’s a nice and rather intuitive result in light of all of the other formulas we’ve been jotting down. However, it is a ‘wave’ perspective. The ‘photon’ perspective assumes that, somehow, the amplitude is given and, therefore, the Planck-Einstein relation only captures the frequency variable: Eγ = h·f.

Indeed, ‘more energy’ in the ‘wave’ perspective basically means ‘more photons’, but photons are photons: they have a definite frequency and a definite energy, and both are given by that Planck-Einstein relation. So let’s look at that relation by doing a bit of dimensional analysis:

  • Energy is measured in electronvolt or, using SI units, joule: 1 eV ≈ 1.6×10−19 J. Energy is force times distance: 1 joule = 1 newton·meter, which means that a larger force over a shorter distance yields the same energy as a smaller force over a longer distance. The oscillations we’re talking about here involve very tiny distances obviously. But the principle is the same: we’re talking some moving charge q, and the power – which is the time rate of change of the energy – that goes in or out at any point of time is equal to dW/dt = F·v, with W the work that’s being done by the charge as it emits radiation.
  • I would also like to add that, as you know, forces are related to the inertia of things. Newton’s Law basically defines a force as that what causes a mass to accelerate: F = m·a = m·(dv/dt) = d(m·v)/dt = dp/dt, with p the momentum of the object that’s involved. When charges are involved, we’ve got the same thing: a potential difference will cause some current to change, and one of the equivalents of Newton’s Law F = m·a = m·(dv/dt) in electromagnetism is V = L·(dI/dt). [I am just saying this so you get a better ‘feel’ for what’s going on.]
  • Planck’s constant is measured in electronvolt·seconds (eV·s) or in, using SI units, in joule·seconds (J·s), so its dimension is that of (physical) action, which is energy times time: [energy]·[time]. Again, a lot of energy during a short time yields the same energy as less energy over a longer time. [Again, I am just saying this so you get a better ‘feel’ for these dimensions.]
  • The frequency f is the number of cycles per time unit, so that’s expressed per second, i.e. in herz (Hz) = 1/second = s−1.

So… Well… It all makes sense: [x joule] = [6.626×10−34 joule]·[1 second]×[f cycles]/[1 second]. But let’s try to deepen our understanding even more: what’s the Planck-Einstein relation really about?

To answer that question, let’s think some more about the wave function. As you know, it’s customary to express the frequency as an angular frequency ω, as used in the wave function A(x, t) = A0·sin(kx − ωt). The angular frequency is the frequency expressed in radians per second. That’s because we need an angle in our wave function, and so we need to relate x and t to some angle. The way to think about this is as follows: one cycle takes a time T (i.e. the period of the wave) which is equal to T = 1/f. Yes: one second divided by the number of cycles per second gives you the time that’s needed for one cycle. One cycle is also equivalent to our argument ωt going around the full circle (i.e. 2π), so we write:  ω·T = 2π and, therefore:

ω = 2π/T = 2π·f

Now we’re ready to play with the Planck-Einstein relation. We know it gives us the energy of one photon really, but what if we re-write our equation Eγ = h·f as Eγ/f = h? The dimensions in this equation are:

[x joule]·[1 second]/[cyles] = [6.626×10−34 joule]·[1 second]

⇔ = 6.626×10−34 joule per cycle

So that means that the energy per cycle is equal to 6.626×10−34 joule, i.e. the value of Planck’s constant.

Let me rephrase truly amazing result, so you appreciate it—perhaps: regardless of the frequency of the light (or our electromagnetic wave, in general) involved, the energy per cycle, i.e. per wavelength or per period, is always equal to 6.626×10−34 joule or, using the electronvolt as the unit, 4.135667662×10−15 eV. So, in case you wondered, that is the true meaning of Planck’s constant!

Now, if we have the frequency f, we also have the wavelength λ, because the velocity of the wave is the frequency times the wavelength: = λ·f and, therefore, λ = c/f. So if we increase the frequency, the wavelength becomes smaller and smaller, and so we’re packing the same amount of energy – admittedly, 4.135667662×10−15 eV is a very tiny amount of energy – into a space that becomes smaller and smaller. Well… What’s tiny, and what’s small? All is relative, of course. 🙂 So that’s where the Planck scale comes in. If we pack that amount of energy into some tiny little space of the Planck dimension, i.e. a ‘length’ of 1.6162×10−35 m, then it becomes a tiny black hole, and it’s hard to think about how that would work.

[…] Let me make a small digression here. I said it’s hard to think about black holes but, of course, it’s not because it’s ‘hard’ that we shouldn’t try it. So let me just mention a few basic facts. For starters, black holes do emit radiation! So they swallow stuff, but they also spit stuff out. More in particular, there is the so-called Hawking radiation, as Roger Penrose and Stephen Hawking discovered.

Let me quickly make a few remarks on that: Hawking radiation is basically a form of blackbody radiation, so all frequencies are there, as shown below: the distribution of the various frequencies depends on the temperature of the black body, i.e. the black hole in this case. [The black curve is the curve that Lord Rayleigh and Sir James Jeans derived in the late 19th century, using classical theory only, so that’s the one that does not correspond to experimental fact, and which led Max Planck to become the ‘reluctant’ father of quantum mechanics. In any case, that’s history and so I shouldn’t dwell on this.]

600px-Black_body

The interesting thing about blackbody radiation, including Hawking radiation, is that it reduces energy and, hence, the equivalent mass of our blackbody. So Hawking radiation reduces the mass and energy of black holes and is therefore also known as black hole evaporation. So black holes that lose more mass than they gain through other means are expected to shrink and ultimately vanish. Therefore, there’s all kind of theories that say why micro black holes, like that Planck scale black hole we’re thinking of right now, should be much larger net emitters of radiation than large black holes and, hence, whey they should shrink and dissipate faster.

Hmm… Interesting… What do we do with all of this information? Well… Let’s think about it as we continue our trek on this long journey to reality over the next year or, more probably, years (plural). 🙂

The key lesson here is that space and time are intimately related because of the idea of movement, i.e. the idea of something having some velocity, and that it’s not so easy to separate the dimensions of time and distance in any hard and fast way. As energy scales become larger and, therefore, our natural time and distance units become smaller and smaller, it’s the energy concept that comes to the fore. It sort of ‘swallows’ all other dimensions, and it does lead to limiting situations which are hard to imagine. Of course, that just underscores the underlying unity of Nature, and the mysteries involved.

So… To relate all of this back to the story that our professor is trying to tell, it’s a simple story really. He’s talking about two fundamental constants basically, c and h, pointing out that c is a property of empty space, and h is related to something doing something. Well… OK. That’s really nothing new, and surely not ground-breaking research. 🙂

Now, let me finish my thoughts on all of the above by making one more remark. If you’ve read a thing or two about this – which you surely have – you’ll probably say: this is not how people usually explain it. That’s true, they don’t. Anything I’ve seen about this just associates the 1043 Hz scale with the 1028 eV energy scale, using the same Planck-Einstein relation. For example, the Wikipedia article on micro black holes writes that “the minimum energy of a microscopic black hole is 1019 GeV [i.e. 1028 eV], which would have to be condensed into a region on the order of the Planck length.” So that’s wrong. I want to emphasize this point because I’ve been led astray by it for years. It’s not the total photon energy, but the energy per cycle that counts. Having said that, it is correct, however, and easy to verify, that the 1043 Hz scale corresponds to a wavelength of the Planck scale: λ = c/= (3×10m/s)/(1043 s−1) = 3×10−35 m. The confusion between the photon energy and the energy per wavelength arises because of the idea of a photon: it travels at the speed of light and, hence, because of the relativistic length contraction effect, it is said to be point-like, to have no dimension whatsoever. So that’s why we think of packing all of its energy in some infinitesimally small place. But you shouldn’t think like that. The photon is dimensionless in our reference frame: in its own ‘world’, it is spread out, so it is a wave train. And it’s in its ‘own world’ that the contradictions start… 🙂

OK. Done!

My third and final point is about what our professor writes on the fundamental physical constants, and more in particular on what he writes on the fine-structure constant. In fact, I could just refer you to my own post on it, but that’s probably a bit too easy for me and a bit difficult for you 🙂 so let me summarize that post and tell you what you need to know about it.

The fine-structure constant

The fine-structure constant α is a dimensionless constant which also illustrates the underlying unity of Nature, but in a way that’s much more fascinating than the two or three things the professor mentions. Indeed, it’s quite incredible how this number (α = 0.00729735…, but you’ll usually see it written as its reciprocal, which is a number that’s close to 137.036…) links charge with the relative speeds, radii, and the mass of fundamental particles and, therefore, how this number also these concepts with each other. And, yes, the fact that it is, effectively, dimensionless, unlike h or c, makes it even more special. Let me quickly sum up what the very same number α all stands for:

(1) α is the square of the electron charge expressed in Planck units: α = eP2.

(2) α is the square root of the ratio of (a) the classical electron radius and (b) the Bohr radius: α = √(re /r). You’ll see this more often written as re = α2r. Also note that this is an equation that does not depend on the units, in contrast to equation 1 (above), and 4 and 5 (below), which require you to switch to Planck units. It’s the square of a ratio and, hence, the units don’t matter. They fall away.

(3) α is the (relative) speed of an electron: α = v/c. [The relative speed is the speed as measured against the speed of light. Note that the ‘natural’ unit of speed in the Planck system of units is equal to c. Indeed, if you divide one Planck length by one Planck time unit, you get (1.616×10−35 m)/(5.391×10−44 s) = m/s. However, this is another equation, just like (2), that does not depend on the units: we can express v and c in whatever unit we want, as long we’re consistent and express both in the same units.]

(4) α is also equal to the product of (a) the electron mass (which I’ll simply write as me here) and (b) the classical electron radius re (if both are expressed in Planck units): α = me·re. Now think that’s, perhaps, the most amazing of all of the expressions for α. [If you don’t think that’s amazing, I’d really suggest you stop trying to study physics. :-)]

Also note that, from (2) and (4), we find that:

(5) The electron mass (in Planck units) is equal me = α/r= α/α2r = 1/αr. So that gives us an expression, using α once again, for the electron mass as a function of the Bohr radius r expressed in Planck units.

Finally, we can also substitute (1) in (5) to get:

(6) The electron mass (in Planck units) is equal to me = α/r = eP2/re. Using the Bohr radius, we get me = 1/αr = 1/eP2r.

So… As you can see, this fine-structure constant really links all of the fundamental properties of the electron: its charge, its radius, its distance to the nucleus (i.e. the Bohr radius), its velocity, its mass (and, hence, its energy),…

So… Why is what it is?

Well… We all marvel at this, but what can we say about it, really? I struggle how to interpret this, just as much – or probably much more 🙂 – as the professor who wrote the article I don’t like (because it’s so imprecise, and that’s what made me write all what I am writing here).

Having said that, it’s obvious that it points to a unity beyond these numbers and constants that I am only beginning to appreciate for what it is: deep, mysterious, and very beautiful. But so I don’t think that professor does a good job at showing how deep, mysterious and beautiful it all is. But then that’s up to you, my brother and you, my imaginary reader, to judge, of course. 🙂

[…] I forgot to mention what I mean with ‘Planck units’. Well… Once again, I should refer you to one of my other posts. But, yes, that’s too easy for me and a bit difficult for you. 🙂 So let me just note we get those Planck units by equating not less than five fundamental physical constants to 1, notably (1) the speed of light, (2) Planck’s (reduced) constant, (3) Boltzmann’s constant, (4) Coulomb’s constant and (5) Newton’s constant (i.e. the gravitational constant). Hence, we have a set of five equations here (ħ = kB = ke = G = 1), and so we can solve that to get the five Planck units, i.e. the Planck length unit, the Planck time unit, the Planck mass unit, the Planck energy unit, the Planck charge unit and, finally (oft forgotten), the Planck temperature unit. Of course, you should note that all mass and energy units are directly related because of the mass-energy equivalence relation E = mc2, which simplifies to E = m if c is equated to 1. [I could also say something about the relation between temperature and (kinetic) energy, but I won’t, as it would only further confuse you.]

OK. Done! 🙂

Addendum: How to think about space and time?

If you read the argument on the Planck scale and constant carefully, then you’ll note that it does not depend on the idea of an indivisible photon. However, it does depend on that Planck-Einstein relation being valid always and everywhere. Now, the Planck-Einstein relation is, in its essence, a fairly basic result from classical electromagnetic theory: it incorporates quantum theory – remember: it’s the equation that allowed Planck to solve the black-body radiation problem, and so it’s why they call Planck the (reluctant) ‘Father of Quantum Theory’ – but it’s not quantum theory.

So the obvious question is: can we make this reflection somewhat more general, so we can think of the electromagnetic force as an example only. In other words: can we apply the thoughts above to any force and any movement really?

The truth is: I haven’t advanced enough in my little study to give the equations for the other forces. Of course, we could think of gravity, and I developed some thoughts on how gravity waves might look like, but nothing specific really. And then we have the shorter-range nuclear forces, of course: the strong force, and the weak force. The laws involved are very different. The strong force involves color charges, and the way distances work is entirely different. So it would surely be some different analysis. However, the results should be the same. Let me offer some thoughts though:

  • We know that the relative strength of the nuclear force is much larger, because it pulls like charges (protons) together, despite the strong electromagnetic force that wants to push them apart! So the mentioned problem of trying to ‘pack’ some oscillation in some tiny little space should be worse with the strong force. And the strong force is there, obviously, at tiny little distances!
  • Even gravity should become important, because if we’ve got a lot of energy packed into some tiny space, its equivalent mass will ensure the gravitational forces also become important. In fact, that’s what the whole argument was all about!
  • There’s also all this talk about the fundamental forces becoming one at the Planck scale. I must, again, admit my knowledge is not advanced enough to explain how that would be possible, but I must assume that, if physicists are making such statements, the argument must be fairly robust.

So… Whatever charge or whatever force we are talking about, we’ll be thinking of waves or oscillations—or simply movement, but it’s always a movement in a force field, and so there’s power and energy involved (energy is force times distance, and power is the time rate of change of energy). So, yes, we should expect the same issues in regard to scale. And so that’s what’s captured by h.

As we’re talking the smallest things possible, I should also mention that there are also other inconsistencies in the electromagnetic theory, which should (also) have their parallel for other forces. For example, the idea of a point charge is mathematically inconsistent, as I show in my post on fields and charges. Charge, any charge really, must occupy some space. It cannot all be squeezed into one dimensionless point. So the reasoning behind the Planck time and distance scale is surely valid.

In short, the whole argument about the Planck scale and those limits is very valid. However, does it imply our thinking about the Planck scale is actually relevant? I mean: it’s not because we can imagine how things might look like  – they may look like those tiny little black holes, for example – that these things actually exist. GUT or string theorists obviously think they are thinking about something real. But, frankly, Feynman had a point when he said what he said about string theory, shortly before his untimely death in 1988: “I don’t like that they’re not calculating anything. I don’t like that they don’t check their ideas. I don’t like that for anything that disagrees with an experiment, they cook up an explanation—a fix-up to say, ‘Well, it still might be true.'”

It’s true that the so-called Standard Model does not look very nice. It’s not like Maxwell’s equations. It’s complicated. It’s got various ‘sectors’: the electroweak sector, the QCD sector, the Higgs sector,… So ‘it looks like it’s got too much going on’, as a friend of mine said when he looked at a new design for mountainbike suspension. 🙂 But, unlike mountainbike designs, there’s no real alternative for the Standard Model. So perhaps we should just accept it is what it is and, hence, in a way, accept Nature as we can see it. So perhaps we should just continue to focus on what’s here, before we reach the Great Desert, rather than wasting time on trying to figure out how things might look like on the other side, especially because we’ll never be able to test our theories about ‘the other side.’

On the other hand, we can see where the Great Desert sort of starts (somewhere near the 1032 Hz scale), and so it’s only natural to think it should also stop somewhere. In fact, we know where it stops: it stops at the 1043 Hz scale, because everything beyond that doesn’t make sense. The question is: is there actually there? Like fundamental strings or whatever you want to call it. Perhaps we should just stop where the Great Desert begins. And what’s the Great Desert anyway? Perhaps it’s a desert indeed, and so then there is absolutely nothing there. 🙂

Hmm… There’s not all that much one can say about it. However, when looking at the history of physics, there’s one thing that’s really striking. Most of what physicists can think of, in the sense that it made physical sense, turned out to exist. Think of anti-matter, for instance. Paul Dirac thought it might exist, that it made sense to exist, and so everyone started looking for it, and Carl Anderson found in a few years later (in 1932). In fact, it had been observed before, but people just didn’t pay attention, so they didn’t want to see it, in a way. […] OK. I am exaggerating a bit, but you know what I mean. The 1930s are full of examples like that. There was a burst of scientific creativity, as the formalism of quantum physics was being developed, and the experimental confirmations of the theory just followed suit.

In the field of astronomy, or astrophysics I should say, it was the same with black holes. No one could really imagine the existence of black holes until the 1960s or so: they were thought of a mathematical curiosity only, a logical possibility. However, the circumstantial evidence now is quite large and so… Well… It seems a lot of what we can think of actually has some existence somewhere. 🙂

So… Who knows? […] I surely don’t. And so I need to get back to the grind and work my way through the rest of Feynman’s Lectures and the related math. However, this was a nice digression, and so I am grateful to my brother he initiated it. 🙂

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Reconciling the wave-particle duality in electromagnetism

As I talked about Feynman’s equation for electromagnetic radiation in my previous post, I thought I should add a few remarks on wave-particle duality, but then I didn’t do it there, because my post would have become way too long. So let me add those remarks here. In fact, I’ve written about this before, and so I’ll just mention the basic ideas without going too much in detail. Let me first jot down the formula once again, as well as illustrate the geometry of the situation:

formual

geometry

The gist of the matter is that light, in classical theory, is a traveling electromagnetic field caused by an accelerating electric charge and that, because light travels at speed c, it’s the acceleration at the retarded time t – r/c, i.e. a‘ = a(t – r/c), that enters the formula. You’ve also seen the diagrams that accompany this formula:

EM 1 EM 2

The two diagrams above show that the curve of the electric field in space is a “reversed” plot of the acceleration as a function of time. As I mentioned before, that’s quite obvious from the mathematical behavior of a function with argument like the argument above, i.e. a function F(t – r/c). When we write t – r/c, we basically measure distance units in seconds, instead of in meter. So we basically use as the scale for both time as well as distance. I explained that in a previous post, so please have a look there if you’d want so see how that works.

So it’s pretty straightforward, really. However, having said that, when I see a diagram like the one above, so all of these diagrams plotting an E or B wave in space, I can’t help thinking it’s somewhat misleading: after all, we’re talking something traveling at the speed of light here and, therefore, its length – in our frame of reference – should be zero. And it is, obviously. Electromagnetic radiation comes packed in point-like, dimensionless photons: the length of something that travels at the speed of light must be zero.

Now, I don’t claim to know what’s going on exactly, but my thinking on it may not be far off the mark. We know that light is emitted and absorbed by atoms, as electrons go from one energy level to another, and the energy of the photons of light corresponds to the difference between those energy levels (i.e. a few electron-volt only, typically: it’s given by the E = h·ν relation). Therefore, we can look at a photon as a transient electromagnetic wave. It’s a very short pulse: the decay time for one such pulse of sodium light, i.e. one photon of sodium light, is 3.2×10–8 seconds. However, taking into account the frequency of sodium light (500 THz), that still makes for some 16 million oscillations, and a wave-train with a length of almost 10 meter. [Yes. Quite incredible, isn’t it?] So the photon could look like the transient wave I depicted below, except… Well… This wavetrain is traveling at the speed of light and, hence, we will not see it as a ten-meter long wave-train. Why not? Well… Because of the relativistic length contraction, it will effectively appear as a point-like particle to us.

Photon wave

So relativistic length contraction is why the wave and particle duality can be easily reconciled in electromagnetism: we can think of light as an irregular beam of point-like photons indeed, as one atomic oscillator after the other releases a photon, in no particularly organized way. So we can think of photons as transient wave-trains, but we should remind ourselves that they are traveling at the speed of light, so they’ll look point-like to us.

Is such view consistent with the results of the famous – of should I say infamous? – double-slit experiment. Well… Maybe. As I mentioned in one of my posts, it is rather remarkable that is actually hard to find actual double-slit experiments that use actual detectors near the slits, and even harder to find such experiments involving photons! Indeed, experiments involving detectors near the slits are usually experiments with ‘real’ particles, such as electrons, for example. Now, a lot of advances have been made in the set-up of these experiments over the past five years, and one of these experiments is a 2010 experiment of an Italian team which suggests that it’s the interaction between the detector and the electron wave that may cause the interference pattern to disappear. Now that throws some doubts on the traditional explanation of the results of the double-slit experiment.

The idea is shown below. The electron is depicted as an incoming plane wave which effectively breaks up as it goes through the slits. The slit on the left has no ‘filter’ (which you may think of as a detector) and, hence, the plane wave goes through as a cylindrical wave. The slit on the right-hand side is covered by a ‘filter’ made of several layers of ‘low atomic number material’, so the electron goes through but, at the same time, the barrier creates a spherical wave as it goes through. The researchers note that “the spherical and cylindrical wave do not have any phase correlation, and so even if an electron passed through both slits, the two different waves that come out cannot create an interference pattern on the wall behind them.” [I hope I don’t have to remind you that, while being represented as ‘real’ waves here, the ‘waves’ are, obviously, complex-valued psi functions.]

double-slit experiment

In fact, to be precise, the experimenters note that there still was an interference effect if the filter was thin enough. Let me quote the reason for that: “The thicker the filter, the greater the probability for inelastic scattering. When the electron suffers inelastic scattering, it is localized. This means that its wavefunction collapses and, after the measurement act, it propagates roughly as a spherical wave from the region of interaction, with no phase relation at all with other elastically or inelastically scattered electrons. If the filter is made thick enough, the interference effects cancels out almost completely.”

This does not solve the ‘mystery’ of the double-slit experiment, but it throws doubt on how it’s usually being explained. The mystery in such experiments is that, when we put detectors, it is either the detector at A or the detector at B that goes off. They should never go off together—”at half strength, perhaps”, as Feynman puts it. But so there are doubts here now. Perhaps the electron does go through both slits at the same time! And so that’s why I used italics when writing “even if an electron passed through both slits”: the electron, or the photon in a similar set-up, is not supposed to do that according to the traditional explanation of the results of the double-slit experiment! It’s one or the other, and the wavefunction collapses or reduces as it goes through. 

However, that’s where these so-called ‘weak measurement’ experiments now come in, like this 2010 experiment: it does not prove but indicates that interaction does not have to be that way. They strongly suggest that it is not all or nothing, that our observations should not necessarily destroy the wavefunction. So, who knows, perhaps we will be able, one day, to show that the wavefunction does go through both slits, as it should (otherwise the interference pattern cannot be explained), and then we will have resolved the paradox.

I am pretty sure that, when that’s done, physicists will also be able to relate the image of a photon as a transient electromagnetic wave (cf. the diagram above), being emitted by an atomic oscillator for a few nanoseconds only (we gave the example for sodium light, for which the decay time was 3.2×10–8 seconds) with the image of a photon as a particle that can be represented by a complex-valued probability amplitude function (cf. the diagram below). I look forward to that day. I think it will come soon.

Photon wave

Here I should add two remarks. First, a lot has been said about the so-called indivisibility of a photon, but inelastic scattering implies that photons are not monolithic: the photon loses energy to the electron and, hence, its wavelength changes. Now, you’ll say: the scattered photon is not the same photon as the incident photon, and you’re right. But… Well. Think about it. It does say something about the presumed oneness of a photon.

I15-72-Compton1

The other remark is on the mathematics of interference. Photons are bosons and, therefore, we have to add their amplitudes to get the interference effect. So you may try to think of an amplitude function, like Ψ = (1/√2π)·eiθ or whatever, and think it’s just a matter of ‘splitting’ this function before it enters the two slits and then ‘putting it back together’, so to say, after our photon has gone through the slits. [For the detailed math of interference in quantum mechanics, see my page on essentials.]  Well… No. It’s not that simple. The illustration with that plane wave entering the slits, and the cylindrical and/or spherical wave coming out, makes it obvious that something happens to our wave as it goes through the slit. As I said a couple of times already, the two-slit experiment is interesting, but the interference phenomenon – or diffraction as it’s called – involving one slit only is at least as interesting. So… Well… The analysis is not that simple. Not at all, really. 🙂

Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Maxwell’s equations and the speed of light

Pre-script (dated 26 June 2020): Our ideas have evolved into a full-blown realistic (or classical) interpretation of all things quantum-mechanical. In addition, I note the dark force has amused himself by removing some material, which messed up the lay-out of this post as well. So no use to read this. Read my recent papers instead. 🙂

Original post:

We know how electromagnetic waves travel through space: they do so because of the mechanism described in Maxwell’s equation: a changing magnetic field causes a changing electric field, and a changing magnetic field causes a (changing) electric field, as illustrated below.

Maxwell interaction

So we need some First Cause to get it all started 🙂 i.e. some current, i.e. some moving charge, but then the electromagnetic wave travels, all by itself, through empty space, completely detached from the cause. You know that by now – indeed, you’ve heard this a thousand times before – but, if you’re reading this, you want to know how it works exactly. 🙂

In my post on the Lorentz gauge, I included a few links to Feynman’s Lectures that explain the nitty-gritty of this mechanism from various angles. However, they’re pretty horrendous to read, and so I just want to summarize them a bit—if only for myself, so as to remind myself what’s important and not. In this post, I’ll focus on the speed of light: why do electromagnetic waves – light – travel at the speed of light?

You’ll immediately say: that’s a nonsensical question. It’s light, so it travels at the speed of light. Sure, smart-arse! Let me be more precise: how can we relate the speed of light to Maxwell’s equations? That is the question here. Let’s go for it.

Feynman deals with the matter of the speed of an electromagnetic wave, and the speed of light, in a rather complicated exposé on the fields from some infinite sheet of charge that is suddenly set into motion, parallel to itself, as shown below. The situation looks – and actually is – very simple, but the math is rather messy because of the rather exotic assumptions: infinite sheets and infinite acceleration are not easy to deal with. 🙂 But so the whole point of the exposé is just to prove that the speed of propagation (v) of the electric and magnetic fields is equal to the speed of light (c), and it does a marvelous job at that. So let’s focus on that here only. So what I am saying is that I am going to leave out most of the nitty-gritty and just try to get to that v = result as fast as I possibly can. So, fasten your seat belt, please.

sheet of charge

Most of the nitty-gritty in Feynman’s exposé is about how to determine the direction and magnitude of the electric and magnetic fields, i.e. E and B. Now, when the nitty-gritty business is finished, the grand conclusion is that both E and B travel out in both the positive as well as the negative x-direction at some speed v and sort of ‘fill’ the entire space as they do. Now, the region they are filling extends infinitely far in both the y- and z-direction but, because they travel along the x-axis, there are no fields (yet) in the region beyond x = ± v·t (t = 0 is the moment when the sheet started moving, and it moves in the positive y-direction). As you can see, the sheet of charge fills the yz-plane, and the assumption is that its speed goes from zero to u instantaneously, or very very quickly at least. So the E and B fields move out like a tidal wave, as illustrated below, and thereby ‘fill’ the space indeed, as they move out.

tidal wave

The magnitude of E and B is constant, but it’s not the same constant, and part of the exercise here is to determine the relationship between the two constants. As for their direction, you can see it in the first illustration: B points in the negative z-direction for x > 0 and in the positive z-direction for x < 0, while E‘s direction is opposite to u‘s direction everywhere, so E points in the negative y-direction. As said, you should just take my word for it, because the nitty-gritty on this – which we do not want to deal with here – is all in Feynman and so I don’t want to copy that.

The crux of the argument revolves around what happens at the wavefront itself, as it travels out. Feynman relates flux and circulation there. It’s the typical thing to do: it’s at the wavefront itself that the fields change: before they were zero, and now they are equal to that constant. The fields do not change anywhere else, so there’s no changing flux or circulation business to be analyzed anywhere else. So we define two loops at the wavefront itself: Γ1 and Γ2. They are normal to each other (cf. the top and side view of the situation below), because the E and B fields are normal to each other. And so then we use Maxwell’s equations to check out what happens with the flux and circulation there and conclude what needs to be concluded. 🙂

top view side view

We start with rectangle Γ2. So one side is in the region where there are fields, and one side is in the region where the fields haven’t reached yet. There is some magnetic flux through this loop, and it is changing, so there is an emf around it, i.e. some circulation of E. The flux changes because the area in which B exists increases at speed v. Now, the time rate of change of the flux is, obviously, the width of the rectangle L times the rate of change of the area, so that’s (B·L·v·Δt)/Δt = B·L·v, with Δt some differential time interval co-defining how slow or how fast the field changes. Now, according to Faraday’s Law (see my previous post), this will be equal to minus the line integral of E around Γ2, which is E·L. So E·L = B·L·v and, hence, we find: E = v·B.

Interesting! To satisfy Faraday’s equation (which is just one of Maxwell’s equations in integral rather than in differential form), E must equal B times v, with v the speed of propagation of our ‘tidal’ wave. Now let’s look at Γ1. There we should apply:

IntegralNow the line integral is just B·L, and the right-hand side is E·L·v, so, not forgetting that c2 in front—i.e. the square of the speed of light, as you know!—we get: c2B = E·v, or E = (c2/v)·B. 

Now, the E = v·B and E = (c2/v)·B equations must both apply (we’re talking one wave and one and the same phenomenon) and, obviously, that’s only possible if v = c2/v, i.e. if v = c. So the wavefront must travel at the speed of light! Waw ! That’s fast. 🙂 Yes. […] Jokes aside, that’s the result we wanted here: we just proved that the speed of travel of an electromagnetic wave must be equal to the speed of light.

As an added bonus, we also showed the mechanism of travel. It’s obvious from the equations we used to prove the result: it works through the derivatives of the fields with respect to time, i.e. ∂E/∂t and ∂B/∂t.

Done! Great! Enjoy the view!

Well… Yes and no. If you’re smart, you’ll say: we got this result because of the c2 factor in that equation, so Maxwell had already put it in, so to speak. Waw! You really are a smart-arse, aren’t you? 🙂

The thing is… Well… The answer is: no. Maxwell did not put it in. Well… Yes and no. Let me explain. Maxwell’s first equation was the electric flux law ·E = σ/ε0: the flux of E through a closed surface is proportional to the charge inside. So that’s basically an other way of writing Coulomb’s Law, and ε0 was just some constant in it, the electric constant. So it’s a constant of proportionality that depends on the unit in which we measure electric charge. The only reason that it’s there is to make the units come out alright, so if we’d measure charge not in coulomb (C) in a unit equal to 1 C/ε0, it would disappear. If we’d do that, our new unit would be equivalent to the charge of some 700,000 protons. You can figure that magical number yourself by checking the values of the proton charge and ε0. 🙂

OK. And then Faraday came up with the exact laws for magnetism, and they involved current and some other constant of proportionality, and Maxwell formalized that by writing ×B = μ0j, with μ0 the magnetic constant. It’s not a flux law but a circulation law: currents cause circulation of B. We get the flux rule from it by integrating it. But currents are moving charges, and so Maxwell knew magnetism was related to the same thing: electric charge. So Maxwell knew the two constants had to be related. In fact, when putting the full set of equations together – there are four, as you know – Maxwell figured out that μtimes εwould have to be equal to the reciprocal of c2, with the speed of propagation of the wave. So Maxwell knew that, whatever the unit of charge, we’d get two constants of proportionality, and electric and a magnetic constant, and that μ0·εwould be equal to 1/c2. However, while he knew that, at the time, light and electromagnetism were considered to be separate phenomena, and so Maxwell did not say that c was the speed of light: the only thing his equations told him was that is the speed of propagation of that ‘electromagnetic’ wave that came out of his equations.

The rest is history. In 1856, the great Wilhelm Eduard Weber – you’ve seen his name before, didn’t you? – did a whole bunch of experiments which measured the electric constant rather precisely, and Maxwell jumped on it and calculated all the rest, i.e. μ0, and so then he took the reciprocal of the square root of μ0·εand – Bang! – he had c, the speed of propagation of the electromagnetic wave he was thinking of. Now, was some value of the order of 3×108 m/s, and so that happened to be the same as the speed of light, which suggested that Maxwell’s c and the speed of light were actually one and the same thing!

Now, I am a smart-arse too 🙂 and, hence, when I first heard this story, I actually wondered how Maxwell could possibly know the speed of light at the time: Maxwell died many years before the Michelson-Morley experiment unequivocally established the value of the speed of light. [In case, you wonder: the Michelson-Morley experiment was done in 1887. So I check it. The fact is that the Michelson-Morley experiment concluded that the speed of light was an absolute value and that, in the process of doing so, they got a rather precise value for it, but the value of itself has already been established, more or less, that is, by a Danish astronomer, Ole Römer, in 1676 ! He did so by carefully observing the timing of the repeating eclipses of Io, one of Jupiter’s moons. Newton mentioned his results in his Principia, which he wrote in 1687, duly noting that it takes about seven to eight minutes for light to travel from the Sun to the Earth. Done! The whole story is fascinating, really, so you should check it out yourself. 🙂

In any case, to make a long story short, Maxwell was puzzled by this mysterious coincidence, but he was bold enough to immediately point to the right conclusion, tentatively at least, and so he told the Cambridge Philosophical Society, in the very same year, i.e. 1856, that “we can scarcely avoid the inference that light consists in the transverse undulations of the same medium which is the cause of electric and magnetic phenomena.”

So… Well… Maxwell still suggests light needs some medium here, so the ‘medium’ is a reference to the infamous aether theory, but that’s not the point: what he says here is what we all take for granted now: light is an electromagnetic wave. So now we know there’s absolute no reason whatsoever to avoid the ‘inference’, but… Well… 160 years ago, it was quite a big deal to suggest something like that. 🙂

So that’s the full story. I hoped you like it. Don’t underestimate what you just did: understanding an argument like this is like “climbing a great peak”, as Feynman puts it. So it is “a great moment” indeed. 🙂 The only thing left is, perhaps, to explain the ‘other’ flux rules I used above. Indeed, you know Faraday’s Law:

emf

But that other one? Well… As I explained in my previous post, Faraday’s Law is the integral form of Maxwell’s second equation: −∂B/∂t = ×E. The ‘other’ flux rule above – so that’s the one with the c2 in front and without a minus sign, is the integral form of Maxwell’s fourth equation: c2×= j+ ∂E/∂t, taking into account that we’re talking a wave traveling in free space, so there are no charges and currents (it’s just a wave in empty space—whatever that means) and, hence, the Maxwell equation reduces to c2×= ∂E/∂t. Now, I could take you through the same gymnastics as I did in my previous post but, if I were you, I’d just apply the general principle that ”the same equations must yield the same solutions” and so I’d just switch E for B and vice versa in Faraday’s equation. 🙂

So we’re done… Well… Perhaps one more thing. We’ve got these flux rules above telling us that the electromagnetic wave will travel all by itself, through empty space, completely detached from its First Cause. But… […] Well… Again you may think there’s some trick here. In other words, you may think the wavefront has to remain connected to the First Cause somehow, just like the whip below is connected to some person whipping it. 🙂

Bullwhip_effect

There’s no such connection. The whip is not needed. 🙂 If we’d switch off the First Cause after some time T, so our moving sheet stops moving, then we’d have the pulse below traveling through empty space. As Feynman puts it: “The fields have taken off: they are freely propagating through space, no longer connected in any way with the source. The caterpillar has turned into a butterfly! 

wavefront

Now, the last question is always the same: what are those fields? What’s their reality? Here, I should refer you to one of the most delightful sections in Feynman’s Lectures. It’s on the scientific imagination. I’ll just quote the introduction to it, but I warmly recommend you go and check it out for yourself: it has no formulas whatsoever, and so you should understand all of it without any problem at all. 🙂

“I have asked you to imagine these electric and magnetic fields. What do you do? Do you know how? How do I imagine the electric and magnetic field? What do I actually see? What are the demands of scientific imagination? Is it any different from trying to imagine that the room is full of invisible angels? No, it is not like imagining invisible angels. It requires a much higher degree of imagination to understand the electromagnetic field than to understand invisible angels. Why? Because to make invisible angels understandable, all I have to do is to alter their properties a little bit—I make them slightly visible, and then I can see the shapes of their wings, and bodies, and halos. Once I succeed in imagining a visible angel, the abstraction required—which is to take almost invisible angels and imagine them completely invisible—is relatively easy. So you say, “Professor, please give me an approximate description of the electromagnetic waves, even though it may be slightly inaccurate, so that I too can see them as well as I can see almost invisible angels. Then I will modify the picture to the necessary abstraction.”

I’m sorry I can’t do that for you. I don’t know how. I have no picture of this electromagnetic field that is in any sense accurate. I have known about the electromagnetic field a long time—I was in the same position 25 years ago that you are now, and I have had 25 years more of experience thinking about these wiggling waves. When I start describing the magnetic field moving through space, I speak of the and fields and wave my arms and you may imagine that I can see them. I’ll tell you what I see. I see some kind of vague shadowy, wiggling lines—here and there is an E and a B written on them somehow, and perhaps some of the lines have arrows on them—an arrow here or there which disappears when I look too closely at it. When I talk about the fields swishing through space, I have a terrible confusion between the symbols I use to describe the objects and the objects themselves. I cannot really make a picture that is even nearly like the true waves. So if you have some difficulty in making such a picture, you should not be worried that your difficulty is unusual.

Our science makes terrific demands on the imagination. The degree of imagination that is required is much more extreme than that required for some of the ancient ideas. The modern ideas are much harder to imagine. We use a lot of tools, though. We use mathematical equations and rules, and make a lot of pictures. What I realize now is that when I talk about the electromagnetic field in space, I see some kind of a superposition of all of the diagrams which I’ve ever seen drawn about them. I don’t see little bundles of field lines running about because it worries me that if I ran at a different speed the bundles would disappear, I don’t even always see the electric and magnetic fields because sometimes I think I should have made a picture with the vector potential and the scalar potential, for those were perhaps the more physically significant things that were wiggling.

Perhaps the only hope, you say, is to take a mathematical view. Now what is a mathematical view? From a mathematical view, there is an electric field vector and a magnetic field vector at every point in space; that is, there are six numbers associated with every point. Can you imagine six numbers associated with each point in space? That’s too hard. Can you imagine even one number associated with every point? I cannot! I can imagine such a thing as the temperature at every point in space. That seems to be understandable. There is a hotness and coldness that varies from place to place. But I honestly do not understand the idea of a number at every point.

So perhaps we should put the question: Can we represent the electric field by something more like a temperature, say like the displacement of a piece of jello? Suppose that we were to begin by imagining that the world was filled with thin jello and that the fields represented some distortion—say a stretching or twisting—of the jello. Then we could visualize the field. After we “see” what it is like we could abstract the jello away. For many years that’s what people tried to do. Maxwell, Ampère, Faraday, and others tried to understand electromagnetism this way. (Sometimes they called the abstract jello “ether.”) But it turned out that the attempt to imagine the electromagnetic field in that way was really standing in the way of progress. We are unfortunately limited to abstractions, to using instruments to detect the field, to using mathematical symbols to describe the field, etc. But nevertheless, in some sense the fields are real, because after we are all finished fiddling around with mathematical equations—with or without making pictures and drawings or trying to visualize the thing—we can still make the instruments detect the signals from Mariner II and find out about galaxies a billion miles away, and so on.

The whole question of imagination in science is often misunderstood by people in other disciplines. They try to test our imagination in the following way. They say, “Here is a picture of some people in a situation. What do you imagine will happen next?” When we say, “I can’t imagine,” they may think we have a weak imagination. They overlook the fact that whatever we are allowed to imagine in science must be consistent with everything else we know: that the electric fields and the waves we talk about are not just some happy thoughts which we are free to make as we wish, but ideas which must be consistent with all the laws of physics we know. We can’t allow ourselves to seriously imagine things which are obviously in contradiction to the known laws of nature. And so our kind of imagination is quite a difficult game. One has to have the imagination to think of something that has never been seen before, never been heard of before. At the same time the thoughts are restricted in a strait jacket, so to speak, limited by the conditions that come from our knowledge of the way nature really is. The problem of creating something which is new, but which is consistent with everything which has been seen before, is one of extreme difficulty.”

Isn’t that great? I mean: Feynman, one of the greatest physicists of all time, didn’t write what he wrote above when he was a undergrad student or so. No. He did so in 1964, when he was 45 years old, at the height of his scientific career! And it gets better, because Feynman then starts talking about beauty. What is beauty in science? Well… Just click and check what Feynman thinks about it. 🙂

Oh… Last thing. So what is the magnitude of the E and B field? Well… You can work it out yourself, but I’ll give you the answer. The geometry of the situation makes it clear that the electric field has a y-component only, and the magnetic field a z-component only. Their magnitudes are given in terms of J, i.e. the surface current density going in the positive y-direction:

equation

Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Music and Math

Pre-scriptum (dated 26 June 2020): These posts on elementary math and physics have not suffered much the attack by the dark force—which is good because I still like them. While my views on the true nature of light, matter and the force or forces that act on them have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. In fact, I find the simplest stuff is often the best. 🙂

Original post:

I ended my previous post, on Music and Physics, by emphatically making the point that music is all about structure, about mathematical relations. Let me summarize the basics:

1. The octave is the musical unit, defined as the interval between two pitches with the higher frequency being twice the frequency of the lower pitch. Let’s denote the lower and higher pitch by a and b respectively, so we say that b‘s frequency is twice that of a.

2. We then divide the [a, b] interval (whose length is unity) in twelve equal sub-intervals, which define eleven notes in-between a and b. The pitch of the notes in-between is defined by the exponential function connecting a and b. What exponential function? The exponential function with base 2, so that’s the function y = 2x.

Why base 2? Because of the doubling of the frequencies when going from a to b, and when going from b to b + 1, and from b + 1 to b + 2, etcetera. In music, we give a, b, b + 1, b + 2, etcetera the same name, or symbol: A, for example. Or Do. Or C. Or Re. Whatever. If we have the unit and the number of sub-intervals, all the rest follows. We just add a number to distinguish the various As, or Cs, or Gs, so we write A1, A2, etcetera. Or C1, C2, etcetera. The graph below illustrates the principle for the interval between C4 and C5. Don’t think the function is linear. It’s exponential: note the logarithmic frequency scale. To make the point, I also inserted another illustration (credit for that graph goes to another blogger).

Frequency_vs_name

equal-tempered-scale-graph-linear

You’ll wonder: why twelve sub-intervals? Well… That’s random. Non-Western cultures use a different number. Eight instead of twelve, for example—which is more logical, at first sight at least: eight intervals amounts to dividing the interval in two equal halves, and the halves in halves again, and then once more: so the length of the sub-interval is then 1/2·1/2·1/2 = (1/2)3 = 1/8. But why wouldn’t we divide by three, so we have 9 = 3·3 sub-intervals? Or by 27 = 3·3·3? Or by 16? Or by 5?

The answer is: we don’t know. The limited sensitivity of our ear demands that the intervals be cut up somehow. [You can do tests of the sensitivity of your ear to relative frequency differences online: it’s fun. Just try them! Some of the sites may recommend a hearing aid, but don’t take that crap.] So… The bottom line is that, somehow, mankind settled on twelve sub-intervals within our musical unit—or our sound unit, I should say. So it is what it is, and the ratio of the frequencies between two successive (semi)tones (e.g. C and C#, or E and F, as E and F are also separated by one half-step only) is 21/12 = 1.059463… Hence, the pitch of each note is about 6% higher than the pitch of the previous note. OK. Next thing.

3. What’s the similarity between C1, C2, C3 etcetera? Or between A1, A2, A3 etcetera? The answer is: harmonics. The frequency of the first overtone of a string tuned at pitch A3 (i.e. 220 Hz) is equal to the fundamental frequency of a string tuned at pitch A4 (i.e. 440 Hz). Likewise, the frequency of the (pitch of the) C4 note above (which is the so-called middle C) is 261.626 Hz, while the frequency of the (pitch of the) next C note (C5) is twice that frequency: 523.251 Hz. [I should quickly clarify the terminology here: a tone consists of several harmonics, with frequencies f, 2·f, 3·f,… n·f,… The first harmonic is referred to as the fundamental, with frequency f. The second, third, etc harmonics are referred to as overtones, with frequency 2·f, 3·f, etc.]

To make a long story short: our ear is able to identify the individual harmonics in a tone, and if the frequency of the first harmonic of one tone (i.e. the fundamental) is the same frequency as the second harmonic of another, then we feel they are separated by one musical unit.

Isn’t that most remarkable? Why would it be that way?

My intuition tells me I should look at the energy of the components. The energy theorem tells us that the total energy in a wave is just the sum of the energies in all of the Fourier components. Surely, the fundamental must carry most of the energy, and then the first overtone, and then the second. Really? Is that so?

Well… I checked online to see if there’s anything on that, but my quick check reveals there’s nothing much out there in terms of research: if you’d google ‘energy levels of overtones’, you’ll get hundreds of links to research on the vibrational modes of molecules, but nothing that’s related to music theory. So… Well… Perhaps this is my first truly original post! 🙂 Let’s go for it. 🙂

The energy in a wave is proportional to the square of its amplitude, and we must integrate over one period (T) of the oscillation. The illustration below should help you to understand what’s going on. The fundamental mode of the wave is an oscillation with a wavelength (λ1) that is twice the length of the string (L). For the second mode, the wavelength (λ2) is just L. For the third mode, we find that λ3 = (2/3)·L. More in general, the wavelength of the nth mode is λn = (2/n)·L.

modes

The illustration above shows that we’re talking sine waves here, differing in their frequency (or wavelength) only. [The speed of the wave (c), as it travels back and forth along the string, i constant, so frequency and wavelength are in that simple relationship: c = f·λ.] Simplifying and normalizing (i.e. choosing the ‘right’ units by multiplying scales with some proportionality constant), the energy of the first mode would be (proportional to):

Integral 1

What about the second and third modes? For the second mode, we have two oscillations per cycle, but we still need to integrate over the period of the first mode T = T1, which is twice the period of the second mode: T1 = 2·T2. Hence, T2 = (1/2)·T1. Therefore, the argument of the sine wave (i.e. the x variable in the integral above) should go from 0 to 4π. However, we want to compare the energies of the various modes, so let’s substitute cleverly. We write:

Integral 2

The period of the third mode is equal to T3 = (1/3)·T1. Conversely, T1 = 3·T3. Hence, the argument of the sine wave should go from 0 to 6π. Again, we’ll substitute cleverly so as to make the energies comparable. We write:

Integral 3

Now that is interesting! For a so-called ideal string, whose motion is the sum of a sinusoidal oscillation at the fundamental frequency f, another at the second harmonic frequency 2·f, another at the third harmonic 3·f, etcetera, we find that the energies of the various modes are proportional to the values in the harmonic series 1, 1/2, 1/3, 1/4,… 1/n, etcetera. Again, Pythagoras’ conclusion was wrong (the ratio of frequencies of individual notes do not respect simple ratios), but his intuition was right: the harmonic series ∑n−1 (n = 1, 2,…,∞) is very relevant in describing natural phenomena. It gives us the respective energies of the various natural modes of a vibrating string! In the graph below, the values are represented as areas. It is all quite deep and mysterious really!

602px-Integral_Test

So now we know why we feel C4 and C5 have so much in common that we call them by the same name: C, or Do. It also helps us to understand why the E and A tones have so much in common: the third harmonic of the 110 Hz A2 string corresponds to the fundamental frequency of the E4 string: both are 330 Hz! Hence, E and A have ‘energy in common’, so to speak, but less ‘energy in common’ than two successive E notes, or two successive A notes, or two successive C notes (like C4 and C5).

[…] Well… Sort of… In fact, the analysis above is quite appealing but – I hate to say it – it’s wrong, as I explain in my post scriptum to this post. It’s like Pythagoras’ number theory of the Universe: the intuition behind is OK, but the conclusions aren’t quite right. 🙂

Ideality versus reality

We’ve been talking ideal strings. Actual tones coming out of actual strings have a quality, which is determined by the relative amounts of the various harmonics that are present in the tone, which is not some simple sum of sinusoidal functions. Actual tones have a waveform that may resemble something like the wavefunction I presented in my previous post, when discussing Fourier analysis. Let me insert that illustration once again (and let me also acknowledge its source once more: it’s Wikipedia). The red waveform is the sum of six sine functions, with harmonically related frequencies, but with different amplitudes. Hence, the energy levels of the various modes will not be proportional to the values in that harmonic series ∑n−1, with n = 1, 2,…,∞.

Fourier_series_and_transform

Das wohltemperierte Klavier

Nothing in what I wrote above is related to questions of taste like: why do I seldomly select a classical music channel on my online radio station? Or why am I not into hip hop, even if my taste for music is quite similar to that of the common crowd (as evidenced from the fact that I like ‘Listeners’ Top’ hit lists)?

Not sure. It’s an unresolved topic, I guess—involving rhythm and other ‘structures’ I did not mention. Indeed, all of the above just tells us a nice story about the structure of the language of music: it’s a story about the tones, and how they are related to each other. That relation is, in essence, an exponential function with base 2. That’s all. Nothing more, nothing less. It’s remarkably simple and, at the same time, endlessly deep. 🙂 But so it is not a story about the structure of a musical piece itself, of a pop song of Ellie Goulding, for instance, or one of Bach’s preludes or fugues.

That brings me back to the original question I raised in my previous post. It’s a question which was triggered, long time ago, when I tried to read Douglas Hofstadter‘s Gödel, Escher and Bach, frustrated because my brother seemed to understand it, and I didn’t. So I put it down, and never ever looked at it again. So what is it really about that famous piece of Bach?

Frankly, I still amn’t sure. As I mentioned in my previous post, musicians were struggling to find a tuning system that would allow them to easily transpose musical compositions. Transposing music amounts to changing the so-called key of a musical piece, so that’s moving the whole piece up or down in pitch by some constant interval that is not equal to an octave. It’s a piece of cake now. In fact, increasing or decreasing the playback speed of a recording also amounts to transposing a piece: a increase or decrease of the playback speed by 6% will shift the pitch up or down by about one semitone. Why? Well… Go back to what I wrote above about that 12th root of 2. We’ve got the right tuning system now, and so everything is easy. Logarithms are great! 🙂

Back to Bach. Despite their admiration for the Greek ideas around aesthetics – and, most notably, their fascination with harmonic ratios! – (almost) all Renaissance musicians were struggling with the so-called Pythagorean tuning system, which was used until the 18th century and which was based on a correct observation (similar strings, under the same tension but differing in length, sound ‘pleasant’ when sounded together if – and only if  – the ratio of the length of the strings is like 1:2, 2:3, 3:4, 3:5, 4:5, etcetera) but a wrong conclusion (the frequencies of musical tones should also obey the same harmonic ratios), and Bach’s so-called ‘good’ temperament tuning system was designed such that the piece could, indeed, be played in most keys without sounding… well… out of tune. 🙂

Having said that, the modern ‘equal temperament’ tuning system, which prescribes that tuning should be done such that the notes are in the above-described simple logarithmic relation to each other, had already been invented. So the true question is: why didn’t Bach embrace it? Why did he stick to ratios? Why did it take so long for the right system to be accepted?

I don’t know. If you google, you’ll find a zillion of possible explanations. As far as I can see, most are all rather mystic. More importantly, most of them do not mention many facts. My explanation is rather simple: while Bach was, obviously, a musical genius, he may not have understood what an exponential, or a logarithm, is all about. Indeed, a quick read of summary biographies reveals that Bach studied a wide range of topics, like Latin and Greek, and theology—of course! But math is not mentioned. He didn’t write about tuning and all that: all of his time went to writing musical masterpieces!

What the biographies do mention is that he always found other people’s tunings unsatisfactory, and that he tuned his harpsichords and clavichords himself. Now that is quite revealing, I’d say! In my view, Bach couldn’t care less about the ratios. He knew something was wrong with the Pythagorean system (or the variants as were then used, which are referred to as meantone temperament) and, as a musical genius, he probably ended up tuning by ear. [For those who’d wonder what I am talking about, let me quickly insert a Wikipedia graph illustrating the difference between the Pythagorean system (and two of these meantone variants) and the equal temperament tuning system in use today.]

Meantone

So… What’s the point I am trying to make? Well… Frankly, I’d bet Bach’s own tuning was actually equal temperament, and so he should have named his masterpiece Das gleichtemperierte Klavier. Then we wouldn’t have all that ‘noise’ around it. 🙂

Post scriptum: Did you like the argument on the respective energy levels of the harmonics of an ideal string? Too bad. It’s wrong. I made a common mistake: when substituting variables in the integral, I ‘forgot’ to substitute the lower and upper bound of the interval over which I was integrating the function. The calculation below corrects the mistake, and so it does the required substitutions—for the first three modes at least. What’s going on here? Well… Nothing much… I just integrate over the length L taking a snapshot at t = 0 (as mentioned, we can always shift the origin of our independent variable, so here we do it for time and so it’s OK). Hence, the argument of our wave function sin(kx−ωt) reduces to kx, with k = 2π/λ, and λ = 2L, λ = L, λ = (2/3)·L for the first, second and third mode respectively. [As for solving the integral of the sine squared, you can google the formula, and please do check my substitutions. They should be OK, but… Well… We never know, do we? :-)]

energy integrals

[…] No… This doesn’t make all that much sense either. Those integrals yield the same energy for all three modes. Something must be wrong: shorter wavelengths (i.e. higher frequencies) are associated with higher energy levels. Full stop. So the ‘solution’ above can’t be right… […] You’re right. That’s where the time aspect comes into play. We were taking a snapshot, indeed, and the mean value of the sine squared function is 1/2 = 0.5, as should be clear from Pythagoras’ theorem: cos2x + sin2x = 1. So what I was doing is like integrating a constant function over the same-length interval. So… Well… Yes: no wonder I get the same value again and again.

[…]

We need to integrate over the same time interval. You could do that, as an exercise, but there’s a more direct approach to it: the energy of a wave is directly proportional to its frequency, so we write: E ∼ f. If the frequency doubles, triples, quadruples etcetera, then its energy doubles, triples, quadruples etcetera too. But – remember – we’re talking one string only here, with a fixed wave speed c = λ·f – so f = c/λ (read: the frequency is inversely proportional to the wavelength) – and, therefore (assuming the same (maximum) amplitude), we get that the energy level of each mode is inversely proportional to the wavelength, so we find that E ∼ 1/f.

Now, with direct or inverse proportionality relations, we can always invent some new unit that makes the relationship an identity, so let’s do that and turn it into an equation indeed. [And, yes, sorry… I apologize again to your old math teacher: he may not quite agree with the shortcut I am taking here, but he’ll justify the logic behind.] So… Remembering that λ1 = 2L, λ2 = L, λ3 = (2/3)·L, etcetera, we can then write:

E1 = (1/2)/L, E2 = (2/2)/L, E3 = (3/2)/L, E4 = (4/2)/L, E5 = (5/2)/L,…, En = (n/2)/L,…

That’s a really nice result, because… Well… In quantum theory, we have this so-called equipartition theorem, which says that the permitted energy levels of a harmonic oscillator are equally spaced, with the interval between them equal to h or ħ (if you use the angular frequency to describe a wave (so that’s ω = 2π·f), then Planck’s constant (h) becomes ħ = h/2π). So here we’ve got equipartition too, with the interval between the various energy levels equal to (1/2)/L.

You’ll say: So what? Frankly, if this doesn’t amaze you, stop reading—but if this doesn’t amaze you, you actually stopped reading a long time ago. 🙂 Look at what we’ve got here. We didn’t specify anything about that string, so we didn’t care about its materials or diameter or tension or how it was made (a wound guitar string is a terribly complicated thing!) or about whatever. Still, we know its fundamental (or normal) modes, and their frequency or nodes or energy or whatever depend on the length of the string only, with the ‘fundamental’ unit of energy being equal to the reciprocal length. Full stop. So all is just a matter of size and proportions. In other words, it’s all about structure. Absolute measurements don’t matter.

You may say: Bull****. What’s the conclusion? You still didn’t tell me anything about how the total energy of the wave is supposed to be distributed over its normal modes! 

That’s true. I didn’t. Why? Well… I am not sure, really. I presented a lot of stuff here, but I did not present a clear and unambiguous answer as to how the total energy of a string is distributed over its modes. Not for actual strings, nor for ideal strings. Let me be honest: I don’t know. I really don’t. Having said that, my guts instinct that most of the energy – of, let’s say, a C4 note – should be in the primary mode (i.e. in the fundamental frequency) must be right: otherwise we would not call it a C4 note. So let’s try to make some assumptions. However, before doing so, let’s first briefly touch base with reality.

For actual strings (or actual musical sounds), I suspect the analysis can be quite complicated, as evidenced by the following illustration, which I took from one of the many interesting sites on this topic. Let me quote the author: “A flute is essentially a tube that is open at both ends. Air is blown across one end and sound comes out the other. The harmonics are all whole number multiples of the fundamental frequency (436 Hz, a slightly flat A4 — a bit lower in frequency than is normally acceptable). Note how the second harmonic is nearly as intense as the fundamental. [My = blog writer’s 🙂 italics] This strong second harmonic is part of what makes a flute sound like a flute.”

Hmmm… What I see in the graph is a first harmonic that is actually more intense than its fundamental, so what’s that all about? So can we actually associate a specific frequency to that tone? Not sure. :-/ So we’re in trouble already.

flute

If reality doesn’t match our thinking, what about ideality? Hmmm… What to say? As for ideal strings – or ideal flutes 🙂 – I’d venture to say that the most obvious distribution of energy over the various modes (or harmonics, when we’re talking sound) would is the Boltzmann distribution.

Huh? Yes. Have a look at one of my posts on statistical mechanics. It’s a weird thing: the distribution of molecular speeds in a gas, or the density of the air in the atmosphere, or whatever involving many particles and/or a great degree of complexity (so many, or such a degree of complexity, that only some kind of statistical approach to the problem works—all that involves Boltzmann’s Law, which basically says the distribution function will be a function of the energy levels involved: fe–energy. So… Well… Yes. It’s the logarithmic scale again. It seems to govern the Universe. 🙂

Huh? Yes. That’s why think: the distribution of the total energy of the oscillation should be some Boltzmann function, so it should depend on the energy of the modes: most of the energy will be in the lower modes, and most of the most in the fundamental. […] Hmmm… It again begs the question: how much exactly?

Well… The Boltzmann distribution strongly resembles the ‘harmonic’ distribution shown above (1, 1/2, 1/3, 1/4 etc), but it’s not quite the same. The graph below shows how they are similar and dissimilar in shape. You can experiment yourself with coefficients and all that, but your conclusion will be the same. As they say in Asia: they are “same-same but different.” 🙂 […] It’s like the ‘good’ and ‘equal’ temperament used when tuning musical instruments: the ‘good’ temperament – which is based on harmonic ratios – is good, but not good enough. Only the ‘equal’ temperament obeys the logarithmic scale and, therefore, is perfect. So, as I mentioned already, while my assumption isn’t quite right (the distribution is not harmonic, in the Pythagorean sense), the intuition behind is OK. So it’s just like Pythagoras’ number theory of the Universe. Having said that, I’ll leave it to you to draw the correct the conclusions from it. 🙂

graph

Music and Physics

Pre-scriptum (dated 26 June 2020): These posts on elementary math and physics have not suffered much the attack by the dark force—which is good because I still like them. While my views on the true nature of light, matter and the force or forces that act on them have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. In fact, I find the simplest stuff is often the best. 🙂

Original post:

My first working title for this post was Music and Modes. Yes. Modes. Not moods. The relation between music and moods is an interesting research topic as well but so it’s not what I am going to write about. 🙂

It started with me thinking I should write something on modes indeed, because the concept of a mode of a wave, or any oscillator really, is quite central to physics, both in classical physics as well as in quantum physics (quantum-mechanical systems are analyzed as oscillators too!). But I wondered how to approach it, as it’s a rather boring topic if you look at the math only. But then I was flying back from Europe, to Asia, where I live and, as I am also playing a bit of guitar, I suddenly wanted to know why we like music. And then I thought that’s a question you may have asked yourself at some point of time too! And so then I thought I should write about modes as part of a more interesting story: a story about music—or, to be precise, a story about the physics behind music. So… Let’s go for it.

Philosophy versus physics

There is, of course, a very simple answer to the question of why we like music: we like music because it is music. If it would not be music, we would not like it. That’s a rather philosophical answer, and it probably satisfies most people. However, for someone studying physics, that answer can surely not be sufficient. What’s the physics behind? I reviewed Feynman’s Lecture on sound waves in the plane, combined it with some other stuff I googled when I arrived, and then I wrote this post, which gives you a much less philosophical answer. 🙂

The observation at the center of the discussion is deceptively simple: why is it that similar strings (i.e. strings made of the same material, with the same thickness, etc), under the same tension but differing in length, sound ‘pleasant’ when sounded together if – and only if  – the ratio of the length of the strings is like 1:2, 2:3, 3:4, 3:5, 4:5, etc (i.e. like whatever other ratio of two small integers)?

You probably wonder: is that the question, really? It is. The question is deceptively simple indeed because, as you will see in a moment, the answer is quite complicated. So complicated, in fact, that the Pythagoreans didn’t have any answer. Nor did anyone else for that matter—until the 18th century or so, when musicians, physicists and mathematicians alike started to realize that a string (of a guitar, or a piano, or whatever instrument Pythagoras was thinking of at the time), or a column of air (in a pipe organ or a trumpet, for example), or whatever other thing that actually creates the musical tone, actually oscillates at numerous frequencies simultaneously.

The Pythagoreans did not suspect that a string, in itself, is a rather complicated thing – something which physicists refer to as a harmonic oscillator – and that its sound, therefore, is actually produced by many frequencies, instead of only one. The concept of a pure note, i.e. a tone that is free of harmonics (i.e. free of all other frequencies, except for the fundamental frequency) also didn’t exist at the time. And if it did, they would not have been able to produce a pure tone anyway: producing pure tones – or notes, as I’ll call them, somewhat inaccurately (I should say: a pure pitch) – is remarkably complicated, and they do not exist in Nature. If the Pythagoreans would have been able to produce pure tones, they would have observed that pure tones do not give any sensation of consonance or dissonance if their relative frequencies respect those simple ratios. Indeed, repeated experiments, in which such pure tones are being produced, have shown that human beings can’t really say whether it’s a musical sound or not: it’s just sound, and it’s neither pleasant (or consonant, we should say) or unpleasant (i.e. dissonant).

The Pythagorean observation is valid, however, for actual (i.e. non-pure) musical tones. In short, we need to distinguish between tones and notes (i.e. pure tones): they are two very different things, and the gist of the whole argument is that musical tones coming out of one (or more) string(s) under tension are full of harmonics and, as I’ll explain in a minute, that’s what explains the observed relation between the lengths of those strings and the phenomenon of consonance (i.e. sounding ‘pleasant’) or dissonance (i.e. sounding ‘unpleasant’).

Of course, it’s easy to say what I say above: we’re 2015 now, and so we have the benefit of hindsight. Back then –  so that’s more than 2,500 years ago! – the simple but remarkable fact that the lengths of similar strings should respect some simple ratio if they are to sound ‘nice’ together, triggered a fascination with number theory (in fact, the Pythagoreans actually established the foundations of what is now known as number theory). Indeed, Pythagoras felt that similar relationships should also hold for other natural phenomena! To mention just one example, the Pythagoreans also believed that the orbits of the planets would also respect such simple numerical relationships, which is why they talked of the ‘music of the spheres’ (Musica Universalis).

We now know that the Pythagoreans were wrong. The proportions in the movements of the planets around the Sun do not respect simple ratios and, with the benefit of hindsight once again, it is regrettable that it took many courageous and brilliant people, such as Galileo Galilei and Copernicus, to convince the Church of that fact. 😦 Also, while Pythagoras’ observations in regard to the sounds coming out of whatever strings he was looking at were correct, his conclusions were wrong: the observation does not imply that the frequencies of musical notes should all be in some simple ratio one to another.

Let me repeat what I wrote above: the frequencies of musical notes are not in some simple relationship one to another. The frequency scale for all musical tones is logarithmic and, while that implies that we can, effectively, do some tricks with ratios based on the properties of the logarithmic scale (as I’ll explain in a moment), the so-called ‘Pythagorean’ tuning system, which is based on simple ratios, was plain wrong, even if it – or some variant of it (instead of the 3:2 ratio, musicians used the 5:4 ratio from about 1510 onwards) – was generally used until the 18th century! In short, Pythagoras was wrong indeed—in this regard at least: we can’t do much with those simple ratios.

Having said that, Pythagoras’ basic intuition was right, and that intuition is still very much what drives physics today: it’s the idea that Nature can be described, or explained (whatever that means), by quantitative relationships only. Let’s have a look at how it actually works for music.

Tones, noise and notes

Let’s first define and distinguish tones and notes. A musical tone is the opposite of noise, and the difference between the two is that musical tones are periodic waveforms, so they have a period T, as illustrated below. In contrast, noise is a non-periodic waveform. It’s as simple as that.

noise versus music

Now, from previous posts, you know we can write any period function as the sum of a potentially infinite number of simple harmonic functions, and that this sum is referred to as the Fourier series. I am just noting it here, so don’t worry about it as for now. I’ll come back to it later.

You also know we have seven musical notes: Do-Re-Mi-Fa-Sol-La-Si or, more common in the English-speaking world, A-B-C-D-E-F-G. And then it starts again with A (or Do). So we have two notes, separated by an interval which is referred to as an octave (from the Greek octo, i.e. eight), with six notes in-between, so that’s eight notes in total. However, you also know that there are notes in-between, except between E and F and between B and C. They are referred to as semitones or half-steps. I prefer the term ‘half-step’ over ‘semitone’, because we’re talking notes really, not tones.

We have, for example, F–sharp (denoted by F#), which we can also call G-flat (denoted by Gb). It’s the same thing: a sharp # raises a note by a semitone (aka half-step), and a flat b lowers it by the same amount, so F# is Gb. That’s what shown below: in an octave, we have eight notes but twelve half-steps. 

Frequency_vs_name

Let’s now look at the frequencies. The frequency scale above (expressed in oscillations per second, so that’s the hertz unit) is a logarithmic scale: frequencies double as we go from one octave to another: the frequency of the C4 note above (the so-called middle C) is 261.626 Hz, while the frequency of the next C note (C5) is double that: 523.251 Hz. [Just in case you’d want to know: the 4 and 5 number refer to its position on a standard 88-key piano keyboard: C4 is the fourth C key on the piano.]

Now, if we equate the interval between C4 and C5 with 1 (so the octave is our musical ‘unit’), then the interval between the twelve half-steps is, obviously, 1/12. Why? Because we have 12 halve-steps in our musical unit. You can also easily verify that, because of the way logarithms work, the ratio of the frequencies of two notes that are separated by one half-step (between D# and E, for example) will be equal to 21/12. Likewise, the ratio of the frequencies of two notes that are separated by half-steps is equal to 2n/12. [In case you’d doubt, just do an example. For instance, if we’d denote the frequency of C4 as f0, and the frequency of C# as f1 and so on (so the frequency of D is f2, the frequency of C5 is f12, and everything else is in-between), then we can write the f2/fratio as f2/f= ( f2/f1)(f1/f0) =  21/12·21/12 = 22/12 = 21/6. I must assume you’re smart enough to generalize this result yourself, and that f12/fis, obviously, equal to 212/12 =21 = 2, which is what it should be!]

Now, because the frequencies of the various C notes are expressed as a number involving some decimal fraction (like 523.251 Hz, and the 0.251 is actually an approximation only), and because they are, therefore, a bit hard to read and/or work with, I’ll illustrate the next idea – i.e. the concept of harmonics – with the A instead of the C. 🙂

Harmonics

The lowest A on a piano is denoted by A0, and its frequency is 27.5 Hz. Lower A notes exist (we have one at 13.75 Hz, for instance) but we don’t use them, because they are near (or actually beyond) the limit of the lowest frequencies we can hear. So let’s stick to our grand piano and start with that 27.5 Hz frequency. The next A note is A1, and its frequency is 55 Hz. We then have A2, which is like the A on my (or your) guitar: its frequency is equal to 2×55 = 110 Hz. The next is A3, for which we double the frequency once again: we’re at 220 Hz now. The next one is the A in the illustration of the C scale above: A4, with a frequency of 440 Hz.

[Let me, just for the record, note that the A4 note is the standard tuning pitch in Western music. Why? Well… There’s no good reason really, except convention. Indeed, we can derive the frequency of any other note from that A4 note using our formula for the ratio of frequencies but, because of the properties of a logarithmic function, we could do the same using whatever other note really. It’s an important point: there’s no such thing as an absolute reference point in music: once we define our musical ‘unit’ (so that’s the so-called octave in Western music), and how many steps we want to have in-between (so that’s 12 steps—again, in Western music, that is), we get all the rest. That’s just how logarithms work. So music is all about structure, i.e. mathematical relationships. Again, Pythagoras’ conclusions were wrong, but his intuition was right.]

Now, the notes we are talking about here are all so-called pure tones. In fact, when I say that the A on our guitar is referred to as A2 and that it has a frequency of 110 Hz, then I am actually making a huge simplification. Worse, I am lying when I say that: when you play a string on a guitar, or when you strike a key on a piano, all kinds of other frequencies – so-called harmonics – will resonate as well, and that’s what gives the quality to the sound: it’s what makes it sound beautiful. So the fundamental frequency (aka as first harmonic) is 110 Hz alright but we’ll also have second, third, fourth, etc harmonics with frequency 220 Hz, 330 Hz, 440 Hz, etcetera. In music, the basic or fundamental frequency is referred to as the pitch of the tone and, as you can see, I often use the term ‘note’ (or pure tone) as a synonym for pitch—which is more or less OK, but not quite correct actually. [However, don’t worry about it: my sloppiness here does not affect the argument.]

What’s the physics behind? Look at the illustration below (I borrowed it from the Physics Classroom site). The thick black line is the string, and the wavelength of its fundamental frequency (i.e. the first harmonic) is twice its length, so we write λ1 = 2·L or, the other way around, L = (1/2)·λ1. Now that’s the so-called first mode of the string. [One often sees the term fundamental or natural or normal mode, but the adjective is not necessary really. In fact, I find it confusing, although I sometimes find myself using it too.]

string

We also have a second, third, etc mode, depicted below, and these modes correspond to the second, third, etc harmonic respectively.

modes

For the second, third, etc mode, the relationship between the wavelength and the length of the string is, obviously, the following: L = (2/2)·λ= λ2, L = L = (3/2)·λ3, etc. More in general, for the nth mode, L will be equal to L = (n/2)·λn, with n = 1, 2, etcetera. In fact, because L is supposed to be some fixed length, we should write it the other way around: λn = (2/n)·L.

What does it imply for the frequencies? We know that the speed of the wave – let’s denote it by c – as it travels up and down the string, is a property of the string, and it’s a property of the string only. In other words, it does not depend on the frequency. Now, the wave velocity is equal to the frequency times the wavelength, always, so we have c = f·λ. To take the example of the (classical) guitar string: its length is 650 mm, i.e. 0.65 m. Hence, the identities λ1 = (2/1)·L, λ2 = (2/2)·L, λ3 = (2/3)·L etc become λ1 = (2/1)·0.65 = 1.3 m, λ2 = (2/2)·0.65 = 0.65 m, λ3 = (2/3)·0.65 = 0.433.. m and so on. Now, combining these wavelengths with the above-mentioned frequencies, we get the wave velocity c = (110 Hz)·(1.3 m) = (220 Hz)·(0.65 m) = (330 Hz)·(0.433.. m) = 143 m/s.

Let me now get back to Pythagoras’ string. You should note that the frequencies of the harmonics produced by a simple guitar string are related to each other by simple whole number ratios. Indeed, the frequencies of the first and second harmonics are in a simple 2 to 1 ratio (2:1). The second and third harmonics have a 3:2 frequency ratio. The third and fourth harmonics a 4:3 ratio. The fifth and fourth harmonic 5:4, and so on and so on. They have to be. Why? Because the harmonics are simple multiples of the basic frequency. Now that is what’s really behind Pythagoras’ observation: when he was sounding similar strings with the same tension but different lengths, he was making sounds with the same harmonics. Nothing more, nothing less. 

Let me be quite explicit here, because the point that I am trying to make here is somewhat subtle. Pythagoras’ string is Pythagoras’ string: he talked similar strings. So we’re not talking some actual guitar or a piano or whatever other string instrument. The strings on (modern) string instruments are not similar, and they do not have the same tension. For example, the six strings of a guitar strings do not differ in length (they’re all 650 mm) but they’re different in tension. The six strings on a classical guitar also have a different diameter, and the first three strings are plain strings, as opposed to the bottom strings, which are wound. So the strings are not similar but very different indeed. To illustrate the point, I copied the values below for just one of the many commercially available guitar string sets.  tensionIt’s the same for piano strings. While they are somewhat more simple (they’re all made of piano wire, which is very high quality steel wire basically), they also differ—not only in length but in diameter as well, typically ranging from 0.85 mm for the highest treble strings to 8.5 mm (so that’s ten times 0.85 mm) for the lowest bass notes.

In short, Pythagoras was not playing the guitar or the piano (or whatever other more sophisticated string instrument that the Greeks surely must have had too) when he was thinking of these harmonic relationships. The physical explanation behind his famous observation is, therefore, quite simple: musical tones that have the same harmonics sound pleasant, or consonant, we should say—from the Latin con-sonare, which, literally, means ‘to sound together’ (from sonare = to sound and con = with). And otherwise… Well… Then they do not sound pleasant: they are dissonant.

To drive the point home, let me emphasize that, when we’re plucking a string, we produce a sound consisting of many frequencies, all in one go. One can see it in practice: if you strike a lower A string on a piano – let’s say the 110 Hz A2 string – then its second harmonic (220 Hz) will make the A3 string vibrate too, because it’s got the same frequency! And then its fourth harmonic will make the A4 string vibrate too, because they’re both at 440 Hz. Of course, the strength of these other vibrations (or their amplitude we should say) will depend on the strength of the other harmonics and we should, of course, expect that the fundamental frequency (i.e. the first harmonic) will absorb most of the energy. So we pluck one string, and so we’ve got one sound, one tone only, but numerous notes at the same time!

In this regard, you should also note that the third harmonic of our 110 Hz A2 string corresponds to the fundamental frequency of the E4 tone: both are 330 Hz! And, of course, the harmonics of E, such as its second harmonic (2·330 Hz = 660 Hz) correspond to higher harmonics of A too! To be specific, the second harmonic of our E string is equal to the sixth harmonic of our A2 string. If your guitar is any good, and if your strings are of reasonable quality too, you’ll actually see it: the (lower) E and A strings co-vibrate if you play the A major chord, but by striking the upper four strings only. So we’ve got energy – motion really – being transferred from the four strings you do strike to the two strings you do not strike! You’ll say: so what? Well… If you’ve got any better proof of the actuality (or reality) of various frequencies being present at the same time, please tell me! 🙂

So that’s why A and E sound very well together (A, E and C#, played together, make up the so-called A major chord): our ear likes matching harmonics. And so that why we like musical tones—or why we define those tones as being musical! 🙂 Let me summarize it once more: musical tones are composite sound waves, consisting of a fundamental frequency and so-called harmonics (so we’ve got many notes or pure tones altogether in one musical tone). Now, when other musical tones have harmonics that are shared, and we sound those notes too, we get the sensation of harmony, i.e. the combination sounds consonant.

Now, i’s not difficult to see that we will always have such shared harmonics if we have similar strings, with the same tension but different lengths, being sounded together. In short, what Pythagoras observed has nothing much to do with notes, but with tones. Let’s go a bit further in the analysis now by introducing some more math. And, yes, I am very sorry: it’s the dreaded Fourier analysis indeed! 🙂

Fourier analysis

You know that we can decompose any periodic function into a sum of a (potentially infinite) series of simple sinusoidal functions, as illustrated below. I took the illustration from Wikipedia: the red function s6(x) is the sum of six sine functions of different amplitudes and (harmonically related) frequencies. The so-called Fourier transform S(f) (in blue) relates the six frequencies with the respective amplitudes.

Fourier_series_and_transform

In light of the discussion above, it is easy to see what this means for the sound coming from a plucked string. Using the angular frequency notation (so we write everything using ω instead of f), we know that the normal or natural modes of oscillation have frequencies ω = 2π/T = 2πf  (so that’s the fundamental frequency or first harmonic), 2ω (second harmonic), 3ω (third harmonic), and so on and so on.

Now, there’s no reason to assume that all of the sinusoidal functions that make up our tone should have the same phase: some phase shift Φ may be there and, hence, we should write our sinusoidal function  not as cos(ωt), but as cos(ωt + Φ) in order to ensure our analysis is general enough. [Why not a sine function? It doesn’t matter: the cosine and sine function are the same, except for another phase shift of 90° = π/2.] Now, from our geometry classes, we know that we can re-write cos(ωt + Φ) as

cos(ωt + Φ) = [cos(Φ)cos(ωt) – sin(Φ)sin(ωt)]

We have a lot of these functions of course – one for each harmonic, in fact – and, hence, we should use subscripts, which is what we do in the formula below, which says that any function f(t) that is periodic with the period T can be written mathematically as:

Fourier series

You may wonder: what’s that period T? It’s the period of the fundamental mode, i.e. the first harmonic. Indeed, the period of the second, third, etc harmonic will only be one half, one third etcetera of the period of the first harmonic. Indeed, T2 = (2π)/(2ω) = (1/2)·(2π)/ω = (1/2)·T1, and T3 = (2π)/(3ω) = (1/3)·(2π)/ω = (1/3)·T1, and so on. However, it’s easy to see that these functions also repeat themselves after two, three, etc periods respectively. So all is alright, and the general idea behind the Fourier analysis is further illustrated below. [Note that both the formula as well as the illustration below (which I took from Feynman’s Lectures) add a ‘zero-frequency term’ a0 to the series. That zero-frequency term will usually be zero for a musical tone, because the ‘zero’ level of our tone will be zero indeed. Also note that the an and bn coefficients are, of course, equal to an = cos Φand b= –sinΦn, so you can relate the illustration and the formula easily.]

Fourier 2You’ll say: What the heck! Why do we need the mathematical gymnastics here? It’s just to understand that other characteristic of a musical tone: its quality (as opposed to its pitch). A so-called rich tone will have strong harmonics, while a pure tone will only have the first harmonic. All other characteristics – the difference between a tone produced by a violin as opposed to a piano – are then related to the ‘mix’ of all those harmonics.

So we have it all now, except for loudness which is, of course, related to the magnitude of the air pressure changes as our waveform moves through the air: pitch, loudness and quality. that’s what makes a musical tone. 🙂

Dissonance

As mentioned above, if the sounds are not consonant, they’re dissonant. But what is dissonance really? What’s going on? The answer is the following: when two frequencies are near to a simple fraction, but not exact, we get so-called beats, which our ear does not like.

Huh? Relax. The illustration below, which I copied from the Wikipedia article on piano tuning, illustrates the phenomenon. The blue wave is the sum of the red and the green wave, which are originally identical. But then the frequency of the green wave is increased, and so the two waves are no longer in phase, and the interference results in a beating pattern. Of course, our musical tone involves different frequencies and, hence, different periods T1,T2, Tetcetera, but you get the idea: the higher harmonics also oscillate with period T1, and if the frequencies are not in some exact ratio, then we’ll have a similar problem: beats, and our ear will not like the sound.

220px-WaveInterference

Of course, you’ll wonder: why don’t we like beats in tones? We can ask that, can’t we? It’s like asking why we like music, isn’t it? […] Well… It is and it isn’t. It’s like asking why our ear (or our brain) likes harmonics. We don’t know. That’s how we are wired. The ‘physical’ explanation of what is musical and what isn’t only goes so far, I guess. 😦

Pythagoras versus Bach

From all of what I wrote above, it is obvious that the frequencies of the harmonics of a musical tone are, indeed, related by simple ratios of small integers: the frequencies of the first and second harmonics are in a simple 2 to 1 ratio (2:1); the second and third harmonics have a 3:2 frequency ratio; the third and fourth harmonics a 4:3 ratio; the fifth and fourth harmonic 5:4, etcetera. That’s it. Nothing more, nothing less.

In other words, Pythagoras was observing musical tones: he could not observe the pure tones behind, i.e. the actual notesHowever, aesthetics led Pythagoras, and all musicians after him – until the mid-18th century – to also think that the ratio of the frequencies of the notes within an octave should also be simple ratios. From what I explained above, it’s obvious that it should not work that way: the ratio of the frequencies of two notes separated by n half-steps is 2n/12, and, for most values of n, 2n/12 is not some simple ratio. [Why? Just take your pocket calculator and calculate the value of 21/12: it’s 20.08333… = 1.0594630943… and so on… It’s an irrational number: there are no repeating decimals. Now, 2n/12 is equal to 21/12·21/12·…·21/12 (n times). Why would you expect that product to be equal to some simple ratio?]

So – I said it already – Pythagoras was wrong—not only in this but also in other regards, such as when he espoused his views on the solar system, for example. Again, I am sorry to have to say that, but it is what is: the Pythagoreans did seem to prefer mathematical ideas over physical experiment. 🙂 Having said that, musicians obviously didn’t know about any alternative to Pythagoras, and they had surely never heard about logarithmic scales at the time. So… Well… They did use the so-called Pythagorean tuning system. To be precise, they tuned their instruments by equating the frequency ratio between the first and the fifth tone in the C scale (i.e. the C and G, as they did not include the C#, D# and F# semitones when counting) with the ratio 3/2, and then they used other so-called harmonic ratios for the notes in-between.

Now, the 3/2 ratio is actually almost correct, because the actual frequency ratio is 27/12 (we have seven tones, including the semitones—not five!), and so that’s 1.4983, approximately. Now, that’s pretty close to 3/2 = 1.5, I’d say. 🙂 Using that approximation (which, I admit, is fairly accurate indeed), the tuning of the other strings would then also be done assuming certain ratios should be respected, like the ones below.

Capture

So it was all quite good. Having said that, good musicians, and some great mathematicians, felt something was wrong—if only because there were several so-called just intonation systems around (for an overview, check out the Wikipedia article on just intonation). More importantly, they felt it was quite difficult to transpose music using the Pythagorean tuning system. Transposing music amounts to changing the so-called key of a musical piece: what one does, basically, is moving the whole piece up or down in pitch by some constant interval that is not equal to an octave. Today, transposing music is a piece of cake—Western music at least. But that’s only because all Western music is played on instruments that are tuned using that logarithmic scale (technically, it’s referred to as the 12-tone equal temperament (12-TET) system). When you’d use one of the Pythagorean systems for tuning, a transposed piece does not sound quite right. 

The first mathematician who really seemed to know what was wrong (and, hence, who also knew what to do) was Simon Stevin, who wrote a manuscript based on the ’12th root of 2 principle’ around AD 1600. It shouldn’t surprise us: the thinking of this mathematician from Bruges would inspire John Napier’s work on logarithms. Unfortunately, while that manuscript describes the basic principles behind the 12-TET system, it didn’t get published (Stevin had to run away from Bruges, to Holland, because he was protestant and the Spanish rulers at the time didn’t like that). Hence, musicians, while not quite understanding the math (or the physics, I should say) behind their own music, kept trying other tuning systems, as they felt it made their music sound better indeed.

One of these ‘other systems’ is the so-called ‘good’ temperament, which you surely heard about, as it’s referred to in Bach’s famous composition, Das Wohltemperierte Klavier, which he finalized in the first half of the 18th century. What is that ‘good’ temperament really? Well… It is what it is: it’s one of those tuning systems which made musicians feel better about their music for a number of reasons, all of which are well described in the Wikipedia article on it. But the main reason is that the tuning system that Bach recommended was a great deal better when it came to playing the same piece in another key. However, it still wasn’t quite right, as it wasn’t the equal temperament system (i.e. the 12-TET system) that’s in place now (in the West at least—the Indian music scale, for instance, is still based on simple ratios).

Why do I mention this piece of Bach? The reason is simple: you probably heard of it because it’s one of the main reference points in a rather famous book: Gödel, Escher and Bach—an Eternal Golden Braid. If not, then just forget about it. I am mentioning it because one of my brothers loves it. It’s on artificial intelligence. I haven’t read it, but I must assume Bach’s master piece is analyzed there because of its structure, not because of the tuning system that one’s supposed to use when playing it. So… Well… I’d say: don’t make that composition any more mystic than it already is. 🙂 The ‘magic’ behind it is related to what I said about A4 being the ‘reference point’ in music: since we’re using a universal logarithmic scale now, there’s no such thing as an absolute reference point any more: once we define our musical ‘unit’ (so that’s the so-called octave in Western music), and also define how many steps we want to have in-between (so that’s 12—in Western music, that is), we get all the rest. That’s just how logarithms work.

So, in short, music is all about structure, i.e. it’s all about mathematical relations, and about mathematical relations only. Again, Pythagoras’ conclusions were wrong, but his intuition was right. And, of course, it’s his intuition that gave birth to science: the simple ‘models’ he made – of how notes are supposed to be related to each other, or about our solar system – were, obviously, just the start of it all. And what a great start it was! Looking back once again, it’s rather sad conservative forces (such as the Church) often got in the way of progress. In fact, I suddenly wonder: if scientists would not have been bothered by those conservative forces, could mankind have sent people around the time that Charles V was born, i.e. around A.D. 1500 already? 🙂

Post scriptum: My example of the the (lower) E and A guitar strings co-vibrating when playing the A major chord striking the upper four strings only, is somewhat tricky. The (lower) E and A strings are associated with lower pitches, and we said overtones (i.e. the second, third, fourth, etc harmonics) are multiples of the fundamental frequency. So why is that the lower strings co-vibrate? The answer is easy: they oscillate at the higher frequencies only. If you have a guitar: just try it. The two strings you do not pluck do vibrate—and very visibly so, but the low fundamental frequencies that come out of them when you’d strike them, are not audible. In short, they resonate at the higher frequencies only. 🙂

The example that Feynman gives is much more straightforward: his example mentions the lower C (or A, B, etc) notes on a piano causing vibrations in the higher C strings (or the higher A, B, etc string respectively). For example, striking the C2 key (and, hence, the C2 string inside the piano) will make the (higher) C3 string vibrate too. But few of us have a grand piano at home, I guess. That’s why I prefer my guitar example. 🙂

Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

The Uncertainty Principle revisited

Pre-script (dated 26 June 2020): This post has become less relevant (even irrelevant, perhaps) because my views on all things quantum-mechanical have evolved significantly as a result of my progression towards a more complete realist (classical) interpretation of quantum physics. I keep blog posts like these mainly because I want to keep track of where I came from. I might review them one day, but I currently don’t have the time or energy for it. 🙂

Original post:

I’ve written a few posts on the Uncertainty Principle already. See, for example, my post on the energy-time expression for it (ΔE·Δt ≥ h). So why am I coming back to it once more? Not sure. I felt I left some stuff out. So I am writing this post to just complement what I wrote before. I’ll do so by explaining, and commenting on, the ‘semi-formal’ derivation of the so-called Kennard formulation of the Principle in the Wikipedia article on it.

The Kennard inequalities, σxσp ≥ ħ/2 and σEσt ≥ ħ/2, are more accurate than the more general Δx·Δp ≥ h and ΔE·Δt ≥ h expressions one often sees, which are an early formulation of the Principle by Niels Bohr, and which Heisenberg himself used when explaining the Principle in a thought experiment picturing a gamma-ray microscope. I presented Heisenberg’s thought experiment in another post, and so I won’t repeat myself here. I just want to mention that it ‘proves’ the Uncertainty Principle using the Planck-Einstein relations for the energy and momentum of a photon:

E = hf and p = h/λ

Heisenberg’s thought experiment is not a real proof, of course. But then what’s a real proof? The mentioned ‘semi-formal’ derivation looks more impressive, because more mathematical, but it’s not a ‘proof’ either (I hope you’ll understand why I am saying that after reading my post). The main difference between Heisenberg’s thought experiment and the mathematical derivation in the mentioned Wikipedia article is that the ‘mathematical’ approach is based on the de Broglie relation. That de Broglie relation looks the same as the Planck-Einstein relation (p = h/λ) but it’s fundamentally different.

Indeed, the momentum of a photon (i.e. the p we use in the Planck-Einstein relation) is not the momentum one associates with a proper particle, such as an electron or a proton, for example (so that’s the p we use in the de Broglie relation). The momentum of a particle is defined as the product of its mass (m) and velocity (v). Photons don’t have a (rest) mass, and their velocity is absolute (c), so how do we define momentum for a photon? There are a couple of ways to go about it, but the two most obvious ones are probably the following:

  1. We can use the classical theory of electromagnetic radiation and show that the momentum of a photon is related to the magnetic field (we usually only analyze the electric field), and the so-called radiation pressure that results from it. It yields the p = E/c formula which we need to go from E = hf to p = h/λ, using the ubiquitous relation between the frequency, the wavelength and the wave velocity (c = λf). In case you’re interested in the detail, just click on the radiation pressure link).
  2. We can also use the mass-energy equivalence E = mc2. Hence, the equivalent mass of the photon is E/c2, which is relativistic mass only. However, we can multiply that mass with the photon’s velocity, which is c, thereby getting the very same value for its momentum p = E/c= E/c.

So Heisenberg’s ‘proof’ uses the Planck-Einstein relations, as it analyzes the Uncertainty Principle more as an observer effect: probing matter with light, so to say. In contrast, the mentioned derivation takes the de Broglie relation itself as the point of departure. As mentioned, the de Broglie relations look exactly the same as the Planck-Einstein relationship (E = hf and p = h/λ) but the model behind is very different. In fact, that’s what the Uncertainty Principle is all about: it says that the de Broglie frequency and/or wavelength cannot be determined exactly: if we want to localize a particle, somewhat at least, we’ll be dealing with a frequency range Δf. As such, the de Broglie relation is actually somewhat misleading at first. Let’s talk about the model behind.

A particle, like an electron or a proton, traveling through space, is described by a complex-valued wavefunction, usually denoted by the Greek letter psi (Ψ) or phi (Φ). This wavefunction has a phase, usually denoted as θ (theta) which – because we assume the wavefunction is a nice periodic function – varies as a function of time and space. To be precise, we write θ as θ = ωt – kx or, if the wave is traveling in the other direction, as θ = kx – ωt.

I’ve explained this in a couple of posts already, including my previous post, so I won’t repeat myself here. Let me just note that ω is the angular frequency, which we express in radians per second, rather than cycles per second, so ω = 2π(one cycle covers 2π rad). As for k, that’s the wavenumber, which is often described as the spatial frequency, because it’s expressed in cycles per meter or, more often (and surely in this case), in radians per meter. Hence, if we freeze time, this number is the rate of change of the phase in space. Because one cycle is, again, 2π rad, and one cycle corresponds to the wave traveling one wavelength (i.e. λ meter), it’s easy to see that k = 2π/λ. We can use these definitions to re-write the de Broglie relations E = hf and p = h/λ as:

E = ħω and p = ħk with h = h/2π

What about the wave velocity? For a photon, we have c = λf and, hence, c = (2π/k)(ω/2π) = ω/k. For ‘particle waves’ (or matter waves, if you prefer that term), it’s much more complicated, because we need to distinguish between the so-called phase velocity (vp) and the group velocity (vg). The phase velocity is what we’re used to: it’s the product of the frequency (the number of cycles per second) and the wavelength (the distance traveled by the wave over one cycle), or the ratio of the angular frequency and the wavenumber, so we have, once again, λf = ω/k = vp. However, this phase velocity is not the classical velocity of the particle that we are looking at. That’s the so-called group velocity, which corresponds to the velocity of the wave packet representing the particle (or ‘wavicle’, if your prefer that term), as illustrated below.

Wave_packet_(dispersion)

The animation below illustrates the difference between the phase and the group velocity even more clearly: the green dot travels with the ‘wavicles’, while the red dot travels with the phase. As mentioned above, the group velocity corresponds to the classical velocity of the particle (v). However, the phase velocity is a mathematical point that actually travels faster than light. It is a mathematical point only, which does not carry a signal (unlike the modulation of the wave itself, i.e. the traveling ‘groups’) and, hence, it does not contradict the fundamental principle of relativity theory: the speed of light is absolute, and nothing travels faster than light (except mathematical points, as you can, hopefully, appreciate now).

Wave_group (1)

The two animations above do not represent the quantum-mechanical wavefunction, because the functions that are shown are real-valued, not complex-valued. To imagine a complex-valued wave, you should think of something like the ‘wavicle’ below or, if you prefer animations, the standing waves underneath (i.e. C to H: A and B just present the mathematical model behind, which is that of a mechanical oscillator, like a mass on a spring indeed). These representations clearly show the real as well as the imaginary part of complex-valued wave-functions.

Photon wave

QuantumHarmonicOscillatorAnimation

With this general introduction, we are now ready for the more formal treatment that follows. So our wavefunction Ψ is a complex-valued function in space and time. A very general shape for it is one we used in a couple of posts already:

Ψ(x, t) ∝ ei(kx – ωt) = cos(kx – ωt) + isin(kx – ωt)

If you don’t know anything about complex numbers, I’d suggest you read my short crash course on it in the essentials page of this blog, because I don’t have the space nor the time to repeat all of that. Now, we can use the de Broglie relationship relating the momentum of a particle with a wavenumber (p = ħk) to re-write our psi function as:

Ψ(x, t) ∝ ei(kx – ωt) = ei(px/ħ – ωt) 

Note that I am using the ‘proportional to’ symbol (∝) because I don’t worry about normalization right now. Indeed, from all of my other posts on this topic, you know that we have to take the absolute square of all these probability amplitudes to arrive at a probability density function, describing the probability of the particle effectively being at point x in space at point t in time, and that all those probabilities, over the function’s domain, have to add up to 1. So we should insert some normalization factor.

Having said that, the problem with the wavefunction above is not normalization really, but the fact that it yields a uniform probability density function. In other words, the particle position is extremely uncertain in the sense that it could be anywhere. Let’s calculate it using a little trick: the absolute square of a complex number equals the product of itself with its (complex) conjugate. Hence, if z = reiθ, then │z│2 = zz* = reiθ·reiθ = r2eiθiθ = r2e0 = r2. Now, in this case, assuming unique values for k, ω, p, which we’ll note as k0, ω0, p0 (and, because we’re freezing time, we can also write t = t0), we should write:

│Ψ(x)│2 = │a0ei(p0x/ħ – ω0t02 = │a0eip0x/ħ eiω0t0 2 = │a0eip0x/ħ 2 │eiω0t0 2 = a02 

Note that, this time around, I did insert some normalization constant a0 as well, so that’s OK. But so the problem is that this very general shape of the wavefunction gives us a constant as the probability for the particle being somewhere between some point a and another point b in space. More formally, we get the surface for a rectangle when we calculate the probability P[a ≤ X ≤ b] as we should calculate it, which is as follows:

integral

More specifically, because we’re talking one-dimensional space here, we get P[a ≤ X ≤ b] = (b–a)·a02. Now, you may think that such uniform probability makes sense. For example, an electron may be in some orbital around a nucleus, and so you may think that all ‘points’ on the orbital (or within the ‘sphere’, or whatever volume it is) may be equally likely. Or, in another example, we may know an electron is going through some slit and, hence, we may think that all points in that slit should be equally likely positions. However, we know that it is not the case. Measurements show that not all points are equally likely. For an orbital, we get complicated patterns, such as the one shown below, and please note that the different colors represent different complex numbers and, hence, different probabilities.

Hydrogen_eigenstate_n5_l2_m1

Also, we know that electrons going through a slit will produce an interference pattern—even if they go through it one by one! Hence, we cannot associate some flat line with them: it has to be a proper wavefunction which implies, once again, that we can’t accept a uniform distribution.

In short, uniform probability density functions are not what we see in Nature. They’re non-uniform, like the (very simple) non-uniform distributions shown below. [The left-hand side shows the wavefunction, while the right-hand side shows the associated probability density function: the first two are static (i.e. they do not vary in time), while the third one shows a probability distribution that does vary with time.]

StationaryStatesAnimation

I should also note that, even if you would dare to think that a uniform distribution might be acceptable in some cases (which, let me emphasize this, it is not), an electron can surely not be ‘anywhere’. Indeed, the normalization condition implies that, if we’d have a uniform distribution and if we’d consider all of space, i.e. if we let a go to –∞ and b to +∞, then a0would tend to zero, which means we’d have a particle that is, literally, everywhere and nowhere at the same time.

In short, a uniform probability distribution does not make sense: we’ll generally have some idea of where the particle is most likely to be, within some range at least. I hope I made myself clear here.

Now, before I continue, I should make some other point as well. You know that the Planck constant (h or ħ) is unimaginably small: about 1×10−34 J·s (joule-second). In fact, I’ve repeatedly made that point in various posts. However, having said that, I should add that, while it’s unimaginably small, the uncertainties involved are quite significant. Let us indeed look at the value of ħ by relating it to that σxσp ≥ ħ/2 relation.

Let’s first look at the units. The uncertainty in the position should obviously be expressed in distance units, while momentum is expressed in kg·m/s units. So that works out, because 1 joule is the energy transferred (or work done) when applying a force of 1 newton (N) over a distance of 1 meter (m). In turn, one newton is the force needed to accelerate a mass of one kg at the rate of 1 meter per second per second (this is not a typing mistake: it’s an acceleration of 1 m/s per second, so the unit is m/s2: meter per second squared). Hence, 1 J·s = 1 N·m·s = 1 kg·m/s2·m·s = kg·m2/s. Now, that’s the same dimension as the ‘dimensional product’ for momentum and distance: m·kg·m/s = kg·m2/s.

Now, these units (kg, m and s) are all rather astronomical at the atomic scale and, hence, h and ħ are usually expressed in other dimensions, notably eV·s (electronvolt-second). However, using the standard SI units gives us a better idea of what we’re talking about. If we split the ħ = 1×10−34 J·s value (let’s forget about the 1/2 factor for now) ‘evenly’ over σx and σp – whatever that means: all depends on the units, of course!  – then both factors will have magnitudes of the order of 1×10−17: 1×10−17 m times 1×10−17 kg·m/s gives us 1×10−34 J·s.

You may wonder how this 1×10−17 m compares to, let’s say, the classical electron radius, for example. The classical electron radius is, roughly speaking, the ‘space’ an electron seems to occupy as it scatters incoming light. The idea is illustrated below (credit for the image goes to Wikipedia, as usual). The classical electron radius – or Thompson scattering length – is about 2.818×10−15 m, so that’s almost 300 times our ‘uncertainty’ (1×10−17 m). Not bad: it means that we can effectively relate our ‘uncertainty’ in regard to the position to some actual dimension in space. In this case, we’re talking the femtometer scale (1 fm = 10−15 m), and so you’ve surely heard of this before.

Thomson_scattering_geometry

What about the other ‘uncertainty’, the one for the momentum (1×10−17 kg·m/s)? What’s the typical (linear) momentum of an electron? Its mass, expressed in kg, is about 9.1×10−31 kg. We also know its relative velocity in an electron: it’s that magical number α = v/c, about which I wrote in some other posts already, so v = αc ≈ 0.0073·3×10m/s ≈ 2.2×10m/s. Now, 9.1×10−31 kg times 2.2×10m/s is about 2×10–26 kg·m/s, so our proposed ‘uncertainty’ in regard to the momentum (1×10−17 kg·m/s) is half a billion times larger than the typical value for it. Now that is, obviously, not so good. [Note that calculations like this are extremely rough. In fact, when one talks electron momentum, it’s usual angular momentum, which is ‘analogous’ to linear momentum, but angular momentum involves very different formulas. If you want to know more about this, check my post on it.]

Of course, now you may feel that we didn’t ‘split’ the uncertainty in a way that makes sense: those –17 exponents don’t work, obviously. So let’s take 1×10–26 kg·m/s for σp, which is half of that ‘typical’ value we calculated. Then we’d have 1×10−8 m for σx (1×10−8 m times 1×10–26 kg·m/s is, once again, 1×10–34 J·s). But then that uncertainty suddenly becomes a huge number: 1×10−8 m is 100 angstrom. That’s not the atomic scale but the molecular scale! So it’s huge as compared to the pico- or femto-meter scale (1 pm = 1×10−12 m, 1 fm = 1×10−15 m) which we’d sort of expect to see when we’re talking electrons.

OK. Let me get back to the lesson. Why this digression? Not sure. I think I just wanted to show that the Uncertainty Principle involves ‘uncertainties’ that are extremely relevant: despite the unimaginable smallness of the Planck constant, these uncertainties are quite significant at the atomic scale. But back to the ‘proof’ of Kennard’s formulation. Here we need to discuss the ‘model’ we’re using. The rather simple animation below (again, credit for it has to go to Wikipedia) illustrates it wonderfully.

Sequential_superposition_of_plane_waves

Look at it carefully: we start with a ‘wave packet’ that looks a bit like a normal distribution, but it isn’t, of course. We have negative and positive values, and normal distributions don’t have that. So it’s a wave alright. Of course, you should, once more, remember that we’re only seeing one part of the complex-valued wave here (the real or imaginary part—it could be either). But so then we’re superimposing waves on it. Note the increasing frequency of these waves, and also note how the wave packet becomes increasingly localized with the addition of these waves. In fact, the so-called Fourier analysis, of which you’ve surely heard before, is a mathematical operation that does the reverse: it separates a wave packet into its individual component waves.

So now we know the ‘trick’ for reducing the uncertainty in regard to the position: we just add waves with different frequencies. Of course, different frequencies imply different wavenumbers and, through the de Broglie relationship, we’ll also have different values for the ‘momentum’ associated with these component waves. Let’s write these various values as kn, ωn, and pn respectively, with n going from 0 to N. Of course, our point in time remains frozen at t0. So we get a wavefunction that’s, quite simply, the sum of N component waves and so we write:

Ψ(x) = ∑ anei(pnx/ħ – ωnt0= ∑ an  eipnx/ħeiωnt= ∑ Aneipnx/ħ

Note that, because of the eiωnt0, we now have complex-valued coefficients An = aneiωnt0 in front. More formally, we say that An represents the relative contribution of the mode pn to the overall Ψ(x) wave. Hence, we can write these coefficients A as a function of p. Because Greek letters always make more of an impression, we’ll use the Greek letter Φ (phi) for it. 🙂 Now, we can go to the continuum limit and, hence, transform that sum above into an infinite sum, i.e. an integral. So our wave function then becomes an integral over all possible modes, which we write as:

wave function integral

Don’t worry about that new 1/√2πħ factor in front. That’s, once again, something that has to do with normalization and scales. It’s the integral itself you need to understand. We’ve got that Φ(p) function there, which is nothing but our An coefficient, but for the continuum case. In fact, these relative contributions Φ(p) are now referred to as the amplitude of all modes p, and so Φ(p) is actually another wave function: it’s the wave function in the so-called momentum space.

You’ll probably be very confused now, and wonder where I want to go with an integral like this. The point to note is simple: if we have that Φ(p) function, we can calculate (or derive, if you prefer that word) the Ψ(x) from it using that integral above. Indeed, the integral above is referred to as the Fourier transform, and it’s obviously closely related to that Fourier analysis we introduced above.

Of course, there is also an inverse transform, which looks exactly the same: it just switches the wave functions (Ψ and Φ) and variables (x and p), and then (it’s an important detail!), it has a minus sign in the exponent. Together, the two functions –  as defined by each other through these two integrals – form a so-called Fourier integral pair, also known as a Fourier transform pair, and the variables involved are referred to as conjugate variables. So momentum (p) and position (x) are conjugate variables and, likewise, energy and time are also conjugate variables (but so I won’t expand on the time-energy relation here: please have a look at one of my others posts on that).

Now, I thought of copying and explaining the proof of Kennard’s inequality from Wikipedia’s article on the Uncertainty Principle (you need to click on the show button in the relevant section to see it), but then that’s pretty boring math, and simply copying stuff is not my objective with this blog. More importantly, the proof has nothing to do with physics. Nothing at all. Indeed, it just proves a general mathematical property of Fourier pairs. More specifically, it proves that, the more concentrated one function is, the more spread out its Fourier transform must be. In other words, it is not possible to arbitrarily concentrate both a function and its Fourier transform.

So, in this case, if we’d ‘squeeze’ Ψ(x), then its Fourier transform Φ(p) will ‘stretch out’, and so that’s what the proof in that Wikipedia article basically shows. In other words, there is some ‘trade-off’ between the ‘compaction’ of Ψ(x), on the one hand, and Φ(p), on the other, and so that is what the Uncertainty Principle is all about. Nothing more, nothing less.

But… Yes? What’s all this talk about ‘squeezing’ and ‘compaction’? We can’t change reality, can we? Well… Here we’re entering the philosophical field, of course. How do we interpret the Uncertainty Principle? It surely does look like us trying to measure something has some impact on the wavefunction. In fact, usually, our measurement – of either position or momentum – usually makes the wavefunctions collapse: we suddenly know where the particle is and, hence, ψ(x) seems to collapse into one point. Alternatively, we measure its momentum and, hence, Φ(p) collapses.

That’s intriguing. In fact, even more intriguing is the possibility we may only partially affect those wavefunctions with measurements that are somewhat less ‘drastic’. It seems a lot of research is focused on that (just Google for partial collapse of the wavefunction, and you’ll finds tons of references, including presentations like this one).

Hmm… I need to further study the topic. The decomposition of a wave into its component waves is obviously something that works well in physics—and not only in quantum mechanics but also in much more mundane examples. Its most general application is signal processing, in which we decompose a signal (which is a function of time) into the frequencies that make it up. Hence, our wavefunction model makes a lot of sense, as it mirrors the physics involved in oscillators and harmonics obviously.

Still… I feel it doesn’t answer the fundamental question: what is our electron really? What do those wave packets represent? Physicists will say questions like this don’t matter: as long as our mathematical models ‘work’, it’s fine. In fact, if even Feynman said that nobody – including himself – truly understands quantum mechanics, then I should just be happy and move on. However, for some reason, I can’t quite accept that. I should probably focus some more on that de Broglie relationship, p = h/λ, as it’s obviously as fundamental to my understanding of the ‘model’ of reality in physics as that Fourier analysis of the wave packet. So I need to do some more thinking on that.

The de Broglie relationship is not intuitive. In fact, I am not ashamed to admit that it actually took me quite some time to understand why we can’t just re-write the de Broglie relationship (λ = h/p) as an uncertainty relation itself: Δλ = h/Δp. Hence, let me be very clear on this:

Δx = h/Δp (that’s the Uncertainty Principle) but Δλ ≠ h/Δp !

Let me quickly explain why.

If the Δ symbol expresses a standard deviation (or some other measurement of uncertainty), we can write the following:

p = h/λ ⇒ Δp = Δ(h/λ) = hΔ(1/λ) ≠ h/Δp

So I can take h out of the brackets after the Δ symbol, because that’s one of the things that’s allowed when working with standard deviations. More in particular, one can prove the following:

  1. The standard deviation of some constant function is 0: Δ(k) = 0
  2. The standard deviation is invariant under changes of location: Δ(x + k) = Δ(x + k)
  3. Finally, the standard deviation scales directly with the scale of the variable: Δ(kx) = |k |Δ(x).

However, it is not the case that Δ(1/x) = 1/Δx. However, let’s not focus on what we cannot do with Δx: let’s see what we can do with it. Δx equals h/Δp according to the Uncertainty Principle—if we take it as an equality, rather than as an inequality, that is. And then we have the de Broglie relationship: p = h/λ. Hence, Δx must equal:

Δx = h/Δp = h/[Δ(h/λ)] =h/[hΔ(1/λ)] = 1/Δ(1/λ)

That’s obvious, but so what? As mentioned, we cannot write Δx = Δλ, because there’s no rule that says that Δ(1/λ) = 1/Δλ and, therefore, h/Δp ≠ Δλ. However, what we can do is define Δλ as an interval, or a length, defined by the difference between its lower and upper bound (let’s denote those two values by λa and λb respectively. Hence, we write Δλ = λb – λa. Note that this does not assume we have a continuous range of values for λ: we can have any number of frequencies λbetween λa and λb, but so you see the point: we’ve got a range of values λ, discrete or continuous, defined by some lower and upper bound.

Now, the de Broglie relation associates two values pa and pb with λa and λb respectively:  pa = h/λa and pb = h/λb. Hence, we can similarly define the corresponding Δp interval as pa – pb, with pa = h/λa and p= h/λb. Note that, because we’re taking the reciprocal, we have to reverse the order of the values here: if λb > λa, then pa = h/λa > p= h/λb. Hence, we can write Δp = Δ(h/λ) = pa – pb = h/λ1 – h/λ= h(1/λ1 – 1/λ2) = h[λ2 – λ1]/λ1λ2. In case you have a bit of difficulty, just draw some reciprocal functions (like the ones below), and have fun connecting intervals on the horizontal axis with intervals on the vertical axis using these functions.

graph

Now, h[λ2 – λ1]/λ1λ2) is obviously something very different than h/Δλ = h/(λ2 – λ1). So we can surely not equate the two and, hence, we cannot write that Δp = h/Δλ.

Having said that, the Δx = 1/Δ(1/λ) = λ1λ2/(λ2 – λ1) that emerges here is quite interesting. We’ve got a ratio here, λ1λ2/(λ2 – λ1, which shows that Δx depends only on the upper and lower bounds of the Δλ range. It does not depend on whether or not the interval is discrete or continuous.

The second thing that is interesting to note is Δx depends not only on the difference between those two values (i.e. the length of the interval) but also on their value: if the length of the interval, i.e. the difference between the two frequencies is the same, but their values as such are higher, then we get a higher value for Δx, i.e. a greater uncertainty in the position. Again, this shows that the relation between Δλ and Δx is not straightforward. But so we knew that already, and so I’ll end this post right here and right now. 🙂    

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

The Strange Theory of Light and Matter (II)

If we limit our attention to the interaction between light and matter (i.e. the behavior of photons and electrons only—so we we’re not talking quarks and gluons here), then the ‘crazy ideas’ of quantum mechanics can be summarized as follows:

  1. At the atomic or sub-atomic scale, we can no longer look at light as an electromagnetic wave. It consists of photons, and photons come in blobs. Hence, to some extent, photons are ‘particle-like’.
  2. At the atomic or sub-atomic scale, electrons don’t behave like particles. For example, if we send them through a slit that’s small enough, we’ll observe a diffraction pattern. Hence, to some extent, electrons are ‘wave-like’.

In short, photons aren’t waves, but they aren’t particles either. Likewise, electrons aren’t particles, but they aren’t waves either. They are neither. The weirdest thing of all, perhaps, is that, while light and matter are two very different things in our daily experience – light and matter are opposite concepts, I’d say, just like particles and waves are opposite concepts) – they look pretty much the same in quantum physics: they are both represented by a wavefunction.

Let me immediately make a little note on terminology here. The term ‘wavefunction’ is a bit ambiguous, in my view, because it makes one think of a real wave, like a water wave, or an electromagnetic wave. Real waves are described by real-valued wave functions describing, for example, the motion of a ball on a spring, or the displacement of a gas (e.g. air) as a sound wave propagates through it, or – in the case of an electromagnetic wave – the strength of the electric and magnetic field.

You may have questions about the ‘reality’ of fields, but electromagnetic waves – i.e. the classical description of light – are quite ‘real’ too, even if:

  1. Light doesn’t travel in a medium (like water or air: there is no aether), and
  2. The magnitude of the electric and magnetic field (they are usually denoted by E and B) depend on your reference frame: if you calculate the fields using a moving coordinate system, you will get a different mixture of E and B. Therefore, E and B may not feel very ‘real’ when you look at them separately, but they are very real when we think of them as representing one physical phenomenon: the electromagnetic interaction between particles. So the E and B mix is, indeed, a dual representation of one reality. I won’t dwell on that, as I’ve done that in another post of mine.

How ‘real’ is the quantum-mechanical wavefunction?

The quantum-mechanical wavefunction is not like any of these real waves. In fact, I’d rather use the term ‘probability wave’ but, apparently, that’s used only by bloggers like me 🙂 and so it’s not very scientific. That’s for a good reason, because it’s not quite accurate either: the wavefunction in quantum mechanics represents probability amplitudes, not probabilities. So we should, perhaps, be consistent and term it a ‘probability amplitude wave’ – but then that’s too cumbersome obviously, so the term ‘probability wave’ may be confusing, but it’s not so bad, I think.

Amplitudes and probabilities are related as follows:

  1. Probabilities are real numbers between 0 and 1: they represent the probability of something happening, e.g. a photon moves from point A to B, or a photon is absorbed (and emitted) by an electron (i.e. a ‘junction’ or ‘coupling’, as you know).
  2. Amplitudes are complex numbers, or ‘arrows’ as Feynman calls them: they have a length (or magnitude) and a direction.
  3. We get the probabilities by taking the (absolute) square of the amplitudes.

So photons aren’t waves, but they aren’t particles either. Likewise, electrons aren’t particles, but they aren’t waves either. They are neither. So what are they? We don’t have words to describe what they are. Some use the term ‘wavicle’ but that doesn’t answer the question, because who knows what a ‘wavicle’ is? So we don’t know what they are. But we do know how they behave. As Feynman puts it, when comparing the behavior of light and then of electrons in the double-slit experiment—struggling to find language to describe what’s going on: “There is one lucky break: electrons behave just like light.”

He says so because of that wave function: the mathematical formalism is the same, for photons and for electrons. Exactly the same? […] But that’s such a weird thing to say, isn’t it? We can’t help thinking of light as waves, and of electrons as particles. They can’t be the same. They’re different, aren’t they? They are.

Scales and senses

To some extent, the weirdness can be explained because the scale of our world is not atomic or sub-atomic. Therefore, we ‘see’ things differently. Let me say a few words about the instrument we use to look at the world: our eye.

Our eye is particular. The retina has two types of receptors: the so-called cones are used in bright light, and distinguish color, but when we are in a dark room, the so-called rods become sensitive, and it is believed that they actually can detect a single photon of light. However, neural filters only allow a signal to pass to the brain when at least five photons arrive within less than a tenth of a second. A tenth of a second is, roughly, the averaging time of our eye. So, as Feynman puts it: “If we were evolved a little further so we could see ten times more sensitively, we wouldn’t have this discussion—we would all have seen very dim light of one color as a series of intermittent little flashes of equal intensity.” In other words, the ‘particle-like’ character of light would have been obvious to us.

Let me make a few more remarks here, which you may or may not find useful. The sense of ‘color’ is not something ‘out there’:  colors, like red or brown, are experiences in our eye and our brain. There are ‘pigments’ in the cones (cones are the receptors that work only if the intensity of the light is high enough) and these pigments absorb the light spectrum somewhat differently, as a result of which we ‘see’ color. Different animals see different things. For example, a bee can distinguish between white paper using zinc white versus lead white, because they reflect light differently in the ultraviolet spectrum, which the bee can see but we don’t. Bees can also tell the direction of the sun without seeing the sun itself, because they are sensitive to polarized light, and the scattered light of the sky (i.e. the blue sky as we see it) is polarized. The bee can also notice flicker up to 200 oscillations per second, while we see it only up to 20, because our averaging time is like a tenth of a second, which is short for us, but so the averaging time of the bee is much shorter. So we cannot see the quick leg movements and/or wing vibrations of bees, but the bee can!

Sometimes we can’t see any color. For example, we see the night sky in ‘black and white’ because the light intensity is very low, and so it’s our rods, not the cones, that process the signal, and so these rods can’t ‘see’ color. So those beautiful color pictures of nebulae are not artificial (although the pictures are often enhanced). It’s just that the camera that is used to take those pictures (film or, nowadays, digital) is much more sensitive than our eye. 

Regardless, color is a quality which we add to our experience of the outside world ourselves. What’s out there are electromagnetic waves with this or that wavelength (or, what amounts to the same, this or that frequency). So when critics of the exact sciences say so much is lost when looking at (visible) light as an electromagnetic wave in the range of 430 to 790 teraherz, they’re wrong. Those critics will say that physics reduces reality. That is not the case.

What’s going on is that our senses process the signal that they are receiving, especially when it comes to vision. As Feynman puts it: “None of the other senses involves such a large amount of calculation, so to speak, before the signal gets into a nerve that one can make measurements on. The calculations for all the rest of the senses usually happen in the brain itself, where it is very difficult to get at specific places to make measurements, because there are so many interconnections. Here, with the visual sense, we have the light, three layers of cells making calculations, and the results of the calculations being transmitted through the optic nerve.”

Hence, things like color and all of the other sensations that we have are the object of study of other sciences, including biochemistry and neurobiology, or physiology. For all we know, what’s ‘out there’ is, effectively, just ‘boring’ stuff, like electromagnetic radiation, energy and ‘elementary particles’—whatever they are. No colors. Just frequencies. 🙂

Light versus matter

If we accept the crazy ideas of quantum mechanics, then the what and the how become one and the same. Hence we can say that photons and electrons are a wavefunction somewhere in space. Photons, of course, are always traveling, because they have energy but no rest mass. Hence, all their energy is in the movement: it’s kinetic, not potential. Electrons, on the other hand, usually stick around some nucleus. And, let’s not forget, they have an electric charge, so their energy is not only kinetic but also potential.

But, otherwise, it’s the same type of ‘thing’ in quantum mechanics: a wavefunction, like those below.

QuantumHarmonicOscillatorAnimation

Why diagram A and B? It’s just to emphasize the difference between a real-valued wave function and those ‘probability waves’ we’re looking at here (diagram C to H). A and B represent a mass on a spring, oscillating at more or less the same frequency but a different amplitude. The amplitude here means the displacement of the mass. The function describing the displacement of a mass on a spring (so that’s diagram A and B) is an example of a real-valued wave function: it’s a simple sine or cosine function, as depicted below. [Note that a sine and a cosine are the same function really, except for a phase difference of 90°.]

cos and sine

Let’s now go back to our ‘probability waves’. Photons and electrons, light and matter… The same wavefunction? Really? How can the sunlight that warms us up in the morning and makes trees grow be the same as our body, or the tree? The light-matter duality that we experience must be rooted in very different realities, isn’t it?

Well… Yes and no. If we’re looking at one photon or one electron only, it’s the same type of wavefunction indeed. The same type… OK, you’ll say. So they are the same family or genus perhaps, as they say in biology. Indeed, both of them are, obviously, being referred to as ‘elementary particles’ in the so-called Standard Model of physics. But so what makes an electron and a photon specific as a species? What are the differences?

There’re  quite a few, obviously:

1. First, as mentioned above, a photon is a traveling wave function and, because it has no rest mass, it travels at the ultimate speed, i.e. the speed of light (c). An electron usually sticks around or, if it travels through a wire, it travels at very low speeds. Indeed, you may find it hard to believe, but the drift velocity of the free electrons in a standard copper wire is measured in cm per hour, so that’s very slow indeed—and while the electrons in an electron microscope beam may be accelerated up to 70% of the speed of light, and close to in those huge accelerators, you’re not likely to find an electron microscope or accelerator in Nature. In fact, you may want to remember that a simple thing like electricity going through copper wires in our houses is a relatively modern invention. 🙂

So, yes, those oscillating wave functions in those diagrams above are likely to represent some electron, rather than a photon. To be precise, the wave functions above are examples of standing (or stationary) waves, while a photon is a traveling wave: just extend that sine and cosine function in both directions if you’d want to visualize it or, even better, think of a sine and cosine function in an envelope traveling through space, such as the one depicted below.

Photon wave

Indeed, while the wave function of our photon is traveling through space, it is likely to be limited in space because, when everything is said and done, our photon is not everywhere: it must be somewhere. 

At this point, it’s good to pause and think about what is traveling through space. It’s the oscillation. But what’s the oscillation? There is no medium here, and even if there would be some medium (like water or air or something like aether—which, let me remind you, isn’t there!), the medium itself would not be moving, or – I should be precise here – it would only move up and down as the wave propagates through space, as illustrated below. To be fully complete, I should add we also have longitudinal waves, like sound waves (pressure waves): in that case, the particles oscillate back and forth along the direction of wave propagation. But you get the point: the medium does not travel with the wave.

Simple_harmonic_motion_animation

When talking electromagnetic waves, we have no medium. These E and B vectors oscillate but is very wrong to assume they use ‘some core of nearby space’, as Feynman puts it. They don’t. Those field vectors represent a condition at one specific point (admittedly, a point along the direction of travel) in space but, for all we know, an electromagnetic wave travels in a straight line and, hence, we can’t talk about its diameter or so.

Still, as mentioned above, we can imagine, more or less, what E and B stand for (we can use field line to visualize them, for instance), even if we have to take into account their relativity (calculating their values from a moving reference frame results in different mixtures of E and B). But what are those amplitudes? How should we visualize them?

The honest answer is: we can’t. They are what they are: two mathematical quantities which, taken together, form a two-dimensional vector, which we square to find a value for a real-life probability, which is something that – unlike the amplitude concept – does make sense to us. Still, that representation of a photon above (i.e. the traveling envelope with a sine and cosine inside) may help us to ‘understand’ it somehow. Again, you absolute have to get rid of the idea that these ‘oscillations’ would somehow occupy some physical space. They don’t. The wave itself has some definite length, for sure, but that’s a measurement in the direction of travel, which is often denoted as x when discussing uncertainty in its position, for example—as in the famous Uncertainty Principle (ΔxΔp > h).

You’ll say: Oh!—but then, at the very least, we can talk about the ‘length’ of a photon, can’t we? So then a photon is one-dimensional at least, not zero-dimensional! The answer is yes and no. I’ve talked about this before and so I’ll be short(er) on it now. A photon is emitted by an atom when an electron jumps from one energy level to another. It thereby emits a wave train that lasts about 10–8 seconds. That’s not very long but, taking into account the rather spectacular speed of light (3×10m/s), that still makes for a wave train with a length of not less than 3 meter. […] That’s quite a length, you’ll say. You’re right. But you forget that light travels at the speed of light and, hence, we will see this length as zero because of the relativistic length contraction effect. So… Well… Let me get back to the question: if photons and electrons are both represented by a wavefunction, what makes them different?

2. A more fundamental difference between photons and electrons is how they interact with each other.

From what I’ve written above, you understand that probability amplitudes are complex numbers, or ‘arrows’, or ‘two-dimensional vectors’. [Note that all of these terms have precise mathematical definitions and so they’re actually not the same, but the difference is too subtle to matter here.] Now, there are two ways of combining amplitudes, which are referred to as ‘positive’ and ‘negative’ interference respectively. I should immediately note that there’s actually nothing ‘positive’ or ‘negative’ about the interaction: we’re just putting two arrows together, and there are two ways to do that. That’s all.

The diagrams below show you these two ways. You’ll say: there are four! However, remember that we square an arrow to get a probability. Hence, the direction of the final arrow doesn’t matter when we’re taking the square: we get the same probability. It’s the direction of the individual amplitudes that matters when combining them. So the square of A+B is the same as the square of –(A+B) = –A+(–B) = –AB. Likewise, the square of AB is the same as the square of –(AB) = –A+B.

vector addition

These are the only two logical possibilities for combining arrows. I’ve written ad nauseam about this elsewhere: see my post on amplitudes and statistics, and so I won’t go into too much detail here. Or, in case you’d want something less than a full mathematical treatment, I can refer you to my previous post also, where I talked about the ‘stopwatch’ and the ‘phase’: the convention for the stopwatch is to have its hand turn clockwise (obviously!) while, in quantum physics, the phase of a wave function will turn counterclockwise. But so that’s just convention and it doesn’t matter, because it’s the phase difference between two amplitudes that counts. To use plain language: it’s the difference in the angles of the arrows, and so that difference is just the same if we reverse the direction of both arrows (which is equivalent to putting a minus sign in front of the final arrow).

OK. Let me get back to the lesson. The point is: this logical or mathematical dichotomy distinguishes bosons (i.e. force-carrying ‘particles’, like photons, which carry the electromagnetic force) from fermions (i.e. ‘matter-particles’, such as electrons and quarks, which make up protons and neutrons). Indeed, the so-called ‘positive’ and ‘negative’ interference leads to two very different behaviors:

  1. The probability of getting a boson where there are already present, is n+1 times stronger than it would be if there were none before.
  2. In contrast, the probability of getting two electrons into exactly the same state is zero. 

The behavior of photons makes lasers possible: we can pile zillions of photon on top of each other, and then release all of them in one powerful burst. [The ‘flickering’ of a laser beam is due to the quick succession of such light bursts. If you want to know how it works in detail, check my post on lasers.]

The behavior of electrons is referred to as Fermi’s exclusion principle: it is only because real-life electrons can have one of two spin polarizations (i.e. two opposite directions of angular momentum, which are referred to as ‘up’ or ‘down’, but they might as well have been referred to as ‘left’ or ‘right’) that we find two electrons (instead of just one) in any atomic or molecular orbital.

So, yes, while both photons and electrons can be described by a similar-looking wave function, their behavior is fundamentally different indeed. How is that possible? Adding and subtracting ‘arrows’ is a very similar operation, isn’it?

It is and it isn’t. From a mathematical point of view, I’d say: yes. From a physics point of view, it’s obviously not very ‘similar’, as it does lead to these two very different behaviors: the behavior of photons allows for laser shows, while the behavior of electrons explain (almost) all the peculiarities of the material world, including us walking into doors. 🙂 If you want to check it out for yourself, just check Feynman’s Lectures for more details on this or, else, re-read my posts on it indeed.

3. Of course, there are even more differences between photons and electrons than the two key differences I mentioned above. Indeed, I’ve simplified a lot when I wrote what I wrote above. The wavefunctions of electrons in orbit around a nucleus can take very weird shapes, as shown in the illustration below—and please do google a few others if you’re not convinced. As mentioned above, they’re so-called standing waves, because they occupy a well-defined position in space only, but standing waves can look very weird. In contrast, traveling plane waves, or envelope curves like the one above, are much simpler.

1280px-D_orbitals

In short: yes, the mathematical representation of photons and electrons (i.e. the wavefunction) is very similar, but photons and electrons are very different animals indeed.

Potentiality and interconnectedness

I guess that, by now, you agree that quantum theory is weird but, as you know, quantum theory does explain all of the stuff that couldn’t be explained before: “It works like a charm”, as Feynman puts it. In fact, he’s often quoted as having said the following:

“It is often stated that of all the theories proposed in this century, the silliest is quantum theory. Some say the the only thing that quantum theory has going for it, in fact, is that it is unquestionably correct.”

Silly? Crazy? Uncommon-sensy? Truth be told, you do get used to thinking in terms of amplitudes after a while. And, when you get used to them, those ‘complex’ numbers are no longer complicated. 🙂 Most importantly, when one thinks long and hard enough about it (as I am trying to do), it somehow all starts making sense.

For example, we’ve done away with dualism by adopting a unified mathematical framework, but the distinction between bosons and fermions still stands: an ‘elementary particle’ is either this or that. There are no ‘split personalities’ here. So the dualism just pops up at a different level of description, I’d say. In fact, I’d go one step further and say it pops up at a deeper level of understanding.

But what about the other assumptions in quantum mechanics. Some of them don’t make sense, do they? Well… I struggle for quite a while with the assumption that, in quantum mechanics, anything is possible really. For example, a photon (or an electron) can take any path in space, and it can travel at any speed (including speeds that are lower or higher than light). The probability may be extremely low, but it’s possible.

Now that is a very weird assumption. Why? Well… Think about it. If you enjoy watching soccer, you’ll agree that flying objects (I am talking about the soccer ball here) can have amazing trajectories. Spin, lift, drag, whatever—the result is a weird trajectory, like the one below:

soccer

But, frankly, a photon taking the ‘southern’ route in the illustration below? What are the ‘wheels and gears’ there? There’s nothing sensible about that route, is there?

615px-Three_paths_from_A_to_B

In fact, there’s at least three issues here:

  1. First, you should note that strange curved paths in the real world (such as the trajectories of billiard or soccer balls) are possible only because there’s friction involved—between the felt of the pool table cloth and the ball, or between the balls, or, in the case of soccer, between the ball and the air. There’s no friction in the vacuum. Hence, in empty space, all things should go in a straight line only.
  2. While it’s quite amazing what’s possible, in the real world that is, in terms of ‘weird trajectories’, even the weirdest trajectories of a billiard or soccer ball can be described by a ‘nice’ mathematical function. We obviously can’t say the same of that ‘southern route’ which a photon could follow, in theory that is. Indeed, you’ll agree the function describing that trajectory cannot be ‘nice’. So even we’d allow all kinds of ‘weird’ trajectories, shouldn’t we limit ourselves to ‘nice’ trajectories only? I mean: it doesn’t make sense to allow the photons traveling from your computer screen to your retina take some trajectory to the Sun and back, does it?
  3. Finally, and most fundamentally perhaps, even when we would assume that there’s some mechanism combining (a) internal ‘wheels and gears’ (such as spin or angular momentum) with (b) felt or air or whatever medium to push against, what would be the mechanism determining the choice of the photon in regard to these various paths? In Feynman’s words: How does the photon ‘make up its mind’?

Feynman answers these questions, fully or partially (I’ll let you judge), when discussing the double-slit experiment with photons:

“Saying that a photon goes this or that way is false. I still catch myself saying, “Well, it goes either this way or that way,” but when I say that, I have to keep in mind that I mean in the sense of adding amplitudes: the photon has an amplitude to go one way, and an amplitude to go the other way. If the amplitudes oppose each other, the light won’t get there—even though both holes are open.”

It’s probably worth re-calling the results of that experiment here—if only to help you judge whether or not Feynman fully answer those questions above!

The set-up is shown below. We have a source S, two slits (A and B), and a detector D. The source sends photons out, one by one. In addition, we have two special detectors near the slits, which may or may not detect a photon, depending on whether or not they’re switched on as well as on their accuracy.

set-up photons

First, we close one of the slits, and we find that 1% of the photons goes through the other (so that’s one photon for every 100 photons that leave S). Now, we open both slits to study interference. You know the results already:

  1. If we switch the detectors off (so we have no way of knowing where the photon went), we get interference. The interference pattern depends on the distance between A and B and varies from 0% to 4%, as shown in diagram (a) below. That’s pretty standard. As you know, classical theory can explain that too assuming light is an electromagnetic wave. But so we have blobs of energy – photons – traveling one by one. So it’s really that double-slit experiment with electrons, or whatever other microscopic particles (as you know, they’ve done these interference electrons with large molecules as well—and they get the same result!). We get the interference pattern by using those quantum-mechanical rules to calculate probabilities: we first add the amplitudes, and it’s only when we’re finished adding those amplitudes, that we square the resulting arrow to the final probability.
  2. If we switch those special detectors on, and if they are 100% reliable (i.e. all photons going through are being detected), then our photon suddenly behaves like a particle, instead of as a wave: they will go through one of the slits only, i.e. either through A, or, alternatively, through B. So the two special detectors never go off together. Hence, as Feynman puts it: we shouldn’t think there is “sneaky way that the photon divides in two and then comes back together again.” It’s one or the other way and, and there’s no interference: the detector at D goes off 2% of the time, which is the simple sum of the probabilities for A and B (i.e. 1% + 1%).
  3. When the special detectors near A and B are not 100% reliable (and, hence, do not detect all photons going through), we have three possible final conditions: (i) A and D go off, (ii) B and D go off, and (iii) D goes off alone (none of the special detectors went off). In that case, we have a final curve that’s a mixture, as shown in diagram (c) and (d) below. We get it using the same quantum-mechanical rules: we add amplitudes first, and then we square to get the probabilities.

double-slit photons - results

Now, I think you’ll agree with me that Feynman doesn’t answer my (our) question in regard to the ‘weird paths’. In fact, all of the diagrams he uses assume straight or nearby paths. Let me re-insert two of those diagrams below, to show you what I mean.

 Many arrowsFew arrows

So where are all the strange non-linear paths here? Let me, in order to make sure you get what I am saying here, insert that illustration with the three crazy routes once again. What we’ve got above (Figure 33 and 34) is not like that. Not at all: we’ve got only straight lines there! Why? The answer to that question is easy: the crazy paths don’t matter because their amplitudes cancel each other out, and so that allows Feynman to simplify the whole situation and show all the relevant paths as straight lines only.

615px-Three_paths_from_A_to_B

Now, I struggled with that for quite a while. Not because I can’t see the math or the geometry involved. No. Feynman does a great job showing why those amplitudes cancel each other out indeed (if you want a summary, see my previous post once again).  My ‘problem’ is something else. It’s hard to phrase it, but let me try: why would we even allow for the logical or mathematical possibility of ‘weird paths’ (and let me again insert that stupid diagram below) if our ‘set of rules’ ensures that the truly ‘weird’ paths (like that photon traveling from your computer screen to your eye doing a detour taking it to the Sun and back) cancel each other out anyway? Does that respect Occam’s Razor? Can’t we devise some theory including ‘sensible’ paths only?

Of course, I am just an autodidact with limited time, and I know hundreds (if not thousands) of the best scientists have thought long and hard about this question and, hence, I readily accept the answer is quite simply: no. There is no better theory. I accept that answer, ungrudgingly, not only because I think I am not so smart as those scientists but also because, as I pointed out above, one can’t explain any path that deviates from a straight line really, as there is no medium, so there are no ‘wheels and gears’. The only path that makes sense is the straight line, and that’s only because…

Well… Thinking about it… We think the straight path makes sense because we have no good theory for any of the other paths. Hmm… So, from a logical point of view, assuming that the straight line is the only reasonable path is actually pretty random too. When push comes to shove, we have no good theory for the straight line either!

You’ll say I’ve just gone crazy. […] Well… Perhaps you’re right. 🙂 But… Somehow, it starts to make sense to me. We allow for everything to, then, indeed weed out the crazy paths using our interference theory, and so we do end up with what we’re ending up with: some kind of vague idea of “light not really traveling in a straight line but ‘smelling’ all of the neighboring paths around it and, hence, using a small core of nearby space“—as Feynman puts it.

Hmm… It brings me back to Richard Feynman’s introduction to his wonderful little book, in which he says we should just be happy to know how Nature works and not aspire to know why it works that way. In fact, he’s basically saying that, when it comes to quantum mechanics, the ‘how’ and the ‘why’ are one and the same, so asking ‘why’ doesn’t make sense, because we know ‘how’. He compares quantum theory with the system of calculation used by the Maya priests, which was based on a system of bars and dots, which helped them to do complex multiplications and divisions, for example. He writes the following about it: “The rules were tricky, but they were a much more efficient way of getting an answer to complicated questions (such as when Venus would rise again) than by counting beans.”

When I first read this, I thought the comparison was flawed: if a common Maya Indian did not want to use the ‘tricky’ rules of multiplication and what have you (or, more likely, if he didn’t understand them), he or she could still resort to counting beans. But how do we count beans in quantum mechanics? We have no ‘simpler’ rules than those weird rules about adding amplitudes and taking the (absolute) square of complex numbers so… Well… We actually are counting beans here then:

  1. We allow for any possibility—any path: straight, curved or crooked. Anything is possible.
  2. But all those possibilities are inter-connected. Also note that every path has a mirror image: for every route ‘south’, there is a similar route ‘north’, so to say, except for the straight line, which is a mirror image of itself.
  3. And then we have some clock ticking. Time goes by. It ensures that the paths that are too far removed from the straight line cancel each other. [Of course, you’ll ask: what is too far? But I answered that question –  convincingly, I hope – in my previous post: it’s not about the ‘number of arrows’ (as suggested in the caption under that Figure 34 above), but about the frequency and, hence, the ‘wavelength’ of our photon.]
  4. And so… Finally, what’s left is a limited number of possibilities that interfere with each other, which results in what we ‘see’: light seems to use a small core of space indeed–a limited number of nearby paths.

You’ll say… Well… That still doesn’t ‘explain’ why the interference pattern disappears with those special detectors or – what amounts to the same – why the special detectors at the slits never click simultaneously.

You’re right. How do we make sense of that? I don’t know. You should try to imagine what happens for yourself. Everyone has his or her own way of ‘conceptualizing’ stuff, I’d say, and you may well be content and just accept all of the above without trying to ‘imagine’ what’s happening really when a ‘photon’ goes through one or both of those slits. In fact, that’s the most sensible thing to do. You should not try to imagine what happens and just follow the crazy calculus rules.

However, when I think about it, I do have some image in my head. The image is of one of those ‘touch-me-not’ weeds. I quickly googled one of these images, but I couldn’t quite find what I am looking for: it would be more like something that, when you touch it, curls up in a little ball. Any case… You know what I mean, I hope.

Mimosa_Pudica

You’ll shake your head now and solemnly confirm that I’ve gone mad. Touch-me-not weeds? What’s that got to do with photons? 

Well… It’s obvious you and I cannot really imagine how a photon looks like. But I think of it as a blob of energy indeed, which is inseparable, and which effectively occupies some space (in three dimensions that is). I also think that, whatever it is, it actually does travel through both slits, because, as it interferes with itself, the interference pattern does depend on the space between the two slits as well as the width of those slits. In short, the whole ‘geometry’ of the situation matters, and so the ‘interaction’ is some kind of ‘spatial’ thing. [Sorry for my awfully imprecise language here.]

Having said that, I think it’s being detected by one detector only because only one of them can sort of ‘hook’ it, somehow. Indeed, because it’s interconnected and inseparable, it’s the whole blob that gets hooked, not just one part of it. [You may or may not imagine that the detectors that’s got the best hold of it gets it, but I think that’s pushing the description too much.] In any case, the point is that a photon is surely not like a lizard dropping its tail while trying to escape. Perhaps it’s some kind of unbreakable ‘string’ indeed – and sorry for summarizing string theory so unscientifically here – but then a string oscillating in dimensions we can’t imagine (or in some dimension we can’t observe, like the Kaluza-Klein theory suggests). It’s something, for sure, and something that stores energy in some kind of oscillation, I think.

What it is, exactly, we can’t imagine, and we’ll probably never find out—unless we accept that the how of quantum mechanics is not only the why, but also the what. 🙂

Does this make sense? Probably not but, if anything, I hope it fired your imagination at least. 🙂

The Strange Theory of Light and Matter (I)

I am of the opinion that Richard Feynman’s wonderful little common-sense introduction to the ‘uncommon-sensy‘ theory of quantum electrodynamics (The Strange Theory of Light and Matter), which were published a few years before his death only, should be mandatory reading for high school students.

I actually mean that: it should just be part of the general education of the first 21st century generation. Either that or, else, the Education Board should include a full-fledged introduction to complex analysis and quantum physics in the curriculum. 🙂

Having praised it (just now, as well as in previous posts), I re-read it recently during a trek in Nepal with my kids – I just grabbed the smallest book I could find the morning we left 🙂 – and, frankly, I now think Ralph Leighton, who transcribed and edited these four short lectures, could have cross-referenced it better. Moreover, there are two or three points where Feynman (or Leighton?) may have sacrificed accuracy for readability. Let me recapitulate the key points and try to improve here and there.

Amplitudes and arrows

The booklet avoids scary mathematical terms and formulas but doesn’t avoid the fundamental concepts behind, and it doesn’t avoid the kind of ‘deep’ analysis one needs to get some kind of ‘feel’ for quantum mechanics either. So what are the simplifications?

A probability amplitude (i.e. a complex number) is, quite simply, an arrow, with a direction and a length. Thus Feynman writes: “Arrows representing probabilities from 0% to 16% [as measured by the surface of the square which has the arrow as its side] have lengths from 0 to 0.4.” That makes sense: such geometrical approach does away, for example, with the need to talk about the absolute square (i.e. the square of the absolute value, or the squared norm) of a complex number – which is what we need to calculate probabilities from probability amplitudes. So, yes, it’s a wonderful metaphor. We have arrows and surfaces now, instead of wave functions and absolute squares of complex numbers.

The way he combines these arrows make sense too. He even notes the difference between photons (bosons) and electrons (fermions): for bosons, we just add arrows; for fermions, we need to subtract them (see my post on amplitudes and statistics in this regard).

There is also the metaphor for the phase of a wave function, which is a stroke of genius really (I mean it): the direction of the ‘arrow’ is determined by a stopwatch hand, which starts turning when a photon leaves the light source, and stops when it arrives, as shown below.

front and back reflection amplitude

OK. Enough praise. What are the drawbacks?

The illustration above accompanies an analysis of how light is either reflected from the front surface of a sheet of a glass or, else, from the back surface. Because it takes more time to bounce off the back surface (the path is associated with a greater distance), the front and back reflection arrows point in different directions indeed (the stopwatch is stopped somewhat later when the photon reflects from the back surface). Hence, the difference in phase (but that’s a term that Feynman also avoids) is determined by the thickness of the glass. Just look at it. In the upper part of the illustration above, the thickness is such that the chance of a photon reflecting off the front or back surface is 5%: we add two arrows, each with a length of 0.2, and then we square the resulting (aka final) arrow. Bingo! We get a surface measuring 0.05, or 5%.

Huh? Yes. Just look at it: if the angle between the two arrows would be 90° exactly, it would be 0.08 or 8%, but the angle is a bit less. In the lower part of the illustration, the thickness of the glass is such that the two arrows ‘line up’ and, hence, they form an arrow that’s twice the length of either arrow alone (0.2 + 0.2 = 0.4), with a square four times as large (0.16 = 16%). So… It all works like a charm, as Feynman puts it.

[…]

But… Hey! Look at the stopwatch for the front reflection arrows in the upper and lower diagram: they point in the opposite direction of the stopwatch hand! Well… Hmm… You’re right. At this point, Feynman just notes that we need an extra rule: “When we are considering the path of a photon bouncing off the front surface of the glass, we reverse the direction of the arrow.

He doesn’t say why. He just adds this random rule to the other rules – which most readers who read this book already know. But why this new rule? Frankly, this inconsistency – or lack of clarity – would wake me up at night. This is Feynman: there must be a reason. Why?

Initially, I suspected it had something to do with the two types of ‘statistics’ in quantum mechanics (i.e. those different rules for combining amplitudes of bosons and fermions respectively, which I mentioned above). But… No. Photons are bosons anyway, so we surely need to add, not subtract. So what is it?

[…] Feynman explains it later, much later – in the third of the four chapters of this little book, to be precise. It’s, quite simply, the result of the simplified model he uses in that first chapter. The photon can do anything really, and so there are many more arrows than just two. We actually should look at an infinite number of arrows, representing all possible paths in spacetime, and, hence, the two arrows (i.e. the one for the reflection from the front and back surface respectively) are combinations of many other arrows themselves. So how does that work?

An analysis of partial reflection (I)

The analysis in Chapter 3 of the same phenomenon (i.e. partial reflection by glass) is a simplified analysis too, but it’s much better – because there are no ‘random’ rules here. It is what Leighton promises to the reader in his introduction: “A complete description, accurate in every detail, of a framework onto which more advanced concepts can be attached without modification. Nothing has to be ‘unlearned’ later.

Well… Accurate in every detail? Perhaps not. But it’s good, and I still warmly recommend a reading of this delightful little book to anyone who’d ask me what to read as a non-mathematical introduction to quantum mechanics. I’ll limit myself here to just some annotations.

The first drawing (a) depicts the situation:

  1. A photon from a light source is being reflected by the glass. Note that it may also go straight through, but that’s a possibility we’ll analyze separately. We first assume that the photon is effectively being reflected by the glass, and so we want to calculate the probability of that event using all these ‘arrows’, i.e. the underlying probability amplitudes.
  2. As for the geometry of the situation: while the light source and the detector seem to be positioned at some angle from the normal, that is not the case: the photon travels straight down (and up again when reflected). It’s just a limitation of the drawing. It doesn’t really matter much for the analysis: we could look at a light beam coming in at some angle, but so we’re not doing that. It’s the simplest situation possible, in terms of experimental set-up that is. I just want to be clear on that.

partial reflection

Now, rather than looking at the front and back surface only (as Feynman does in Chapter 1), the glass sheet is now divided into a number of very thin sections: five, in this case, so we have six points from which the photon can be scattered into the detector at A: X1 to X6. So that makes six possible paths. That’s quite a simplification but it’s easy to see it doesn’t matter: adding more sections would result in many more arrows, but these arrows would also be much smaller, and so the final arrow would be the same.

The more significant simplification is that the paths are all straight paths, and that the photon is assumed to travel at the speed of light, always. If you haven’t read the booklet, you’ll say that’s obvious, but it’s not: a photon has an amplitude to go faster or slower than c but, as Feynman points out, these amplitudes cancel out over longer distances. Likewise, a photon can follow any path in space really, including terribly crooked paths, but these paths also cancel out. As Feynman puts it: “Only the paths near the straight-line path have arrows pointing in nearly the same direction, because their timings are nearly the same, and only these arrows are important, because it is from them that we accumulate a large final arrow.” That makes perfect sense, so there’s no problem with the analysis here either.

So let’s have a look at those six arrows in illustration (b). They point in a slightly different direction because the paths are slightly different and, hence, the distances (and, therefore, the timings) are different too. Now, Feynman (but I think it’s Leighton really) loses himself here in a digression on monochromatic light sources. A photon is a photon: it will have some wave function with a phase that varies in time and in space and, hence, illustration (b) makes perfect sense. [I won’t quote what he writes on a ‘monochromatic light source’ because it’s quite confusing and, IMHO, not correct.]

The stopwatch metaphor has only one minor shortcoming: the hand of a stopwatch rotates clockwise (obviously!), while the phase of an actual wave function goes counterclockwise with time. That’s just convention, and I’ll come back to it when I discuss the mathematical representation of the so-called wave function, which gives you these amplitudes. However, it doesn’t change the analysis, because it’s the difference in the phase that matters when combining amplitudes, so the clock can turn in either way indeed, as long as we’re agreed on it.

At this point, I can’t resist: I’ll just throw the math in. If you don’t like it, you can just skip the section that follows.

Feynman’s arrows and the wave function

The mathematical representation of Feynman’s ‘arrows’ is the wave function:

f = f(x–ct)

Is that the wave function? Yes. It is: it’s a function whose argument is x – ct, with x the position in space, and t the time variable. As for c, that’s the speed of light. We throw it in to make the units in which we measure time and position compatible. 

Really? Yes: f is just a regular wave function. To make it look somewhat more impressive, I could use the Greek symbol Φ (phi) or Ψ (psi) for it, but it’s just what it is: a function whose value depends on position and time indeed, so we write f = f(x–ct). Let me explain the minus sign and the c in the argument.

Time and space are interchangeable in the argument, provided we measure time in the ‘right’ units, and so that’s why we multiply the time in seconds with c, so the new unit of time becomes the time that light needs to travel a distance of one meter. That also explains the minus sign in front of ct: if we add one distance unit (i.e. one meter) to the argument, we have to subtract one time unit from it – the new time unit of course, so that’s the time that light needs to travel one meter – in order to get the same value for f. [If you don’t get that x–ct thing, just think a while about this, or make some drawing of a wave function. Also note that the spacetime diagram in illustration (b) above assumes the same: time is measured in an equivalent unit as distance, so the 45% line from the south-west to the north-east, that bounces back to the north-west, represents a photon traveling at speed c in space indeed: one unit of time corresponds to one meter of travel.]

Now I want to be a bit more aggressive. I said is a simple function. That’s true and not true at the same time. It’s a simple function, but it gives you probability amplitudes, which are complex numbers – and you may think that complex numbers are, perhaps, not so simple. However, you shouldn’t be put off. Complex numbers are really like Feynman’s ‘arrows’ and, hence, fairly simple things indeed. They have two dimensions, so to say: an a– and a b-coordinate. [I’d say an x– and y-coordinate, because that’s what you usually see, but then I used the x symbol already for the position variable in the argument of the function, so you have to switch to a and b for a while now.]

This a– and b-coordinate are referred to as the real and imaginary part of a complex number respectively. The terms ‘real’ and ‘imaginary’ are confusing because both parts are ‘real’ – well… As real as numbers can be, I’d say. 🙂 They’re just two different directions in space: the real axis is the a-axis in coordinate space, and the imaginary axis is the b-axis. So we could write it as an ordered pair of numbers (a, b). However, we usually write it as a number itself, and we distinguish the b-coordinate from the a-coordinate by writing an i in front: (a, b) = a + ib. So our function f = f(x–ct) is a complex-valued function: it will give you two numbers (an a and a b) instead of just one when you ‘feed’ it with specific values for x and t. So we write:

f = f(x–ct) = (a, b) = a + ib

So what’s the shape of this function? Is it linear or irregular or what? We’re talking a very regular wave function here, so it’s shape is ‘regular’ indeed. It’s a periodic function, so it repeats itself again and again. The animations below give you some idea of such ‘regular’ wave functions. Animation A and B shows a real-valued ‘wave’: a ball on a string that goes up and down, for ever and ever. Animations C to H are – believe it or not – basically the same thing, but so we have two numbers going up and down. That’s all.

QuantumHarmonicOscillatorAnimation

The wave functions above are, obviously, confined in space, and so the horizontal axis represents the position in space. What we see, then, is how the real and imaginary part of these wave functions varies as time goes by. [Think of the blue graph as the real part, and the imaginary part as the pinkish thing – or the other way around. It doesn’t matter.] Now, our wave function – i.e. the one that Feynman uses to calculate all those probabilities – is even more regular than those shown above: its real part is an ordinary cosine function, and it’s imaginary part is a sine. Let me write this in math:

f = f(x–ct) = a + ib = r(cosφ + isinφ)

It’s really the most regular wave function in the world: the very simple illustration below shows how the two components of f vary as a function in space (i.e. the horizontal axis) while we keep the time fixed, or vice versa: it could also show how the function varies in time at one particular point in space, in which case the horizontal axis would represent the time variable. It is what it is: a sine and a cosine function, with the angle φ as its argument.

cos and sine

Note that a sine function is the same as a cosine function, but it just lags a bit. To be precise, the phase difference is 90°, or π/2 in radians (the radian (i.e. the length of the arc on the unit circle) is a much more natural unit to express angles, as it’s fully compatible with our distance unit and, hence, most – if not all – of our other units). Indeed, you may or may not remember the following trigonometric identities: sinφ = cos(π/2–φ) = cos(φ–π/2).

In any case, now we have some r and φ here, instead of a and b. You probably wonder where I am going with all of this. Where are the x and t variables? Be patient! You’re right. We’ll get there. I have to explain that r and φ first. Together, they are the so-called polar coordinates of Feynman’s ‘arrow’ (i.e. the amplitude). Polar coordinates are just as good as coordinates as these Cartesian coordinates we’re used to (i.e. a and b). It’s just a different coordinate system. The illustration below shows how they are related to each other. If you remember anything from your high school trigonometry course, you’ll immediately agree that a is, obviously, equal to rcosφ, and b is rsinφ, which is what I wrote above. Just as good? Well… The polar coordinate system has some disadvantages (all of those expressions and rules we learned in vector analysis assume rectangular coordinates, and so we should watch out!) but, for our purpose here, polar coordinates are actually easier to work with, so they’re better.

Complex_number_illustration

Feynman’s wave function is extremely simple because his ‘arrows’ have a fixed length, just like the stopwatch hand. They’re just turning around and around and around as time goes by. In other words, is constant and does not depend on position and time. It’s the angle φ that’s turning and turning and turning as the stopwatch ticks while our photon is covering larger and larger distances. Hence, we need to find a formula for φ that makes it explicit how φ changes as a function in spacetime. That φ variable is referred to as the phase of the wave function. That’s a term you’ll encounter frequently and so I had better mention it. In fact, it’s generally used as a synonym for any angle, as you can see from my remark on the phase difference between a sine and cosine function.

So how do we express φ as a function of x and t? That’s where Euler’s formula comes in. Feynman calls it the most remarkable formula in mathematics – our jewel! And he’s probably right: of all the theorems and formulas, I guess this is the one we can’t do without when studying physics. I’ve written about this in another post, and repeating what I wrote there would eat up too much space, so I won’t do it and just give you that formula. A regular complex-valued wave function can be represented as a complex (natural) exponential function, i.e. an exponential function with Euler’s number e (i.e. 2.728…) as the base, and the complex number iφ as the (variable) exponent. Indeed, according to Euler’s formula, we can write:

f = f(x–ct) = a + ib = r(cosφ + isinφ) = r·eiφ

As I haven’t explained Euler’s formula (you should really have a look at my posts on it), you should just believe me when I say that r·eiφ is an ‘arrow’ indeed, with length r and angle φ (phi), as illustrated above, with a and b coordinates arcosφ and b = rsinφ. What you should be able to do now, is to imagine how that φ angle goes round and round as time goes by, just like Feynman’s ‘arrow’ goes round and round – just like a stopwatch hand indeed, but note our φ angle turns counterclockwise indeed.

Fine, you’ll say – but so we need a mathematical expression, don’t we? Yes,we do. We need to know how that φ angle (i.e. the variable in our r·eiφ function) changes as a function of x and t indeed. It turns out that the φ in r·eiφ can be substituted as follows:

eiφ = r·ei(ωt–kx) = r·eik(x–ct)

Huh? Yes. The phase (φ) of the probability amplitude (i.e. the ‘arrow’) is a simple linear function of x and t indeed: φ = ωt–kx = –k(x–ct). What about all these new symbols, k and ω? The ω and k in this equation are the so-called angular frequency and the wave number of the wave. The angular frequency is just the frequency expressed in radians, and you should think of the wave number as the frequency in space. [I could write some more here, but I can’t make it too long, and you can easily look up stuff like this on the Web.] Now, the propagation speed c of the wave is, quite simply, the ratio of these two numbers: c = ω/k. [Again, it’s easy to show how that works, but I won’t do it here.]

Now you know it all, and so it’s time to get back to the lesson.

An analysis of partial reflection (II)

Why did I digress? Well… I think that what I write above makes much more sense than Leighton’s rather convoluted description of a monochromatic light source as he tries to explain those arrows in diagram (b) above. Whatever it is, a monochromatic light source is surely not “a device that has been carefully arranged so that the amplitude for a photon to be emitted at a certain time can be easily calculated.” That’s plain nonsense. Monochromatic light is light of a specific color, so all photons have the same frequency (or, to be precise, their wave functions have all the same well-defined frequency), but these photons are not in phase. Photons are emitted by atoms, as an electron moves from one energy level to the other. Now, when a photon is emitted, what actually happens is that the atom radiates a train of waves only for about 10–8 sec, so that’s about 10 billionths of a second. After 10–8 sec, some other atom takes over, and then another atom, and so on. Each atom emits one photon, whose energy is the difference between the two energy levels that the electron is jumping to or from. So the phase of the light that is being emitted can really only stay the same for about 10–8 sec. Full stop.

Now, what I write above on how atoms actually emit photons is a paraphrase of Feynman’s own words in his much more serious series of Lectures on Mechanics, Radiation and Heat. Therefore, I am pretty sure it’s Leighton who gets somewhat lost when trying to explain what’s happening. It’s not photons that interfere. It’s the probability amplitudes associated with the various paths that a photon can take. To be fully precise, we’re talking the photon here, i.e. the one that ends up in the detector, and so what’s going on is that the photon is interfering with itself. Indeed, that’s exactly what the ‘craziness’ of quantum mechanics is all about: we sent electrons, one by one, through two slits, and we observe an interference pattern. Likewise, we got one photon here, that can go various ways, and it’s those amplitudes that interfere, so… Yes: the photon interferes with itself.

OK. Let’s get back to the lesson and look at diagram (c) now, in which the six arrows are added. As mentioned above, it would not make any difference if we’d divide the glass in 10 or 20 or 1000 or a zillion ‘very thin’ sections: there would be many more arrows, but they would be much smaller ones, and they would cover the same circular segment: its two endpoints would define the same arc, and the same chord on the circle that we can draw when extending that circular segment. Indeed, the six little arrows define a circle, and that’s the key to understanding what happens in the first chapter of Feynman’s QED, where he adds two arrows only, but with a reversal of the direction of the ‘front reflection’ arrow. Here there’s no confusion – Feynman (or Leighton) eloquently describe what they do:

“There is a mathematical trick we can use to get the same answer [i.e. the same final arrow]: Connecting the arrows in order from 1 to 6, we get something like an arc, or part of a circle. The final arrow forms the chord of this arc. If we draw arrows from the center of the ‘circle’ to the tail of arrow 1 and to the head of arrow 6, we get two radii. If the radius arrow from the center to arrow 1 is turned 180° (“subtracted”), then it can be combined with the other radius arrow to give us the same final arrow! That’s what I was doing in the first lecture: these two radii are the two arrows I said represented the ‘front surface’ and ‘back surface’ reflections. They each have the famous length of 0.2.”

That’s what’s shown in part (d) of the illustration above and, in case you’re still wondering what’s going on, the illustration below should help you to make your own drawings now.

CircularsegmentSo… That explains the phenomenon Feynman wanted to explain, which is a phenomenon that cannot be explained in classical physics. Let me copy the original here:

Iridescence

Partial reflection by glass—a phenomenon that cannot be explained in classical physics? Really?

You’re right to raise an objection: partial reflection by glass can, in fact, be explained by the classical theory of light as an electromagnetic wave. The assumption then is that light is effectively being reflected by both the front and back surface and the reflected waves combine or cancel out (depending on the thickness of the glass and the angle of reflection indeed) to match the observed pattern. In fact, that’s how the phenomenon was explained for hundreds of years! The point to note is that the wave theory of light collapsed as technology advanced, and experiments could be made with very weak light hitting photomultipliers. As Feynman writes: “As the light got dimmer and dimmer, the photomultipliers kept making full-sized clicks—there were just fewer of them. Light behaved as particles!”

The point is that a photon behaves like an electron when going through two slits: it interferes with itself! As Feynman notes, we do not have any ‘common-sense’ theory to explain what’s going on here. We only have quantum mechanics, and quantum mechanics is an “uncommon-sensy” theory: a “strange” or even “absurd” theory, that looks “cockeyed” and incorporates “crazy ideas”. But… It works.

Now that we’re here, I might just as well add a few more paragraphs to fully summarize this lovely publication – if only because summarizing stuff like this helps me to come to terms with understanding things better myself!

Calculating amplitudes: the basic actions

So it all boils down to calculating amplitudes: an event is divided into alternative ways of how the event can happen, and the arrows for each way are ‘added’. Now, every way an event can happen can be further subdivided into successive steps. The amplitudes for these steps are then ‘multiplied’. For example, the amplitude for a photon to go from A to C via B is the ‘product’ of the amplitude to go from A to B and the amplitude to go from B to C.

I marked the terms ‘multiplied’ and ‘product’ with apostrophes, as if to say it’s not a ‘real’ product. But it is an actual multiplication: it’s the product of two complex numbers. Feynman does not explicitly compare this product to other products, such as the dot (•) or cross (×) product of two vectors, but he uses the ∗ symbol for multiplication here, which clearly distinguishes VW from VW or V×W indeed or, more simply, from the product of two ordinary numbers. [Ordinary numbers? Well… With ‘ordinary’ numbers, I mean real numbers, of course, but once you get used to complex numbers, you won’t like that term anymore, because complex numbers start feeling just as ‘real’ as other numbers – especially when you get used to the idea of those complex-valued wave functions underneath reality.]

Now, multiplying complex numbers, or ‘arrows’ using QED’s simpler language, consists of adding their angles and multiplying their lengths. That being said, the arrows here all have a length smaller than one (because their square cannot be larger than one, because that square is a probability, i.e. a (real) number between 0 and 1), Feynman defines successive multiplication as successive ‘shrinks and turns’ of the unit arrow. That all makes sense – very much sense.

But what’s the basic action? As Feynman puts the question: “How far can we push this process of splitting events into simpler and simpler subevents? What are the smallest possible bits and pieces? Is there a limit?” He immediately answers his own question. There are three ‘basic actions’:

  1. A photon goes from one point (in spacetime) to another: this amplitude is denoted by P(A to B).
  2. An electron goes from one point to another: E(A to B).
  3. An electron emits and/or absorbs a photon: this is referred to as a ‘junction’ or a ‘coupling’, and the amplitude for this is denoted by the symbol j, i.e. the so-called junction number.

How do we find the amplitudes for these?

The amplitudes for (1) and (2) are given by a so-called propagator functions, which give you the probability amplitude for a particle to travel from one place to another in a given time indeed, or to travel with a certain energy and momentum. Judging from the Wikipedia article on these functions, the subject-matter is horrendously complicated, and the formulas are too, even if Feynman says it’s ‘very simple’ – for a photon, that is. The key point to note is that any path is possible. Moreover, there are also amplitudes for photons to go faster or slower than the speed of light (c)! However, these amplitudes make smaller contributions, and cancel out over longer distances. The same goes for the crooked paths: the amplitudes cancel each other out as well.

What remains are the ‘nearby paths’. In my previous post (check the section on electromagnetic radiation), I noted that, according to classical wave theory, a light wave does not occupy any physical space: we have electric and magnetic field vectors that oscillate in a direction that’s perpendicular to the direction of propagation, but these do not take up any space. In quantum mechanics, the situation is quite different. As Feynman puts it: “When you try to squeeze light too much [by forcing it to go through a small hole, for example, as illustrated below], it refuses to cooperate and begins to spread out.” He explains this in the text below the second drawing: “There are not enough arrows representing the paths to Q to cancel each other out.”

Many arrowsFew arrows

Not enough arrows? We can subdivide space in as many paths as we want, can’t we? Do probability amplitudes take up space? And now that we’re asking the tougher questions, what’s a ‘small’ hole? What’s ‘small’ and what’s ‘large’ in this funny business?

Unfortunately, there’s not much of an attempt in the booklet to try to answer these questions. One can begin to formulate some kind of answer when doing some more thinking about these wave functions. To be precise, we need to start looking at their wavelength. The frequency of a typical photon (and, hence, of the wave function representing that photon) is astronomically high. For visible light, it’s in the range of 430 to 790 teraherz, i.e. 430–790×1012 Hz. We can’t imagine such incredible numbers. Because the frequency is so high, the wavelength is unimaginably small. There’s a very simple and straightforward relation between wavelength (λ) and frequency (ν) indeed: c = λν. In words: the speed of a wave is the wavelength (i.e. the distance (in space) of one cycle) times the frequency (i.e. the number of cycles per second). So visible light has a wavelength in the range of 390 to 700 nanometer, i.e. 390–700 billionths of a meter. A meter is a rather large unit, you’ll say, so let me express it differently: it’s less than one thousandth of a micrometer, and a micrometer itself is one thousandth of a millimeter. So, no, we can’t imagine that distance either.

That being said, that wavelength is there, and it does imply that some kind of scale is involved. A wavelength covers one full cycle of the oscillation: it means that, if we travel one wavelength in space, our ‘arrow’ will point in the same direction again. Both drawings above (Figure 33 and 34) suggest the space between the two blocks is less than one wavelength. It’s a bit hard to make sense of the direction of the arrows but note the following:

  1. The phase difference between (a) the ‘arrow’ associated with the straight route (i.e. the ‘middle’ path) and (b) the ‘arrow’ associated with the ‘northern’ or ‘southern’ route (i.e. the ‘highest’ and ‘lowest’ path) in Figure 33 is like quarter of a full turn, i.e. 90°. [Note that the arrows for the northern and southern route to P point in the same direction, because they are associated with the same timing. The same is true for the two arrows in-between the northern/southern route and the middle path.]
  2. In Figure 34, the phase difference between the longer routes and the straight route is much less, like 10° only.

Now, the calculations involved in these analyses are quite complicated but you can see the explanation makes sense: the gap between the two blocks is much narrower in Figure 34 and, hence, the geometry of the situation does imply that the phase difference between the amplitudes associated with the ‘northern’ and ‘southern’ routes to Q is much smaller than the phase difference between those amplitudes in Figure 33. To be precise,

  1. The phase difference between (a) the ‘arrow’ associated with the ‘northern route’ to Q and (b) the ‘arrow’ associated with the ‘southern’ route to Q (i.e. the ‘highest’ and ‘lowest’ path) in Figure 33 is like three quarters of a full turn, i.e. 270°. Hence, the final arrow is very short indeed, which means that the probability of the photon going to Q is very low indeed. [Note that the arrows for the northern and southern route no longer point in the same direction, because they are associated with very different timings: the ‘southern route’ is shorter and, hence, faster.]
  2. In Figure 34, we have a phase difference between the shortest and longest route that is like 60° only and, hence, the final arrow is very sizable and, hence, the probability of the photon going to Q is, accordingly, quite substantial.

OK… What did I say here about P(A to B)? Nothing much. I basically complained about the way Feynman (or Leighton, more probably) explained the interference or diffraction phenomenon and tried to do a better job before tacking the subject indeed: how do we get that P(A to B)?

A photon can follow any path from A to B, including the craziest ones (as shown below). Any path? Good players give a billiard ball extra spin that may make the ball move in a curved trajectory, and will also affect its its collision with any other ball – but a trajectory like the one below? Why would a photon suddenly take a sharp turn left, or right, or up, or down? What’s the mechanism here? What are the ‘wheels and gears inside’ of the photon that (a) make a photon choose this path in the first place and (b) allow it to whirl, swirl and twirl like that?

615px-Three_paths_from_A_to_B

We don’t know. In fact, the question may make no sense, because we don’t know what actually happens when a photon travels through space. We know it leaves as a lump of energy, and we know it arrives as a similar lump of energy. When we actually put a detector to check which path is followed – by putting special detectors at the slits in the famous double-slit experiment, for example – the interference pattern disappears. So… Well… We don’t know how to describe what’s going on: a photon is not a billiard ball, and it’s not a classical electromagnetic wave either. It is neither. The only thing that we know is that we get probabilities that match with the results of experiment if we accept this nonsensical assumptions and do all of the crazy arithmetic involved. Let me get back to the lesson.  

Photons can also travel faster or slower than the speed of light (c is some 3×108 meter per second but, in our special time unit, it’s equal to one). Does that violate relativity? It doesn’t, apparently, but for the reasoning behind I must, once again, refer you to more sophisticated writing.

In any case, if the mathematicians and physicists have to take into account both of these assumptions (any path is possible, and speeds higher or lower than c are possible too!), they must be looking at some kind of horrendous integral, don’t they?

They are. When everything is said and done, that propagator function is some monstrous integral indeed, and I can’t explain it to you in a couple of words – if only because I am struggling with it myself. 🙂 So I will just believe Feynman when he says that, when the mathematicians and physicists are finished with that integral, we do get some simple formula which depends on the value of the so-called spacetime interval between two ‘points’ – let’s just call them 1 and 2 – in space and time. You’ve surely heard about it before: it’s denoted by sor I (or whatever) and it’s zero if an object moves at the speed of light, which is what light is supposed to do – but so we’re dealing with a different situation here. 🙂 To be precise, I consists of two parts:

  1. The distance d between the two points (1 and 2), i.e. Δr, which is just the square root of d= Δr= (x2–x2)2+(y2–y1)2+(z2–z1)2. [This formula is just a three-dimensional version of the Pythagorean Theorem.]
  2. The ‘distance’ (or difference) in time, which is usually expressed in those ‘equivalent’ time units that we introduced above already, i.e. the time that light – traveling at the speed of light 🙂 – needs to travel one meter. We will usually see that component of I in a squared version too: Δt= (t2–t1)2, or, if time is expressed in the ‘old’ unit (i.e. seconds), then we write c2Δt2 = c2(t2–t1)2.

Now, the spacetime interval itself is defined as the excess of the squared distance (in space) over the squared time difference:

s= I = Δr– Δt= (x2–x2)2+(y2–y1)2+(z2–z1)– (t2–t1)2

You know we can then define time-like, space-like and light-like intervals, and these, in turn, define the so-called light cone. The spacetime interval can be negative, for example. In that case, Δt2 will be greater than Δr2, so there is no ‘excess’ of distance over time: it means that the time difference is large enough to allow for a cause–effect relation between the two events, and the interval is said to be time-like. In any case, that’s not the topic of this post, and I am sorry I keep digressing.

The point to note is that the formula for the propagator favors light-like intervals: they are associated with large arrows. Space- and time-like intervals, on the other hand, will contribute much smaller arrows. In addition, the arrows for space- and time-like intervals point in opposite directions, so they will cancel each other out. So, when everything is said and done, over longer distances, light does tend to travel in a straight line and at the speed of light. At least, that’s what Feynman tells us, and I tend to believe him. 🙂

But so where’s the formula? Feynman doesn’t give it, probably because it would indeed confuse us. Just google ‘propagator for a photon’ and you’ll see what I mean. He does integrate the above conclusions in that illustration (b) though. What illustration? 

Oh… Sorry. You probably forgot what I am trying to do here, but so we’re looking at that analysis of partial reflection of light by glass. Let me insert it once again so you don’t have to scroll all the way up.

partial reflection

You’ll remember that Feynman divided the glass sheet into five sections and, hence, there are six points from which the photon can be scattered into the detector at A: X1 to X6. So that makes six possible paths: these paths are all straight (so Feynman makes abstraction of all of the crooked paths indeed), and the other assumption is that the photon effectively traveled at the speed of light, whatever path it took (so Feynman also assumes the amplitudes for speeds higher or lower than c cancel each other out). So that explains the difference in time at emission from the light source. The longest path is the path to point X6 and then back up to the detector. If the photon would have taken that path, it would have to be emitted earlier in time – earlier as compared to the other possibilities, which take less time. So it would have to be emitted at T = T6. The direction of the ‘arrow’ is like one o’clock. The shorter paths are associated with shorter times (the difference between the time of arrival and departure is shorter) and so T5 is associated with an arrow in the 12 o’clock direction, T5 is 11 o’clock, and so on, till T5, which points at the 9 o’clock direction.

But… What? These arrows also include the reflection, i.e. the interaction between the photon and some electron in the glass, don’t they? […] Right you are. Sorry. So… Yes. The action above involves four ‘basic actions’:

  1. A photon is emitted by the source at a time T = T1, T2, T3, T4, T5 or T6: we don’t know. Quantum-mechanical uncertainty. 🙂
  2. It goes from the source to one of the points X = X1, X2, X3, X4, X5 or Xin the glass: we don’t know which one, because we don’t have a detector there.
  3. The photon interacts with an electron at that point.
  4. It makes it way back up to the detector at A.

Step 1 does not have any amplitude. It’s just the start of the event. Well… We start with the unit arrow pointing north actually, so its length is one and its direction is 12 o’clock. And so we’ll shrink and turn it, i.e. multiply it with other arrows, in the next steps.

Steps 2 and 4 are straightforward and are associated with arrows of the same length. Their direction depends on the distance traveled and/or the time of emission: it amounts to the same because we assume the speed is constant and exactly the same for the six possibilities (that speed is c = 1 obviously). But what length? Well… Some length according to that formula which Feynman didn’t give us. 🙂

So now we need to analyze the third of those three basic actions: a ‘junction’ or ‘coupling’ between an electron and a photon. At this point, Feynman embarks on a delightful story highlighting the difficulties involved in calculating that amplitude. A photon can travel following crooked paths and at devious speeds, but an electron is even worse: it can take what Feynman refers to as ‘one-hop flights’, ‘two-hop flights’, ‘three-hop flights’,… any ‘n-hop flight’ really. Each stop involves an additional amplitude, which is represented by n2, with n some number that has been determined from experiment. The formula for E(A to B) then becomes a series of terms: P(A to B) + (P(A to C)∗n2∗(P(C to B) + (P(A to D)∗n2∗P(D to E)∗n2∗P(E to C)+…

P(A to B) is the ‘one-hop flight’ here, while C, D and E are intermediate points, and (P(A to C)∗n2∗(P(C to B) and (P(A to D)∗n2∗P(D to E)∗n2∗P(E to C) are the ‘two-hop’ and ‘three-hop’ flight respectively. Note that this calculation has to be made for all possible intermediate points C, D, E and so on. To make matters worse, the theory assumes that electrons can emit and absorb photons along the way, and then there’s a host of other problems, which Feynman tries to explain in the last and final chapter of his little book. […]

Hey! Stop it!

What?

You’re talking about E(A to B) here. You’re supposed to be talking about that junction number j.

Oh… Sorry. You’re right. Well… That junction number j is about –0.1. I know that looks like an ordinary number, but it’s an amplitude, so you should interpret it as an arrow. When you multiply it with another arrow, it amounts to a shrink to one-tenth, and half a turn. Feynman entertains us also on the difficulties of calculating this number but, you’re right, I shouldn’t be trying to copy him here – if only because it’s about time I finish this post. 🙂

So let me conclude it indeed. We can apply the same transformation (i.e. we multiply with j) to each of the six arrows we’ve got so far, and the result is those six arrows next to the time axis in illustration (b). And then we combine them to get that arc, and then we apply that mathematical trick to show we get the same result as in a classical wave-theoretical analysis of partial reflection.

Done. […] Are you happy now?

[…] You shouldn’t be. There are so many questions that have been left unanswered. For starters, Feynman never gives that formula for the length of P(A to B), so we have no clue about the length of these arrows and, hence, about that arc. If physicists know their length, it seems to have been calculated backwards – from those 0.2 arrows used in the classical wave theory of light. Feynman is actually quite honest about that, and simply writes:

“The radius of the arc [i.e. the arc that determines the final arrow] evidently depends on the length of the arrow for each section, which is ultimately determined by the amplitude S that an electron in an atom of glass scatters a photon. This radius can be calculated using the formulas for the three basic actions. […] It must be said, however, that no direct calculation from first principles for a substance as complex as glass has actually been done. In such cases, the radius is determined by experiment. For glass, it has been determined from experiment that the radius is approximately 0.2 (when the light shines directly onto the glass at right angles).”

Well… OK. I think that says enough. So we have a theory – or first principles at last – but we don’t them to calculate. That actually sounds a bit like metaphysics to me. 🙂 In any case… Well… Bye for now!

But… Hey! You said you’d analyze how light goes straight through the glass as well?

Yes. I did. But I don’t feel like doing that right now. I think we’ve got enough stuff to think about right now, don’t we? 🙂

The Complementarity Principle

Pre-script (dated 26 June 2020): This post has become less relevant because my views on all things quantum-mechanical have evolved significantly as a result of my progression towards a more complete realist (classical) interpretation of quantum physics. Hence, we recommend you read our recent papers. I keep blog posts like these mainly because I want to keep track of where I came from. I might review them one day, but I currently don’t have the time or energy for it. 🙂

Original post:

Unlike what you might think when seeing the title of this post, it is not my intention to enter into philosophical discussions here: many authors have been writing about this ‘principle’, most of which–according to eminent physicists–don’t know what they are talking about. So I have no intention to make a fool of myself here too. However, what I do want to do here is explore, in an intuitive way, how the classical and quantum-mechanical explanations of the phenomenon of the diffraction of light are different from each other–and fundamentally so–while, necessarily, having to yield the same predictions. It is in that sense that the two explanations should be ‘complementary’.

The classical explanation

I’ve done a fairly complete analysis of the classical explanation in my posts on Diffraction and the Uncertainty Principle (20 and 21 September), so I won’t dwell on that here. Let me just repeat the basics. The model is based on the so-called Huygens-Fresnel Principle, according to which each point in the slit becomes a source of a secondary spherical wave. These waves then interfere, constructively or destructively, and, hence, by adding them, we get the form of the wave at each point of time and at each point in space behind the slit. The animation below illustrates the idea. However, note that the mathematical analysis does not assume that the point sources are neatly separated from each other: instead of only six point sources, we have an infinite number of them and, hence, adding up the waves amounts to solving some integral (which, as you know, is an infinite sum).

Huygens_Fresnel_Principle

We know what we are supposed to get: a diffraction pattern. The intensity of the light on the screen at the other side depends on (1) the slit width (d), (2) the frequency of the light (λ), and (3) the angle of incidence (θ), as shown below.

Single_Slit_Diffraction_(english)

One point to note is that we have smaller bumps left and right. We don’t get that if we’d treat the slit as a single point source only, like Feynman does when he discusses the double-slit experiment for (physical) waves. Indeed, look at the image below: each of the slits acts as one point source only and, hence, the intensity curves I1 and I2 do not show a diffraction pattern. They are just nice Gaussian “bell” curves, albeit somewhat adjusted because of the angle of incidence (we have two slits above and below the center, instead of just one on the normal itself). So we have an interference pattern on the screen and, now that we’re here, let me be clear on terminology: I am going along with the widespread definition of diffraction being a pattern created by one slit, and the definition of interference as a pattern created by two or more slits. I am noting this just to make sure there’s no confusion.

Water waves

That should be clear enough. Let’s move on the quantum-mechanical explanation.

The quantum-mechanical explanation

There are several formulations of quantum mechanics: you’ve heard about matrix mechanics and wave mechanics. Roughly speaking, in matrix mechanics “we interpret the physical properties of particles as matrices that evolve in time”, while the wave mechanics approach is primarily based on these complex-valued wave functions–one for each physical property (e.g. position, momentum, energy). Both approaches are mathematically equivalent.

There is also a third approach, which is referred to as the path integral formulation, which  “replaces the classical notion of a single, unique trajectory for a system with a sum, or functional integral, over an infinity of possible trajectories to compute an amplitude” (all definitions here were taken from Wikipedia). This approach is associated with Richard Feynman but can also be traced back to Paul Dirac, like most of the math involved in quantum mechanics, it seems. It’s this approach which I’ll try to explain–again, in an intuitive way only–in order to show the two explanations should effectively lead to the same predictions.

The key to understanding the path integral formulation is the assumption that a particle–and a ‘particle’ may refer to both bosons (e.g. photons) or fermions (e.g. electrons)–can follow any path from point A to B, as illustrated below. Each of these paths is associated with a (complex-valued) probability amplitude, and we have to add all these probability amplitudes to arrive at the probability amplitude for the particle to move from A to B.

615px-Three_paths_from_A_to_B

You can find great animations illustrating what it’s all about in the relevant Wikipedia article but, because I can’t upload video here, I’ll just insert two illustrations from Feynman’s 1985 QED, in which he does what I try to do, and that is to approach the topic intuitively, i.e. without too much mathematical formalism. So probability amplitudes are just ‘arrows’ (with a length and a direction, just like a complex number or a vector), and finding the resultant or final arrow is a matter of just adding all the little arrows to arrive at one big arrow, which is the probability amplitude, which he denotes as P(A, B), as shown below.

feynman-qed-1985

This intuitive approach is great and actually goes a very long way in explaining complicated phenomena, such as iridescence for example (the wonderful patterns of color on an oil film!), or the partial reflection of light by glass (anything between 0 and 16%!). All his tricks make sense. For example, different frequencies are interpreted as slower or faster ‘stopwatches’ and, as such, they determine the final direction of the arrows which, in turn, explains why blue and red light are reflected differently. And so on and son. It all works. […] Up to a point.

Indeed, Feynman does get in trouble when trying to explain diffraction. I’ve reproduced his explanation below. The key to the argument is the following:

  1. If we have a slit that’s very wide, there are a lot of possible paths for the photon to take. However, most of these paths cancel each other out, and so that’s why the photon is likely to travel in a straight line. Let me quote Feynman: “When the gap between the blocks is wide enough to allow many neighboring paths to P and Q, the arrows for the paths to P add up (because all the paths to P take nearly the same time), while the paths to Q cancel out (because those paths have a sizable difference in time). So the photomultiplier at Q doesn’t click.” (QED, p.54)
  2. However, “when the gap is nearly closed and there are only a few neighboring paths, the arrows to Q also add up, because there is hardly any difference in time between them, either (see Fig. 34). Of course, both final arrows are small, so there’s not much light either way through such a small hole, but the detector at Q clicks almost as much as the one at P! So when you try to squeeze light too much to make sure it’s going only in a straight line, it refuses to cooperate and begins to spread out.” (QED, p. 55)

Many arrowsFew arrows

This explanation is as simple and intuitive as Feynman’s ‘explanation’ of diffraction using the Uncertainty Principle in his introductory chapter on quantum mechanics (Lectures, I-38-2), which is illustrated below. I won’t go into the detail (I’ve done that before) but you should note that, just like the explanation above, such explanations do not explain the secondary, tertiary etc bumps in the diffraction pattern.

Diffraction of electrons

So what’s wrong with these explanations? Nothing much. They’re simple and intuitive, but essentially incomplete, because they do not incorporate all of the math involved in interference. Incorporating the math means doing these integrals for

  1. Electromagnetic waves in classical mechanics: here we are talking ‘wave functions’ with some real-valued amplitude representing the strength of the electric and magnetic field; and
  2. Probability waves: these are complex-valued functions, with the complex-valued amplitude representing probability amplitudes.

The two should, obviously, yield the same result, but a detailed comparison between the approaches is quite complicated, it seems. Now, I’ve googled a lot of stuff, and I duly note that diffraction of electromagnetic waves (i.e. light) is conveniently analyzed by summing up complex-valued waves too, and, moreover, they’re of the same familiar type: ψ = Aei(kx–ωt). However, these analyses also duly note that it’s only the real part of the wave that has an actual physical interpretation, and that it’s only because working with natural exponentials (addition, multiplication, integration, derivation, etc) is much easier than working with sine and cosine waves that such complex-valued wave functions are used (also) in classical mechanics. In fact, note the fine print in Feynman’s illustration of interference of physical waves (Fig. 37-2): he calculates the intensities I1 and I2 by taking the square of the absolute amplitudes ĥ1 and ĥ2, and the hat indicates that we’re also talking some complex-valued wave function here.

Hence, we must be talking the same mathematical waves in both explanations, aren’t we? In other words, we should get the same psi functions ψ = Aei(kx–ωt) in both explanations, don’t we? Well… Maybe. But… Probably not. As far as I know–but I must be wrong–we cannot just re-normalize the E and B vectors in these electromagnetic waves in order to establish an equivalence with probability waves. I haven’t seen that being done (but I readily admit I still have a lot of reading to do) and so I must assume it’s not very clear-cut at all.

So what? Well… I don’t know. So far, I did not find a ‘nice’ or ‘intuitive’ explanation of a quantum-mechanical approach to the phenomenon of diffraction yielding the same grand diffraction equation, referred to as the Fresnel-Kirchoff diffraction formula (see below), or one of its more comprehensible (because simplified) representations, such as the Fraunhofer diffraction formula, or the even easier formula which I used in my own post (you can google them: they’re somewhat less monstrous and–importantly–they work with real numbers only, which makes them easier to understand).

Kirchoff formula[…] That looks pretty daunting, isn’t it? You may start to understand it a bit better by noting that (n, r) and (n, s) are angles, so that’s OK in a cosine function. The other variables also have fairly standard interpretations, as shown below, but… Admit it: ‘easy’ is something else, isn’t it?

730px-Kirchhoff_1

So… Where are we here? Well… As said, I trust that both explanations are mathematically equivalent – just like matrix and wave mechanics 🙂 –and, hence, that a quantum-mechanical analysis will indeed yield the same formula. However, I think I’ll only understand physics truly if I’ve gone through all of the motions here.

Well then… I guess that should be some kind of personal benchmark that should guide me on this journey, isn’t it? 🙂 I’ll keep you posted.

Post scriptum: To be fair to Feynman, and demonstrating his talent as a teacher once again, he actually acknowledges that the double-slit thought experiment uses simplified assumptions that do not include diffraction effects when the electrons go through the slit(s). He does so, however, only in one of the first chapters of Vol. III of the Lectures, where he comes back to the experiment to further discuss the first principles of quantum mechanics. I’ll just quote him: “Incidentally, we are going to suppose that the holes 1 and 2 are small enough that when we say an electron goes through the hole, we don’t have to discuss which part of the hole. We could, of course, split each hole into pieces with a certain amplitude that the electron goes to the top of the hole and the bottom of the hole and so on. We will suppose that the hole is small enough so that we don’t have to worry about this detail. That is part of the roughness involved; the matter can be made more precise, but we don’t want to do so at this stage.” So here he acknowledges that he omitted the intricacies of diffraction. I noted this only later. Sorry.

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 20, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Diffraction and the Uncertainty Principle (II)

Pre-script (dated 26 June 2020): This post did not suffer too much from the attack on this blog by the the dark force. It remains relevant. 🙂

Original post:

In my previous post, I derived and explained the general formula for the pattern generated by a light beam going through a slit or a circular aperture: the diffraction pattern. For light going through an aperture, this generates the so-called Airy pattern. In practice, diffraction causes a blurring of the image, and may make it difficult to distinguish two separate points, as shown below (credit for the image must go to Wikipedia again, I am afraid).

Airy_disk_spacing_near_Rayleigh_criterion

What’s actually going on is that the lens acts as a slit or, if it’s circular (which is usually the case), as an aperture indeed: the wavefront of the transmitted light is taken to be spherical or plane when it exits the lens and interferes with itself, thereby creating the ring-shaped diffraction pattern that we explained in the previous post.

The spatial resolution is also known as the angular resolution, which is quite appropriate, because it refers to an angle indeed: we know the first minimum (i.e. the first black ring) occurs at an angle θ such that sinθ = λ/L, with λ the wavelength of the light and L the lens diameter. It’s good to remind ourselves of the geometry of the situation: below we picture the array of oscillators, and so we know that the first minimum occurs at an angle such that Δ = λ. The second, third, fourth etc minimum occurs at an angle θ such that Δ = 2λ, 3λ, 4λ, etc. However, these secondary minima do not play any role in determining the resolving power of a lens, or a telescope, or an electron microscope, etc, and so you can just forget about them for the time being.

geometry

For small angles (expressed in radians), we can use the so-called small-angle approximation and equate sinθ with θ: the error of this approximation is less than one percent for angles smaller than 0.244 radians (14°), so we have the amazingly simply result that the first minimum occurs at an angle θ such that:

θ = λ/L

Spatial resolution of a microscope: the Rayleigh criterion versus Dawes’ limit 

If we have two point sources right next to each other, they will create two Airy disks, as shown above, which may overlap. That may make it difficult to see them, in a telescope, a microscope, or whatever device. Hence, telescopes, microscopes (using light or electron beams or whatever) have a limited resolving power. How do we measure that?

The so-called Rayleigh criterion regards two point sources as just resolved when the principal diffraction maximum of one image coincides with the first minimum of the other, as shown below. If the distance is greater, the two points are (very) well resolved, and if it is smaller, they are regarded as not resolved. This angle is obviously related to the θ = λ/L angle but it’s not the same: in fact, it’s a slightly wider angle. The analysis involved in calculating the angular resolution in terms of angle, and we use the same symbol θ for it, is quite complicated and so I’ll skip that and just give you the result:

θ = 1.22λ/L

two point sourcesRayleigh criterion

Note that, in this equation, θ stands for the angular resolution, λ for the wavelength of the light being used, and L is the diameter of the (aperture of) the lens. In the first of the three images above, the two points are well separated and, hence, the angle between them is well above the angular resolution. In the second, the angle between just meets the Rayleigh criterion, and in the third the angle between them is smaller than the angular resolution and, hence, the two points are not resolved.

Of course, the Rayleigh criterion is, to some extent, a matter of judgment. In fact, an English 19th century astronomer, named William Rutter Dawes, actually tested human observers on close binary stars of equal brightness, and found they could make out the two stars within an angle that was slightly narrower than the one given by the Rayleigh criterion. Hence, for an optical telescope, you’ll also find the simple θ = λ/L formula, so that’s the formula without the 1.22 factor (of course, λ here is, once again, the wavelength of the observed light or radiation, and L is the diameter of the telescope’s primary lens). This very simple formula allows us, for example, to calculate the diameter of the telescope lens we’d need to build to separate (see) objects in space with a resolution of, for example, 1 arcsec (i.e. 1/3600 of a degree or π/648,000 of a radian). Indeed, if we filter for yellow light only, which has a wavelength of 580 nm, we find L = 580×10−9 m/(π/648,000) = 0.119633×10−6 m ≈ 12 cm. [Just so you know: that’s about the size of the lens aperture of a good telescope (4 or 6 inches) for amateur astronomers–just in case you’d want one. :-)]

This simplified formula is called Dawes’ limit, and you’ll often see it used instead of Rayleigh’s criterion. However, the fact that it’s exactly the same formula as our formula for the first minimum of the Airy pattern should not confuse you: angular resolution is something different.

Now, after this introduction, let me get to the real topic of this post: Heisenberg’s Uncertainty Principle according to Heisenberg.

Heisenberg’s Uncertainty Principle according to Heisenberg

I don’t know about you but, as a kid, I didn’t know much about waves and fields and all that, and so I had difficulty understanding why the resolving power of a microscope or any other magnifying device depended on the frequency or wavelength. I now know my understanding was limited because I thought the concept of the amplitude of an electromagnetic wave had some spatial meaning, like the amplitude of a water or a sound wave. You know what I mean: this false idea that an electromagnetic wave is something that sort of wriggles through space, just like a water or sound wave wriggle through their medium (water and air respectively). Now I know better: the amplitude of an electromagnetic wave measures field strength and there’s no medium (no aether). So it’s not like a wave going around some object, or making some medium oscillate. I am not ashamed to acknowledge my stupidity at the time: I am just happy I finally got it, because it helps to really understand Heisenberg’s own illustration of his Uncertainty Principle, which I’ll present now.

Heisenberg imagined a gamma-ray microscope, as shown below (I copied this from the website of the American Institute for Physics ). Gamma-ray microscopes don’t exist – they’re hard to produce: you need a nuclear reactor or so 🙂 – but, as Heisenberg saw the development of new microscopes using higher and higher energy beams (as opposed to the 1.5-3 eV light in the visible spectrum) so as to increase the angular resolution and, hence, be able to see smaller things, he imagined one could use, perhaps, gamma-rays for imaging. Gamma rays are the hardest radiation, with frequencies of 10 exaherz and more (or >1019 Hz) and, hence, energies above 100 keV (i.e. 100,000 more than photons in the visible light spectrum, and 1000 times more than the electrons used in an average electron microscope). Gamma rays are not the result of some electron jumping from a higher to a lower energy level: they are emitted in decay processes of atomic nuclei (gamma decay). But I am digressing. Back to the main story line. So Heisenberg imagined we could ‘shine’ gamma rays on an electron and that we could then ‘see’ that electron in the microscope because some of the gamma photons would indeed end up in the microscope after their ‘collision’ with the electron, as shown below.

gammaray

The experiment is described in many places elsewhere but I found these accounts often confusing, and so I present my own here. 🙂

What Heisenberg basically meant to show is that this set-up would allow us to gather precise information on the position of the electron–because we would know where it was–but that, as a result, we’d lose information in regard to its momentum. Why? To put it simply: because the electron recoils as a result of the interaction. The point, of course, is to calculate the exact relationship between the two (position and momentum). In other words: what we want to do is to state the Uncertainty Principle quantitatively, not qualitatively.

Now, the animation above uses the symbol L for the γ-ray wavelength λ, which is confusing because I used L for the diameter of the aperture in my explanation of diffraction above. The animation above also uses a different symbol for the angular resolution: A instead of θ. So let me borrow the diagram used in the Wikipedia article and rephrase the whole situation.

Heisenberg_Microscope

From the diagram above, it’s obvious that, to be scattered into the microscope, the γ-ray photon must be scattered into a cone with angle ε. That angle is obviously related to the angular resolution of the microscope, which is θ = ε/2 = λ/D, with D the diameter of the aperture (i.e. the primary lens). Now, the electron could actually be anywhere, and the scattering angle could be much larger than ε, and, hence, relating D to the uncertainty in position (Δx) is not as obvious as most accounts of this thought experiment make it out to be. The thing is: if the scattering angle is larger than ε, it won’t reach the light detector at the end of the microscope (so that’s the flat top in the diagram above). So that’s why we can equate D with Δx, so we write Δx = ± D/2 = D. To put it differently: the assumption here is basically that this imaginary microscope ‘sees’ an area that is approximately as large as the lens. Using the small-angle approximation (so we write sin(2ε) ≈ 2ε), we can write:

Δx = 2λ/ε

Now, because of the recoil effect, the electron receives some momentum from the γ-ray photon. How much? Well… The situation is somewhat complicated (much more complicated than the Wikipedia article on this very same topic suggests), because the photon keeps some but also gives some of its original momentum. In fact, what’s happening really is Compton scattering: the electron first absorbs the photon, and then emits another with a different energy and, hence, also with different frequency and wavelength. However, what we do now is that the photon’s original momentum was equal to E/c= p = h/λ. That’s just the Planck relation or, if you’d want to look at the photon as a particle, the de Broglie equation.

Now, because we’re doing an analysis in one dimension only (x), we’re only going to look at the momentum in this direction only, i.e. px, and we’ll assume that all of the momentum of the photon before the interaction (or ‘collision’ if you want) was horizontal. Hence, we can write p= h/λ. After the collision, however, this momentum is spread over the electron and the scattered or emitted photon that’s going into the microscope. Let’s now imagine the two extremes:

  1. The scattered photon goes to the left edge of the lens. Hence, its horizontal momentum is negative (because it moves to the left) and the momentum pwill be distributed over the electron and the photon such that p= p’–h(ε/2)/λ’. Why the ε/2 factor? Well… That’s just trigonometry: the horizontal momentum of the scattered photon is obviously only a tiny fraction of its original horizontal momentum, and that fraction is given by the angle ε/2.
  2. The scattered photon goes to the right edge of the lens. In that case, we write p= p”+ h(ε/2)/λ”.

Now, the spread in the momentum of the electron, which we’ll simply write as Δp, is obviously equal to:

Δp = p”– p’= p+ h(ε/2)/λ” – p+ h(ε/2)/λ’ = h(ε/2)/λ” + h(ε/2)/λ’ = h(ε/2)/λ” + h(ε/2)/λ’

That’s a nice formula, but what can we do with it? What we want is a relationship between Δx and Δp, i.e. the position and the momentum of the electron, and of the electron only. That involves another simplification, which is also dealt with very summarily – too summarily in my view – in most accounts of this experiment. So let me spell it out. The angle ε is obviously very small and, hence, we may equate λ’ and λ”. In addition, while these two wavelengths differ from the wavelength of the incoming photon, the scattered photon is, obviously, still a gamma ray and, therefore, we are probably not too far off when substituting both λ’ and λ” for λ, i.e. the frequency of the incoming γ-ray. Now, we can re-write Δx = 2λ/ε as 1/Δx = ε/(2λ). We then get:

Δp = p”– p’= hε/2λ” + hε/2λ’ = 2hε/2λ = 2h/Δx

Now that yields ΔpΔx = 2h, which is an approximate expression of Heisenberg’s Uncertainty Principle indeed (don’t worry about the factor 2, as that’s something that comes with all of the approximations).

A final moot point perhaps: it is obviously a thought experiment. Not only because we don’t have gamma-ray microscopes (that’s not relevant because we can effectively imagine constructing one) but because the experiment involves only one photon. A real microscope would organize a proper beam, but that would obviously complicate the analysis. In fact, it would defeat the purpose, because the whole point is to analyze one single interaction here.

The interpretation

Now how should we interpret all of this? Is this Heisenberg’s ‘proof’ of his own Principle? Yes and no, I’d say. It’s part illustration, and part ‘proof’, I would say. The crucial assumptions here are:

  1. We can analyze γ-ray photons, or any photon for that matter, as particles having some momentum, and when ‘colliding’, or interacting, with an electron, the photon will impart some momentum to that electron.
  2. Momentum is being conserved and, hence, the total (linear) momentum before and after the collision, considering both particles–i.e. (1) the incoming ray and the electron before the interaction and (2) the emitted photon and the electron that’s getting the kick after the interaction–must be the same.
  3. For the γ-ray photon, we can relate (or associate, if you prefer that term) its wavelength λ with its momentum p through the Planck relation or, what amounts to the same for photons (because they have no mass), the de Broglie relation.

Now, these assumptions are then applied to an analysis of what we know to be true from experiment, and that’s the phenomenon of diffraction, part of which is the observation that the resolving power of a microscope is limited, and that its resolution is given by the θ = λ/D equation.

Bringing it all together, then gives us a theory which is consistent with experiment and, hence, we then assume the theory is true. Why? Well… I could start a long discourse here on the philosophy of science but, when everything is said and done, we should admit we don’t any ‘better’ theory.

But, you’ll say: what’s a ‘better’ theory? Well… Again, the answer to that question is the subject-matter of philosophers. As for me, I’d just refer to what’s known as Occam’s razor: among competing hypotheses, we should select the one with the fewest assumptions. Hence, while more complicated solutions may ultimately prove correct, the fewer assumptions that are made, the better. Now, when I was a kid, I thought quantum mechanics was very complicated and, hence, describing it here as a ‘simple’ theory sounds strange. But that’s what it is in the end: there’s no better (read: simpler) way to describe, for example, why electrons interfere with each other, and with themselves, when sending them through one or two slits, and so that’s what all these ‘illustrations’ want to show in the end, even if you think there must be simpler way to describe reality. As said, as a kid, I thought so too. 🙂

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Photons as strings

Pre-script written much later: In the meanwhile, we figured it all out. We found the common-sense interpretation of quantum physics. No ambiguity. No hocus-pocus. I keep posts like the one below online only to, one day, go back to where I went wrong. 🙂

Jean Louis Van Belle, 20 May 2020

In my previous post, I explored, somewhat jokingly, the grey area between classical physics and quantum mechanics: light as a wave versus light as a particle. I did so by trying to picture a photon as an electromagnetic transient traveling through space, as illustrated below. While actual physicists would probably deride my attempt to think of a photon as an electromagnetic transient traveling through space, the idea illustrates the wave-particle duality quite well, I feel.

Photon wave

Understanding light is the key to understanding physics. Light is a wave, as Thomas Young proved to the Royal Society of London in 1803, thereby demolishing Newton’s corpuscular theory. But its constituents, photons, behave like particles. According to modern-day physics, both were right. Just to put things in perspective, the thickness of the note card which Young used to split the light – ordinary sunlight entering his room through a pinhole in a window shutter – was 1/30 of an inch, or approximately 0.85 mm. Hence, in essence, this is a double-slit experiment with the two slits being separated by a distance of almost 1 millimeter. That’s enormous as compared to modern-day engineering tolerance standards: what was thin then, is obviously not considered to be thin now. Scale matters. I’ll come back to this.

Young’s experiment (from www.physicsclassroom.com)

Young experiment

The table below shows that the ‘particle character’ of electromagnetic radiation becomes apparent when its frequency is a few hundred terahertz, like the sodium light example I used in my previous post: sodium light, as emitted by sodium lamps, has a frequency of 500×1012 oscillations per second and, therefore (the relation between frequency and wavelength is very straightforward: their product is the velocity of the wave, so for light we have the simple λf = c equation), a wavelength of 600 nanometer (600×10–9 meter).

Electromagnetic spectrum

However, whether something behaves like a particle or a wave also depends on our measurement scale: 0.85 mm was thin in Young’s time, and so it was a delicate experiment then but now, it’s a standard classroom experiment indeed. The theory of light as a wave would hold until more delicate equipment refuted it. Such equipment came with another sense of scale. It’s good to remind oneself that Einstein’s “discovery of the law of the photoelectric effect”, which explained the photoelectric effect as the result of light energy being carried in discrete quantized packets of energy, now referred to as photons, goes back to 1905 only, and that the experimental apparatus which could measure it was not much older. So waves behave like particles if we look at them close enough. Conversely, particles behave like waves if we look at them close enough. So there is this zone where they are neither, the zone for which we invoke the mathematical formalism of quantum mechanics or, to put it more precisely, the formalism of quantum electrodynamics: that “strange theory of light and Matter”, as Feynman calls it.

Let’s have a look at how particles became waves. It should not surprise us that the experimental apparatuses needed to confirm that electrons–or matter in general–can actually behave like a wave is more recent than the 19th century apparatuses which led Einstein to develop his ‘corpuscular’ theory of light (i.e. the theory of light as photons). The engineering tolerances involved are daunting. Let me be precise here. To be sure, the phenomenon of electron diffraction (i.e. electrons going through one slit and producing a diffraction pattern on the other side) had been confirmed experimentally already in 1925, in the famous Davisson-Germer experiment. I am saying because it’s rather famous indeed. First, because electron diffraction was a weird thing to contemplate at the time. Second, because it confirmed the de Broglie hypothesis only two years after Louis de Broglie had advanced it. And, third, because Davisson and Germer had never intended to set it up to detect diffraction: it was pure coincidence. In fact, the observed diffraction pattern was the result of a laboratory accident, and Davisson and Germer weren’t aware of other, conscious, attempts of trying to prove the de Broglie hypothesis. 🙂 […] OK. I am digressing. Sorry. Back to the lesson.

The nanotechnology that was needed to confirm Feynman’s 1965 thought experiment on electron interference (i.e. electrons going through two slits and interfering with each other (rather than producing some diffraction pattern as they go through one slit only) – and, equally significant as an experiment result, with themselves as they go through the slit(s) one by one! – was only developed over the past decades. In fact, it was only in 2008 (and again in 2012) that the experiment was carried out exactly the way Feynman describes it in his Lectures.

It is useful to think of what such experiments entail from a technical point of view. Have a look at the illustration below, which shows the set-up. The insert in the upper-left corner shows the two slits which were used in the 2012 experiment: they are each 62 nanometer wide – that’s 50×10–9 m! – and the distance between them is 272 nanometer, or 0.272 micrometer. [Just to be complete: they are 4 micrometer tall (4×10–6 m), and the thing in the middle of the slits is just a little support (150 nm) to make sure the slit width doesn’t vary.]

The second inset (in the upper-right corner) shows the mask that can be moved to close one or both slits partially or completely. The mask is 4.5µm wide ×20µm tall. Please do take a few seconds to contemplate the technology behind this feat: a nanometer is a millionth of a millimeter, so that’s a billionth of a meter, and a micrometer is a millionth of a meter. To imagine how small a nanometer is, you should imagine dividing one millimeter in ten, and then one of these tenths in ten again, and again, and once again, again, and again. In fact, you actually cannot imagine that because we live in the world we live in and, hence, our mind is used only to addition (and subtraction) when it comes to comparing sizes and – to a much more limited extent – with multiplication (and division): our brain is, quite simply, not wired to deal with exponentials and, hence, it can’t really ‘imagine’ these incredible (negative) powers. So don’t think you can imagine it really, because one can’t: in our mind, these scales exist only as mathematical constructs. They don’t correspond to anything we can actually make a mental picture of.

Electron double-slit set-up

The electron beam consisted of electrons with an (average) energy of 600 eV. That’s not an awful lot: 8.5 times more than the energy of an electron in orbit in a atom, whose energy would be some 70 eV, so the acceleration before they went through the slits was relatively modest. I’ve calculated the corresponding de Broglie wavelength of these electrons in another post (Re-Visiting the Matter-Wave, April 2014), using the de Broglie equations: f = E/h or λ = p/h. And, of course, you could just google the article on the experiment and read about it, but it’s a good exercise, and actually quite simple: just note that you’ll need to express the energy in joule (not in eV) to get it right. Also note that you need to include the rest mass of the electron in the energy. I’ll let you try it (or else just go to that post of mine). You should find a de Broglie wavelength of 50 picometer for these electrons, so that’s 50×10–12 m. While that wavelength is less than a thousandth of the slit width (62 nm), and about 5,500 times smaller than the space between the two slits (272 nm), the interference effect was unambiguous in the experiment. I advice you to google the results yourself (or read that April 2014 post of mine if you want a summary): the experiment was done at the University of Nebraska-Lincoln in 2012.

Electrons and X-rays

To put everything in perspective: 50 picometer is like the wavelength of X-rays, and you can google similar double-slit experiments for X-rays: they also loose their ‘particle behavior’ when we look at them at this tiny scale. In short, scale matters, and the boundary between ‘classical physics’ (electromagnetics) and quantum physics (wave mechanics) is not clear-cut. If anything, it depends on our perspective, i.e. what we can measure, and we seem to be shifting that boundary constantly. In what direction?

Downwards obviously: we’re devising instruments that measure stuff at smaller and smaller scales, and what’s happening is that we can ‘see’ typical ‘particles’, including hard radiation such as gamma rays, as local wave trains. Indeed, the next step is clear-cut evidence for interference between gamma rays.

Energy levels of photons

We would not associate low-frequency electromagnetic waves, such as radio or radar waves, with photons. But light in the visible spectrum, yes. Obviously. […]

Isn’t that an odd dichotomy? If we see that, on a smaller scale, particles start to look like waves, why would the reverse not be true? Why wouldn’t we analyze radio or radar waves, on a much larger scale, as a stream of very (I must say extremely) low-energy photons? I know the idea sounds ridiculous, because the energies involved would be ridiculously low indeed. Think about it. The energy of a photon is given by the Planck relation: E = h= hc/λ. For visible light, with wavelengths ranging from 800 nm (red) to 400 nm (violet or indigo), the photon energies range between 1.5 and 3 eV. Now, the shortest wavelengths for radar waves are in the so-called millimeter band, i.e. they range from 1 mm to 1 cm. A wavelength of 1 mm corresponds to a photon energy of 0.00124 eV. That’s close to nothing, of course, and surely not the kind of energy levels that we can currently detect.

But you get the idea: there is a grey area between classical physics and quantum mechanics, and it’s our equipment–notably the scale of our measurements–that determine where that grey area begins, and where it ends, and it seems to become larger and larger as the sensitivity of our equipment improves.

What do I want to get at? Nothing much. Just some awareness of scale, as an introduction to the actual topic of this post, and that’s some thoughts on a rather primitive string theory of photons. What !? 

Yes. Purely speculative, of course. 🙂

Photons as strings

I think my calculations in the previous post, as primitive as they were, actually provide quite some food for thought. If we’d treat a photon in the sodium light band (i.e. the light emitted by sodium, from a sodium lamp for instance) just like any other electromagnetic pulse, we would find it’s a pulse of some 10 meter long. We also made sense of this incredibly long distance by noting that, if we’d look at it as a particle (which is what we do when analyzing it as a photon), it should have zero size, because it moves at the speed of light and, hence, the relativistic length contraction effect ensures we (or any observer in whatever reference frame really, because light always moves at the speed of light, regardless of the reference frame) will see it as a zero-size particle.

Having said that, and knowing damn well that we have treat the photon as an elementary particle, I would think it’s very tempting to think of it as a vibrating string.

Huh?

Yes. Let me copy that graph again. The assumption I started with is a standard one in physics, and not something that you’d want to argue with: photons are emitted when an electron jumps from a higher to a lower energy level and, for all practical purposes, this emission can be analyzed as the emission of an electromagnetic pulse by an atomic oscillator. I’ll refer you to my previous post – as silly as it is – for details on these basics: the atomic oscillator has a Q, and so there’s damping involved and, hence, the assumption that the electromagnetic pulse resembles a transient should not sound ridiculous. Because the electric field as a function in space is the ‘reversed’ image of the oscillation in time, the suggested shape has nothing blasphemous.

Photon wave

Just go along with it for a while. First, we need to remind ourselves that what’s vibrating here is nothing physical: it’s an oscillating electromagnetic field. That being said, in my previous post, I toyed with the idea that the oscillation could actually also represent the photon’s wave function, provided we use a unit for the electric field that ensures that the area under the squared curve adds up to one, so as to normalize the probability amplitudes. Hence, I suggested that the field strength over the length of this string could actually represent the probability amplitudes, provided we choose an appropriate unit to measure the electric field.

But then I was joking, right? Well… No. Why not consider it? An electromagnetic oscillation packs energy, and the energy is proportional to the square of the amplitude of the oscillation. Now, the probability of detecting a particle is related to its energy, and such probability is calculated from taking the (absolute) square of probability amplitudes. Hence, mathematically, this makes perfect sense.

It’s quite interesting to think through the consequences, and I hope I will (a) understand enough of physics and (b) find enough time for this—one day! One interesting thing is that the field strength (i.e. the magnitude of the electric field vector) is a real number. Hence, if we equate these magnitudes with probability amplitudes, we’d have real probability amplitudes, instead of complex-valued ones. That’s not a very fundamental issue. It probably indicates we should also take into account the fact that the E vector also oscillates in the other direction that’s normal to the direction of propagation, i.e. the y-coordinate (assuming that the z-axis is the direction of propagation). To put it differently, we should take the polarization of the light into account. The figure below–which I took from Wikipedia again (by far the most convenient place to shop for images and animations: what would I do without it?– shows how the electric field vector moves in the xy-plane indeed, as the wave travels along the z-axis. So… Well… I still have to figure it all out, but the idea surely makes sense.

Circular.Polarization.Circularly.Polarized.Light_Right.Handed.Animation.305x190.255Colors

Another interesting thing to think about is how the collapse of the wave function would come about. If we think of a photon as a string, it must have some ‘hooks’ which could cause it to ‘stick’ or ‘collapse’ into a ‘lump’ as it hits a detector. What kind of hook? What force would come into play?

Well… The interaction between the photon and the photodetector is electromagnetic, but we’re looking for some other kind of ‘hook’ here. What could it be? I have no idea. Having said that, we know that the weakest of all fundamental forces—gravity—becomes much stronger—very much stronger—as the distance becomes smaller and smaller. In fact, it is said that, if we go to the Planck scale, the strength of the force of gravity becomes quite comparable with the other forces. So… Perhaps it’s, quite simply, the equivalent mass of the energy involved that gets ‘hooked’, somehow, as it starts interacting with the photon detector. Hence, when thinking about a photon as an oscillating string of energy, we should also think of that string as having some inseparable (equivalent) mass that, once it’s ‘hooked’, has no other option that to ‘collapse into itself’. [You may note there’s no quantum theory for gravity as yet. I am not sure how, but I’ve got a gut instinct that tells me that may help to explain why a photon consists of one single ‘unbreakable’ lump, although I need to elaborate this argument obviously.]

You must be laughing aloud now. A new string theory–really?

I know… I know… I haven’t reach sophomore level and I am already wildly speculating… Well… Yes. What I am talking about here has probably nothing to do with current string theories, although my proposed string would also replace the point-like photon by a one-dimensional ‘string’. However, ‘my’ string is, quite simply, an electromagnetic pulse (a transient actually, for reasons I explained in my previous post). Naive? Perhaps. However, I note that the earliest version of string theory is referred to as bosonic string theory, because it only incorporated bosons, which is what photons are.

So what? Well… Nothing… I am sure others have thought of this too, and I’ll look into it. It’s surely an idea which I’ll keep in the back of my head as I continue to explore physics. The idea is just too simple and beautiful to disregard, even if I am sure it must be pretty naive indeed. Photons as ten-meter long strings? Let’s just forget about it. 🙂 Onwards !!! 🙂

Post Scriptum: The key to ‘closing’ this discussion is, obviously, to be found in a full-blown analysis of the relativity of fields. So, yes, I have not done all of the required ‘homework’ on this and the previous post. I apologize for that. If anything, I hope it helped you to also try to think somewhat beyond the obvious. I realize I wasted a lot of time trying to understand the pre-cooked ready-made stuff that’s ‘on the market’, so to say. I still am, actually. Perhaps I should first thoroughly digest Feynman’s Lectures. In fact, I think that’s what I’ll try to do in the next year or so. Sorry for any inconvenience caused. 🙂

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

The shape and size of a photon

Important post script (PS) – dated 22 December 2018: Dear readers of this post, this is one of the more popular posts of my blog but − in the meanwhile − I did move on, and quite a bit, actually! The analysis below is not entirely consistent: I got many questions on it, and I have been thinking differently as a result. The Q&A below sums up everything: I do think of the photon as a pointlike particle now, and Chapter VIII of my book sums up the photon model. At the same time, if you are really interested in this question – how should one think of a photon? – then it’s probably good you also read the original post. If anything, it shows you how easy it is to get confused.

Hi Brian – see section III of this paper: http://vixra.org/pdf/1812.0273v2.pdf

Feynman’s classical idea of an atomic oscillator is fine in the context of the blackbody radiation problem, but his description of the photon as a long wavetrain does not make any sense. A photon has to pack two things: (1) the energy difference between the Bohr orbitals and (2) Planck’s constant h, which is the (physical) action associated with one cycle of an oscillation (so it’s a force over a distance (the loop or the radius – depending on the force you’re looking at) over a cycle time). See section V of the paper for how the fine-structure constant pops up here – it’s, as usual, a sort of scaling constant, but this time it scales a force. In any case, the idea is that we should think of a photon as one cycle – rather than a long wavetrain. The one cycle makes sense: when you calculate field strength and force you get quite moderate values (not the kind of black-hole energy concentrations some people suggest). It also makes sense from a logical point of view: the wavelength is something real, and so we should think of the photon amplitude (the electric field strength) as being real as well – especially when you think of how that photon is going to interact or be absorbed into another atom.

Sorry for my late reply. It’s been a while since I checked the comments. Please let me know if this makes sense. I’ll have a look at your blog in the coming days. I am working on a new paper on the anomalous magnetic moment – which is not anomalous as all if you start to think about how things might be working in reality. After many years of study, I’ve come to the conclusion that quantum mechanics is a nice way of describing things, but it doesn’t help us in terms of understanding anything. When we want to understand something, we need to push the classical framework a lot further than we currently do. In any case, that’s another discussion. :-/

JL

 

OK. Now you can move on to the post itself. 🙂 Sorry if this is confusing the reader, but it is necessary to warn him. I think of this post now as still being here to document the history of my search for a ‘basic version of truth’, as someone called it. [For an even more recent update, see Chapter 8 of my book, A Realist Interpretation of Quantum Mechanics.

Original post:

Photons are weird. All elementary particles are weird. As Feynman puts it, in the very first paragraph of his Lectures on Quantum Mechanics : “Historically, the electron, for example, was thought to behave like a particle, and then it was found that in many respects it behaved like a wave. So it really behaves like neither. Now we have given up. We say: “It is like neither. There is one lucky break, however—electrons behave just like light. The quantum behavior of atomic objects (electrons, protons, neutrons, photons, and so on) is the same for all, they are all “particle waves,” or whatever you want to call them. So what we learn about the properties of electrons will apply also to all “particles,” including photons of light.” (Feynman’s Lectures, Vol. III, Chapter 1, Section 1)

I wouldn’t dare to argue with Feynman, of course, but… What? Well… Photons are like electrons, and then they are not. Obviously not, I’d say. For starters, photons do not have mass or charge, and they are also bosons, i.e. ‘force-carriers’ (as opposed to matter-particles), and so they obey very different quantum-mechanical rules, which are referred to as Bose-Einstein statistics. I’ve written about that in other post (see, for example, my post on Bose-Einstein and Fermi-Dirac statistics), so I won’t do that again here. It’s probably sufficient to remind the reader that these rules imply that the so-called Pauli exclusion principle does not apply to them: bosons like to crowd together, thereby occupying the same quantum state—unlike their counterparts, the so-called fermions or matter-particles: quarks (which make up protons and neutrons) and leptons (including electrons and neutrinos), which can’t do that. Two electrons, for example, can only sit on top of each other (or be very near to each other, I should say) if their spins are opposite (so that makes their quantum state different), and there’s no place whatsoever to add a third one because there are only two possible ‘directions’ for the spin: up or down.

From all that I’ve been writing so far, I am sure you have some kind of picture of matter-particles now, and notably of the electron: it’s not really point-like, because it has a so-called scattering cross-section (I’ll say more about this later), and we can find it somewhere taking into account the Uncertainty Principle, with the probability of finding it at point x at time t given by the absolute square of a so-called ‘wave function’ Ψ(x, t).

But what about the photon? Unlike quarks or electrons, they are really point-like, aren’t they? And can we associate them with a psi function too? I mean, they have a wavelength, obviously, which is given by the Planck-Einstein energy-frequency relation: E = hν, with h the Planck constant and ν the frequency of the associated ‘light’. But an electromagnetic wave is not like a ‘probability wave’. So… Do they have a de Broglie wavelength as well?

Before answering that question, let me present that ‘picture’ of the electron once again.

The wave function for electrons

The electron ‘picture’ can be represented in a number of ways but one of the more scientifically correct ones – whatever that means – is that of a spatially confined wave function representing a complex quantity referred to as the probability amplitude. The animation below (which I took from Wikipedia) visualizes such wave functions. As mentioned above, the wave function is usually represented by the Greek letter psi (Ψ), and it is often referred to as a ‘probability wave’ – by bloggers like me, that is 🙂 – but that term is quite misleading. Why? You surely know that by now: the wave function represents a probability amplitude, not a probability. [So, to be correct, we should say a ‘probability amplitude wave’, or an ‘amplitude wave’, but so these terms are obviously too long and so they’ve been dropped and everybody talks about ‘the’ wave function now, although that’s confusing too, because an electromagnetic wave is a ‘wave function’ too, but describing ‘real’ amplitudes, not some weird complex numbers referred to as ‘probability amplitudes’.]

StationaryStatesAnimation

Having said what I’ve said above, probability amplitude and probability are obviously related: if we take the (absolute) square of the psi function – i.e. if we take the (absolute) square of all these amplitudes Ψ(x, t) – then we get the actual probability of finding that electron at point x at time t. So then we get the so-called probability density functions, which are shown on the right-hand side of the illustration above. [As for the term ‘absolute’ square, the absolute square is the squared norm of the associated ‘vector’. Indeed, you should note that the square of a complex number can be negative as evidenced, for example, by the definition of i: i= –1. In fact, if there’s only an imaginary part, then its square is always negative. Probabilities are real numbers between 0 and 1, and so they can’t be negative, and so that’s why we always talk about the absolute square, rather than the square as such.]

Below, I’ve inserted another image, which gives a static picture (i.e. one that is not varying in time) of the wave function of a real-life electron. To be precise: it’s the wave function for an electron on the 5d orbital of a hydrogen orbital. You can see it’s much more complicated than those easy things above. However, the idea behind is the same. We have a complex-valued function varying in space and in time. I took it from Wikipedia and so I’ll just copy the explanation here: “The solid body shows the places where the electron’s probability density is above a certain value (0.02), as calculated from the probability amplitude.” What about these colors? Well… The image uses the so-called HSL color system to represent complex numbers: each complex number is represented by a unique color, with a different hue (H), saturation (S) and lightness (L). Just google if you want to know how that works exactly.

Hydrogen_eigenstate_n5_l2_m1

OK. That should be clear enough. I wanted to talk about photons here. So let’s go for it. Well… Hmm… I realize I need to talk about some more ‘basics’ first. Sorry for that.

The Uncertainty Principle revisited (1)

The wave function is usually given as a function in space and time: Ψ = Ψ(x, t). However, I should also remind you that we have a similar function in the ‘momentum space’: if ψ is a psi function, then the function in the momentum space is a phi function, and we’ll write it as Φ = Φ(p, t). [As for the notation, x and p are written with capital letters and, hence, represent (three-dimensional) vectors. Likewise, we use a capital letter for psi and phi so we don’t confuse it with, for example, the lower-case φ (phi) representing the phase of a wave function.]

The position-space and momentum-space wave functions Ψ and Φ are related through the Uncertainty Principle. To be precise: they are Fourier transforms of each other. Huh? Don’t be put off by that statement. In fact, I shouldn’t have mentioned it, but then it’s how one can actually prove or derive the Uncertainty Principle from… Well… From ‘first principles’, let’s say, instead of just jotting it down as some God-given rule. Indeed, as Feynman puts: “The Uncertainty Principle should be seen in its historical context. If you get rid of all of the old-fashioned ideas and instead use the ideas that I’m explaining in these lectures—adding arrows for all the ways an event can happen—there is no need for an uncertainty principle!” However, I must assume you’re, just like me, not quite used to the new ideas as yet, and so let me just jot down the Uncertainty Principle once again, as some God-given rule indeed :-):

σx·σħ/2

This is the so-called Kennard formulation of the Principle: it measures the uncertainty about the exact position (x) as well as the momentum (p), in terms of the standard deviation (so that’s the σ (sigma) symbol) around the mean. To be precise, the assumption is that we cannot know the real x and p: we can only find some probability distribution for x and p, which is usually some nice “bell curve” in the textbooks. While the Kennard formulation is the most precise (and exact) formulation of the Uncertainty Principle (or uncertainty relation, I should say), you’ll often find ‘other’ formulations. These ‘other’ formulates usually write Δx and Δp instead of σand σp, with the Δ symbol indicating some ‘spread’ or a similar concept—surely do not think of Δ as a differential or so! [Sorry for assuming you don’t know this (I know you do!) but I just want to make sure here!] Also, these ‘other’ formulations will usually (a) not mention the 1/2 factor, (b) substitute ħ for h (ħ = h/2π, as you know, so ħ is preferred when we’re talking things like angular frequency or other stuff involving the unit circle), or (c) put an equality (=) sign in, instead of an inequality sign (≥). Niels Bohr’s early formulation of the Uncertainty Principle actually does all of that:

ΔxΔp h

So… Well… That’s a bit sloppy, isn’t it? Maybe. In Feynman’s Lectures, you’ll find an oft-quoted ‘application’ of the Uncertainty Principle leading to a pretty accurate calculation of the typical size of an atom (the so-called Bohr radius), which Feynman starts with an equally sloppy statement of the Uncertainty Principle, so he notes: “We needn’t trust our answer to within factors like 2, π etcetera.” Frankly, I used to think that’s ugly and, hence, doubt the ‘seriousness’ of such kind of calculations. Now I know it doesn’t really matter indeed, as the essence of the relationship is clearly not a 2, π or 2π factor. The essence is the uncertainty itself: it’s very tiny (and multiplying it with 2, π or 2π doesn’t make it much bigger) but so it’s there.

In this regard, I need to remind you of how tiny that physical constant ħ actually is: about 6.58×10−16 eV·s. So that’s a zero followed by a decimal point and fifteen zeroes: only then we get the first significant digits (65812…). And if 10−16 doesn’t look tiny enough for you, then just think about how tiny the electronvolt unit is: it’s the amount of (potential) energy gained (or lost) by an electron as it moves across a potential difference of one volt (which, believe me, is nothing much really): if we’d express ħ in Joule, then we’d have to add nineteen more zeroes, because 1 eV = 1.6×10−19 J. As for such phenomenally small numbers, I’ll just repeat what I’ve said many times before: we just cannot imagine such small number. Indeed, our mind can sort of intuitively deal with addition (and, hence, subtraction), and with multiplication and division (but to some extent only), but our mind is not made to understand non-linear stuff, such as exponentials indeed. If you don’t believe me, think of the Richter scale: can you explain the difference between a 4.0 and a 5.0 earthquake? […] If the answer to that question took you more than a second… Well… I am right. 🙂 [The Richter scale is based on the base-10 exponential function: a 5.0 earthquake has a shaking amplitude that is 10 times that of an earthquake that registered 4.0, and because energy is proportional to the square of the amplitude, that corresponds to an energy release that is 31.6 times that of the lesser earthquake.]

A digression on units

Having said what I said above, I am well aware of the fact that saying that we cannot imagine this or that is what most people say. I am also aware of the fact that they usually say that to avoid having to explain something. So let me try to do something more worthwhile here.

1. First, I should note that ħ is so small because the second, as a unit of time, is so incredibly large. All is relative, of course. 🙂 For sure, we should express time in a more natural unit at the atomic or sub-atomic scale, like the time that’s needed for light to travel one meter. Let’s do it. Let’s express time in a unit that I shall call a ‘meter‘. Of course, it’s not an actual meter (because it doesn’t measure any distance), but so I don’t want to invent a new word and surely not any new symbol here. Hence, I’ll just put apostrophes before and after: so I’ll write ‘meter’ or ‘m’. When adopting the ‘meter’ as a unit of time, we get a value for ‘ħ‘ that is equal to (6.6×10−16 eV·s)(1/3×108 ‘meter’/second) = 2.2×10−8 eV·’m’. Now, 2.2×10−8 is a number that is still too tiny to imagine. But then our ‘meter’ is still a rather huge unit at the atomic scale: we should take the ‘millimicron’, aka the ‘nanometer’ (1 nm = 1×10−9 m), or – even better because more appropriate – the ‘angstrom‘: 1 Å = 0.1 nm = 1×10−10 m. Indeed, the smallest atom (hydrogen) has a radius of 0.25 Å, while larger atoms will have a radius of about 1 or more Å. Now that should work, isn’t it? You’re right, we get a value for ‘ħ‘ equal to (6.6×10−16 eV·s)(1/3×108 ‘m’/s)(1×1010 ‘Å’/m) = 220 eV·’Å’, or 22 220 eV·’nm’. So… What? Well… If anything, it shows ħ is not a small unit at the atomic or sub-atomic level! Hence, we actually can start imagining how things work at the atomic level when using more adequate units.

[Now, just to test your knowledge, let me ask you: what’s the wavelength of visible light in angstrom? […] Well? […] Let me tell you: 400 to 700 nm is 4000 to 7000 Å. In other words, the wavelength of visible light is quite sizable as compared to the size of atoms or electron orbits!]

2. Secondly, let’s do a quick dimension analysis of that ΔxΔp h relation and/or its more accurate expression σx·σħ/2.

A position (and its uncertainty or standard deviation) is expressed in distance units, while momentum… Euh… Well… What? […] Momentum is mass times velocity, so it’s kg·m/s. Hence, the dimension of the product on the left-hand side of the inequality is m·kg·m/s = kg·m2/s. So what about this eV·s dimension on the right-hand side? Well… The electronvolt is a unit of energy, and so we can convert it to joules. Now, a joule is a newton-meter (N·m), which is the unit for both energy and work: it’s the work done when applying a force of one newton over a distance of one meter. So we now have N·m·s for ħ, which is nice, because Planck’s constant (h or ħ—whatever: the choice for one of the two depends on the variables we’re looking at) is the quantum for action indeed. It’s a Wirkung as they say in German, so its dimension combines both energy as well as time.

To put it simply, it’s a bit like power, which is what we men are interested in when looking at a car or motorbike engine. 🙂 Power is the energy spent or delivered per second, so its dimension is J/s, not J·s. However, your mind can see the similarity in thinking here. Energy is a nice concept, be it potential (think of a water bucket above your head) or kinetic (think of a punch in a bar fight), but it makes more  sense to us when adding the dimension of time (emptying a bucket of water over your head is different than walking in the rain, and the impact of a punch depends on the power with which it is being delivered). In fact, the best way to understand the dimension of Planck’s constant is probably to also write the joule in ‘base units’. Again, one joule is the amount of energy we need to move an object over a distance of one meter against a force of one newton. So one J·s is one N·m·s is (1) a force of one newton acting over a distance of (2) one meter over a time period equal to (3) one second.

I hope that gives you a better idea of what ‘action’ really is in physics. […] In any case, we haven’t answered the question. How do we relate the two sides? Simple: a newton is an oft-used SI unit, but it’s not a SI base unit, and so we should deconstruct it even more (i.e. write it in SI base units). If we do that, we get 1 N = 1 kg·m/s2: one newton is the force needed to give a mass of 1 kg an acceleration of 1 m/s per second. So just substitute and you’ll see the dimension on the right-hand side is kg·(m/s2)·m·s = kg·m2/s, so it comes out alright.

Why this digression on units? Not sure. Perhaps just to remind you also that the Uncertainty Principle can also be expressed in terms of energy and time:

ΔE·Δt = h

Here there’s no confusion  in regard to the units on both sides: we don’t need to convert to SI base units to see that they’re the same: [ΔE][Δt] = J·s.

The Uncertainty Principle revisited (2)

The ΔE·Δt = h expression is not so often used as an expression of the Uncertainty Principle. I am not sure why, and I don’t think it’s a good thing. Energy and time are also complementary variables in quantum mechanics, so it’s just like position and momentum indeed. In fact, I like the energy-time expression somewhat more than the position-momentum expression because it does not create any confusion in regard to the units on both sides: it’s just joules (or electronvolts) and seconds on both sides of the equation. So what?

Frankly, I don’t want to digress too much here (this post is going to become awfully long) but, personally, I found it hard, for quite a while, to relate the two expressions of the very same uncertainty ‘principle’ and, hence, let me show you how the two express the same thing really, especially because you may or may not know that there are even more pairs of complementary variables in quantum mechanics. So, I don’t know if the following will help you a lot, but it helped me to note that:

  1. The energy and momentum of a particle are intimately related through the (relativistic) energy-momentum relationship. Now, that formula, E2 = p2c2 – m02c4, which links energy, momentum and intrinsic mass (aka rest mass), looks quite monstrous at first. Hence, you may prefer a simpler form: pc = Ev/c. It’s the same really as both are based on the relativistic mass-energy equivalence: E = mc2 or, the way I prefer to write it: m = E/c2. [Both expressions are the same, obviously, but we can ‘read’ them differently: m = E/c2 expresses the idea that energy has a equivalent mass, defined as inertia, and so it makes energy the primordial concept, rather than mass.] Of course, you should note that m is the total mass of the object here, including both (a) its rest mass as well as (b) the equivalent mass it gets from moving at the speed v. So m, not m0, is the concept of mass used to define p, and note how easy it is to demonstrate the equivalence of both formulas: pc = Ev/c ⇔ mvc = Ev/c ⇔ E = mc2. In any case, the bottom line is: don’t think of the energy and momentum of a particle as two separate things; they are two aspects of the same ‘reality’, involving mass (a measure of inertia, as you know) and velocity (as measured in a particular (so-called inertial) reference frame).
  2. Time and space are intimately related through the universal constant c, i.e. the speed of light, as evidenced by the fact that we will often want to express distance not in meter but in light-seconds (i.e. the distance that light travels (in a vacuum) in one second) or, vice versa, express time in meter (i.e. the time that light needs to travel a distance of one meter).

These relationships are interconnected, and the following diagram shows how.

Uncertainty relations

The easiest way to remember it all is to apply the Uncertainty Principle, in both its ΔE·Δt = h as well as its Δp·Δx = h  expressions, to a photon. A photon has no rest mass and its velocity v is, obviously, c. So the energy-momentum relationship is a very simple one: p = E/c. We then get both expressions of the Uncertainty Principle by simply substituting E for p, or vice versa, and remember that time and position (or distance) are related in exactly the same way: the constant of proportionality is the very same. It’s c. So we can write: Δx = Δt·c and Δt = Δx/c. If you’re confused, think about it in very practical terms: because the speed of light is what it is, an uncertainty of a second in time amounts, roughly, to an uncertainty in position of some 300,000 km (c = 3×10m/s). Conversely, an uncertainty of some 300,000 km in the position amounts to a uncertainty in time of one second. That’s what the 1-2-3 in the diagram above is all about: please check if you ‘get’ it, because that’s ‘essential’ indeed.

Back to ‘probability waves’

Matter-particles are not the same, but we do have the same relations, including that ‘energy-momentum duality’. The formulas are just somewhat more complicated because they involve mass and velocity (i.e. a velocity less than that of light). For matter-particles, we can see that energy-momentum duality not only in the relationships expressed above (notably the relativistic energy-momentum relation), but also in the (in)famous de Broglie relation, which associates some ‘frequency’ (f) to the energy (E) of a particle or, what amounts to the same, some ‘wavelength’ (λ) to its momentum (p):

λ = h/p and f = E/h

These two complementary equations give a ‘wavelength’ (λ) and/or a ‘frequency’ (f) of a de Broglie wave, or a ‘matter wave’ as it’s sometimes referred to. I am using, once again, apostrophes because the de Broglie wavelength and frequency are a different concept—different than the wavelength or frequency of light, or of any other ‘real’ wave (like water or sound waves, for example). To illustrate the differences, let’s start with a very simple question: what’s the velocity of a de Broglie wave? Well… […] So? You thought you knew, didn’t you?

Let me answer the question:

  1. The mathematically (and physically) correct answer involves distinguishing the group and phase velocity of a wave.
  2. The ‘easy’ answer is: the de Broglie wave of a particle moves with the particle and, hence, its velocity is, obviously, the speed of the particle which, for electrons, is usually non-relativistic (i.e. rather slow as compared to the speed of light).

To be clear on this, the velocity of a de Broglie wave is not the speed of light. So a de Broglie wave is not like an electromagnetic wave at all. They have nothing in common really, except for the fact that we refer to both of them as ‘waves’. 🙂

The second thing to note is that, when we’re talking about the ‘frequency’ or ‘wavelength’ of ‘matter waves’ (i.e. de Broglie waves), we’re talking the frequency and wavelength of a wave with two components: it’s a complex-valued wave function, indeed, and so we get a real and imaginary part when we’re ‘feeding’ the function with some values for x and t.

Thirdly and, perhaps, most importantly, we should always remember the Uncertainty Principle when looking at the de Broglie relation. The Uncertainty Principle implies that we can actually not assign any precise wavelength (or, what amounts to the same, a precise frequency) to a de Broglie wave: if there is a spread in p (and, hence, in E), then there will be a spread in λ (and in f). In fact, I tend to think that it would be better to write the de Broglie relation as an ‘uncertainty relation’ in its own right:

Δλ = Δ(h/p) = hΔp and Δf = ΔE/h = hΔE

Besides from underscoring the fact that we have other ‘pairs’ of complementary variables, this ‘version’ of the de Broglie equation would also remind us continually of the fact that a ‘regular’ wave with an exact frequency and/or an exact wavelength (so a Δλ and/or a Δf equal to zero) would not give us any information about the momentum and/or the energy. Indeed, as Δλ and/or Δf go to zero (Δλ → 0 and/or Δf → 0 ), then Δp and ΔE must go to infinity (Δp → ∞ and ΔE → ∞. That’s just the math involved in such expressions. 🙂

Jokes aside, I’ll admit I used to have a lot of trouble understanding this, so I’ll just quote the expert teacher (Feynman) on this to make sure you don’t get me wrong here:

“The amplitude to find a particle at a place can, in some circumstances, vary in space and time, let us say in one dimension, in this manner: Ψ Aei(ωtkx, where ω is the frequency, which is related to the classical idea of the energy through ħω, and k is the wave number, which is related to the momentum through ħk. [These are equivalent formulations of the de Broglie relations using the angular frequency and the wave number instead of wavelength and frequency.] We would say the particle had a definite momentum p if the wave number were exactly k, that is, a perfect wave which goes on with the same amplitude everywhere. The Ψ Aei(ωtkxequation [then] gives the [complex-valued probability] amplitude, and if we take the absolute square, we get the relative probability for finding the particle as a function of position and time. This is a constant, which means that the probability to find a [this] particle is the same anywhere.” (Feynman’s Lectures, I-48-5)

You may say or think: What’s the problem here really? Well… If the probability to find a particle is the same anywhere, then the particle can be anywhere and, for all practical purposes, that amounts to saying it’s nowhere really. Hence, that wave function doesn’t serve the purpose. In short, that nice Ψ Aei(ωtkxfunction is completely useless in terms of representing an electron, or any other actual particle moving through space. So what to do?

The Wikipedia article on the Uncertainty Principle has this wonderful animation that shows how we can superimpose several waves, one on top of each other, to form a wave packet. Let me copy it below:

Sequential_superposition_of_plane_waves

So that’s what the wave we want indeed: a wave packet that travels through space but which is, at the same time, limited in space. Of course, you should note, once again, that it shows only one part of the complex-valued probability amplitude: just visualize the other part (imaginary if the wave above would happen to represent the real part, and vice versa if the wave would happen to represent the imaginary part of the probability amplitude). The animation basically illustrates a mathematical operation. To be precise, it involves a Fourier analysis or decomposition: it separates a wave packet into a finite or (potentially) infinite number of component waves. Indeed, note how, in the illustration above, the frequency of the component waves gradually increases (or, what amounts to the same, how the wavelength gets smaller and smaller) and how, with every wave we ‘add’ to the packet, it becomes increasingly localized. Now, you can easily see that the ‘uncertainty’ or ‘spread’ in the wavelength here (which we’ll denote by Δλ) is, quite simply, the difference between the wavelength of the ‘one-cycle wave’, which is equal to the space the whole wave packet occupies (which we’ll denote by Δx), and the wavelength of the ‘highest-frequency wave’. For all practical purposes, they are about the same, so we can write: Δx ≈ Δλ. Using Bohr’s formulation of the Uncertainty Principle, we can see the expression I used above (Δλ = hΔp) makes sense: Δx = Δλ = h/Δp, so ΔλΔp = h.

[Just to be 100% clear on terminology: a Fourier decomposition is not the same as that Fourier transform I mentioned when talking about the relation between position and momentum in the Kennard formulation of the Uncertainty Principle, although these two mathematical concepts obviously have a few things in common.]

The wave train revisited

All what I’ve said above, is the ‘correct’ interpretation of the Uncertainty Principle and the de Broglie equation. To be frank, it took me quite a while to ‘get’ that—and, as you can see, it also took me quite a while to get ‘here’, of course. 🙂

In fact, I was confused, for quite a few years actually, because I never quite understood whey there had to be a spread in the wavelength of a wave train. Indeed, we can all easily imagine a localized wave train with a fixed frequency and a fixed wavelength, like the one below, which I’ll re-use later. I’ve made this wave train myself: it’s a standard sine and cosine function multiplied with an ‘envelope’ function generating the envelope. As you can see, it’s a complex-valued thing indeed: the blue curve is the real part, and the imaginary part is the red curve.

Photon wave

You can easily make a graph like this yourself. [Just use of one of those online graph tools.] This thing is localized in space and, as mentioned above, it has a fixed frequency and wavelength. So all those enigmatic statements you’ll find in serious or less serious books (i.e. textbooks or popular accounts) on quantum mechanics saying that “we cannot define a unique wavelength for a short wave train” and/or saying that “there is an indefiniteness in the wave number that is related to the finite length of the train, and thus there is an indefiniteness in the momentum” (I am quoting Feynman here, so not one of the lesser gods) are – with all due respect for these authors, especially Feynman – just wrong. I’ve made another ‘short wave train’ below, but this time it depicts the real part of a (possible) wave function only.

graph (1)

Hmm… Now that one has a weird shape, you’ll say. It doesn’t look like a ‘matter wave’! Well… You’re right. Perhaps. [I’ll challenge you in a moment.] The shape of the function above is consistent, though, with the view of a photon as a transient electromagnetic oscillation. Let me come straight to the point by stating the basics: the view of a photon in physics is that photons are emitted by atomic oscillators. As an electron jumps from one energy level to the other, it seems to oscillate back and forth until it’s in equilibrium again, thereby emitting an electromagnetic wave train that looks like a transient.

Huh? What’s a transient? It’s an oscillation like the one above: its amplitude and, hence, its energy, gets smaller and smaller as time goes by. To be precise, its energy level has the same shape as the envelope curve below: E = E0e–t/τ. In this expression, we have τ as the so-called decay time, and one can show it’s the inverse of the so-called decay rate: τ = 1/γ with γE = –dE/dt. In case you wonder, check it out on Wikipedia: it’s one of the many applications of the natural exponential function: we’re talking a so-called exponential decay here indeed, involves a quantity (in this case, the amplitude and/or the energy) decreasing at a rate that is proportional to its current value, with the coefficient of proportionality being γ. So we write that as γE = –dE/dt in mathematical notation. 🙂

decay time

I need to move on. All of what I wrote above was ‘plain physics’, but so what I really want to explore in this post is a crazy hypothesis. Could these wave trains above – I mean the wave trains with the fixed frequency and wavelength – possible represent a de Broglie wave for a photon?

You’ll say: of course not! But, let’s be honest, you’d have some trouble explaining why. The best answer you could probably come up with is: because no physics textbook says something like that. You’re right. It’s a crazy hypothesis because, when you ask a physicist (believe it or not, but I actually went through the trouble of asking two nuclear scientists), they’ll tell you that photons are not to be associated with de Broglie waves. [You’ll say: why didn’t you try looking for an answer on the Internet? I actually did but – unlike what I am used to – I got very confusing answers on this one, so I gave up trying to find some definite answer on this question on the Internet.]

However, these negative answers don’t discourage me from trying to do some more freewheeling. Before discussing whether or not the idea of a de Broglie wave for a photon makes sense, let’s think about mathematical constraints. I googled a bit but I only see one actually: the amplitudes of a de Broglie wave are subject to a normalization condition. Indeed, when everything is said and done, all probabilities must take a value between 0 and 1, and they must also all add up to exactly 1. So that’s a so-called normalization condition that obviously imposes some constraints on the (complex-valued) probability amplitudes of our wave function.

But let’s get back to the photon. Let me remind you of what happens when a photon is being emitted by inserting the two diagrams below, which gives the energy levels of the atomic orbitals of electrons.

Energy Level Diagrams

So an electron absorbs or emits a photon when it goes from one energy level to the other, so it absorbs or emits radiation. And, of course, you will also remember that the frequency of the absorbed or emitted light is related to those energy levels. More specifically, the frequency of the light emitted in a transition from, let’s say, energy level Eto Ewill be written as ν31 = (E– E1)/h. This frequency will be one of the so-called characteristic frequencies of the atom and will define a specific so-called spectral emission line.

Now, from a mathematical point of view, there’s no difference between that ν31 = (E– E1)/h equation and the de Broglie equation, f = E/h, which assigns a de Broglie wave to a particle. But, of course, from all that I wrote above, it’s obvious that, while these two formulas are the same from a math point of view, they represent very different things. Again, let me repeat what I said above: a de Broglie wave is a matter-wave and, as such, it has nothing to do with an electromagnetic wave. 

Let me be even more explicit. A de Broglie wave is not a ‘real’ wave, in a sense (but, of course, that’s a very unscientific statement to make); it’s a psi function, so it represents these weird mathematical quantities–complex probability amplitudes–which allow us to calculate the probability of finding the particle at position x or, if it’s a wave function for the momentum-space, to find a value p for its momentum. In contrast, a photon that’s emitted or absorbed represents a ‘real’ disturbance of the electromagnetic field propagating through space. Hence, that frequency ν is something very different than f, which is why we use another symbol for it (ν is the Greek letter nu, not to be confused with the v symbol we use for velocity). [Of course, you may wonder how ‘real’ or ‘unreal’ an electromagnetic field is but, in the context of this discussion, let me assure you we should look at it as something that’s very real.]

That being said, we also know light is emitted in discrete energy packets: in fact, that’s how photons were defined originally, first by Planck and then by Einstein. Now, when an electron falls from one energy level in an atom to another (lower) energy level, it emits one – and only one – photon with that particular wavelength and energy. The question then is: how should we picture that photon? Does it also have some more or less defined position in space, and some momentum? The answer is definitely yes, on both accounts:

  1. Subject to the constraints of the Uncertainty Principle, we know, more or less indeed, when a photon leaves a source and when it hits some detector. [And, yes, due to the ‘Uncertainty Principle’ or, as Feynman puts it, the rules for adding arrows, it may not travel in a straight line and/or at the speed of light—but that’s a discussion that, believe it or not, is not directly relevant here. If you want to know more about it, check one or more of my posts on it.]
  2. We also know light has a very definite momentum, which I’ve calculated elsewhere and so I’ll just note the result: p = E/c. It’s a ‘pushing momentum’ referred to as radiation pressure, and its in the direction of travel indeed.

In short, it does makes sense, in my humble opinion that is, to associate some wave function with the photon, and then I mean a de Broglie wave. Just think about it yourself. You’re right to say that a de Broglie wave is a ‘matter wave’, and photons aren’t matter but, having said that, photons do behave like like electrons, don’t they? There’s diffraction (when you send a photon through one slit) and interference (when photons go through two slits, altogether or – amazingly – one by one), so it’s the same weirdness as electrons indeed, and so why wouldn’t we associate some kind of wave function with them?

You can react in one of three ways here. The first reaction is: “Well… I don’t know. You tell me.” Well… That’s what I am trying to do here. 🙂

The second reaction may be somewhat more to the point. For example, those who’ve read Feynman’s Strange Theory of Light and Matter, could say: “Of course, why not? That’s what we do when we associate a photon going from point A to B with an amplitude P(A to B), isn’t it?”

Well… No. I am talking about something else here. Not some amplitude associated with a path in spacetime, but a wave function giving an approximate position of the photon.

The third reaction may be the same as the reaction of those two nuclear scientists I asked: “No. It doesn’t make sense. We do not associate photons with a de Broglie wave.” But so they didn’t tell me why because… Well… They didn’t have the time to entertain a guy like me and so I didn’t dare to push the question and continued to explore it more in detail myself.

So I’ve done that, and I thought of one reason why the question, perhaps, may not make all that much sense: a photon travels at the speed of light; therefore, it has no length. Hence, doing what I am doing below, and that’s to associate the electromagnetic transient with a de Broglie wave might not make sense.

Maybe. I’ll let you judge. Before developing the point, I’ll raise two objections to the ‘objection’ raised above (i.e. the statement that a photon has no length). First, if we’re looking at the photon as some particle, it will obviously have no length. However, an electromagnetic transient is just what it is: an electromagnetic transient. I’ve see nothing that makes me think its length should be zero. In fact, if that would be the case, the concept of an electromagnetic wave itself would not make sense, as its ‘length’ would always be zero. Second, even if – somehow – the length of the electromagnetic transient would be reduced to zero because of its speed, we can still imagine that we’re looking at the emission of an electromagnetic pulse (i.e. a photon) using the reference frame of the photon, so that we’re traveling at speed c,’ riding’ with the photon, so to say, as it’s being emitted. Then we would ‘see’ the electromagnetic transient as it’s being radiated into space, wouldn’t we?

Perhaps. I actually don’t know. That’s why I wrote this post and hope someone will react to it. I really don’t know, so I thought it would be nice to just freewheel a bit on this question. So be warned: nothing of what I write below has been researched really, so critical comments and corrections from actual specialists are more than welcome.

The shape of a photon wave

As mentioned above, the answer in regard to the definition of a photon’s position and momentum is, obviously, unambiguous. Perhaps we have to stretch whatever we understand of Einstein’s (special) relativity theory, but we should be able to draw some conclusions, I feel.

Let me say one thing more about the momentum here. As said, I’ll refer you to one of my posts for the detail but, all you should know here is that the momentum of light is related to the magnetic field vector, which we usually never mention when discussing light because it’s so tiny as compared to the electric field vector in our inertial frame of reference. Indeed, the magnitude of the magnetic field vector is equal to the magnitude of the electric field vector divided by c = 3×108, so we write B = E/c. Now, the E here stands for the electric field, so let me use W to refer to the energy instead of E. Using the B = E/equation and a fairly straightforward calculation of the work that can be done by the associated force on a charge that’s being put into this field, we get that famous equation which we mentioned above already: the momentum of a photon is its total energy divided by c, so we write p = W/c. You’ll say: so what? Well… Nothing. I just wanted to note we get the same p = W/c equation indeed, but from a very different angle of analysis here. We didn’t use the energy-momentum relation here at all! In any case, the point to note is that the momentum of a photon is only a tiny fraction of its energy (p = W/c), and that the associated magnetic field vector is also just a tiny fraction of the electric field vector (B = E/c).

But so it’s there and, in fact, when adopting a moving reference frame, the mix of E and B (i.e. the electric and magnetic field) becomes an entirely different one. One of the ‘gems’ in Feynman’s Lectures is the exposé on the relativity of electric and magnetic fields indeed, in which he analyzes the electric and magnetic field caused by a current, and in which he shows that, if we switch our inertial reference frame for that of the moving electrons in the wire, the ‘magnetic’ field disappears, and the whole electromagnetic effect becomes ‘electric’ indeed.

I am just noting this because I know I should do a similar analysis for the E and B ‘mixture’ involved in the electromagnetic transient that’s being emitted by our atomic oscillator. However, I’ll admit I am not quite comfortably enough with the physics nor the math involved to do that, so… Well… Please do bear this in mind as I will be jotting down some quite speculative thoughts in what follows.

So… A photon is, in essence, a electromagnetic disturbance and so, when trying to picture a photon, we can think of some oscillating electric field vector traveling through–and also limited in–space. [Note that I am leaving the magnetic field vector out of the analysis from the start, which is not ‘nice’ but, in light of that B = E/c relationship, I’ll assume it’s acceptable.] In short, in the classical world – and in the classical world only of course – a photon must be some electromagnetic wave train, like the one below–perhaps.

Photon - E

But why would it have that shape? I only suggested it because it has the same shape as Feynman’s representation of a particle (see below) as a ‘probability wave’ traveling through–and limited in–space. Wave train

So, what about it? Let me first remind you once again (I just can’t stress this point enough it seems) that Feynman’s representation – and most are based on his, it seems – is misleading because it suggests that ψ(x) is some real number. It’s not. In the image above, the vertical axis should not represent some real number (and it surely should not represent a probability, i.e. some real positive number between 0 and 1) but a probability amplitude, i.e. a complex number in which both the real and imaginary part are important. Just to be fully complete (in case you forgot), such complex-valued wave function ψ(x) will give you all the probabilities you need when you take its (absolute) square, but so… Well… We’re really talking a different animal here, and the image above gives you only one part of the complex-valued wave function (either the real or the imaginary part), while it should give you both. That’s why I find my graph below much better. 🙂 It’s the same really, but so it shows both the real as well as the complex part of a wave function.

Photon wave

But let me go back to the first illustration: the vertical axis of the first illustration is not ψ but E – the electric field vector. So there’s no imaginary part here: just a real number, representing the strength–or magnitude I should say– of the electric field E as a function of the space coordinate x. [Can magnitudes be negative? The honest answer is: no, they can’t. But just think of it as representing the field vector pointing in the other way .]

Regardless of the shortcomings of this graph, including the fact we only have some real-valued oscillation here, would it work as a ‘suggestion’ of how a real-life photon could look like?

Of course, you could try to not answer that question by mumbling something like: “Well… It surely doesn’t represent anything coming near to a photon in quantum mechanics.” But… Well… That’s not my question here: I am asking you to be creative and ‘think outside of the box’, so to say. 🙂

So you should say ‘No!’ because of some other reason. What reason? Well… If a photon is an electromagnetic transient – in other words, if we adopt a purely classical point of view – it’s going to be a transient wave indeed, and so then it should walk, talk and even look like a transient. 🙂 Let me quickly jot down the formula for the (vertical) component of E as a function of the acceleration of some charge q:

EMR law

The charge q (i.e. the source of the radiation) is, of course, our electron that’s emitting the photon as it jumps from a higher to a lower energy level (or, vice versa, absorbing it). This formula basically states that the magnitude of the electric field (E) is proportional to the acceleration (a) of the charge (with t–r/c the retarded argument). Hence, the suggested shape of E as a function of x as shown above would imply that the acceleration of the electron is (a) initially quite small, (b) then becomes larger and larger to reach some maximum, and then (c) becomes smaller and smaller again to then die down completely. In short, it does match the definition of a transient wave sensu stricto (Wikipedia defines a transient as “a short-lived burst of energy in a system caused by a sudden change of state”) but it’s not likely to represent any real transient. So, we can’t exclude it, but a real transient is much more likely to look like something what’s depicted below: no gradual increase in amplitude but big swings initially which then dampen to zero. In other words, if our photon is a transient electromagnetic disturbance caused by a ‘sudden burst of energy’ (which is what that electron jump is, I would think), then its representation will, much more likely, resemble a damped wave, like the one below, rather than Feynman’s picture of a moving matter-particle.

graph (1)

In fact, we’d have to flip the image, both vertically and horizontally, because the acceleration of the source and the field are related as shown below. The vertical flip is because of the minus sign in the formula for E(t). The horizontal flip is because of the minus sign in the (t – r/c) term, the retarded argument: if we add a little time (Δt), we get the same value for a(tr/cas we would have if we had subtracted a little distance: Δr=cΔt. So that’s why E as a function of r (or of x), i.e. as a function in space, is a ‘reversed’ plot of the acceleration as a function of time.

wave in space

So we’d have something like below.

Photon wave

What does this resemble? It’s not a vibrating string (although I do start to understand the attractiveness of string theory now: vibrating strings are great as energy storage systems, so the idea of a photon being some kind of vibrating string sounds great, doesn’t it?). It’s not resembling a bullwhip effect either, because the oscillation of a whip is confined by a different envelope (see below). And, no, it’s also definitely not a trumpet. 🙂

800px-Bullwhip_effect

It’s just what it is: an electromagnetic transient traveling through space. Would this be realistic as a ‘picture’ of a photon? Frankly, I don’t know. I’ve looked at a lot of stuff but didn’t find anything on this really. The easy answer, of course, is quite straightforward: we’re not interested in the shape of a photon because we know it is not an electromagnetic wave. It’s a ‘wavicle’, just like an electron.

[…] Sure. I know that too. Feynman told me. 🙂 But then why wouldn’t we associate some wave function with it? Please tell me, because I really can’t find much of an answer to that question in the literature, and so that’s why I am freewheeling here. So just go along with me for a while, and come up with another suggestion. As I said above, your bet is as good as mine. All that I know is that there’s one thing we need to explain when considering the various possibilities: a photon has a very well-defined frequency (which defines its color in the visible light spectrum) and so our wave train should – in my humble opinion – also have that frequency. At least for ‘quite a while’—and then I mean ‘most of the time’, or ‘on average’ at least. Otherwise the concept of a frequency – or a wavelength – wouldn’t make much sense. Indeed, if the photon has no defined wavelength or frequency, then we could not perceive it as some color (as you may or may not know, the sense of ‘color’ is produced by our eye and brain, but so it’s definitely associated with the frequency of the light). A photon should have a color (in phyics, that means a frequency) because, when everything is said and done, that’s what the Planck relation is all about.

What would be your alternative? I mean… Doesn’t it make sense to think that, when jumping from one energy level to the other, the electron would initially sort of overshoot its new equilibrium position, to then overshoot it again on the other side, and so on and so on, but with an amplitude that becomes smaller and smaller as the oscillation dies out? In short, if we look at radiation as being caused by atomic oscillators, why would we not go all the way and think of them as oscillators subject to some damping force? Just think about it. 🙂

The size of a photon wave

Let’s forget about the shape for a while and think about size. We’ve got an electromagnetic train here. So how long would it be? Well… Feynman calculated the Q of these atomic oscillators: it’s of the order of 10(see his Lectures, I-33-3: it’s a wonderfully simple exercise, and one that really shows his greatness as a physics teacher) and, hence, this wave train will last about 10–8 seconds (that’s the time it takes for the radiation to die out by a factor 1/e). To give a somewhat more precise example, for sodium light, which has a frequency of 500 THz (500×1012 oscillations per second) and a wavelength of 600 nm (600×10–9 meter), the radiation will lasts about 3.2×10–8 seconds. [In fact, that’s the time it takes for the radiation’s energy to die out by a factor 1/e, so(i.e. the so-called decay time τ), so the wavetrain will actually last longer, but so the amplitude becomes quite small after that time.]

So that’s a very short time, but still, taking into account the rather spectacular frequency (500 THz) of sodium light, that still makes for some 16 million oscillations and, taking into the account the rather spectacular speed of light (3×10m/s), that makes for a wave train with a length of, roughly, 9.6 meter. Huh? 9.6 meter!?

You’re right. That’s an incredible distance: it’s like infinity on an atomic scale!

So… Well… What to say? Such length surely cannot match the picture of a photon as a fundamental particle which cannot be broken up, can it? So it surely cannot be right because, if this would be the case, then there surely must be some way to break this thing up and, hence, it cannot be ‘elementary’, can it?

Well… Maybe. But think it through. First note that we will not see the photon as a 10-meter long string because it travels at the speed of light indeed and so the length contraction effect ensure its length, as measured in our reference frame (and from whatever ‘real-life’ reference frame actually, because the speed of light will always be c, regardless of the speeds we mortals could ever reach (including speeds close to c), is zero.

So, yes, I surely must be joking here but, as far as jokes go, I can’t help thinking this one is fairly robust from a scientific point of view. Again, please do double-check and correct me, but all what I’ve written so far is not all that speculative. It corresponds to all what I’ve read about it: only one photon is produced per electron in any de-excitation, and its energy is determined by the number of energy levels it drops, as illustrated (for a simple hydrogen atom) below. For those who continue to be skeptical about my sanity here, I’ll quote Feynman once again:

“What happens in a light source is that first one atom radiates, then another atom radiates, and so forth, and we have just seen that atoms radiate a train of waves only for about 10–8 sec; after 10–8 sec, some atom has probably taken over, then another atom takes over, and so on. So the phases can really only stay the same for about 10–8 sec. Therefore, if we average for very much more than 10–8 sec, we do not see an interference from two different sources, because they cannot hold their phases steady for longer than 10–8 sec. With photocells, very high-speed detection is possible, and one can show that there is an interference which varies with time, up and down, in about 10–8 sec.” (Feynman’s Lectures, I-34-4)

600px-Hydrogen_transitions

So… Well… Now it’s up to you. I am going along here with the assumption that a photon in the visible light spectrum, from a classical world perspective, should indeed be something that’s several meters long and packs a few million oscillations. So, while we usually measure stuff in seconds, or hours, or years, and, hence, while we would that think 10–8 seconds is short, a photon would actually be a very stretched-out transient that occupies quite a lot of space. I should also add that, in light of that number of ten meter, the dampening seems to happen rather slowly!

[…]

I can see you shaking your head now, for various reasons.

First because this type of analysis is not appropriate. […] You think so? Well… I don’t know. Perhaps you’re right. Perhaps we shouldn’t try to think of a photon as being something different than a discrete packet of energy. But then we also know it is an electromagnetic waveSo why wouldn’t we go all the way? 

Second, I guess you may find the math involved in this post not to your liking, even if it’s quite simple and I am not doing anything spectacular here. […] Well… Frankly, I don’t care. Let me bulldozer on. 🙂

What about the ‘vertical’ dimension, the y and the z coordinates in space? We’ve got this long snaky  thing: how thick-bodied is it?

Here, we need to watch our language. While it’s fairly obvious to associate a wave with a cross-section that’s normal to its direction of propagation, it is not obvious to associate a photon with the same thing. Not at all actually: as that electric field vector E oscillates up and down (or goes round and round, as shown in the illustration below, which is an image of a circularly polarized wave), it does not actually take any space. Indeed, the electric and magnetic field vectors E and B have a direction and a magnitude in space but they’re not representing something that is actually taking up some small or larger core in space.

Circular.Polarization.Circularly.Polarized.Light_Right.Handed.Animation.305x190.255Colors

Hence, the vertical axis of that graph showing the wave train does not indicate some spatial position: it’s not a y-coordinate but the magnitude of an electric field vector. [Just to underline the fact that the magnitude E has nothing to do with spatial coordinates: note that its value depends on the unit we use to measure field strength (so that’s newton/coulomb, if you want to know), so it’s really got nothing to do with an actual position in space-time.]

So, what can we say about it? Nothing much, perhaps. But let me try.

Cross-sections in nuclear physics

In nuclear physics, the term ‘cross-section’ would usually refer to the so-called Thompson scattering cross-section of an electron (or any charged particle really), which can be defined rather loosely as the target area for the incident wave (i.e. the photons): it is, in fact, a surface which can be calculated from what is referred to as the classical electron radius, which is about 2.82×10–15 m. Just to compare: you may or may not remember the so-called Bohr radius of an atom, which is about 5.29×10–11 m, so that’s a length that’s about 20,000 times longer. To be fully complete, let me give you the exact value for the Thompson scattering cross-section of an electron: 6.62×10–29 m(note that this is a surface indeed, so we have m squared as a unit, not m).

Now, let me remind you – once again – that we should not associate the oscillation of the electric field vector with something actually happening in space: an electromagnetic field does not move in a medium and, hence, it’s not like a water or sound wave, which makes molecules go up and down as it propagates through its medium. To put it simply: there’s nothing that’s wriggling in space as that photon is flashing through space. However, when it does hit an electron, that electron will effectively ‘move’ (or vibrate or wriggle or whatever you can imagine) as a result of the incident electromagnetic field.

That’s what’s depicted and labeled below: there is a so-called ‘radial component’ of the electric field, and I would say: that’s our photon! [What else would it be?] The illustration below shows that this ‘radial’ component is just E for the incident beam and that, for the scattered beam, it is, in fact, determined by the electron motion caused by the incident beam through that relation described above, in which a is the normal component (i.e. normal to the direction of propagation of the outgoing beam) of the electron’s acceleration.

Thomson_scattering_geometry

Now, before I proceed, let me remind you once again that the above illustration is, once again, one of those illustrations that only wants to convey an idea, and so we should not attach too much importance to it: the world at the smallest scale is best not represented by a billiard ball model. In addition, I should also note that the illustration above was taken from the Wikipedia article on elastic scattering (i.e. Thomson scattering), which is only a special case of the more general Compton scattering that actually takes place. It is, in fact, the low-energy limit. Photons with higher energy will usually be absorbed, and then there will be a re-emission, but, in the process, there will be a loss of energy in this ‘collision’ and, hence, the scattered light will have lower energy (and, hence, lower frequency and longer wavelength). But – Hey! – now that I think of it: that’s quite compatible with my idea of damping, isn’t it? 🙂 [If you think I’ve gone crazy, I am really joking here: when it’s Compton scattering, there’s no ‘lost’ energy: the electron will recoil and, hence, its momentum will increase. That’s what’s shown below (credit goes to the HyperPhysics site).]

compton4

So… Well… Perhaps we should just assume that a photon is a long wave train indeed (as mentioned above, ten meter is very long indeed: not an atomic scale at all!) but that its effective ‘radius’ should be of the same order as the classical electron radius. So what’s that order? If it’s more or less the same radius, then it would be in the order of femtometers (1 fm = 1 fermi = 1×10–15 m). That’s good because that’s a typical length-scale in nuclear physics. For example, it would be comparable with the radius of a proton. So we look at a photon here as something very different – because it’s so incredibly long (at least as measured from its own reference frame) – but as something which does have some kind of ‘radius’ that is normal to its direction of propagation and equal or smaller than the classical electron radius. [Now that I think of it, we should probably think of it as being substantially smaller. Why? Well… An electron is obviously fairly massive as compared to a photon (if only because an electron has a rest mass and a photon hasn’t) and so… Well… When everything is said and done, it’s the electron that absorbs a photon–not the other way around!]

Now, that radius determines the area in which it may produce some effect, like hitting an electron, for example, or like being detected in a photon detector, which is just what this so-called radius of an atom or an electron is all about: the area which is susceptible of being hit by some particle (including a photon), or which is likely to emit some particle (including a photon). What is exactly, we don’t know: it’s still as spooky as an electron and, therefore, it also does not make all that much sense to talk about its exact position in space. However, if we’d talk about its position, then we should obviously also invoke the Uncertainty Principle, which will give us some upper and lower bounds for its actual position, just like it does for any other particle: the uncertainty about its position will be related to the uncertainty about its momentum, and more knowledge about the former, will implies less knowledge about the latter, and vice versa. Therefore, we can also associate some complex wave function with this photon which is – for all practical purposes – a de Broglie wave. Now how should we visualize that wave?

Well… I don’t know. I am actually not going to offer anything specific here. First, it’s all speculation. Second, I think I’ve written too much rubbish already. However, if you’re still reading, and you like this kind of unorthodox application of electromagnetics, then the following remarks may stimulate your imagination.

The first thing to note is that we should not end up with a wave function that, when squared, gives us a constant probability for each and every point in space. No. The wave function needs to be confined in space and, hence, we’re also talking a wave train here, and a very short one in this case. So… Well… What about linking its amplitude to the amplitude of the field for the photon. In other words, the probability amplitude could, perhaps, be proportional to the amplitude of E, with the proportionality factor being determined by (a) the unit in which we measure E (i.e. newton/coulomb) and (b) the normalization condition.

OK. I hear you say it now: “Ha-ha! Got you! Now you’re really talking nonsense! How can a complex number (the probability amplitude) be proportional to some real number (the field strength)?”

Well… Be creative. It’s not that difficult to imagine some linkages. First, the electric field vector has both a magnitude and a direction. Hence, there’s more to E than just its magnitude. Second, you should note that the real and imaginary part of a complex-valued wave function is a simple sine and cosine function, and so these two functions are the same really, except for a phase difference of π/2. In other words, if we have a formula for the real part of a wave function, we have a formula for its imaginary part as well. So… Your remark is to the point and then it isn’t.

OK, you’ll say, but then so how exactly would you link the E vector with the ψ(x, t) function for a photon. Well… Frankly, I am a bit exhausted now and so I’ll leave any further speculation to you. The whole idea of a de Broglie wave of a photon, with the (complex-valued) amplitude having some kind of ‘proportional’ relationship to the (magnitude of) the electric field vector makes sense to me, although we’d have to be innovative about what that ‘proportionality’ exactly is.

Let me conclude this speculative business by noting a few more things about our ‘transient’ electromagnetic wave:

1. First, it’s obvious that the usual relations between (a) energy (W), (b) frequency (f) and (c) amplitude (A) hold. If we increase the frequency of a wave, we’ll have a proportional increase in energy (twice the frequency is twice the energy), with the factor of proportionality being given by the Planck-Einstein relation: W = hf. But if we’re talking amplitudes (for which we do not have a formula, which is why we’re engaging in those assumptions on the shape of the transient wave), we should not forget that the energy of a wave is proportional to the square of its amplitude: W ∼ A2. Hence, a linear increase of the amplitudes results in an exponential (quadratic) increase in energy (e.g. if you double all amplitudes, you’ll pack four times more energy in that wave).

2. Both factors come into play when an electron emits a photon. Indeed, if the difference between the two energy levels is larger, then the photon will not only have a higher frequency (i.e. we’re talking light (or electromagnetic radiation) in the upper ranges of the spectrum then) but one should also expect that the initial overshooting – and, hence, the initial oscillation – will also be larger. In short, we’ll have larger amplitudes. Hence, higher-energy photons will pack even more energy upfront. They will also have higher frequency, because of the Planck relation. So, yes, both factors would come into play.

What about the length of these wave trains? Would it make them shorter? Yes. I’ll refer you to Feynman’s Lectures to verify that the wavelength appears in the numerator of the formula for Q. Hence, higher frequency means shorter wavelength and, hence, lower Q. Now, I am not quite sure (I am not sure about anything I am writing here it seems) but this may or may not be the reason for yet another statement I never quite understood: photons with higher and higher energy are said to become smaller and smaller, and when they reach the Planck scale, they are said to become black holes.

Hmm… I should check on that. 🙂

Conclusion

So what’s the conclusion? Well… I’ll leave it to you to think about this. As said, I am a bit tired now and so I’ll just wrap this up, as this post has become way too long anyway. Let me, before parting, offer the following bold suggestion in terms of finding a de Broglie wave for our photon: perhaps that transient above actually is the wave function.

You’ll say: What !? What about normalization? All probabilities have to add up to one and, surely, those magnitudes of the electric field vector wouldn’t add up to one, would they?

My answer to that is simple: that’s just a question of units, i.e. of normalization indeed. So just measure the field strength in some other unit and it will come all right.

[…] But… Yes? What? Well… Those magnitudes are real numbers, not complex numbers.

I am not sure how to answer that one but there’s two things I could say:

  1. Real numbers are complex numbers too: it’s just that their imaginary part is zero.
  2. When working with waves, and especially with transients, we’ve always represented them using the complex exponential function. For example, we would write a wave function whose amplitude varies sinusoidally in space and time as Aei(ωtr), with ω the (angular) frequency and k the wave number (so that’s the wavelength expressed in radians per unit distance).

So, frankly, think about it: where is the photon? It’s that ten-meter long transient, isn’t it? And the probability to find it somewhere is the (absolute) square of some complex number, right? And then we have a wave function already, representing an electromagnetic wave, for which we know that the energy which it packs is the square of its amplitude, as well as being proportional to its frequency. We also know we’re more likely to detect something with high energy than something with low energy, don’t we? So… Tell me why the transient itself would not make for a good psi function?

But then what about these probability amplitudes being a function of the y and z coordinates?

Well… Frankly, I’ve started to wonder if a photon actually has a radius. If it doesn’t have a mass, it’s probably the only real point-like particle (i.e. a particle not occupying any space) – as opposed to all other matter-particles, which do have mass.

Why?

I don’t know. Your guess is as good as mine. Maybe our concepts of amplitude and frequency of a photon are not very relevant. Perhaps it’s only energy that counts. We know that a photon has a more or less well-defined energy level (within the limits of the Uncertainty Principle) and, hence, our ideas about how that energy actually gets distributed over the frequency, the amplitude and the length of that ‘transient’ have no relation with reality. Perhaps we like to think of a photon as a transient electromagnetic wave, because we’re used to thinking in terms of waves and fields, but perhaps a photon is just a point-like thing indeed, with a wave function that’s got the same shape as that transient. 🙂

Post scriptum: Perhaps I should apologize to you, my dear reader. It’s obvious that, in quantum mechanics, we don’t think of a photon as having some frequency and some wavelength and some dimension in space: it’s just an elementary particle with energy interacting with other elementary particles with energy, and we use these coupling constants and what have you to work with them. So we don’t usually think of photons as ten-meter long transients moving through space. So, when I write that “our concepts of amplitude and frequency of a photon are maybe not very relevant” when trying to picture a photon, and that “perhaps, it’s only energy that counts”, I actually don’t mean “maybe” or “perhaps“. I mean: Of course! […] In the quantum-mechanical world view, that is.

So I apologize for, perhaps, posting what may or may not amount to plain nonsense. However, as all of this nonsense helps me to make sense of these things myself, I’ll just continue. 🙂 I seem to move very slowly on this Road to Reality, but the good thing about moving slowly, is that it will − hopefully − give me the kind of ‘deeper’ understanding I want, i.e. an understanding beyond the formulas and mathematical and physical models. In the end, that’s all that I am striving for when pursuing this ‘hobby’ of mine. Nothing more, nothing less. 🙂 Onwards!

Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 17, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 20, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Babushka thinking

Pre-scriptum (dated 26 June 2020): This is an interesting post. I think my thoughts on the relevance of scale – especially the role of the fine-structure constant in this regard – have evolved considerably, so you should probably read my papers instead of these old blog posts.

Original post:

What is that we are trying to understand? As a kid, when I first heard about atoms consisting of a nucleus with electrons orbiting around it, I had this vision of worlds inside worlds, like a set of babushka dolls, one inside the other. Now I know that this model – which is nothing but the 1911 Rutherford model basically – is plain wrong, even if it continues to be used in the logo of the International Atomic Energy Agency, or the US Atomic Energy Commission. 

IAEA logo US_Atomic_Energy_Commission_logo

Electrons are not planet-like things orbiting around some center. If one wants to understand something about the reality of electrons, one needs to familiarize oneself with complex-valued wave functions whose argument represents a weird quantity referred to as a probability amplitude and, contrary to what you may think (unless you read my blog, or if you just happen to know a thing or two about quantum mechanics), the relation between that amplitude and the concept of probability tout court is not very straightforward.

Familiarizing oneself with the math involved in quantum mechanics is not an easy task, as evidenced by all those convoluted posts I’ve been writing. In fact, I’ve been struggling with these things for almost a year now and I’ve started to realize that Roger Penrose’s Road to Reality (or should I say Feynman’s Lectures?) may lead nowhere – in terms of that rather spiritual journey of trying to understand what it’s all about. If anything, they made me realize that the worlds inside worlds are not the same. They are different – very different.

When everything is said and done, I think that’s what’s nagging us as common mortals. What we are all looking for is some kind of ‘Easy Principle’ that explains All and Everything, and we just can’t find it. The point is: scale matters. At the macro-scale, we usually analyze things using some kind of ‘billiard-ball model’. At a smaller scale, let’s say the so-called wave zone, our ‘law’ of radiation holds, and we can analyze things in terms of electromagnetic or gravitational fields. But then, when we further reduce scale, by another order of magnitude really – when trying to get  very close to the source of radiation, or if we try to analyze what is oscillating really – we get in deep trouble: our easy laws do no longer hold, and the equally easy math – easy is relative of course 🙂 – we use to analyze fields or interference phenomena, becomes totally useless.

Religiously inclined people would say that God does not want us to understand all or, taking a somewhat less selfish picture of God, they would say that Reality (with a capital R to underline its transcendental aspects) just can’t be understood. Indeed, it is rather surprising – in my humble view at least – that things do seem to get more difficult as we drill down: in physics, it’s not the bigger things – like understanding thermonuclear fusion in the Sun, for example – but the smallest things which are difficult to understand. Of course, that’s partly because physics leaves some of the bigger things which are actually very difficult to understand – like how a living cell works, for example, or how our eye or our brain works – to other sciences to study (biology and biochemistry for cells, or for vision or brain functionality). In that respect, physics may actually be described as the science of the smallest things. The surprising thing, then, is that the smallest things are not necessarily the simplest things – on the contrary.

Still, that being said, I can’t help feeling some sympathy for the simpler souls who think that, if God exists, he seems to throw up barriers as mankind tries to advance its knowledge. Isn’t it strange, indeed, that the math describing the ‘reality’ of electrons and photons (i.e. quantum mechanics and quantum electrodynamics), as complicated as it is, becomes even more complicated – and, important to note, also much less accurate – when it’s used to try to describe the behavior of  quarks and gluons? Additional ‘variables’ are needed (physicists call these ‘variables’ quantum numbers; however, when everything is said and done, that’s what quantum numbers actually are: variables in a theory), and the agreement between experimental results and predictions in QCD is not as obvious as it is in QED.

Frankly, I don’t know much about quantum chromodynamics – nothing at all to be honest – but when I read statements such as “analytic or perturbative solutions in low-energy QCD are hard or impossible due to the highly nonlinear nature of the strong force” (I just took this one line from the Wikipedia article on QCD), I instinctively feel that QCD is, in fact, a different world as well – and then I mean different from QED, in which analytic or perturbative solutions are the norm. Hence, I already know that, once I’ll have mastered Feynman’s Volume III, it won’t help me all that much to get to the next level of understanding: understanding quantum chromodynamics will be yet another long grind. In short, understanding quantum mechanics is only a first step.

Of course, that should not surprise us, because we’re talking very different order of magnitudes here: femtometers (10–15 m), in the case of electrons, as opposed to attometers (10–18 m) or even zeptometers (10–21 m) when we’re talking quarks. Hence, if past experience (I mean the evolution of scientific thought) is any guidance, we actually should expect an entirely different world. Babushka thinking is not the way forward.

Babushka thinking

What’s babushka thinking? You know what babushkas are, don’t you? These dolls inside dolls. [The term ‘babushka’ is actually Russian for an old woman or grandmother, which is what these dolls usually depict.] Babushka thinking is the fallacy of thinking that worlds inside worlds are the same. It’s what I did as a kid. It’s what many of us still do. It’s thinking that, when everything is said and done, it’s just a matter of not being able to ‘see’ small things and that, if we’d have the appropriate equipment, we actually would find the same doll within the larger doll – the same but smaller – and then again the same doll with that smaller doll. In Asia, they have these funny expression: “Same-same but different.” Well… That’s what babushka thinking all about: thinking that you can apply the same concepts, tools and techniques to what is, in fact, an entirely different ballgame.

First_matryoshka_museum_doll_open

Let me illustrate it. We discussed interference. We could assume that the laws of interference, as described by superimposing various waves, always hold, at every scale, and that it’s just  the crudeness of our detection apparatus that prevents us from seeing what’s going on. Take two light sources, for example, and let’s say they are a billion wavelengths apart – so that’s anything between 400 to 700 meters for visible light (because the wavelength of visible light is 400 to 700 billionths of a meter). So then we won’t see any interference indeed, because we can’t register it. In fact, none of the standard equipment can. The interference term oscillates wildly up and down, from positive to negative and back again, if we move the detector just a tiny bit left or right – not more than the thickness of a hair (i.e. 0.07 mm or so). Hence, the range of angles θ (remember that angle θ was the key variable when calculating solutions for the resultant wave in previous posts) that are being covered by our eye – or by any standard sensor really – is so wide that the positive and negative interference averages out: all that we ‘see’ is the sum of the intensities of the two lights. The terms in the interference term cancel each other out. However, we are still essentially correct assuming there actually is interference: we just cannot see it – but it’s there.

Reinforcing the point, I should also note that, apart from this issue of ‘distance scale’, there is also the scale of time. Our eye has a tenth-of-a-second averaging time. That’s a huge amount of time when talking fundamental physics: remember that an atomic oscillator – despite its incredibly high Q – emits radiation for like 10-8 seconds only, so that’s one-hundred millionths of a second. Then another atom takes over, and another – and so that’s why we get unpolarized light: it’s all the same frequencies (because the electron oscillators radiate at their resonant frequencies), but so there is no fixed phase difference between all of these pulses: the interference between all of these pulses should result in ‘beats’ – as they interfere positively or negatively – but it all cancels out for us, because it’s too fast.

Indeed, while the ‘sensors’ in the retina of the human eye (there are actually four kind of cells there, but so the principal ones are referred to as ‘rod’ and ‘cone’ cells respectively) are, apparently, sensitive enough able to register individual photons, the “tenth-of-a-second averaging” time means that the cells – which are interconnected and ‘pre-process’ light really – will just amalgamate all those individual pulses into one signal of a certain color (frequency) and a certain intensity (energy). As one scientist puts it: “The neural filters only allow a signal to pass to the brain when at least about five to nine photons arrive within less than 100 ms.” Hence, that signal will not keep track of the spacing between those photons.

In short, information gets lost. But so that, in itself, does not invalidate babushka thinking. Let me visualize it by a non-very-mathematically-rigorous illustration. Suppose that we have some very regular wave train coming in, like the one below: one wave train consisting of three ‘groups’ separated between ‘nodes’.

Graph

All will depend on the period of the wave as compared to that one-tenth-of-a-second averaging time. In fact, we have two ‘periods’: the periodicity of the group – which is related to the concept of group velocity – and, hence, I’ll associate a ‘group wavelength’ and a ‘group period’ with that. [In case you haven’t heard of these terms before, don’t worry: I haven’t either. :-)] Now, if one tenth of a second covers like two or all three of the groups between the nodes (so that means that one tenth of a second is a multiple of the group period Tg), then even the envelope of the wave does not matter much in terms of ‘signal’: our brain will just get one pulse that averages it all out. We will see none of the detail of this wave train. Our eye will just get light in (remember that the intensity of the light is the square of the amplitude, so the negative amplitudes make contributions too) but we cannot distinguish any particular pulse: it’s just one signal. This is the most common situation when we are talking about electromagnetic radiation: many photons arrive but our eye just sends one signal to the brain: “Hey Boss! Light of color X and intensity Y coming from direction Z.”

In fact, it’s quite remarkable that our eye can distinguish colors in light of the fact that the wavelengths of various colors (violet, blue, green, yellow, orange and red) differs 30 to 40 billionths of a meter only! Better still: if the signal lasts long enough, we can distinguish shades whose wavelengths differ by 10 or 15 nm only, so that’s a difference of 1% or 2% only. In case you wonder how it works: Feynman devotes not less than two chapters in his Lectures to the physiology of the eye: not something you’ll find in other physics handbooks! There are apparently three pigments in the cells in our eyes, each sensitive to color in a different way and it is “the spectral absorption in those three pigments that produces the color sense.” So it’s a bit like the RGB system in a television – but then more complicated, of course!

But let’s go back to our wave there and analyze the second possibility. If a tenth of a second covers less than that ‘group wavelength’, then it’s different: we will actually see the individual groups as two or  three separate pulses. Hence, in that case, our eye – or whatever detector (another detector will just have another averaging time – will average over a group, but not over the whole wave train. [Just in case you wonder how we humans compare with our living beings: from what I wrote above, it’s obvious we can see ‘flicker’ only if the oscillation is in the range of 10 or 20 Hz. The eye of a bee is made to see the vibrations of feet and wings of other bees and, hence, its averaging time is much shorter, like a hundredth of a second and, hence, it can see flicker up to 200 oscillations per second! In addition, the eye of a bee is sensitive over a much wider range of ‘color’ – it sees UV light down to a wavelength of 300 nm (where as we don’t see light with a wavelength below 400 nm) – and, to top it all off, it has got a special sensitivity for polarized light, so light that gets reflected or diffracted looks different to the bee.]

Let’s go to the third and final case. If a tenth of a second would cover less than the wavelength of the the so-called carrier wave, i.e. the actual oscillation, then we will be able to distinguish the individual peaks and troughs of the carrier wave!

Of course, this discussion is not limited to our eye as a sensor: any instrument will be able to measure individual phenomena only within a certain range, with an upper and a lower range, i.e. the ‘biggest’ thing it can see, and the ‘smallest’. So that explains the so-called resolution of an optical or an electron microscope: whatever the instrument, it cannot really ‘see’ stuff that’s smaller than the wavelength of the ‘light’ (real light or – in the case of an electron microscope – electron beams) it uses to ‘illuminate’ the object it is looking at. [The actual formula for the resolution of a microscope is obviously a bit more complicated, but this statement does reflect the gist of it.]

However, all that I am writing above, suggests that we can think of what’s going on here as ‘waves within waves’, with the wave between nodes not being any different – in substance that is – as the wave as a whole: we’ve got something that’s oscillating, and within each individual oscillation, we find another oscillation. From a math point of view, babushka thinking is thinking we can analyze the world using Fourier’s machinery to decompose some function (see my posts on Fourier analysis). Indeed, in the example above, we have a modulated carrier wave (it is an example of amplitude modulation – the old-fashioned way of transmitting radio signals), and we see a wave within a wave and, hence, just like the Rutherford model of an atom, you may think there will always be ‘a wave within a wave’.

In this regard, you may think of fractals too: fractals are repeating or self-similar patterns that are always there, at every scale. However, the point to note is that fractals do not represent an accurate picture of how reality is actually structured: worlds within worlds are not the same.

Reality is no onion

Reality is not some kind of onion, from which you peel off a layer and then you find some other layer, similar to the first: “same-same but different”, as they’d say in Asia. The Coast of Britain is, in fact, finite, and the grain of sand you’ll pick up at one of its beaches will not look like the coastline when you put it under a microscope. In case you don’t believe me: I’ve inserted a real-life photo below. The magnification factor is a rather modest 300 times. Isn’t this amazing? [The credit for this nice picture goes to a certain Dr. Gary Greenberg. Please do google his stuff. It’s really nice.]

sand-grains-under-microscope-gary-greenberg-1

In short, fractals are wonderful mathematical structures but – in reality – there are limits to how small things get: we cannot carve a babushka doll out of the cellulose and lignin molecules that make up most of what we call wood. Likewise, the atoms that make up the D-glucose chains in the cellulose will never resemble the D-glucose chains. Hence, the babushka doll, the D-glucose chains that make up wood, and the atoms that make up the molecules within those macro-molecules are three different worlds. They’re not like layers of the same onion. Scale matters. The worlds inside words are different, and fundamentally so: not “same-same but different” but just plain different. Electrons are no longer point-like negative charges when we look at them at close range.

In fact, that’s the whole point: we can’t look at them at close range because we can’t ‘locate’ them. They aren’t particles. They are these strange ‘wavicles’ which we described, physically and mathematically, with a complex wave function relating their position (or their momentum) with some probability amplitude, and we also need to remember these funny rules for adding these amplitudes, depending on whether or not the ‘wavicle’ obeys Fermi or Bose statistics.

Weird, but – come to think of it – not more weird, in terms of mathematical description, than these electromagnetic waves. Indeed, when jotting down all these equations and developing all those mathematical argument, one often tends to forget that we are not talking some physical wave here. The field vector E (or B) is a mathematical construct: it tells us what force a charge will feel when we put it here or there. It’s not like a water or sound wave that makes some medium (water or air) actually move. The field is an influence that travels through empty space. But how can something actually through empty space? When it’s truly empty, you can’t travel through it, can you?

Oh – you’ll say – but we’ve got these photons, don’t we? Waves are not actually waves: they come in little packets of energy – photons. Yes. You’re right. But, as mentioned above, these photons aren’t little bullets – or particles if you want. They’re as weird as the wave and, in any case, even a billiard ball view of the world is not very satisfying: what happens exactly when two billiard balls collide in a so-called elastic collision? What are the springs on the surface of those balls – in light of the quick reaction, they must resemble more like little explosive charges that detonate on impact, isn’t it? – that make the two balls recoil from each other?

So any mathematical description of reality becomes ‘weird’ when you keep asking questions, like that little child I was – and I still am, in a way, I guess. Otherwise I would not be reading physics at the age of 45, would I? 🙂

Conclusion

Let me wrap up here. All of what I’ve been blogging about over the past few months concerns the classical world of physics. It consists of waves and fields on the one hand, and solid particles on the other – electrons and nucleons. But so we know it’s not like that when we have more sensitive apparatuses, like the apparatus used in that 2012 double-slit electron interference experiment at the University of Nebraska–Lincoln, that I described at length in one of my earlier posts. That apparatus allowed control of two slits – both not more than 62 nanometer wide (so that’s the difference between the wavelength of dark-blue and light-blue light!), and the monitoring of single-electron detection events. Back in 1963, Feynman already knew what this experiment would yield as a result. He was sure about it, even if he thought such instrument could never be built. [To be fully correct, he did have some vague idea about a new science, for which he himself coined the term ‘nanotechnology’, but what we can do today surpasses, most probably, all his expectations at the time. Too bad he died too young to see his dreams come through.]

The point to note is that this apparatus does not show us another layer of the same onion: it shows an entirely different world. While it’s part of reality, it’s not ‘our’ reality, nor is it the ‘reality’ of what’s being described by classical electromagnetic field theory. It’s different – and fundamentally so, as evidenced by those weird mathematical concepts one needs to introduce to sort of start to ‘understand’ it.

So… What do I want to say here? Nothing much. I just had to remind myself where I am right now. I myself often still fall prey to babushka thinking. We shouldn’t. We should wonder about the wood these dolls are made of. In physics, the wood seems to be math. The models I’ve presented in this blog are weird: what are those fields? And just how do they exert a force on some charge? What’s the mechanics behind? To these questions, classical physics does not have an answer really.

But, of course, quantum mechanics does not have a very satisfactory answer either: what does it mean when we say that the wave function collapses? Out of all of the possibilities in that wonderful indeterminate world ‘inside’ the quantum-mechanical universe, one was ‘chosen’ as something that actually happened: a photon imparts momentum to an electron, for example. We can describe it, mathematically, but – somehow – we still don’t really understand what’s going on.

So what’s going on? We open a doll, and we do not find another doll that is smaller but similar. No. What we find is a completely different toy. However – Surprise ! Surprise ! – it’s something that can be ‘opened’ as well, to reveal even weirder stuff, for which we need even weirder ‘tools’ to somehow understand how it works (like lattice QCD, if you’d want an example: just google it if you want to get an inkling of what that’s about). Where is this going to end? Did it end with the ‘discovery’ of the Higgs particle? I don’t think so.

However, with the ‘discovery’ (or, to be generous, let’s call it an experimental confirmation) of the Higgs particle, we may have hit a wall in terms of verifying our theories. At the center of a set of babushka dolls, you’ll usually have a little baby: a solid little thing that is not like the babushkas surrounding it: it’s young, male and solid, as opposed to the babushkas. Well… It seems that, in physics, we’ve got several of these little babies inside: electrons, photons, quarks, gluons, Higgs particles, etcetera. And we don’t know what’s ‘inside’ of them. Just that they’re different. Not “same-same but different”. No. Fundamentally different. So we’ve got a lot of ‘babies’ inside of reality, very different from the ‘layers’ around them, which make up ‘our’ reality. Hence, ‘Reality’ is not a fractal structure. What is it? Well… I’ve started to think we’ll never know. For all of the math and wonderful intellectualism involved, do we really get closer to an ‘understanding’ of what it’s all about?

I am not sure. The more I ‘understand’, the less I ‘know’ it seems. But then that’s probably why many physicists still nurture an acute sense of mystery, and why I am determined to keep reading. 🙂

Post scriptum: On the issue of the ‘mechanistic universe’ and the (related) issue of determinability and indeterminability, that’s not what I wanted to write about above, because I consider that solved. This post is meant to convey some wonder – on the different models of understanding that we need to apply to different scales. It’s got little to do with determinability or not. I think that issue got solved long time ago, and I’ll let Feynman summarize that discussion:

“The indeterminacy of quantum mechanics has given rise to all kinds of nonsense and questions on the meaning of freedom of will, and of the idea that the world is uncertain. […] Classical physics is also indeterminate. It is true, classically, that if we knew the position and the velocity of every particle in the world, or in a box of gas, we could predict exactly what would happen. And therefore the classical world is deterministic. Suppose, however, we have a finite accuracy and do not know exactly where just one atom is, say to one part in a billion. Then as it goes along it hits another atom, and because we did not know the position better than one part in a billion, we find an even larger error in the position after the collision. And that is amplified, of course, in the next collision, so that if we start with only a tiny error it rapidly magnifies to a very great uncertainty. […] Speaking more precisely, given an arbitrary accuracy, no matter how precise, one can find a time long enough that we cannot make predictions valid for that long a time. That length of time is not very large. It is not that the time is millions of years if the accuracy is one part in a billion. The time goes only logarithmically with the error. In only a very, very tiny time – less than the time it took to state the accuracy – we lose all our information. It is therefore not fair to say that from the apparent freedom and indeterminacy of the human mind, we should have realized that classical ‘deterministic’ physics could not ever hope to understand, and to welcome quantum mechanics as a release from a completely ‘mechanistic’ universe. For already in classical mechanics, there was indeterminability from a practical point of view.” (Feynman, Lectures, 1963, p. 38-10)

That really says it all, I think. I’ll just continue to keep my head down – i.e. stay away from philosophy as for now – and try to find a way to open the toy inside the toy. 🙂

Light: relating waves to photons

Pre-scriptum (dated 26 June 2020): Some of the relevant illustrations in this post were removed as a result of an attack by the dark force. In any case, my ideas on the nature of light and photons have evolved considerably, so you should probably read my papers instead of these old blog posts.

Original post:

This is a concluding note on my ‘series’ on light. The ‘series’ gave you an overview of the ‘classical’ theory: light as an electromagnetic wave. It was very complete, including relativistic effects (see my previous post). I could have added more – there’s an equivalent for four-vectors, for example, when we’re dealing with frequencies and wave numbers: quantities that transform like space and time under the Lorentz transformations – but you got the essence.

One point we never ever touched upon, was that magnetic field vector though. It is there. It is tiny because of that 1/c factor, but it’s there. We wrote it as

magnetic field

All symbols in bold are vectors, of course. The force is another vector vector cross-product: F = qv×B, and you need to apply the usual right-hand screw rule to find the direction of the force. As it turns out, that force – as tiny as it is – is actually oriented in the direction of propagation, and it is what is responsible for the so-called radiation pressure.

So, yes, there is a ‘pushing momentum’. How strong is it? What power can it deliver? Can it indeed make space ships sail? Well… The magnitude of the unit vector er’ is obviously one, so it’s the values of the other vectors that we need to consider. If we substitute and average F, the thing we need to find is:

〈F〉 = q〈vE〉/c

But the charge q times the field is the electric force, and the force on the charge times the velocity is the work dW/dt being done on the charge. So that should equal the energy absorbed that is being absorbed from the light per second. Now, I didn’t look at that much. It’s actually one of the very few things I left – but I’ll refer you to Feynman’s Lectures if you want to find out more: there’s a fine section on light scattering, introducing the notion of the Thompson scattering cross section, but – as said – I think you had enough as for now. Just note that 〈F〉 = [dW/dt]/c and, hence, that the momentum that light delivers is equal to the energy that is absorbed (dW/dt) divided by c.

So the momentum carried is 1/c times the energy. Now, you may remember that Planck solved the ‘problem’ of black-body radiation – an anomaly that physicists couldn’t explain at the end of the 19th century – by re-introducing a corpuscular theory of light: he said light consisted of photons. We all know that photons are the kind of ‘particles’ that the Greek and medieval corpuscular theories of light envisaged but, well… They have a particle-like character – just as much as they have a wave-like character. They are actually neither, and they are physically and mathematically being described by these wave functions – which, in turn, are functions describing probability amplitudes. But I won’t entertain you with that here, because I’ve written about that in other posts. Let’s just go along with the ‘corpuscular’ theory of photons for a while.

Photons also have energy (which we’ll write as W instead of E, just to be consistent with the symbols above) and momentum (p), and Planck’s Law says how much:

W = hf and p = W/c

So that’s good: we find the same multiplier 1/c here for the momentum of a photon. In fact, this is more than just a coincidence of course: the “wave theory” of light and Planck’s “corpuscular theory” must of course link up, because they are both supposed to help us understand real-life phenomena.

There’s even more nice surprises. We spoke about polarized light, and we showed how the end of the electric field vector describes a circular or elliptical motion as the wave travels to space. It turns out that we can actually relate that to some kind of angular momentum of the wave (I won’t go into the details though – because I really think the previous posts have really been too heavy on equations and complicated mathematical arguments) and that we could also relate it to a model of photons carrying angular momentum, “like spinning rifle bullets” – as Feynman puts it.

However, he also adds: “But this ‘bullet’ picture is as incomplete as the ‘wave’ picture.” And so that’s true and that should be it. And it will be it. I will really end this ‘series’ now. It was quite a journey for me, as I am making my way through all of these complicated models and explanations of how things are supposed to work. But a fascinating one. And it sure gives me a much better feel for the ‘concepts’ that are hastily explained in all of these ‘popular’ books dealing with science and physics, hopefully preparing me better for what I should be doing, and that’s to read Penrose’s advanced mathematical theories.

Some content on this page was disabled on June 20, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Reflecting on complex numbers (again)

Pre-scriptum (dated 26 June 2020): This post – part of a series of rather simple posts on elementary math and physics – did not suffer much from the attack by the dark force—which is good because I still like it. Enjoy !

Original post:

This will surely be not my most readable post – if only because it’s soooooo long and – at times – quite ‘philosophical’. Indeed, it’s not very rigorous or formal, unlike those posts on complex analysis I wrote last year. At the same time, I think this post digs ‘deeper’, in a sense. Indeed, I really wanted to get to the heart of the ‘magic’ behind complex numbers. I’ll let you judge if I achieved that goal.

Complex numbers: why are they useful?

The previous post demonstrated the power of complex numbers (i.e. why they are used for), but it didn’t say much about what they are really. Indeed, we had a simple differential equation–an expression modeling an oscillator (read: a spring with a mass on it), with two terms only: d2x/dt2 = –ω2x–but so we could not solve it because of the minus sign in front of the term with the x.

Indeed, the so-called characteristic equation for this differential equation is r2 = –ω2 and so we’re in trouble here because there is no real-valued r that solves this. However, allowing complex-valued roots (r = ±iω) to solve the characteristic equation does the trick. Let’s analyze what we did (and don’t worry if you don’t ‘get’ this: it’s not essential to understand what follows):

  • Using those complex roots, we wrote the general solution for the differential equation as Aeiωt+ Beiωt. Now, note that everything is complex in this general solution, not only the eiωt and eiωt  ‘components’ but also the (random) coefficients A and B.
  • However, because we wanted to find a real-valued function in the end (remember: x is a vertical displacement from an equilibrium position x = 0, so that’s ‘real’ indeed), we imposed the condition that Aeiωtand Beiωt had to be each other’s complex conjugate. Hence, B must beequal to A* and our ‘general’ (real-valued) solution was Aeiωt+ A*eiωt. So we only have one complex (but equally random) coefficient now – A – and we get the other one (A*) for free, so to say.
  • Writing A in polar notation, i.e. substituting A for A = x0eiΔ, which implies that A* = x0e–iΔ, yields A0eiΔeiωt + A0e-iΔeiω = A0[ei(ωt + Δ) + ei(ωt + Δ)].
  • Expanding this, using Euler’s formula (and the fact that cos(-α) = cosα but sin(-α) = sinα) then gives us, finally, the following (real-valued) functional form for x:

A0[cos(ωt + Δ) + isin(ωt + Δ) + cos(ωt + Δ) – isin(ωt + Δ)]

= 2A0cos(ωt + Δ) = x0cos(ωt + Δ)

That’s easy enough to follow, I guess (everything is relative of course), but do we really understand what we’re doing here? Let me rephrase what’s going on here:

  • In the initial problem, our dependent variable x(t) was the vertical displacement, so that was a real-valued function of a real-valued (independent) variable (time).
  • Now, we kept the independent variable t real – time is always real, never imaginary 🙂 – but so we made x = x(t) a complex (dependent) variable by equating x(t) with the complex-valued exponential ert. So we’re doing a substitution here really.
  • Now, if ert is complex-valued, it means, of course, that r is complex and so that allows us to equate r with the square root of a negative number (r = ±iω).
  • We then plug these imaginary roots back in and get a general complex-valued solution (as expected).
  • However, we then impose the condition that the imaginary part of our solution should be zero.

In other words, we had a family of complex-valued functions as a general solution for the differential equation, but we limited the solution set to a somewhat less general solution including real-valued functions only.

OK. We all get this. But it doesn’t mean we ‘understand’ complex numbers. Let’s try to take the magic out of those complex numbers.

Complex numbers: what are they?

I’ve devoted two or three posts to this already (October-November 2013) but let’s go back to basics. Let’s start with that imaginary unit i. The essence of– and, yes, I am using the term ‘essence’ in a very ‘philosophical’ sense here I guess: i‘s intrinsic nature, so to speak – is that its square is equal to minus one: i2= –1.

That’s it really. We don’t need more. Of course, we can associate i with lots of other things if we would want to (and we will, of course!), such as Euler’s formula for example, but these associations are not essential – or not as essential as this definition I should say. Indeed, while that ‘rule’ or ‘definition’ is totally weird and – at first sight – totally random, it’s the only one we need: all other arithmetic rules do not change and, in fact, it’s just that one extra rule that allows us to deal with any algebraic equation – so that’s literally every equation involving addition, multiplication and exponentiation (so that’s every polynomial basically). However, stating that i2= –1 still doesn’t answer the question: what is a complex number really?

In order to not get too confused, I’ve started to think we should just take complex numbers at face value: it’s the sum of (i) some real number and (ii) a so-called imaginary part, which consists of another real number multiplied with i. [So the only ‘imaginary’ bit is, once again, i: all the rest is real! ] Now, when I say the ‘sum’, then that’s not some kind of ‘new’ sum. Well… Let me qualify that. It’s not some kind of ‘new’ sum because we’re just adding two things the way we’re used to: two and two apples are four apples, and one orange plus two more is three. However, it is true that we’re adding two separate beasts now, so to say, and so we do keep the things with an i in them separate from the real bits. In short, we do keep the apples and the oranges separate.

Now, I would like to be able to say that multiplication of complex numbers is just as straightforward as adding them, but that’s not true. When we multiply complex numbers, that i2= –1 rule kicks in and produces some ‘effects’ that are logical but not all that ‘straightforward’ I’d say.

Let’s take a simple example–but a significant one (if only because we’ll use the result later): let’s multiply a complex number with itself, i.e. let’s take the square of a complex number. We get (a + bi)2= (a + bi)(a + bi) = a·a + a·(bi) + (bi)·a + (bi)·(bi) = a+ 2abi + b2i= a2 + 2abi – b2. That’s very different as compared to the square of a real sum a + b: (a + b)= a+ 2ab + b2. How? Just look at it: we’ve got a real bit (a2 – b2) and then an imaginary bit (2abi). So what?

Well… The thumbnail graph below illustrates the difference for a = b: it maps x to (a) 4x[i.e. (x + x)2] and to (b) 2x2 [i.e. (x + ix)2] respectively. Indeed, when we’re squaring real numbers, we get (a + b)= 4a2–i.e. a ‘real bit’ only, of course!–but when we’re squaring complex numbers, we need to keep track of two components: the real part and the imaginary part. However, the real part (a2 – b2) is zero in this case (a = b), and so it’s only the imaginary part 2abi = 2a2i that counts!

graph (2)

That’s kids stuff, you’ll say… In fact, when you’re a mathematician, you’ll say it’s a nonsensical graph. Why? Because it compares an apple and an orange really: we want to show 2ixreally, not 2x2.

That’s true. However, that’s why the graph is actually useful. The red graph introduces a new idea, and with a ‘new’ idea I mean something that’s not inherent in the i2= –1 identity: it associates i with the vertical axis in the two-dimensional plane.

Hmm… This is an idea that is ‘nice’ – very nice actually – but, once again, I should note that it’s not part of i‘s essence. Indeed, the Italian mathematicians who first ‘invented’ complex numbers in the early 16th century (Tartaglia (‘the Stammerer’) and da Vinci’s friend Cardano) introduced roots of –1 because they needed them to solve algebraic equations. That’s it. Full stop. It was only much later (some hundred years later that is!) that Euler and Descartes associated imaginary numbers (like 2ix2) with the vertical coordinate axis. To my readers who have managed not to fall asleep while reading this: please continue till the end, and you will understand why I am saying the idea of a geometrical interpretation is ‘not essential’.

To the same readers, I’ll also say the following, however: if we do associate complex numbers with a second dimension, then we can associate the algebraic operations with things we can visualize in space. Most of you–all of you I should say–know that already, obviously, but let’s just have a look at that to make sure we’re on the same page.

A very basic thing in physical mathematics is reversing the direction of something. Things go in one direction, but we should be able to visualize them going in the opposite direction. We may associate this with a variable going from 0 to infinity (+∞): it may be time (t), or a time-dependent variable x, y or z. Of course, we know what we have here: we think of the positive real axis. So, what we do when we multiply with –1 is reversing its direction, and so then we’re talking the negative real axis: a variable going from 0 to minus infinity (-∞). Therefore, we can associate multiplication by –1 with a full rotation around the center (i.e. around the zero point) by 180 degrees (i.e. by π, in radians).imaginary_rotation

You may think that’s a weird way of looking at multiplication by minus one. Well… Yes and no. But think of it: the concept of negative numbers is actually as ‘weird’ as the concept of the imaginary unit in a way. I mean… Think about it: we’re used to use negative numbers because we learned about them when we were very small kids but what are they really? What does it mean to have minus three apples? You know the answer of course: it probably means that you owe someone three apples but that you don’t have any right now. 🙂 […] But that’s not the point here. I hope you see what I mean: negative numbers are weird too, in a sense. Indeed, we should be aware of the fact that we often look at concepts as being ‘weird’ because we weren’t exposed to them early enough: the great mathematician Leonhard Euler thought complex numbers were so ‘essential’ to math and, hence, so ‘natural’ that he thought kids should learn complex numbers as soon as they started learning ‘real’ numbers. In fact, he probably thought we should only be using complex numbers because… Well… They make the arithmetic space complete, so to say. […] But then I guess that’s because Euler understood complex numbers in a way we don’t, which is why I am writing about them here. 🙂

OK. Back to the main story line. In order to understand complex numbers somewhat better, it is actually useful – but, again, not necessarily essential – to think of i as a halfway rotation, i.e. a rotation by 90 degrees only, clockwise or counterclockwise, as illustrated above: multiplication with i means a counterclockwise rotation by 90 degrees (or π/2 radians) and multiplication with –i means a clockwise rotation by the same amount. Again, the minus sign gives the direction here: clockwise or counterclockwise. It works indeed: i·i =(-i)·(-i) = –1.

OK. Let’s wrap this up: we might say that

  • a positive real number is associated with some (absolute) quantity (i.e. a magnitude);
  • a minus sign says: “Go the opposite way! Go back! Subtract!”– so it’s associated with the opposite direction or the opposite of something in general; and, finally,
  • the imaginary unit adds a second dimension: instead of moving on a line only, we can now walk around on a plane.

Once we understand that, it’s easy to understand why, in most applications of complex numbers, you’ll see the polar notation for complex numbers. Indeed, instead of writing a complex number z as z = a+ ib, we’ll usually see it written as:

z = reiθ with eiθ = cosθ + isinθ

Huh? Well… Yes. Let me throw it in here straight away. You know this formula: it’s Euler’s formula. The so-called ‘magical’ formula! Indeed, Feynman calls it ‘our jewel’: the ‘most remarkable formula in mathematics’ as he puts it. Waw ! If he says so, it must be right. 🙂 So let’s try to understand it.

Is it magical really? Well… I guess the answer is ‘Yes’ and ‘No’ at the same time:

  • No. There is no ‘magic’ here. Associating the real part a and the imaginary part b with a magnitude r and an angle θ (a = rcosθ and b = rcosθ) is actually just an application of the Pythagorean theorem, so that’s ‘magic’ you learnt when you were very little and, hence, it does not look like magic anymore. [Although you should try to appreciate its ‘magic’ once again, I feel. Remember that you heard about the Pythagorean theorem because your teacher wanted to tell you what the square root of 2 actually is: a so-called irrational number that we get by taking the ‘one-half power’ of 2, i.e. 21/2 = 20.5, or, what amounts to the same, the square root of 2. Of course, you and I are both used to irrational numbers now, like 21/2, but they are also ‘weird’. As weird as i. In fact, it is said that the Greek mathematician who claimed their existence was exiled, because these irrational numbers did not fit into the (early) Pythagorean school of thought. Indeed, that school of thought wanted to reduce geometry to whole numbers and their ratios only. So there was no place for irrational numbers there!]
  • Yes. It is ‘magical’. Associating eiθ – so that’s a complex exponential function really! – with the unit circle is something you learnt much later in life only, if ever. It’s a strange thing indeed: we have a real (but, I admit, irrational) number here – e is 2.718 followed by an infinite number of decimals as you know, just like π – and then we raise to the power iθ, so that’s i once again multiplied by a real number θ (i.e. the so-called phase or – to put it simply – the angle). By now, we know what it means to multiply something with i, and–of course–we also know what exponentiation is (it’s just a shorthand for repeated multiplication), but we haven’t defined complex exponentials yet.

In fact… That’s what we’re going to do here. But in a rather ‘weird’ way as you will see: we won’t define them really but we’ll calculate them. For the moment, however, we’ll leave it at this and just note that, through Euler’s relation, we can see how a fraction or a multiple of i, e.g. 0.1i or 2.3i, corresponds to a fraction or a multiple of the angle associated with i, i.e. 0.1 times π/2 or 2.3 times π/2. In other words, Euler’s formula shows how the second (spatial) dimension is associated with the concept of the angle.

[…] And then the third (spatial) dimension is, of course, easy to add: it’s just an angle in another direction. What direction? Well… An angle away from the plane that we just formed by introducing that first angle. 🙂 […] So, from our zero point (here and now), we use a ruler to draw lines, and then a compass to measure angles away from that line, and then we create a plane, and then we can just add dimensions as we please by adding more ‘angles’ away from what we already have (a line, or a plane, and any higher-dimensional thing really).

Dimensions

I feel I need to digress briefly here, just to make sure we’re on the same page. Dimensions. What is a dimension in physics or in math? What do we mean if we say that spacetime is a four-dimensional continuum? From what we wrote above, the concept of a spatial dimension should be obvious: we have three dimensions in space (the x, y and z direction), and so we need three numbers indeed to describe the position of an object, from our point of view that is (i.e. in our reference frame).

But so we also have a fourth number: time. By now, you also know that, just like position and/or motion in space, time is relative too: that is relative to some frame of reference indeed. So, yes, we need four numbers, i.e. four dimensions, to describe an event in spacetime. That being said, time is obviously still something different (I mean different than space), despite the fact that Einstein’s relativity theory relates it to space: indeed, we showed in our post on (special) relativity that there’s no such thing as absolute time. However, that actually reinforces the point: a point in time is something fundamentally different than a point in space. Despite the fact that

  1. Time is just like a space dimension in the physical-mathematical meaning of the term ‘dimension’ (a dimension of a space or an object is one of the coordinates that is needed to specify a point within that space, or to ‘locate’ the object – both in time and space that is); and that,
  2. We can express distance and time in the same units because the speed of light is absolute (so that allows us to express time in meter, despite the fact that time is relative or “local”, as Hendrik Lorentz called it); and that, finally,
  3. If we do that (i.e. if we express time and distance in equivalent units), the equations for space and time in the Lorentz transformation equations mirror each other nicely – ‘mixing’ the space and time variables in the same way, so to say – and, therefore, space and time do form a ‘kind of union’, as Minkowski famously said;

Despite all that, time and space are fundamentally different things. Perhaps not for God – because He (or She, or It?) is said to be Everywhere Always – but surely for us, humans. For us, humans, always busy constructing that mental space with our ruler and our compass, time is and remains the one and only truly independent variable. Indeed, for us, mortal beings, the clocks just tick (locally indeed – that’s why I am using a plural: clocks – but that doesn’t change the fact they’re ticking, and in one direction only).

And so things happen and equations such as the one we started with – i.e. the differential equation modeling the behavior of an oscillator – show us how they happen. In one of my previous posts, I also showed why the laws of physics do not allow us to reverse time, but I won’t talk about that here. Let’s get back to complex numbers. Indeed, I am only talking about dimensions here because, despite all I wrote above about the imaginary axis in the complex plane, the thing to note here is that we did not use complex numbers in the physical-mathematical problem above to bring in an extra spatial dimension.

We just did it because we could not solve the equation with one-dimensional numbers only: we needed to take the square root of a negative number and we couldn’t. That was it basically. So there was no intention of bringing in a y- or z-dimension, and we didn’t. If we would have wanted to do that, we would have had to insert another dependent variable in the differential equation, and so it would have become a so-called partial differential equation in two or three dependent variables (x, y and z), with time – once again – as the independent variable (t). [A differential equation in one variable only (real- or complex-valued), like the ones we’re used to now, are referred to as ordinary differential equations, as opposed to… no, not extraordinary, but partial differential equations.]

In fact, if we would have generalized to two- or three-dimensional space, we would have run into the same type of problem (roots of negative numbers) when trying to solve the partial differential equation and so we would have needed complex-valued variables to solve it analytically in this case too. So we would have three ‘dimensions’ but each ‘dimension’ would be associated with complex (i.e. ‘two-dimensional) numbers. Is this getting complicated? I guess so.

The point is that, when studying physics or math, we will have to get used to the fact that these ‘two-dimensional numbers’ which we introduced, i.e. complex numbers, are actually more ‘natural’ ‘numbers’ to work with from a purely analytic point of view (as for the meaning of ‘analytic’, just read it as ‘logical problem-solving’), especially when we write them in their polar form, i.e. as complex exponentials. We can then take advantage of that wonderful property that they already are a functional form (z =reiθ), so to speak, and that their first, second etcetera derivative is easy to calculate because that ‘functional form’ is an exponential, and exponentials come back to themselves when taking the derivative (with the coefficient in the exponent in front). That makes the differential equation a simple algebraic equation (i.e. without derivatives involved), which is easy to solve.

In short, we should just look at complex numbers here (i.e. in the context of my three previous posts, or in the context of differential equations in general) as a computational device, not as an attempt to add an extra spatial dimension to the analysis.

Now, that’s probably the reason why Feynman inserts a chapter on ‘algebra’ that, at first, does not seem to make much sense. As usual, however, I worked through it and then found it to be both instructive as well as intriguing because it makes the point that complex exponentials are, first and foremost, an algebraic thing, not a geometrical thing.

I’ll try to present his argument here but don’t worry if you can’t or don’t want to follow it all the way through because… Well… It’s a bit ‘weird’ indeed, and I must admit I haven’t quite come to terms with it myself. On the other hand, if you’re ready for some thinking ‘outside of the box’, I assure you that I haven’t found anything like this in a math textbook or on the Web. This proves the fact that Feynman was a bit of a maverick… Well… In any case, I’ll let you judge. Now that you’re here, I would really encourage you to read the whole thing, as loooooooong as it is.

Complex exponentials from an algebraic point of view: introduction

Exponentiation is nothing but repeated multiplication. That’s easy to understand when the exponents are integers: a to the power n (an) is a×a×a×a×… etcetera – repeated n times, so we have n factors (all equal to a) in the product. That’s very straightforward.

Now, to understand rational exponents (so that’s an m/n exponent, with m and n integers), we just need to understand one thing more, and that is the inverse operation of exponentiation, i.e. the nth root. We then get am/n = (am)1/n. So, that’s easy too. […] Well… No. Not that easy. In fact, our problems starts right here:

  • If n is even, and a is a positive real number, we have two (real) nth roots a1/n: ± a1/n.
  • However, if a is negative (and n is still even obviously), then we have a problem. There’s no real nth root of a in that case. That’s why Cardano invented i: we’ll associate an even root of a negative real number with two complex-valued roots.
  • What if n is uneven? Then we have only one real root: it’s positive when a is positive, and negative when a is negative. Done.

But let’s not complicate matters from the start. The point here is to do some algebra that should help us to understand complex exponentials. However, I will make one small digression, and that’s on logarithmic functions. It’s not essential but, again, useful. […] Well… Maybe. 🙂 I hope so. 🙂

We know that exponentials are actually associated with two inverse operations:

  1. Given some value y and some number n, we can take the nth root of y (y1/n) to find the original base x for which y = xn.
  2. Given some value y and some number a, we can take the logarithm (to base a) of y to find the original exponent x for which y = ax.

In the first case, the problem is: given n, find x for which y = xn. In the second case, the problem is: given a, find x for which y = ax. Is that complicated? Probably. In order to further confuse you, I’ve inserted a thumbnail graph with y = 2x (so that’s the exponential function with base 2) and y = log2x (so that’s the logarithmic function with base 2). You can see these two functions mirror each other, with the x = y line as the mirror axis.

graph

We usually find logarithms more ‘difficult’ than roots (I do, for sure), but that’s just because we usually learn about them much later in life–like in a senior high school class, for example, as opposed to a junior high school class (I am just guessing, but you know what I mean).

In addition, we have these extra symbols ‘log‘–L-O-G :-)–to express the function. Indeed, we use just two symbols to write the y = 2function: 2 and x – and then the meaning is clear from where we write these: we write 2 in normal script and x as a superscript and so we know that’s exponentiation. But so we’re not so economical for the logarithmic function. Not at all. In fact, we use three symbols for the logarithmic function: (1) ‘log’ (which is quite verbose as a symbol in itself, because it consists of three letters), (2) 2 and (3) x. That’s not economical at all! Indeed, why don’t we just write y = 2x or something? So that’s a subscript in front, instead of a superscript behind. It would work. It’s just a matter of getting used to it, i.e. it’s just a convention in other words.

Of course, I am joking a bit here but you get my point: in essence, the logarithmic function should not come across as being more ‘difficult’ or less ‘natural’ than the exponential function: exponentiation involves two numbers – a base and an exponent – and, hence, it’s logical that we have two inverse operations, rather than one. [You’ll say that a sum or a product involves (at least) two terms or two factors as well, so why don’t they have two inverse operations? Well… Addition and multiplication are commutative operations: a+b = b+a, and a·b = b·a. Exponentiation isn’t: a≠ na. That’s why. Check it: 2×3 = 3×2, but 23 = 8 ≠ 3= 9.]

Now, apart from us ‘liking’ exponential functions more than logarithmic functions because of the non-relevant fact that we learned about log functions only much later in our life, we will usually also have a strong preference for one or the other base for an exponential. The most preferred base is, obviously, ten (10). We use that base in so-called scientific notations for numbers. For example: the elementary charge (i.e. the charge of an electron) is approximately –1.6×10−19 coulombs. […] Oh… We have a minus sign in the exponent here (–19). So what’s that? Sorry. I forgot to mention that. But it’s easy: a–n = (an)–1 = 1/an.

Our most preferred base is 10 because we have a decimal system, and we have a decimal system because we have ten fingers. Indeed, the Maya used a base-20 system because they used their toes to count as well (so they counted in twenties instead of tens), and it also seems that some tribes had octal (base-8) systems because they used the spaces between their fingers, rather than the fingers themselves. And, of course, we all know that computers use a base-2 system because… Well… Because they’re computers. In any case, 10 is called the common base, because… Well… Because it’s common.

However, by now you know that, in physics and mathematics, we prefer that strange numberas a base. However, remember it’s not that strange: it’s just a number like π. Why do we call it ‘natural’? Because of that nice property: the derivative of the exponential function ecomes back to itself: d(ex)/dt = ex. That’s not the case for 10x. In fact, taking the derivative of 10is pretty easy too: we just need to put a coefficient in front. To be specific, we need to put the logarithm (to base e) of the base of our exponential function (i.e. 10) in front: d(10x)/dt = 10xln(10). [Ln(10) is yet another notation that has been introduced, it seems, to confuse young kids and ensure they hate logarithms: ln(10) is just loge(10) or, if I would have had my way in terms of conventions (which would ensure an ‘economic’ use of symbols), we could also write ln(10) = e10. :-)]

Stop! I am going way too fast here. We first need to define what irrational powers are! Indeed, from all that I’ve written so far, you can imagine what am/n is (am/n  = am)1/n, but what if m is not an integer? What if m equals the square root of 2, for example? In other words, what is 10or ex  or 2or whatever for irrational exponents?

We all sort of ‘know’ what irrationals are: it involves limits, infinitesimals, fractions of fractions, Dedekind cuts. Whatever, even if you don’t understand a word of what I am writing here, you do – intuitively: irrationals can be approximated by fractions of fractions. The grand idea is that we divide some number by 2, and then we divide by 2 once again (so we divide by 4), and then once again (so we take 1/8), and again (1/16), and so on and so on. These are Dedekind cuts. Of course, dividing by two is a pretty random way of cutting things up. Why don’t we divide by three, or by four, for example? Well… It’s the same as with those other ‘natural’ numbers: we have to start somewhere and so this  ‘binary’ way of cutting things up is probably the most ‘natural’. 🙂 [Have you noticed how many ‘natural’ numbers we’ve mentioned already: 10, e, π, 2… And one (1) itself of course. :-)]

So we’ll use something like Dedekind cuts for irrational powers as well. We’ll define them as a sort of limit (in fact, that’s exactly what they are) and so we have to find some approximation (or convergence) process that allows us to do so.

We’ll start with base 10 here because, as mentioned above, base 10 comes across as more ‘natural’ (or ‘common’) to us non-mathematicians than the so-called ‘natural’ base e. However, I should note that the base doesn’t matter much because it’s quite easy to switch from one base to another. Indeed, we can always write a= (bk)= bks = bt with a = band t = k·s (as for k, k is obviously equal to logb(a). From this simple formula, you can see that changing base amounts to changing the horizontal scale: we replace s by t = k·s. That’s it. So don’t worry about our choice of base. 🙂

Complex exponentials from an algebraic point of view: well… Not the introduction 🙂

Ouf! So much stuff! But so here we go. We take base 10 and see how such an approximation of an irrational power of 10 (10x) looks like. Of course, we can write any irrational number x as some (positive or negative) integer plus an endless series of decimals after the zero (e.g. e = 2 + 0.7182818284590452… etc). So let’s just focus on numbers between 0 and 1 as for now (so we’ll take the integer out of the total, so to speak). In fact, before we start, I’ll cheat and show you the result, just to make sure you can follow the argument a bit.

graph (3)Yes. That’s how 10x looks like, but so we don’t know that yet because we don’t know what irrational powers are, and so we can’t make a graph like that–yet. We only know very general things right now, such as:

  • 100 = 1 and 101 = 10 etcetera.
  • Most importantly, we know that 10m/n  = (10m)1/n = (101/n)for integer m and n.

In fact, we’ll use the second fact to calculate 10x for x = 1/2, 1/4, 1/8, 1/16, and so on and so on. We’ll go all the way down to where x becomes a fraction very close to zero: that’s the table below. Note that the x values in the table are rational fractions 1/2, 1/4, 1/8 etcetera indeed, so x is not an irrational exponent: x is a real number but rational, so x can be expressed either as a fraction of two integers m and n (m = 1 and n = 1, 4, 8, 16, 32 and so on here), or as a decimal number with a finite number of decimals behind the decimal point (0.5, 0.25, 0.125, 0.0625 etcetera).

Capture

The third column gives the value 10x for these fractions x = 1/2, 1/4, 1/8 etcetera. How do we get these? Hmm… It’s true. I am jumping over another hurdle here. The key assumption behind the table is that we know how to take the square root of a number, so that we can calculate 101/2, to quite some precision indeed, as 101/2 = 3.162278 (and there’s more decimals but we’re not too interested in them right now), and then that we can take the square root of that value (3.162278). That’s quite an assumption indeed.

However, if we don’t want this post to become a book in itself, then I must assume we can do that. In fact, I’ve done it with a calculator here but, before there were calculators, this kind of calculations could and had to be done with a table of logarithms. That’s because of a very convenient property of logarithms: logc(ab) =logc(a) + logc(b). However, as said, I should be writing a post here only, not a book. [Already now, this post beats the record in terms of length and verbosity…] So I’ll just ask you to accept that – at this stage – we know how to calculate the square root of something and, therefore, to accept that we can take the square root not only of 10 but of any number really, including 3.162278, and then the root of that number, and then the root of that result, and so and so on. So that gives us the values in the third column of the table above: they’re successive square roots. [Please do double-check! It will help you to understand what I am writing about here.]

So… Back to the main story. What we are doing in the table above is to take the square root in succession, so that’s (101/2)1/2 = 101/4, and then again: (101/4)1/2 = 101/8 , and then again: (101/8)1/2 = 101/16 , so we get 101/2, 101/4, 101/8, 101/16, 101/32 and so on and so on. All the way down. Well… Not all the way down. In fact, in the table above, we stop after ten iterations already, so that’s when x = 1/1024. [Note that 1/1024 is 2 to the power minus 10: 2–10 = 1/210   = 1/1024. I am just throwing that in here because that little ‘fact’ will come in handy later.]

Why do we stop after ten iterations? Well… Actually, there’s no real good reason to stop at exactly ten iterations. We could have 15 iterations: then x would be 1/215 = 1/32768. Or 20 (x = 1/1048576). Or 39 (x = 1/too many digits to write down). Whatever. However, we start to notice something interesting that actually allows us to stop. We note that 10 to the power x (10x) tends to one as x becomes very small.

Now you’re laughing. Well… Surely ! That’s what we’d expect, isn’t it? 10= 1. Is that the grand conclusion?

No.

The question is how small should x be? That’s where the fourth column of the table above comes in. We’re calculating a number there that converges to some value quite near to 2.3 as x goes to zero and – importantly – it converges rather quickly. In fact, if you’d do the calculations yourself, you’d see that it converges to 2.302585 after a while. [With Excel or some similar application, you can do 20 or more iterations in no time, and so that’s what you’ll find.]

Of course, we can keep going and continue adding zillions of decimals to this number but we don’t want to do that: 2.302585 is fine. We don’t need any more decimals. Why? Well… We’re going to use this number to approximate 10near x = 0: it turns out that we can get a real good approximation of 10x near x = 0 using that 2.302585 factor, so we can write that

10≈ 1 + 2.302585x

That approximation is the last column in the table above. In order to show you how good it is as an ‘approximation’, I’ve plotted the actual values for 10x (blue markers) and the approximated values for 10x (black markers) using that 1 + 2.302585x formula. You can see it’s a pretty good match indeed if x is small. And ‘small’ here is not that small: a ratio like x = 1/8 (i.e. x = 0.125) is good enough already! In fact, the graph below shows that 1/16 = 0.0625 is almost perfect! So we don’t need to ‘go down’ too far: ten iterations is plenty!

Capture

I’ve probably ‘lost’ you by now. What are we doing here really? How did we get that linear approximation formula, and why do we need it? Well… See the last column: we calculate (10x–1)/x, so that’s the difference between 10and 1 divided by the (fractional) exponent x and we see, indeed, that that number converges to a value very near to 2.302585. Why? Well… What we are actually doing is calculating the gradient of 10x, i.e. the slope of the tangent line to the (non-linear) 10x curve. That’s what’s shown in the graph below.

graph (1)

Working backwards, we can then re-write (10x–1)/x ≈ 2.302585 as 10≈ 1 + 2.302585x indeed.

So what we’ve got here is quite standard: we know we can approximate a non-linear curve with a linear curve, using the gradient near the point that we’re observing (and so that’s near the point x = 0 in this case) and so that‘s what we’re doing here.

Of course, you should remember that we cannot actually plot a smooth curve like that, for the moment that is, because we can only calculate 10x for rational real numbers. However, it’s easy to generalize and just ‘fill the gaps’ so to speak, and so that’s how irrational powers are defined really.

Hmm… So what’s the next step? Well… The next step is not to continue and continue and continue and continue etcetera to show that the smooth curve above is, indeed, the graph of 10x. No. The next step is to use that linear approximation to algebraically calculate the value of 10is, so that’s a power of 10 with a complex exponent.

HUH!? 

Yes. That’s the gem I found in Feynman’s 1965 Lectures. [Well… One of the gems, I should say. There are many. :-)]

It’s quite interesting. In his little chapter on ‘algebra’ (Lectures, I-22), Feynman just assumes that this ‘law’ that 10= 1 + 2.302585x is not only ‘correct’ for small real fractions x but also for very small complex fractions, and then he just reverses the procedure above to calculate 10ifor larger values of x. Let’s see how that goes.

However, let’s first switch the variable from x to s, because we’re talking complex numbers now. Indeed, I can’t use the symbol x as I used it above anymore because x is now the real part of some complex number 10is. In addition, I should note that Feynman introduces this delta (Δ). The idea behind is to make things somewhat easier to read by relating s to an integer: Δ = 1024s, so Δ = 1, 2, 4, 8,… 1024 for s = 1/1024, 1/512, 1/256 etcetera (see the second column in the table below). I am not entirely sure why he does that: Feynman must think fractions are harder to ‘read’. [Frankly, the introduction of this Δ makes Feynman’s exposé somewhat harder to ‘read’ IMHO – but that’s just a matter of taste, I guess.] Of course, the approximation then becomes

10= 1 + 2.302585·Δ/1024 = 1 + 0.0022486Δ. 

The table below is the one that Feynman uses. The important thing is that you understand the first line in this table: 10i/1024 = 1 + 0.00225i·Δ1 + 0.00225i·1 = 1 + 0.00225i. And then we go to the second line: 10i/512 = 10i/1024·10i/1024 = 102i/1024 = 10i/512, so we’re doing the reverse thing here: we don’t take square roots but we square what we’ve found already. So we multiply 1 + 0.00225i with itself and get (1+0.00225i)(1+0.00225i) =  1 + 2·0.00225i + 0.002252i2 = 1 – 0.000005 + 0.45i ≈ 0.999995 + 0.45i ≈ 1 + 0.0045i.

Capture 1

Let’s go to the third line now. In fact, what we’re doing here is working our way back up, i.e. all the way from s = 1/1024 to s = 1. And that’s where the ‘magic’ of i (i.e. the fact that i2 = –1) is starting to show: (0.999995+0.0045i)2 =  0.99999 + 2·0.999995·0.0045i + 0.00452i= 0.99997 + 0.009i. So the real part of 10iis changing as well – it is decreasing in fact! Why is that? Because of the term with the ifactor! [I write 0.99997 instead of 0.99996 because I round up here, while Feynman consistently rounds down.]

So now the game is clear: we take larger and larger fractions s (i/512, i/256, i/128, etcetera), and calculate 10iby squaring the previous result. After ten iterations, we get the grand result for s = i/1 = i:

10is = –0.66928 + 0.74332i (more or less that is)

Note the minus sign in front of the real part, and look at the intermediate values for x and y too. Isn’t that remarkable?

OK. Waw ! But… So what? What’s next?

Well… To graph 10is, we should not just keep squaring things because that amounts to doubling the exponent again and again and so that means the argument is just making larger and larger jumps along the positive real axis really (see that graph that I made above: the distance between the successive values of x gets larger and larger, and so that’s a bad recipe for a smooth graph).

So what can we do? Well… We should just take a sufficiently small power, i/8 for example, and multiply that with 1, 2, 3 etcetera so we get something more ‘regular’. That’s what’s done in the table below and what’s represented in the graph underneath (to get the scale of the horizontal axis, note that s = p/8).

Capture 2

Capture 3

Hey! Look at that! There we are! That’s the graph we were looking for: it shows a (complex) exponential (10is) as a periodic (complex-valued) function with the real part behaving like a cosine function and the imaginary part behaving like as a sine function.

Note the upper and lower bounds: +1 and –1. Indeed, it doesn’t seem to matter whether we use 10 or as a base: the x and y part oscillate between −1 and +1. So, whatever the base, we’ll see the same pattern: the base only changes the scale of the horizontal axis (i.e. s). However, that being said, because of this scale factor, I do need to say like a cosine/sine function when discussing that graph above. So I cannot say they are a cosine and a sine function. Feynman calls these functions algebraic sine and cosine functions.

But – remember! – we can always switch base through a clever substitution so 10is = eit and recalculate stuff to whatever number of decimals behind the decimal point we’d want. So let’s do that: let’s switch to base e. WOW! What happens?

We then [Finally! you’ll say!] get values that – Surprise ! Surprise ! – correspond to the real cosine and sine function. That then, in turn, allows us to just substitute the ‘algebraic’ cosine and sine function for the ‘real’ cosine in an equation that – Yes! – is Euler’s formula itself:

ei= cos(t) + isin(t)

So that’s it. End of story.

[…]

You’ll say: So what? Well… Not sure what to say. I think this is rather remarkable. This is not the formal mathematical proof of Euler’s formula (at least not of the kind that you’ll find in a textbook or on Wikipedia). No, we are just calculating the values x and y of ei= x + iy using an approximation process used to calculate real powers and then, well… Just some bold assumption involving infinitesimals really.

I think this is amazing stuff (even if I’ll downplay that statement a bit in my post scriptum). I really don’t understand these things the way I would like to understand them. I guess I just haven’t got the right kind of brain for these things. 😦 Indeed, just think about it: when we have the real exponential ex, then we’ve got that typical ‘rocket’ graph (i.e. the blue one in the graph below): just something blasting away indeed. But when we put in the exponent (eix), then we get two components oscillating up and down like the cosine and sine function. Well… Not only like the cosine and sine function: the green and red line– i.e. the real and imaginary part of eix!– actually are the cosine and sine function!

graph

Do you understand this in an intuitive way? Yes? You do? Waw ! Please write me and tell me how. I don’t. 😦

Oh well… The good thing about it is… Well… At least complex numbers will always stay ‘magical’ to me. 🙂

Post scriptum: When I write, above, that I don’t understand this in an intuitive way, I don’t mean to say it’s not logical. In fact, it is. It has to be, of course, because we’re talking math here! 🙂

The logic is pretty clear indeed. We have an exponential function here (y = 10x) and we’re evaluating that function in the neighborhood of x = 0 (we do it on the positive side only but we could, of course, do the same analysis on the other side as well). So then we use that very general mathematical procedure of calculating approximate values for the (non-linear) 10x curve using the gradient. So we plug in some differential value for x (in differential terms, we’d write Δx – but so the delta symbol here has nothing to do with Feynman’s Δ above) and, of course, we find Δy = 2.302585·Δx. So we add that to 1 (the value of 10at point x = 0) and, then, we go through these iterations, not using that linear equation any more, but the very fundamental property of an exponential function that 102x = (10x)2. So we start with an approximate value, but then the value we plug into these iterative calculations is the square of the previous value. So, to calculate the next points, we do not use an approximation method any more, but we just square the first result, and then the second and so on and so on, and that’s just calculation, not approximation.

[In fact, you may still wonder and think that it’s quite remarkable that the points we calculate using this process are so accurate, but that’s due to the rapid convergence of that value we found for the gradient. Well… Yes and no. Here I must admit that Feynman (and I) cheated a bit because we used a rather precise value for the gradient: 2.302585, so that’s six significant digits after the decimal point. Now, that value is actually calculated based on twenty (rather than 10) iterations when ‘going down’. But that little factoid is not embarrassing because it doesn’t change much: the argument itself is sound. Very sound.]

OK… That’s easy enough to understand. The thing that is not easy to understand – intuitively that is – is that we can just insert some complex differential Δs into that Δy = 2.302585·Δx equation. Isn’t it ‘weird’, indeed, that we can just use a complex fraction s = i/1024 to calculate our first point, instead of a real fraction x = 1/1024? It is. That’s the only thing really. Indeed, once we’ve done that, it’s plain sailing again: we just square the result to get the next result, and then we square that again, and so on and so on. However, that being said, the difference is that the ‘magic’ of i comes into play indeed. When squaring, we do not get a 4a2 result but an (a+bi)= a– b2 + 2abi. So it’s that minus sign and the i that give an entirely different ‘dynamic’ to how the function evolves from there (i.e. different as compared to working with a real base only). It’s all quite remarkable really because we start off with a really tiny value b here: 0.00225 to be precise, so that’s (less than) 1/445 ! [Of course, the real part a, at the point from where we start doing these iterations, is one.]

But so that first step is ‘weird’ indeed. Why is it no problem whatsoever to insert the complex fraction s = i/1024 into 1 + 2.302585o·s, instead of the real fraction 1/1024, and then afterwards, to square these complex numbers that we’re getting, instead of real numbers?

It just doesn’t feel right, does it? I must admit that, at first, I felt that Feynman was doing something ‘illegal’ too. But, obviously, he’s not. It’s plain mathematical logic. We have two functions here: one is linear (y = 1 + 2.302585·x), and the other is quadratic (y = x2) and so what’s happening really is that, at the point x = 0, we change the function. We substitute not x for ix really but y = 10for y = 10ix. So we still have an independent real variable x but, instead of a real-valued y = 10function, we now have a complex-valued y = 10ifunction.

However, the ‘output’ of that function, of course, is a complex y, not a real y. In our case, because we’re plotting a function really–to be precise, we’re calculating the exponential function y = 10through all these iterations–we get a complex-valued function of the shape that, by now, we know so well.

So it is ‘discontinuous’ in a way, and so I can’t say all that much about it. Look at the graph below where, once again, we have the real exponential function ex and then the two components of the complex exponential eix. This time, I’ve plotted them on both sides of the zero point because they’re continuous on both sides indeed. Imagine we’re walking along this blue ex curve from some negative x to zero. We’re familiar with the path. It has, for instance, that property we exploited above: as we doubled the ‘input’ (so from x we went to 2x), the ‘output’ went up not as the double but as the square of the original value: e2x = (ex)2. And then we also know that, around the point x = o, we can approximate it with a linear function. In fact, in this case, the linear approximation is super-simple: y = 1 + x. Indeed, the gradient for ex at point x = 0 is equal to 1! So, yes, we know and understand that blue curve. But then we arrive at point x = 0 and we decide something radical: we change the function!

graph (5)

Yes. That’s what we’re really doing in that very lengthy story above: ei is a complex-valued function of the real variable x. That’s something different. However, we continue to say that the approximation y = 1 + x must also be valid for complex x and y. So we say that ei= 1 + ix. Is that wrong? No. Not at all. Functional forms are functional forms and gradients are gradients: d(eix)/dx = ieix, and ieix at x = 0 is equal to ie0 = i! Hence, ei= 1 + ix is a perfectly legitimate linear approximation. And then it’s just the same thing again: we use that iteration mechanism to calculate successive squares of complex numbers because, for complex exponentials as well, we have e2(ix) = (eix)2.

So. The ‘magic’ is a lot of ‘confusion’ really. The point to note is that we do have a different function here: eiand e‘look’ similar–it’s just that i, right?but, in fact, when we replace x by ix in the exponent of e, that’s quite a radical change. We can use the same linear approximation at x = ix = 0 but then it’s over. Our blue graph stops: we’re no longer walking along it. I can’t even say it bifurcates, so to say, into the red and the green one, because it doesn’t. We’re talking apples and oranges indeed, and so the comparison is quickly done: they’re different. Full stop.

Is there any geometrical relationship between all these curves? Well… Yes and no. I can see one, at the very start: the gradient of our ex function at x = 0 is equal to unity (i.e. 1), and so that’s the same gradient as the gradient of the imaginary part of our new eifunction (the gradient of the real part is zero, before it becomes negative). But that’s just… I mean… That just comes out of Euler’s formula: e= cos(0) + isin(0). Honestly, it’s no use to try to be smart here and think about stuff like that. We’re no longer walking on the blue curve. We’re looking at a new function: a complex-valued function eix (instead of a real-valued function ex) of a real variable (x). That’s it. Just don’t try to relate the two too much: you switched functions. Full stop. It’s like changing trains! 🙂

So… What’s the conclusion? Well… I’d say: “Complex numbers can be analyzed as extensions of real numbers, so to say, but – frankly – they are different.

[…]

I’ll probably never understand complex numbers in the way I would like to understand them–that is like I understand that one plus one is two. However, this rather lengthy forage in the complex forest has helped me somewhat. I hope it helped you too.

Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

Time reversal and CPT symmetry (III)

Pre-scriptum (dated 26 June 2020): While my posts on symmetries (and why they may or may be broken) are somewhat mutilated (removal of illustrations and other material) as a result of an attack by the dark force, I am happy to see a lot of it survived more or less intact. While my views on the true nature of light, matter and the force or forces that act on them – all of the stuff that explains symmetries or symmetry-breaking, in other words – have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. 🙂

Original post:

Although I concluded my previous post by saying that I would not write anything more about CPT symmetry, I feel like I have done an injustice to Val Fitch, James Cronin, and all those other researchers who spent many man-years to painstakingly demonstrate how the weak force does not always respect the combined charge-parity (C-P) symmetry. Indeed, I did not want to denigrate their efforts when I noted that:

  1. These decaying kaons (i.e. the particles that are used to demonstrate the CP symmetry-breaking phenomenon) are rather exotic and very short-lived particles; and
  2. Researchers have not been able to find many other traces of non-respect of CP symmetry, except when studying a heavier version of these kaons (the so-called B- and D-mesons) as soon as these could be produced in higher volumes in newer (read: higher-energy) particle colliders (so that’s in the last ten or fifteen years only), but so these B- and D-mesons are even more rare and even less stable.

CP violation is CP violation: it’s plain weird, especially when Fermilab and CERN experiments observed direct CP violation in kaon decay processes. [Remember that the original 1964 Fitch-Cronin experiment could not directly observe CP violation: in their experiment, CP violation in neutral kaon decay processes could only be deduced from other (unexpected) decay processes.]

Why? When one reverses all of the charges and other variables (such as parity which – let me remind you – has to do with ‘left-handedness’ and ‘right-handedness’ of particles), then the process should go in the other direction in an exactly symmetric way. Full stop. If not, there’s some kind of ‘leakage’ so to say, and such ‘leakage’ would be ‘kind-of-OK’ when we’d be talking some kind of chemical or biological process, but it’s obviously not ‘kind-of-OK’ when we’re talking one of the fundamental forces. It’s just not ‘logical’.

Feynman versus ‘t Hooft: pro and contra CP-symmetry breaking

A remark that is much more relevant than the two comments above is that one of the most brilliant physicists of the 20th century, Richard Feynman, seemed to have refused to entertain the idea of CP-symmetry breaking. Indeed, while, in his 1965 Lectures, he devotes quite a bit of attention to Chien-Shiung Wu’s 1956 experiment with decaying cobalt-60 nuclei (i.e. the experiment which first demonstrated parity violation, i.e. the breaking of P-symmetry), he does not mention the 1964 Fitch-Cronin experiment, and all of his writing in these Lectures makes it very clear that he not only strongly believes that the combined CP symmetry holds, but that it’s also the only ‘symmetry’ that matters really, and the only one that Nature truly respects–always.

So Feynman was wrong. Of course, these Lectures were published less than a year after the 1964 Fitch-Cronin experiment and, hence, you might think he would have changed his ideas on the possibility of Nature not respecting CP-symmetry–just like Wolfgang Pauli, who could only accept the reality of Nature not respecting reflection symmetry (P-symmetry) after repeated experiments re-confirmed the results of Wu’s original 1956 experiment.

But – No! – Feynman’s 1985 book on quantum electrodynamics (QED) –so that’s five years after Fitch and Cronin got a Nobel Prize for their discovery– is equally skeptical on this point: he basically states that the weak force is “not well understood” and that he hopes that “a more beautiful and, hence, more accurate understanding” of things will emerge.

OK, you will say, but Feynman passed away shortly after (he died from a rare form of cancer in 1988) and, hence, we should now listen to the current generation of physicists.

You’re obviously right, so let’s look around. Hmm… Gerard ‘t Hooft? Yes ! He is 67 now but – despite his age – it is obvious that he surely qualifies as a ‘next-generation’ physicist. He got his Nobel Prize for “elucidating the quantum structure of electroweak interactions” (read: for clarifying how the weak force actually works) and he is also very enthusiastic about all these Grand Unified Theories (most notably string and superstring theory) and so, yes, he should surely know, shouldn’t he?

I guess so. However, even ‘t Hooft writes that these experiments with these ‘crazy kaons’ – as he calls them – show ‘violation’ indeed, but that it’s marginal: the very same experiments also show near-symmetry. What’s near-symmetry? Well… Just what the term says: the weak force is almost symmetrical. Hence, CP-symmetry is the norm and CP-asymmetry is only a marginal phenomenon. That being said, it’s there and, hence, it should be explained. How?

‘t Hooft himself writes that one could actually try to interpret the results of the experiment by adding some kind of ‘fifth’ force to our world view – a “super-weak force” as he calls it, which would interfere with the weak force only.

To be fair, he immediately adds that introducing such ‘fifth force’ doesn’t really solve the “mystery” of CP asymmetry, because, while we’d restore the principle of CP symmetry for the weak force interactions, we would then have to explain why this ‘super-weak’ force does not respect it. In short, we cannot just reason the problem away. Hence, ‘t Hooft’s conclusion in his 1996 book on The Ultimate Building Blocks of the universe is quite humble: “The deeper cause [of CP asymmetry] is likely to remain a mystery.” (‘t Hooft, 1996, Chapter 7: The crazy kaons)

What about other explanations? For example, you might be tempted to think these two or three exceptions to a thousand cases respecting the general rule must have something to do with quantum-mechanical uncertainty: when everything is said and done, we’re dealing with probabilities in quantum mechanics, aren’t we? Hence, exceptions do occur and are actually expected to occur.

No. Quantum indeterminism is not applicable here. While working with probability amplitudes and probabilities is effectively equivalent to stating some general rules involving some average or mean value and then some standard deviation from that average, we’ve got something else going on here: Fitch and Cronin took a full six months indeed–repeating the experiment over and over and over again–to firmly establish a statistically significant bias away from the theoretical average. Hence, even if the bias is only 0.2% or 0.3%, it is a statistically significant difference between the probability of a process going one way, and the probability of that very same process going the other way.

So what? There are so many non-reversible processes and asymmetries in this world: why don’t we just accept this?Well… I’ll just refer to my previous post on this one: we’re talking a fundamental force here – not some chemical reaction – and, hence, if we reverse all of the relevant charges (including things such as left-handed or right-handed spin), the reaction should go the other way, and with exactly the same probability. If it doesn’t, it’s plain weird. Full stop.

OK. […] But… Perhaps there is some external phenomenon affecting these likelihoods, like these omnipresent solar neutrinos indeed, which I mentioned in a previous post and which are all left-handed. So perhaps we should allow these to enter the equation as well. […] Well… I already said that would make sense–to some extent at least– because there is some flimsy evidence of solar flares affecting radioactive decay rates (solar flares and neutrino outbursts are closely related, so if solar flares impact radioactive decay, we could or should expect them to meddle with any beta decay process really). That being said, it would not make sense from other, more conventional, points of view: we cannot just ‘add’ neutrinos to the equation because then we’d be in trouble with the conservation laws, first and foremost the energy conservation law! So, even if we would be able to work out some kind of theoretical mechanism involving these left-handed solar neutrinos (which are literally all over the place, bombarding us constantly even if they’re very hard to detect), thus explaining the observed P-asymmetry, we would then have to explain why it violates the energy conservation law! Well… Good luck with that, I’d say!

So it is a conundrum really. Let me sum up the above discussion in two bullet points:

  1. While kaons are short-lived particles because of the presence of the second-generation (and, hence, unstable) s-quark, they are real particles (so they are not some resonance or some so-called virtual particle). Hence, studying their behavior in interactions with any force field (and, most notably, their behavior in regard to the weak force) is extremely relevant, and the observed CP asymmetry–no matter how small–is something which should really grab our attention.
  2. The philosophical implications of any form of non-respect of the combined CP symmetry for our common-sense notion of time are truly profound and, therefore, the Fitch-Cronin experiment rightly deserves a lot of accolades.

So let’s analyze these ‘philosophical implications’ (which is just a somewhat ‘charged’ term for the linkage between CP- and time-symmetry which I want to discuss here) somewhat more in detail.

Time reversal and CPT symmetry

In the previous posts, I said it’s probably useful to distinguish (a) time-reversal as a (loosely defined) philosophical concept from (b) the mathematical definition of time-reversal, which is much more precise and unambiguous. It’s the latter which is generally used in physics, and it amounts to putting a minus sign in front of all time variables in any equation describing some situation, process or system in physics. That’s it really. Nothing more.

The point that I wanted to make is that true time reversal – i.e. time-reversal in the ‘philosophical’ or ‘common-sense’ interpretation – also involves a reversal of the forces, and that’s done through reversing all charges causing those forces. I used the example of the movie as a metaphor: most movies, when played backwards, do not make sense, unless we reverse the forces. For example, seeing an object ‘fall back’ to where it was (before it started falling) in a movie playing backwards makes sense only if we would assume that masses repel, instead of attract, each other. Likewise, any static or dynamic electromagnetic phenomena we would see in that backwards playing movie would make sense only if we would assume that the charges of the protons and electrons causing the electromagnetic fields involved would be reversed. How? Well… I don’t know. Just imagine some magic.

In such world view–i.e. a world view which connects the arrow of time with real-life forces that cause our world to change– I also looked at the left- and right-handedness of particles as some kind of ‘charge’, because it co-determines how the weak force plays out. Hence, any phenomenon in the movie having to do with the weak force (such as beta decay) could also be time-reversed by making left-handed particles right-handed, and right-handed particles left-handed. In short, I said that, when it comes to time reversal, only a full CPT-transformation makes sense–from a philosophical point of view that is.

Now, reversing left- and right-handedness amounts to a P-transformation (and don’t interrupt me now by asking why physicists use this rather awkward word ‘parity’ for what’s left- and right-handedness really), just like a C-transformation amounts to reversing electric and ‘color’ charges (‘color’ charges are the charges involved in the strong nuclear force).

Now, if only a full CPT transformation makes sense, then CP-reversal should also mean T-reversal, and vice versa. Feynman’s story about “the guy in the ‘other’ universe” (see my previous post) was quite instructive in that regard, and so let’s look at the finer points of that story once again.

Is ‘another’ world possible at all?

Feynman’s assumption was that we’ve made contact (don’t ask how: somehow) with some other intelligent being living in some ‘other’ world somewhere ‘out there’, and that there are no visual or other common references. That’s all rather vague, you’ll say, but just hang in there and try to see where we’re going with this story. Most notably, the other intelligent being – but let’s call ‘it’ a she instead of ‘a guy’ or ‘a Martian’ – cannot see the universe as we see it: we can’t describe, for instance, the Big and Small Dipper and explain to her what ‘left’ and ‘right’ is referring to such constellations, because she’s sealed off somehow from it (so she lives in a totally different corner of the universe really).

In contrast, we would be able, most probably, to explain and share the concept of ‘upward’ and ‘downwards’ by assuming that she is also attracted by some center of gravity nearby, just like we are attracted downwards by our Earth. Then, after many more hours and days, weeks, months or even years of tedious ‘discussions’, we would probably be able to describe electric currents and explain electromagnetic phenomena, and then, hopefully, she would find out that the laws in her corner of the universe are exactly the same, and so we could thus explain and share the notion of a ‘positive’ and a ‘negative’ charge, and the notion of a magnetic ‘north’ and ‘south’ pole.

However, at this point the story becomes somewhat more complicated, because – as I tried to explain in my previous post – her ‘positive’ electric charge (+) and her magnetic ‘north’ might well be our ‘negative’ electric charge (–) and our magnetic ‘south’. Why? It’s simple: the electromagnetic force does respect charge and also parity symmetry and so there is no way of defining any absolute sense of ‘left’ and ‘right’ or (magnetic) ‘north’ and (magnetic) ‘south’ with reference to the electromagnetic force alone. [If you don’t believe, just look at my previous post and study the examples.]

Talking about the strong force wouldn’t help either, because it also fully respects charge symmetry.

Huh? Yes. Just go through my previous post which – I admit – was probably quite confusing but made the point that a ‘mirror-image’ world would work just as well… except when it comes to the weak force. Indeed, atomic decay processes (beta decay) do distinguish between ‘left-handed’ and ‘right-handed’ particles (as measured by their spin) in an absolute sense that is (see the illustration of decaying muons and their mirror-image in the previous post) and, hence, it’s simple: in order to make sure her ‘left’ and her ‘right’ is the same as ours, we should just ask her to perform those beta decay experiments demonstrating that parity (or P-symmetry) is not being conserved and, then, based on our common definition of what’s ‘up’ and ‘down’ (the commonality of these notions being based on the effects of gravity which, we assume, are the same in both worlds), we could agree that ‘right’ is ‘right’ indeed, and that ‘left’ is ‘left’ indeed.

Now, you will remember there was one ‘catch’ here: if ever we would want to set up an actual meeting with her (just assume that we’ve finally figured out where she is and so we (or she) are on our way to meet each other), we would have to ask her to respect protocol and put out her right hand to greet us, not her left. The reason is the following: while ‘right-handed’ and ‘left-handed’ matter behave differently when it comes to weak force interactions (read: atomic decay processes)–which is how we can distinguish between ‘left’ and ‘right’ in the first place, in some kind of absolute sense that is–the combined CP symmetry implies that right-handed matter and left-handed anti-matter behave just the same–and, of course, the same goes for ‘left-handed’ matter and ‘right-handed’ anti-matter. Hence, after we would have had a painstakingly long exchange on broken P-symmetry to ensure we are talking about the same thing, we would still not know for sure: she might be living in a world of anti-matter indeed, in which case her ‘right’ would actually be ‘left’ for us, and her ‘left’ would be ‘right’.

Hence, if, after all that talk on P-symmetry and doing all those experiments involving P-asymmetry, she actually would put out her left hand when meeting us physically–instead of the agreed-upon right hand… Then… Well… Don’t touch it. 🙂

There is a way out of course. And, who knows, perhaps she was just trying to be humorous and so perhaps she smiled and apologized for the confusion in the meanwhile. But then… […] Hmm… I am not sure if such bad joke would make for a good start of a relationship, even if it would obviously demonstrate superior intelligence. 🙂

Indeed, the Fitch-Cronin experiment brings an additional twist to this potentially romantic story between two intelligent beings from two ‘different’ worlds. In fact, the Fitch-Cronin experiment actually rules out this theoretical possibility of mutual destruction and, therefore, the possibility of two ‘different’ worlds.

The argument goes straight to the heart of our philosophical discussion on time reversal. Indeed, whatever you may or may not have understood from this and my previous posts on CPT symmetry, the key point is that the combined CPT symmetry cannot be violated.

Why? Well… That’s plain logic: the real world does not care about our conventions, so reversing all of our conventions, i.e.

  1. Changing all particles to antiparticles by reversing all charges (C),
  2. Turning all right-handed particles into left-handed particles and vice versa (P), and
  3. Changing the sign of time (T),

describes a world truly going back in time.

Now, ‘her’ world is not going back in time. Why? Well… Because we can actually talk to her, it is obvious that her ‘arrow of time’ points in the same direction as ours, so she is not living in a world that is going back in time. Full stop. Therefore, any experiment involving a combined CP asymmetry (i.e. C-P violation) should yield the same results and, hence, she should find the same bias, i.e. a bias going in the very same direction of the equation, i.e. from left to right, or from right to left – whatever (what we label it, depends on our conventions, which we ‘re-set’ as we talked to her, and, hence, which we share, based on the results of all these beta decay experiments we did to ensure we’re really talking about the ‘same’ direction, and not its opposite).

Is this confusing? It sure is. But let me rephrase the logic. Perhaps it helps.

  1. Combined CPT symmetry implies that if the combined CP-symmetry is broken, then T-symmetry is also broken. Hence, the experimentally established fact of broken CP symmetry (even if it’s only 2 or 3 times per thousand) ensures that the ‘arrow of time’ points in one direction, and in one direction only. To put it simply: we cannot reverse time in a world which does not (fully) respect the principle of CP symmetry.
  2. Now, if you and I can exchange meaningful signals (i.e. communicate), then your and my ‘arrow of time’ obviously point in the same direction. To put it simply, we’re actors in the same movie, and whether or not it is being played backwards doesn’t matter anymore: the point is that the two of us share the same arrow of time. In other words, God did not do any combined CPT-transformation trick on your world as compared to mine, and vice versa.
  3. Hence, ‘your’ world is ‘my’ world and vice versa. So we live in the same world with the very same symmetries and asymmetries.

Now apply this logic to our imaginary new friend (‘she’) and (I hope) you’ll get the point.

To make a long story short, and also to conclude our philosophical digressions here on a pleasant (romantic) note: the fact that we would be able to communicate with her, implies that she’d be living in the same world as ours. We know that now, for sure, because of the broken CP symmetry: indeed, if her ‘time arrow’ points in the same direction, then CP symmetry will be broken in just the very same way in ‘her’ world (i.e. the ‘bias’ will have the same direction, in an absolute sense) as it it is broken in ‘our’ world.

In short, there are only two possible worlds: (1) this world and (2) one and only one ‘other’ world. This ‘other’ world is our world under a full CPT-transformation: the whole movie played backwards in other words, but with all ‘charges’ affecting forces – in whatever form and shape they come (electric charge, color charge, spin, and what have you) reversed or – using that awful mathematical term – ‘negated’.

In case you’d wonder (1): I consider the many-worlds interpretation of quantum mechanics as… Well… Nonsense. CPT symmetry allows for two worlds only. Maximum two. 🙂

In case you’d wonder (2): An oscillating-universe theory, or some kind of cyclic thing (so Big Bangs followed by Big Crunches) are not incompatible with my ‘two-possible-worlds’ view of things. However, this ‘oscillations’ would all take place in the same world really, because the arrow of time isn’t being reversed really, as Big Bangs and Big Crunches do not reverse charges and parities–at least not to my knowledge.

But, of course, who knows?

Postscripts:

1. You may wonder what ‘other’ asymmetries I am hinting at in this post here. It’s quite simple. It’s everything you see around you, including the works of the increasing entropy law. However, if I would have to choose one asymmetry in this world (the real world), as an example of a very striking and/or meaningful asymmetry, it’s the the preponderance of matter over anti-matter, including the preponderance of (left-handed) neutrinos over (right-handed) antineutrinos. Indeed, I can’t shake off that feeling that neutrino physics is going to spring some surprises in the coming decades.

[When you’d google a bit in order to get some more detail on neutrinos (and solar neutrinos in particular, which are the kind of neutrinos that are affecting us right now and right here), you’ll probably get confused by a phenomenon referred to as neutrino oscillation (which refers to a process in which neutrinos change ‘flavor’) but so the basic output of the Sun’s nuclear reactor is neutrinos, not anti-neutrinos. Indeed, the (general) reaction involves two protons combining to form one (heavy) hydrogen atom (i.e. deuterium, which consists of one neutron, one proton and one electron), thereby ejecting one positron (e+) and one (electron) neutrino (ve). In any case, this is not the place to develop the point. I’ll leave that for my next post.]

2. Whether or not you like the story about ‘her’ above, you should have noticed something that we could loosely refer to as ‘degrees of freedom’ is playing some role:

  1. We know that T-symmetry has not been broken: ‘her’ arrow of time points in the same direction.
  2. Therefore, the combined CP-symmetry of ‘her’ world is broken in the same way as in our world.
  3. If the combined CP-symmetry in ‘her’ world is broken in the same way as in ‘our’ world, the individual C and P symmetries have to be broken in the very same way. In other words, it’s the same world indeed. Not some anti-matter world.

As I am neither a physicist nor a mathematician, and not a philosopher either, please do feel free to correct any logical errors you may identify in this piece. Personally, I feel the logic connecting CP violation and individual C- and P-violation needs further ‘flesh on the bones’, but the core argument is pretty solid I think. 🙂

3. What about the increasing entropy law in this story? What happens to it if we reverse time, charge and parity? Well… Nothing. It will remain valid, as always. So that’s why an actual movie being played backwards with charges and parities reversed will still not make any sense to us: things that are broken don’t repair themselves and, hence, at the system level, there’s another type of irreducible ‘arrow of time’ it seems. But you’ll have to admit that the character of that entropy ‘law’ is very different from these ‘fundamental’ force laws. And then just think about it, isn’t it extremely improbable how we human beings have evolved in this universe? And how we are seemingly capable to understand ourselves and this universe? We don’t violate the entropy law obviously (on the contrary: we’re obviously messing up our planet), but I feel we do negate it in a way that escapes the kind of logical thinking that underpins the story I wrote above. But such remarks have nothing to do with math or physics and, hence, I will refrain from them.

4. Finally, for those who’d feel like some kind of ‘feminist’ remark on my use of ‘us’ and ‘her’, I think the use of ‘her’ is explained to underline the idea of ‘other’ and, hence, as a male writer, using ‘her’ to underscore the ‘other’ dimension comes naturally and shouldn’t be criticized. The element which could/should bother a female reader of such ‘through experiments’ is that we seem to assume that the ‘other’ intelligent being is actually somewhat ‘dumber’ than us, because the story above assumes we are actually explaining the experiments of the Wu and Fitch-Cronin team to ‘her’, instead of the other way around. That’s why I inserted the possibility of ‘her’ pulling a practical joke on us by offering us her left hand: if ‘she’ is equally or even more intelligent than us, then she’d surely have figured out that there’s no need to be worried about the ‘other’ being made of anti-matter. 🙂

Time reversal and CPT symmetry (II)

Pre-scriptum (dated 26 June 2020): While my posts on symmetries (and why they may or may be broken) are somewhat mutilated (removal of illustrations and other material) as a result of an attack by the dark force, I am happy to see a lot of it survived more or less intact. While my views on the true nature of light, matter and the force or forces that act on them – all of the stuff that explains symmetries or symmetry-breaking, in other words – have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. 🙂

Original post:

My previous post touched on many topics and, hence, I feel I was not quite able to exhaust the topic of parity violation (let’s just call it mirror asymmetry: that’s more intuitive). Indeed, I was rather casual in stating that:

  1. We have ‘right-handed’ and ‘left-handed’ matter, and they behave differently–at least with respect to the weak force–and, hence, we have some kind of absolute distinction between left and right in the real world.
  2. If ‘right-handed’ matter and ‘left-handed’ matter are not the same, then ‘right-handed’ antimatter and ‘left-handed’ antimatter are not the same either.
  3. CP symmetry connects the two: right-handed matter behaves just like left-handed antimatter, and right-handed antimatter behaves just like left-handed matter.

There are at least two problems with this:

  1. In previous posts, I mentioned the so-called Fitch-Cronin experiment which, back in 1964, provided evidence that ‘Nature’ also violated the combined CP-symmetry. In fact, I should be precise here and say the weak force, instead of ‘Nature’, because all these experiments investigate the behavior of the weak force only. Having said that, it’s true I mentioned this experiment in a very light-hearted manner–too casual really: I just referred to my simple diagrams illustrating what true time reversal entails (a reversal of the forces and, hence, of the charges causing those forces) and that was how I sort of shrugged it all of.
  2. In such simplistic world view, the question is not so much why the weak force violates mirror symmetry, but why gravity, electromagnetism and the strong force actually respect it!

Indeed, you don’t get a Nobel Prize for stating the obvious and, hence, if Val Fitch and James Cronin got one for that CP-violation experiment, C/P or CP violation cannot be trivial matters.

P-symmetry revisited

So let’s have another look at mirror symmetry–also known as reflection symmetry– by following Feynman’s example: let us actually build a ‘left-hand’ clock, and let’s do it meticulously, as Feynman describes it: “Every time there is a screw with a right-hand thread in one, we use a screw with a left-hand thread in the corresponding place of the other; where one is marked ‘IV’ on the face, we mark a ‘VI’ on the face of the other; each coiled spring is twisted one way in one clock and the other way in the mirror-image clock; when we are all finished, we have two clocks, both physical, which bear to each other the relation of an object and its mirror image, although they are both actual, material objects. Now the question is: If the two clocks are started in the same condition, the springs wound to corresponding tightnesses, will the two clocks tick and go around, forever after, as exact mirror images?”

The answer seems to be obvious: of course they will! Indeed, we do observe that P symmetry is being respected, as shown below:

P symmetry

You may wonder why we have to go through the trouble of building another clock. Why can’t we just take one of these transparent ‘mystery clocks’ and just go around it and watch its hand(s) move standing behind it? The answer is simple: that’s not what mirror symmetry is about. As Feynman puts its: a mirror reflection “turns the whole space inside out.” So it’s not like a simple translation or a rotation of space. Indeed, when we would move around the clock to watch it from behind, then all we do is rotating our reference frame (with a rotation angle equal to 180 degrees). That’s all. So we just change the orientation of the clock (and, hence, we watch it from behind indeed), but we are not changing left for right and right for left.

Rotational symmetry is a symmetry as well, and the fact that the laws of Nature are invariant under rotation is actually less obvious than you may think (because you’re used to the idea). However, that’s not the point here: rotational symmetry is something else than reflection (mirror) symmetry. Let me make that clear by showing how the clock might run when it would not respect P-symmetry.

P asymmetry

You’ll say: “That’s nonsense.” If we build that mirror-image clock and also wind it up in the ‘other’ direction (‘other’ as compared to our original clock), then the mirror-image clock can’t run that way. Is that nonsense? Nonsensical is actually the word that Wolfgang Pauli used when he heard about Chien-Shiung Wu’s 1956 experiment (i.e. the first experiment that provided solid evidence for the fact that the weak force – in beta decay for instance – does not respect P-symmetry), but so he had to retract his words when repeated beta decay experiments confirmed Wu’s findings.

Of course, the mirror-image clock above (i.e. the one running clockwise) breaks P-symmetry in a very ‘symmetric’ way. In fact, you’ll agree that the hands of that mirror-image clock might actually turn ‘clockwise’ if its machinery would be completely reversible, so we could wind up its springs in the same way as the original clock. But that’s cheating obviously. However, it’s a relevant point and, hence, to be somewhat more precise I should add that Wu’s experiment (and the other beta decay experiments which followed after hers) actually only found a strong bias in the direction of decay: not all of the beta rays (beta rays consist of electrons really – check the illustration in my previous post for more details) went ‘up’ (or ‘down’ in the mirror-reversed arrangement), but most of them did. 

Wu_experiment

OK. We got that. Now how do we explain it? The key to explaining the phenomenon observed by Wu and her team, is the spin of the cobalt-60 nuclei or, in the muon decay experiment described in my previous post, the spin of the muons. It’s the spin of these particles that makes them ‘left-handed’ or ‘right-handed’ and the decay direction is (mostly) in the direction of the axial vector that’s associated with the spin direction (this axial vector is the thick black arrow in the illustration below).

Axial vector

Hmm… But we’ve got spinning things in (mechanical) clocks as well, don’t we? Yes. We have flywheels and balance wheels and lots of other spinning stuff in a mechanical clock, but these wheels are not equivalent to spinning muons or other elementary particles: the wheels in a clock preserve and transfer angular momentum.

OK… But… […] But isn’t that what we are talking about here? Angular momentum?

No. Electrons spinning around a nucleus have angular momentum as well – referred to as orbital angular momentum – but it’s not the same thing as spin which, somewhat confusingly, is often referred to as intrinsic angular momentum. In short, we could make a detailed analysis of how our clock and its mirror image actually work, and we would find that all of the axial vectors associated with flywheels, balance wheels and springs in a clock would effectively be reversed in the mirror-image clock but, in contrast with the weak decay example, their reversed directions would actually explain why the mirror-image clock is turning counter-clockwise (from our point of view that is), just like the image of the original clock in the mirror does, and, therefore, why a ‘left-handed’ mechanical clock actually respects P-symmetry, instead of breaking it.

Axial and polar vectors in physics

In physics, we encounter such axial vectors everywhere. They show the axis of spin, and their direction is determined by the direction of spin through one of two conventions: the ‘right-hand screw rule’, or the ‘left-hand screw rule’. Physicists have settled on the former, so let’s work with that for the time being.

The other type of vector is a polar vector. That’s an ‘honest’ vector as Feynman calls it–depicting ‘real’ things such as, for example, a step in space, or some force acting in some direction. The figures below (which I took from Feynman’s Lectures) illustrate the idea (and please do note the care with which Feynman reversed the direction of the arrows above the r and ω in the mirror image):

  1. When mirrored, a polar vector “changes its head, just as the whole space turns inside out.”
  2. An axial vector behaves differently when mirrored. It changes too, but in a very different way: it is usually reversed in respect to the geometry of the whole space, as illustrated in the muon decay image above. However, in the illustration below, that is not the case, because the angular velocity ‘vector’ is not reversed when mirrored. So it’s all quite subtle and one has to carefully watch what’s going on really when we do such mirror reflections.

Axial vectors

What’s the third figure about? Well… While it’s not that difficult to visualize all of the axial vectors in a mechanical clock, it’s a different matter when discussing electromagnetic forces, and then to explain why these electromagnetic forces also respect mirror symmetry, just like the mechanical clock. But let’s me try.

When an electric current goes through a solenoid, the solenoid becomes a magnet, especially when wrapped around an iron core. The direction and strength of the magnetic field is given by the magnetic field vector B, and the force on an electrically charged particle moving through such magnetic field will be equal to F = qv×B. That’s a so-called vector cross product and we’ve seen it before: a×b = na││b│sinθ, so we take (1) the magnitudes of a and b, (2) the sinus of the angle between them, and (3) the unit vector (n) perpendicular to (the plane containing) a and b; multiply it all; and there we are: that’s the result. But – Hey! Wait a minute! – there are two unit vectors perpendicular to a and b. So how does that work out?

Well… As you might have guessed, there is another right-hand rule here, as shown below.

2000px-Right_hand_rule_cross_product

Now how does that work out for our magnetic field? If we mirror the set-up and let an electron move through the field? Well… Let’s do the math for an electron moving into this screen, so in the direction that you are watching.

In the first set-up, the B vector points upwards and, hence, the electron will deviate in the direction given by that cross product above: qv×B. In other words, it will move sideways as it moves away from you, into the field. In which direction? Well… Just turn that hand above about 90 degrees and you have the answer: right. Oh… No. It’s left, because q is negative. Right.

In the mirror-image set-up, we have a B’ vector pointing in the opposite direction so… Hey ! Mirror symmetry is not being respected, is it?

Well… No. Remember that we must change everything, including our conventions, so the ‘right-hand rules’ above becomes ‘left-hand rules’, as shown below for example. Surely you’re joking, Mr. Feynman!

P-parity for screw rules

Well… No. F and v are polar vectors and, hence, “their head might change, just as the whole space turns inside out”, but that’s not the case now, because they’re parallel to the mirror. In short, the force F on the electron will still be the same: it will deviate leftwards. I tried to draw that below, but it’s hard to make that red line look like it’s a line going away from you.

Capture

But that can’t be true, you’ll say. The field lines go from north to south, and so we have that B’ vector pointing downwards now.

No, we don’t. Or… Well… Yes. It all depends on our conventions. 🙂  

Feynman’s switch to ‘left-hand rules’ also involves renaming the magnetic poles, so all magnetic north poles are now referred to as ‘south’ poles, and all magnetic south poles are now referred to as ‘north’ poles, and so that’s why he has a B’ vector pointing downwards. Hence, he does not change the convention that magnetic field lines go from north to south, but his ‘north’ pole (in the mirror-image drawing) is actually a ‘south’ pole. Capito? 🙂

[…] OK. Let me try to explain it once again. In reality, it does not matter whether or not a solenoid is wound clockwise or counterclockwise (or, to use the terminology introduced above, whether our solenoid is left-handed or right-handed). The important thing is that the current through the solenoid flows from the top to the bottom. We can only reverse the poles – in reality – if we reverse the electric current, but so we don’t do that in our mirror-image set-up. Therefore, the force F on our charged particle will not change, and B’ is an axial vector alright but this axial vector does not represent the actual magnetic field.

[…] But… If we change these conventions, it should represent the magnetic field, shouldn’t it? And how do we calculate that force then?

OK. If you insist. Here we go:

  1. So we change ‘right’ to ‘left’ and ‘left’ to ‘right’, and our cross-product rule becomes a ‘left-hand’ rule.
  2. But our electrons still go from ‘top’ to ‘bottom’. Hence, the (magnetic) force on a charged particle won’t change.
  3. But if the result has to be the same, then B needs to become –B, or so that’s B’ in our ‘left-handed’ coordinate system.
  4. We can now calculate F using the ‘left-handed’ cross product rule and – because we did not change the convention that field lines go from north to south, we’ll also rename our poles.
  5. Yippee ! All comes out all right: our electron goes left. Sorry. Right. Huh? Yes. Because we’ve agreed to replace ‘left’ by ‘right’, remember? 🙂

[…]

If you didn’t get anything of this, don’t worry. There is actually a much more comprehensible illustration of the mirror symmetry of electromagnetic forces. If we would hang two wires next to each other, as below, and we send a current through them, they will attract if the two currents are in the same direction, and they will repel when the currents are opposite. However, it doesn’t matter if the current goes from left to right or from right to left. As long as the two currents have the same direction (left or right), it’s fine: there will be attraction. That’s all it takes to demonstrate P-symmetry for electromagnetism.

Wires attracting

The Fitch-Cronin experiment

I guess I caused an awful lot of confusion above. Just forget about it all and take one single message home: the electromagnetic force does not care about the axial vector of spinning particles, but the weak force does.

Is that shocking?

No. There are plenty of examples in the real world showing that the direction of ‘spin’ does matter. For instance, to unlock a right-hinged door, you turn the key to the right (i.e. clockwise). The other direction doesn’t work. While I am sure physicists won’t like such simplistic statements, I think that accepting that Nature has similar ‘left-handed’ and ‘right-handed’ mechanisms is not the kind of theoretical disaster that Wolfgang Pauli thought it was. If anything, we just should marvel at the fact that gravity, electromagnetism and the strong force are P- and C-symmetric indeed, and further investigate why the weak force does not have such nice symmetries. Indeed, it respects the combined CPT symmetry, but that amounts to saying that our world sort of makes sense, so that ain’t much.

In short, our understanding of that weak force is probably messy and, as Feynman points out: “At the present level of understanding, you can still see the “seams” in the theories; they have not yet been smoothed out so that the connection becomes more beautiful and, therefore, probably more correct.” (QED, 1985, p. 142). However, let’s stop complaining about our ‘limited understanding’ and so let’s work with what we do understand right now. Hence, let’s have a look at that Fitch-Cronin experiment now and see how ‘weird’ or, on the contrary, how ‘understandable’ it actually is.

To situate the Fitch-Cronin experiment, we first need to say something more about that larger family of mesons, of which the kaons are just one of the branches. In fact, in case you’d not be interested in this story as such, then I’d suggest you just read it as a very short introduction to the Standard Model as such, as it gives a nice short overview of all matter-particles–which is always useful I’d think.

Hadrons, mesons and baryons

You may or may not remember that mesons are unstable particles consisting of one quark and one anti-quark (so mesons consist of two quarks, but one of them should be an anti-quark). As such, mesons are to be distinguished from the ‘other’ group within the larger group of hadrons, i.e. the baryons, which are made of three quarks. [The term ‘hadrons’ itself is nothing but a catch-all for all particles consisting of quarks.]

The most prominent representatives of the baryon family are the (stable) neutron and proton, i.e. the nucleons, which consist of u and d quarks. However, there are unstable baryons as well. These unstable baryons involve the heavier (second-generation) or quarks, or the super-heavy (third-generation) b quark. [As for the top quark (t), that’s so high-energy (and, hence, so short-lived) that baryons made of a t quark (so-called ‘top-baryons’) are not expected to exist but, then, who knows really?]

But kaons are mesons, and so I won’t say anything more about baryons The two illustrations below should be sufficient to situate the discussion.

98E-pic-first-classification-particles

Standard_Model_of_Elementary_Particles

Kaons are just one branch of the meson family. There are, for instance, heavier versions of the kaons, referred to as B- and D-mesons. Let me quickly introduce these:

  1. The ‘B’ in ‘B-meson’ refers to the fact that one of the quarks in a B-meson is a b-quark: a b (bottom) quark is a much heavier (third-generation) version of the (second-generation) s-quark.
  2. As for the ‘D’ in D-meson, I have no idea. D-mesons will always consist of a c-quark or anti-quark, combined with a lighter d, u or s (anti-)quark, but so there’s no obvious relationship between a D-meson and a d-quark. Sorry.
  3. If you look at the quark table above, you’ll wonder whether there are any top-mesons, i.e. mesons consisting of a t quark or anti-quark. The answer to that question seems to be negative: t quarks disintegrate too fast, it is said. [So that resembles the remark on the possiblity of t-baryons.] If you’d google a bit on this, you’ll find that, in essence, we haven’t found any t-mesons as yet but their potential existence should not be excluded.

Anything else? Yes. There’s a lot more around actually. Besides (1) kaons, (2) B-mesons and (3) D-mesons, we also have (4) pions (i.e. a combination of a u and a d, or their anti-matter counterpart), (5) rho-mesons (ρ-mesons can be thought of as excited (higher-energy) pions(6) eta-mesons (η-mesons a rapidly decaying mixture of ud and s quarks or their anti-matter counterparts), as well as a whole bunch of (temporary) particles consisting of a quark and its own anti-matter counterpart, notably the (7) phi (a φ consists of a s and an anti-s), psi (a ψ consists of an c and an anti-c) and upsilon (a φ consists of a b and an anti-b) particles (so all these particles are their own anti-particles).

So it’s quite a zoo indeed, but let’s zoom in on those ‘crazy’ kaons. [‘Crazy kaons’ is the epithet that Gerard ‘t Hooft reserved for them in his In Search of the Ultimate Building Blocks (1996).] What are they really? 

Crazy kaons

Kaons, also know as K-mesons, are, first of all, mesons, i.e. particles made of one quark and one anti-quark (as opposed to baryons, which are made of three quarks, e.g. protons and neutrons). All mesons are unstable: at best, they last a few hundredths of a microsecond, but kaons have much shorter lifetimes than that. Where do we find them? We usually create them in those particle colliders and other sophisticated machinery (the experiment used kaon beams) but we can also find them as a decay product in (secondary) cosmic rays (cosmic rays consist of very high-energy particles and they produce ‘showers’ of secondary particles as they hit our atmosphere).

They come in three varieties: neutral and positively or negatively charged, so we have a K0, a K+, and a K, in principle that is (the story will become more complicated later). What they have in common is that one of the quarks is the rather heavy s-quark (s stands for ‘strange’ but you know what Feynman – and others – think of that name: it’s just a strange name indeed, and so don’t worry too much about it). An s-quark is a so-called second-generation matter-particle and that’s why the kaon is unstable: all second-generation matter-particles are unstable. The second quark is just an ordinary u- or d-quark, i.e. the type of quark you’d find in the (stable) proton or neutron.

But what about the electric charge? Well… I should be complete. The quarks might be anti-quarks as well. That’s nothing to worry about as you’ll remember: anti-matter is just matter but with the charges reversed. So a Kconsists of an s quark and an anti-d quark or –and this is the key to understanding the experiment actually– a K0 can also consist of an anti-s quark and a (normal) d-quark. Note that the s and d quarks have a charge of 1/3 and so the total charge comes out alright. [As for the other kaons, a Kconsists of a u and anti-s quark (the u quark has charge 2/3 and so we have +1 as the total charge), and the K– consists of an anti-u and an s quark (and, hence, we have –1 as the charge), but we actually don’t need them any more for our story.]

So that’s simple enough. Well… No. Unfortunately, the story is, indeed, more complicated than that. The actual kaons in a neutral kaon beam come in two varieties that are a mix of the two above-mentioned neutral K states: a K-long (KL) has a lifetime of about 9×10–11 s, while a K-short (KS) has a lifetime of about 5.2×10–8 s. Hence, at the end of the beam, we’re sure to find Kkaons only.

Huh? mix of two particle states… You’re talking superposition here? Well… Yes. Sort of. In fact, as for what KL and Kactually are, that’s a long and complex story involving what is referred to as a neutral particle oscillation process. In essence, neutral particle oscillation occurs when a (neutral) particle and its antiparticle are different but decay into the same final state. It is then possible for the decay and its time reversed process to contribute to oscillations indeed, that turn the one into the other, and vice versa, so we can write A → Δ → B → Δ → A → etcetera, where A is the particle, B is the antiparticle, and Δ is the common set of particles into which both can decay. So there’s an oscillation phenomenon from one state to the other here, and all the things I noted about interference obviously come into play.

In any case, to make a very long and complicated story short, I’ll summarize it as follows: if CP symmetry holds, then one can show that this oscillation process should result in a very clear-cut situation: a mixed beam of long-lived and short-lived kaons, i.e. a mix of KL and KS. Both decay differently: a K-short particle decays into two pions only, while a K-long particle decays into three pions.

That is illustrated below: at the end of the 17.4 m beam, one should only see three-pion decay events. However, that’s not what Fitch and Cronin measured: the actually saw a one two-pion decay event into every 500 (on average that is)! [I have introduced the pion species in the more general discussion on mesons: you’ll remember they consist of first-generation quarks only, but so don’t worry about it: just note the K-long and K-short particles decay differently. Don’t be confused by the π notation below: it has nothing to do with a circle or so, so 2π just means two pions.]

Kaon beam

That means that the kaon decay processes involved do not observe the assumed CP symmetry and, because it’s the weak force that’s causing those decays, it means that the weak force itself does not respect CP symmetry.

Why is that so?

You may object that these lifetimes are just averages and, hence, perhaps we see these two-pion decays at the end of the beam because some of the K-short particles actually survived much longer !

No. That’s to be ruled out. The short-lived particle cannot be observable more than a few centimeters down the beam line. To show that, one can calculate the time required to drop to 1/500 of the original population of K-short particles. With the stated lifetime (9×10–11 s), the half-life calculation gives a time of 5.5 x 10-10 seconds. At nearly the speed of light, this would give a distance of about 17 centimeters, and so that’s only 1/100 the length of Cronin and Fitch’s beam tube.

But what about the fact that particles live longer when they’re going fast? You are right: the number above ignores relativistic time dilation: the lifetime as seen in the laboratory frame is ‘dilated’ indeed by the relativity factor γ. At 0.98c (i.e. the speed of these kaons, γ =5, and, hence, this “time dilation effect” is very substantial. However, re-calculating the distance gives a revised distance equal to 17γ cm, i.e. 85 cm. Hence, even with kaons speeding at 0.98c, the population would be down by a factor of 500 by the time they got a meter down the beam tube. So for any particle velocity really, all of these K-short particles should have decayed long before they get to the end of the beam line.

Fitch and Cronin did not see that, however: they saw one two-pion decay event for every 500 decay events, so that’s two per thousand (0.2%) and, hence, that is very significant. While the reasoning is complex (these oscillations and the quantum-mechanical calculations involved are not easy to understand), the results clearly shows the kaon decay process does not observe CP symmetry.

OK. So what? How does this violate charge and parity symmetry? Well… That’s a complicated story which involves a deeper understanding of how initial and final states of such processes incorporate CP values, and then showing how these values differ. That’s a story that requires a master’s degree in physics, I must assume, and so I don’t have that. But I can sort of sense the point and I would suggest we just accept it here. [To be precise, the Fitch-Cronin experiment is an indirect ‘proof’ of CP violation only: as mentioned below, only in 1999 would experiments be able to demonstrate direct CP violation.]

OK. So what? Do we see it somewhere else? Well… Fitch and Cronin got a Nobel Prize for this only sixteen years later, i.e. in 1980, and then it took researchers another twenty years to find CP violation in some other process. To be very precise, only in 1999 (i.e. 35 years after the Fitch-Cronin findings), Fermilab and CERN could conclude a series of experiments demonstrating direct CP violation in (neutral) kaon decay processes (as mentioned above, the Fitch-Cronin experiment only shows indirect CP violation), and that then set the stage for a ‘new’ generation of experiments involving B-mesons and D-mesons, i.e. mesons consisting of even heavier quarks (c or b quarks)–so these are things that are even less stable than kaons. So… Well… Perhaps you’re right. There’s not all that many examples really.

Aha ! So what?

Well… Nothing. That’s it. These ‘broken symmetries’ exist, without any doubt, but–you’re right–they are a marginal phenomenon in Nature it seems. I’ll just conclude with quoting Feynman once again (Vol. I-52-9):

“The marvelous thing about it all is that for such a wide range of important phenomena–nuclear forces, electrical phenomena, and gravitation–over a tremendous range of physics, all the laws for these seem to be symmetrical. On the other hand, this little extra piece says, “No, the laws are not symmetrical!” How is it that Nature can be almost symmetrical, but not perfectly symmetrical? […] No one has any idea why. […] Perhaps God made the laws only nearly symmetrical so that we should not be jealous of His perfection.”

Hmm… That’s the last line of the first volume of his Lectures (there are three of them), and so that should end the story really.

However, I would personally not like to involve God in such discussions. When everything is said and done, we are talking atomic decay processes here. Now, I’ve already said that I am not a physicist (my only ambition is to understand some of what they are investigating), but I cannot accept that these decay processes are entirely random. I am not saying there are some ‘inner variables’ here. No. That would amount to challenging the Copenhagen interpretation of quantum mechanics, which I won’t.

But when it comes to the weak force, I’ve got a feeling that neutrino physics may provide the answer: the Earth is being bombarded with neutrinos, and their ‘intrinsic parity’ is all the same: all of them are left-handed. In fact, that’s why weak interactions which emit neutrinos or antineutrinos violate P-symmetry! It’s a very primitive statement – and not backed up by anything I have read so far – but I’ve got a feeling that the weak force does not only involve emission of neutrinos or antineutrinos: I think they enter the equation as well.

That’s preposterous and totally random statement, you’ll say.

Yes. […] But I feel I am onto something and I’ll explore it as good as I can–if only to find out why I am so damn wrong. I can only say that, if and when neutrino physics would allow us to tentatively confirm this random and completely uninformed hypothesis, then we would have an explanation which would be much more in line with the answers that astrophysicists give to questions related to other observable asymmetries such as, for example, the imbalance between matter and anti-matter.

However, I know that I am just babbling now, and that nobody takes this seriously anyway and, hence, I will conclude my series on CPT symmetry right here and now. 🙂

Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 16, 2020 as a result of a DMCA takedown notice from The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 20, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/