The reality of the wavefunction

If you haven’t read any of my previous posts on the geometry of the wavefunction (this link goes to the most recent one of them), then don’t attempt to read this one. It brings too much stuff together to be comprehensible. In fact, I am not even sure if I am going to understand what I write myself. 🙂 [OK. Poor joke. Acknowledged.]

Just to recap the essentials, I part ways with mainstream physicists in regard to the interpretation of the wavefunction. For mainstream physicists, the wavefunction is just some mathematical construct. Nothing real. Of course, I acknowledge mainstream physicists have very good reasons for that, but… Well… I believe that, if there is interference, or diffraction, then something must be interfering, or something must be diffracting. I won’t dwell on this because… Well… I have done that too many times already. My hypothesis is that the wavefunction is, in effect, a rotating field vector, so it’s just like the electric field vector of a (circularly polarized) electromagnetic wave (illustrated below).

Of course, it must be different, and it is. First, the (physical) dimension of the field vector of the matter-wave must be different. So what is it? Well… I am tempted to associate the real and imaginary component of the wavefunction with a force per unit mass (as opposed to the force per unit charge dimension of the electric field vector). Of course, the newton/kg dimension reduces to the dimension of acceleration (m/s2), so that’s the dimension of a gravitational field.

Second, I also am tempted to think that this gravitational disturbance causes an electron (or any matter-particle) to move about some center, and I believe it does so at the speed of light. In contrast, electromagnetic waves do not involve any mass: they’re just an oscillating field. Nothing more. Nothing less. Why would I believe there must still be some pointlike particle involved? Well… As Feynman puts it: “When you do find the electron some place, the entire charge is there.” (Feynman’s Lectures, III-21-4) So… Well… That’s why.

The third difference is one that I thought of only recently: the plane of the oscillation cannot be perpendicular to the direction of motion of our electron, because then we can’t explain the direction of its magnetic moment, which is either up or down when traveling through a Stern-Gerlach apparatus. I am more explicit on that in the mentioned post, so you may want to check there. 🙂

I wish I mastered the software to make animations such as the one above (for which I have to credit Wikipedia), but so I don’t. You’ll just have to imagine it. That’s great mental exercise, so… Well… Just try it. 🙂

Let’s now think about rotating reference frames and transformations. If the z-direction is the direction along which we measure the angular momentum (or the magnetic moment), then the up-direction will be the positive z-direction. We’ll also assume the y-direction is the direction of travel of our elementary particle—and let’s just consider an electron here so we’re more real. 🙂 So we’re in the reference frame that Feynman used to derive the transformation matrices for spin-1/2 particles (or for two-state systems in general). His ‘improved’ Stern-Gerlach apparatus—which I’ll refer to as a beam splitter—illustrates this geometry.

Modified Stern-Gerlach

So I think the magnetic moment—or the angular momentum, really—comes from an oscillatory motion in the x– and y-directions. One is the real component (the cosine function) and the other is the imaginary component (the sine function), as illustrated below. Circle_cos_sin

So the crucial difference with the animations above (which illustrate left- and a right-handed polarization respectively) is that we, somehow, need to imagine the circular motion is not in the xz-plane, but in the yz-plane. Now what happens if we change the reference frame?

Well… That depends on what you mean by changing the reference frame. Suppose we’re looking in the positive y-direction—so that’s the direction in which our particle is moving—, then we might imagine how it would look like when we would make a 180° turn and look at the situation from the other side, so to speak. Now, I did a post on that earlier this year, which you may want to re-read. When we’re looking at the same thing from the other side (from the back side, so to speak), we will want to use our familiar reference frame. So we will want to keep the z-axis as it is (pointing upwards), and we will also want to define the x– and y-axis using the familiar right-hand rule for defining a coordinate frame. So our new x-axis and our new y-axis will the same as the old x- and y-axes but with the sign reversed. In short, we’ll have the following mini-transformation: (1) z‘ = z, (2) x’ = −x, and (3) y’ = −y.

So… Well… If we’re effectively looking at something real that was moving along the y-axis, then it will now still be moving along the y’-axis, but in the negative direction. Hence, our elementary wavefunction eiθ = cosθ + i·sinθ will transform into −cosθ − i·sinθ = −cosθ − i·sinθ = cosθ − i·sinθ. It’s the same wavefunction. We just… Well… We just changed our reference frame. We didn’t change reality.

Now you’ll cry wolf, of course, because we just went through all that transformational stuff in our last post. To be specific, we presented the following transformation matrix for a rotation along the z-axis:rotation matrix

Now, if φ is equal to 180° (so that’s π in radians), then these eiφ/2 and eiφ/2/√2 factors are equal to eiπ/2 = +i and eiπ/2 = −i respectively. Hence, our eiθ = cosθ + i·sinθ becomes…

Hey ! Wait a minute ! We’re talking about two very different things here, right? The eiθ = cosθ + i·sinθ is an elementary wavefunction which, we presume, describes some real-life particle—we talked about an electron with its spin in the up-direction—while these transformation matrices are to be applied to amplitudes describing… Well… Either an up– or a down-state, right?

Right. But… Well… Is it so different, really? Suppose our eiθ = cosθ + i·sinθ wavefunction describes an up-electron, then we still have to apply that eiφ/2 = eiπ/2 = +i factor, right? So we get a new wavefunction that will be equal to eiφ/2·eiθ = eiπ/2·eiθ = +i·eiθ = i·cosθ + i2·sinθ = sinθ − i·cosθ, right? So how can we reconcile that with the cosθ − i·sinθ function we thought we’d find?

We can’t. So… Well… Either my theory is wrong or… Well… Feynman can’t be wrong, can he? I mean… It’s not only Feynman here. We’re talking all mainstream physicists here, right?

Right. But think of it. Our electron in that thought experiment does, effectively, make a turn of 180°, so it is going in the other direction now ! That’s more than just… Well… Going around the apparatus and looking at stuff from the other side.

Hmm… Interesting. Let’s think about the difference between the sinθ − i·cosθ and cosθ − i·sinθ functions. First, note that they will give us the same probabilities: the square of the absolute value of both complex numbers is the same. [It’s equal to 1 because we didn’t bother to put a coefficient in front.] Secondly, we should note that the sine and cosine functions are essentially the same. They just differ by a phase factor: cosθ = sin(θ + π/2) and −sinθ = cos(θ + π/2). Let’s see what we can do with that. We can write the following, for example:

sinθ − i·cosθ = −cos(θ + π/2) − i·sin(θ + π/2) = −[cos(θ + π/2) + i·sin(θ + π/2)] = −ei·(θ + π/2)

Well… I guess that’s something at least ! The ei·θ and −ei·(θ + π/2) functions differ by a phase shift and a minus sign so… Well… That’s what it takes to reverse the direction of an electron. 🙂 Let us mull over that in the coming days. As I mentioned, these more philosophical topics are not easily exhausted. 🙂

Transforming amplitudes for spin-1/2 particles

Some say it is not possible to fully understand quantum-mechanical spin. Now, I do agree it is difficult, but I do not believe it is impossible. That’s why I wrote so many posts on it. Most of these focused on elaborating how the classical view of how a rotating charge precesses in a magnetic field might translate into the weird world of quantum mechanics. Others were more focused on the corollary of the quantization of the angular momentum, which is that, in the quantum-mechanical world, the angular momentum is never quite all in one direction only—so that explains some of the seemingly inexplicable randomness in particle behavior.

Frankly, I think those explanations help us quite a bit already but… Well… We need to go the extra mile, right? In fact, that’s drives my search for a geometric (or physical) interpretation of the wavefunction: the extra mile. 🙂

Now, in one of these many posts on spin and angular momentum, I advise my readers – you, that is – to try to work yourself through Feynman’s 6th Lecture on quantum mechanics, which is highly abstract and, therefore, usually skipped. [Feynman himself told his students to skip it, so I am sure that’s what they did.] However, if we believe the physical (or geometric) interpretation of the wavefunction that we presented in previous posts is, somehow, true, then we need to relate it to the abstract math of these so-called transformations between representations. That’s what we’re going to try to do here. It’s going to be just a start, and I will probably end up doing several posts on this but… Well… We do have to start somewhere, right? So let’s see where we get today. 🙂

The thought experiment that Feynman uses throughout his Lecture makes use of what Feynman’s refers to as modified or improved Stern-Gerlach apparatuses. They allow us to prepare a pure state or, alternatively, as Feynman puts it, to analyze a state. In theory, that is. The illustration below present a side and top view of such apparatus. We may already note that the apparatus itself—or, to be precise, our perspective of it—gives us two directions: (1) the up direction, so that’s the positive direction of the z-axis, and (2) the direction of travel of our particle, which coincides with the positive direction of the y-axis. [This is obvious and, at the same time, not so obvious, but I’ll talk about that in my next post. In this one, we basically need to work ourselves through the math, so we don’t want to think too much about philosophical stuff.]

Modified Stern-Gerlach

The kind of questions we want to answer in this post are variants of the following basic one: if a spin-1/2 particle (let’s think of an electron here, even if the Stern-Gerlach experiment is usually done with an atomic beam) was prepared in a given condition by one apparatus S, say the +S state, what is the probability (or the amplitude) that it will get through a second apparatus T if that was set to filter out the +T state?

The result will, of course, depend on the angles between the two apparatuses S and T, as illustrated below. [Just to respect copyright, I should explicitly note here that all illustrations are taken from the mentioned Lecture, and that the line of reasoning sticks close to Feynman’s treatment of the matter too.]

basic set-up

We should make a few remarks here. First, this thought experiment assumes our particle doesn’t get lost. That’s obvious but… Well… If you haven’t thought about this possibility, I suspect you will at some point in time. So we do assume that, somehow, this particle makes a turn. It’s an important point because… Well… Feynman’s argument—who, remember, represents mainstream physics—somehow assumes that doesn’t really matter. It’s the same particle, right? It just took a turn, so it’s going in some other direction. That’s all, right? Hmm… That’s where I part ways with mainstream physics: the transformation matrices for the amplitudes that we’ll find here describe something real, I think. It’s not just perspective: something happened to the electron. That something does not only change the amplitudes but… Well… It describes a different electron. It describes an electron that goes in a different direction now. But… Well… As said, these are reflections I will further develop in my next post. 🙂 Let’s focus on the math here. The philosophy will follow later. 🙂 Next remark.

Second, we assume the (a) and (b) illustrations above represent the same physical reality because the relative orientation between the two apparatuses, as measured by the angle α, is the same. Now that is obvious, you’ll say, but, as Feynman notes, we can only make that assumption because experiments effectively confirm that spacetime is, effectively, isotropic. In other words, there is no aether allowing us to establish some sense of absolute direction. Directions are relativerelative to the observer, that is… But… Well… Again, in my next post, I’ll argue that it’s not because directions are relative that they are, somehow, not real. Indeed, in my humble opinion, it does matter whether an electron goes here or, alternatively, there. These two different directions are not just two different coordinate frames. But… Well… Again. The philosophy will follow later. We need to stay focused on the math here.

Third and final remark. This one is actually very tricky. In his argument, Feynman also assumes the two set-ups below are, somehow, equivalent.

equivalent set-up

You’ll say: Huh? If not, say it! Huh? 🙂 Yes. Good. Huh? Feynman writes equivalentnot the same because… Well… They’re not the same, obviously:

  1. In the first set-up (a), T is wide open, so the apparatus is not supposed to do anything with the beam: it just splits and re-combines it.
  2. In set-up (b) the T apparatus is, quite simply, not there, so… Well… Again. Nothing is supposed to happen with our particles as they come out of S and travel to U.

The fundamental idea here is that our spin-1/2 particle (again, think of an electron here) enters apparatus U in the same state as it left apparatus S. In both set-ups, that is! Now that is a very tricky assumption, because… Well… While the net turn of our electron is the same, it is quite obvious it has to take two turns to get to U in (a), while it only takes one turn in (b). And so… Well… You can probably think of other differences too. So… Yes. And no. Same-same but different, right? 🙂

Right. That is why Feynman goes out of his way to explain the nitty-gritty behind: he actually devotes a full page in small print on this, which I’ll try to summarize in just a few paragraphs here. [And, yes, you should check my summary against Feynman’s actual writing on this.] It’s like this. While traveling through apparatus T in set-up (a), time goes by and, therefore, the amplitude would be different by some phase factor δ. [Feynman doesn’t say anything about this, but… Well… In the particle’s own frame of reference, this phase factor depend on the energy, the momentum and the time and distance traveled. Think of the argument of the elementary wavefunction here: θ = (E∙t – px)/ħ).] Now, if we believe that the amplitude is just some mathematical construct—so that’s what mainstream physicists (not me!) believe—then we could effectively say that the physics of (a) and (b) are the same, as Feynman does. In fact, let me quote him here:

“The physics of set-up (a) and (b) should be the same but the amplitudes could be different by some phase factor without changing the result of any calculation about the real world.”

Hmm… It’s one of those mysterious short passages where we’d all like geniuses like Feynman (or Einstein, or whomever) to be more explicit on their world view: if the amplitudes are different, can the physics really be the same? I mean… Exactly the same? It all boils down to that unfathomable belief that, somehow, the particle is real but the wavefunction that describes it, is not. Of course, I admit that it’s true that choosing another zero point for the time variable would also change all amplitudes by a common phase factor and… Well… That’s something that I consider to be not real. But… Well… The time and distance traveled in the apparatus is the time and distance traveled in the apparatus, right?

Bon… I have to stay away from these questions as for now—we need to move on with the math here—but I will come back to it later. But… Well… Talking math, I should note a very interesting mathematical point here. We have these transformation matrices for amplitudes, right? Well… Not yet. In fact, the coefficient of these matrices are exactly what we’re going to try to derive in this post, but… Well… Let’s assume we know them already. 🙂 So we have a 2-by-2 matrix to go from S to T, from T to U, and then one to go from S to U without going through T, which we can write as RSTRTU,  and RSU respectively. Adding the subscripts for the base states in each representation, the equivalence between the (a) and (b) situations can then be captured by the following formula:

phase factor

So we have that phase factor here: the left- and right-hand side of this equation is, effectively, same-same but different, as they would say in Asia. 🙂 Now, Feynman develops a beautiful mathematical argument to show that the eiδ factor effectively disappears if we convert our rotation matrices to some rather special form that is defined as follows:


I won’t copy his argument here, but I’d recommend you go over it because it is wonderfully easy to follow and very intriguing at the same time. [Yes. Simple things can be very intriguing.] Indeed, the calculation below shows that the determinant of these special rotation matrices will be equal to 1.

det is one

So… Well… So what? You’re right. I am being sidetracked here. The point is that, if we put all of our rotation matrices in this special form, the eiδ factor vanishes and the formula above reduces to:

reduced formula

So… Yes. End of excursion. Let us remind ourselves of what it is that we are trying to do here. As mentioned above, the kind of questions we want to answer will be variants of the following basic one: if a spin-1/2 particle was prepared in a given condition by one apparatus (S), say the +S state, what is the probability (or the amplitude) that it will get through a second apparatus (T) if that was set to filter out the +T state?

We said the result would depend on the angles between the two apparatuses S and T. I wrote: angles—plural. Why? Because a rotation will generally be described by the three so-called Euler angles:  α, β and γ. Now, it is easy to make a mistake here, because there is a sequence to these so-called elemental rotations—and right-hand rules, of course—but I will let you figure that out. 🙂

The basic idea is the following: if we can work out the transformation matrices for each of these elemental rotations, then we can combine them and find the transformation matrix for any rotation. So… Well… That fills most of Feynman’s Lecture on this, so we don’t want to copy all that. We’ll limit ourselves to the logic for a rotation about the z-axis, and then… Well… You’ll see. 🙂

So… The z-axis… We take that to be the direction along which we are measuring the angular momentum of our electron, so that’s the direction of the (magnetic) field gradient, so that’s the up-axis of the apparatus. In the illustration below, that direction points out of the page, so to speak, because it is perpendicular to the direction of the x– and the y-axis that are shown. Note that the y-axis is the initial direction of our beam.

rotation about z

Now, because the (physical) orientation of the fields and the field gradients of S and T is the same, Feynman says that—despite the angle—the probability for a particle to be up or down with regard to and T respectively should be the same. Well… Let’s be fair. He does not only say that: experiment shows it to be true. [Again, I am tempted to interject here that it is not because the probabilities for (a) and (b) are the same, that the reality of (a) and (b) is the same, but… Well… You get me. That’s for the next post. Let’s get back to the lesson here.] The probability is, of course, the square of the absolute value of the amplitude, which we will denote as C+C, C’+, and C’ respectively. Hence, we can write the following:

same probabilities

Now, the absolute values (or the magnitudes) are the same, but the amplitudes may differ. In fact, they must be different by some phase factor because, otherwise, we would not be able to distinguish the two situations, which are obviously different. As Feynman, finally, admits himself—jokingly or seriously: “There must be some way for a particle to know that it has turned the corner at P1.” [P1 is the midway point between and in the illustration, of course—not some probability.]

So… Well… We write:

C’+ = eiλ ·C+ and C’ = eiμ ·C

Now, Feynman notes that an equal phase change in all amplitudes has no physical consequence (think of re-defining our t0 = 0 point), so we can add some arbitrary amount to both λ and μ without changing any of the physics. So then we can choose this amount as −(λ + μ)/2. We write:

subtracting a number

Now, it shouldn’t you too long to figure out that λ’ is equal to λ’ = λ/2 + μ/2 = −μ’. So… Well… Then we can just adopt the convention that λ = −μ. So our C’+ = eiλ ·C+ and C’ = eiμ ·C equations can now be written as:

C’+ = eiλ ·C+ and C’ = eiλ·C

The absolute values are the same, but the phases are different. Right. OK. Good move. What’s next?

Well… The next assumption is that the phase shift λ is proportional to the angle (α) between the two apparatuses. Hence, λ is equal to λ = m·α, and we can re-write the above as:

C’+ = ei·C+ and C’ = ei·C

Now, this assumption may or may not seem reasonable. Feynman justifies it with a continuity argument, arguing any rotation can be built up as a sequence of infinitesimal rotations and… Well… Let’s not get into the nitty-gritty here. [If you want it, check Feynman’s Lecture itself.] Back to the main line of reasoning. So we’ll assume we can write λ as λ = m·α. The next question then is: what is the value for m? Now, we obviously do get exactly the same physics if we rotate by 360°, or 2π radians. So we might conclude that the amplitudes should be the same and, therefore, that ei = eim·2π has to be equal to one, so C’+ = C+ and C’ = C . That’s the case if m is equal to 1. But… Well… No. It’s the same thing again: the probabilities (or the magnitudes) have to be the same, but the amplitudes may be different because of some phase factor. In fact, they should be different. If m = 1/2, then we also get the same physics, even if the amplitudes are not the same. They will be each other’s opposite:

same physical state

Huh? Yes. Think of it. The coefficient of proportionality (m) cannot be equal to 1. If it would be equal to 1, and we’d rotate by 180° only, then we’d also get those C’+ = −C+ and C’ = −C equations, and so these coefficients would, therefore, also describe the same physical situation. Now, you will understand, intuitively, that a rotation of the apparatus by 180° will not give us the same physical situation… So… Well… In case you’d want a more formal argument proving a rotation by 180° does not give us the same physical situation, Feynman has one for you. 🙂

I know that, by now, you’re totally tired and bored, and so you only want the grand conclusion at this point. Well… All of what I wrote above should, hopefully, help you to understand that conclusion, which – I quote Feynman here – is the following:

If we know the amplitudes C+ and C of spin one-half particles with respect to a reference frame S, and we then use new base states, defined with respect to a reference frame T which is obtained from S by a rotation α around the z-axis, the new amplitudes are given in terms of the old by the following formulas:


[Feynman denotes our angle α by phi (φ) because… He uses the Euler angles a bit differently. But don’t worry: it’s the same angle.]

What about the amplitude to go from the C to the C’+ state, and from the C+ to the C’ state? Well… That amplitude is zero. So the transformation matrix is this one:

rotation matrix

Let’s take a moment and think about this. Feynman notes the following, among other things: “It is very curious to say that if you turn the apparatus 360° you get new amplitudes. [They aren’t really new, though, because the common change of sign doesn’t give any different physics.] But if something has been rotated by a sequence of small rotations whose net result is to return it to the original orientation, then it is possible to define the idea that it has been rotated 360°—as distinct from zero net rotation—if you have kept track of the whole history.”

This is very deep. It connects space and time into one single geometric space, so to speak. But… Well… I’ll try to explain this rather sweeping statement later. Feynman also notes that a net rotation of 720° does give us the same amplitudes and, therefore, cannot be distinguished from the original orientation. Feynman finds that intriguing but… Well… I am not sure if it’s very significant. I do note some symmetries in quantum physics involve 720° rotations but… Well… I’ll let you think about this. 🙂

Note that the determinant of our matrix is equal to a·b·ceiφ/2·eiφ/2 = 1. So… Well… Our rotation matrix is, effectively, in that special form! How comes? Well… When equating λ = −μ, we are effectively putting the transformation into that special form.  Let us also, just for fun, quickly check the normalization condition. It requires that the probabilities, in any given representation, add to up to one. So… Well… Do they? When they come out of S, our electrons are equally likely to be in the up or down state. So the amplitudes are 1/√2. [To be precise, they are ±1/√2 but… Well… It’s the phase factor story once again.] That’s normalized: |1/√2|2 + |1/√2|2 = 1. The amplitudes to come out of the apparatus in the up or down state are eiφ/2/√2 and eiφ/2/√2 respectively, so the probabilities add up to |eiφ/2/√2|2 + |eiφ/2/√2|2 = … Well… It’s 1. Check it. 🙂

Let me add an extra remark here. The normalization condition will result in matrices whose determinant will be equal to some pure imaginary exponential, like eiα. So is that what we have here? Yes. We can re-write 1 as 1 = ei·0 = e0, so α = 0. 🙂 Capito? Probably not, but… Well… Don’t worry about it. Just think about the grand results. As Feynman puts it, this Lecture is really “a sort of cultural excursion.” 🙂

Let’s do a practical calculation here. Let’s suppose the angle is, effectively, 180°. So the eiφ/2 and eiφ/2/√2 factors are equal to eiπ/2 = +i and eiπ/2 = −i, so… Well… What does that mean—in terms of the geometry of the wavefunction? Hmm… We need to do some more thinking about the implications of all this transformation business for our geometric interpretation of he wavefunction, but so we’ll do that in our next post. Let us first work our way out of this rather hellish transformation logic. 🙂 [See? I do admit it is all quite difficult and abstruse, but… Well… We can do this, right?]

So what’s next? Well… Feynman develops a similar argument (I should say same-same but different once more) to derive the coefficients for a rotation of ±90° around the y-axis. Why 90° only? Well… Let me quote Feynman here, as I can’t sum it up more succinctly than he does: “With just two transformations—90° about the y-axis, and an arbitrary angle about the z-axis [which we described above]—we can generate any rotation at all.”

So how does that work? Check the illustration below. In Feynman’s words again: “Suppose that we want the angle α around x. We know how to deal with the angle α α around z, but now we want it around x. How do we get it? First, we turn the axis down onto x—which is a rotation of +90°. Then we turn through the angle α around z’. Then we rotate 90° about y”. The net result of the three rotations is the same as turning around x by the angle α. It is a property of space.”

full rotation

Besides helping us greatly to derive the transformation matrix for any rotation, the mentioned property of space is rather mysterious and deep. It sort of reduces the degrees of freedom, so to speak. Feynman writes the following about this:

“These facts of the combinations of rotations, and what they produce, are hard to grasp intuitively. It is rather strange, because we live in three dimensions, but it is hard for us to appreciate what happens if we turn this way and then that way. Perhaps, if we were fish or birds and had a real appreciation of what happens when we turn somersaults in space, we could more easily appreciate such things.”

In any case, I should limit the number of philosophical interjections. If you go through the motions, then you’ll find the following elemental rotation matrices:

full set of rotation matrices

What about the determinants of the Rx(φ) and Ry(φ) matrices? They’re also equal to one, so… Yes. A pure imaginary exponential, right? 1 = ei·0 = e0. 🙂

What’s next? Well… We’re done. We can now combine the elemental transformations above in a more general format, using the standardized Euler angles. Again, just go through the motions. The Grand Result is:

euler transformatoin

Does it give us normalized amplitudes? It should, but it looks like our determinant is going to be a much more complicated complex exponential. 🙂 Hmm… Let’s take some time to mull over this. As promised, I’ll be back with more reflections in my next post.

Quantum-mechanical magnitudes

As I was writing about those rotations in my previous post (on electron orbitals), I suddenly felt I should do some more thinking on (1) symmetries and (2) the concept of quantum-mechanical magnitudes of vectors. I’ll write about the first topic (symmetries) in some other post. Let’s first tackle the latter concept. Oh… And for those I frightened with my last post… Well… This should really be an easy read. More of a short philosophical reflection about quantum mechanics. Not a technical thing. Something intuitive. At least I hope it will come out that way. 🙂

First, you should note that the fundamental idea that quantities like energy, or momentum, may be quantized is a very natural one. In fact, it’s what the early Greek philosophers thought about Nature. Of course, while the idea of quantization comes naturally to us (I think it’s easier to understand than, say, the idea of infinity), it is, perhaps, not so easy to deal with it mathematically. Indeed, most mathematical ideas – like functions and derivatives – are based on what I’ll loosely refer to as continuum theory. So… Yes, quantization does yield some surprising results, like that formula for the magnitude of some vector J:Magnitude formulasThe J·J in the classical formula above is, of course, the equally classical vector dot product, and the formula itself is nothing but Pythagoras’ Theorem in three dimensions. Easy. I just put a + sign in front of the square roots so as to remind you we actually always have two square roots and that we should take the positive one. 🙂

I will now show you how we get that quantum-mechanical formula. The logic behind it is fairly straightforward but, at the same time… Well… You’ll see. 🙂 We know that a quantum-mechanical variable – like the spin of an electron, or the angular momentum of an atom – is not continuous but discrete: it will have some value = jj-1, j-2, …, -(j-2), -(j-1), –j. Our here is the maximum value of the magnitude of the component of our vector (J) in the direction of measurement, which – as you know – is usually written as Jz. Why? Because we will usually choose our coordinate system such that our z-axis is aligned accordingly. 🙂 Those values jj-1, j-2, …, -(j-2), -(j-1), –j are separated by one unit. That unit would be Planck’s quantum of action ħ ≈ 1.0545718×10−34 N·m·s – by the way, isn’t it amazing we can actually measure such tiny stuff in some experiment? 🙂 – if J would happen to be the angular momentum, but the approach here is more general – action can express itself in various ways 🙂 – so the unit doesn’t matter: it’s just the unit, so that’s just one. 🙂 It’s easy to see that this separation implies must be some integer or half-integer. [Of course, now you might think the values of a series like 2.4, 1.4, 0.4, -0.6, -1.6 are also separated by one unit, but… Well… That would violate the most basic symmetry requirement so… Well… No. Our has to be an integer or a half-integer. Please also note that the number of possible values for is equal to 2j+1, as we’ll use that in a moment.]

OK. You’re familiar with this by now and so I should not repeat the obvious. To make things somewhat more real, let’s assume = 3/2, so =  3/2, 1/2, -1/2 or +3/2. Now, we don’t know anything about the system and, therefore, these four values are all equally likely. Now, you may not agree with this assumption but… Well… You’ll have to agree that, at this point, you can’t come up with anything else that would make sense, right? It’s just like a classical situation: J might point in any direction, so we have to give all angles an equal probability. [In fact, I’ll show you – in a minute or so – that you actually have a point here: we should think some more about this assumption – but so that’s for later. I am asking you to just go along with this story as for now.]

So the expected value of Jz is E[Jz] is equal to E[Jz] = (1/4)·(3/2)+(1/4)·(1/2)+(1/4)·(-1/2)+(1/4)·(-3/2) = 0. Nothing new here. We just multiply probabilities with all of the possible values to get an expected value. So we get zero here because our values are distributed symmetrically around the zero point. No surprise. Now, to calculate a magnitude, we don’t need Jbut Jz2. In case you wonder, that’s what this squaring business is all about: we’re abstracting away from the direction and so we’re going to square both positive as well as negative values to then add it all up and take a square root. Now, the expected value of Jz2 is equal to E[Jz] = (1/4)·(3/2)2+(1/4)·(1/2)2+(1/4)·(-1/2)2+(1/4)·(-3/2)2 = 5/4 = 1.25. Some positive value.

You may note that it’s a bit larger than the average of the absolute value of our variable, which is equal to (|3/2|+|1/2|+|-1/2|+|-3/2|)/4 = 1, but that’s just because the squaring favors larger values 🙂 Also note that, of course, we’d also get some positive value if Jwould be a continuous variable over the [-3/2, +3/2] interval, but I’ll let you think about what positive value we’d get for Jzassuming Jz is uniform distributed over the [-3/2, +3/2] interval, because that calculation is actually not so straightforward as it may seem at first. In any case, these considerations are not very relevant to our story here, so let’s move on.

Of course, our z-direction was random, and so we get the same thing for whatever direction. More in particular, we’ll also get it for the x– and y-directions: E[Jx] = E[Jy] = E[Jz] = 5/4. Now, at this point it’s probably good to give you a more generalized formula for these quantities. I think you’ll easily agree to the following one:magnitude squared formulaSo now we can apply our classical J·J = JxJyJzformula to these quantities by calculating the expected value of JJ·J, which is equal to:

E[J·J] = E[Jx2] + E[Jy2] + E[Jz2] = 3·E[Jx2] = 3·E[Jy2] = 3·E[Jz2]

You should note we’re making use of the E[X Y] = E[X]+ E[Y] property here: the expected value of the sum of two variables is equal to the sum of the expected values of the variables, and you should also note this is true even if the individual variables would happen to be correlated – which might or might not be the case. [What do you think is the case here?]

For = 3/2, it’s easy to see we get E[J·J] = 3·E[Jx] = 3·5/4 = (3/2)·(3/2+1) = j·(j+1). We should now generalize this formula for other values of j,  which is not so easy… Hmm… It obviously involves some formula for a series, and I am not good at that… So… Well… I just checked if it was true for = 1/2 and = 1 (please check that at least for yourself too!) and then I just believe the authorities on this for all other values of j. 🙂

Now, in a classical situation, we know that J·J product will be the same for whatever direction J would happen to have, and so its expected value will be equal to its constant value J·J. So we can write: E[J·J] = J·J. So… Well… That’s why we write what we wrote above:Magnitude formulas

Makes sense, no? E[J·J] = E[Jx2+Jy2+Jz2] = E[Jx2]+E[Jy2]+E[Jz2] = j·(j+1) = J·J = J2, so = +√[j(j+1)], right?

Hold your horses, man! Think! What are we doing here, really? We didn’t calculate all that much above. We only found that E[Jx2]+E[Jy2]+E[Jz2] = E[Jx2+Jy2+Jz2] =  j·(j+1). So what? Well… That’s not a proof that the J vector actually exists.


Yes. That J vector might just be some theoretical concept. When everything is said and done, all we’ve been doing – or at least, we imagined we did – is those repeated measurements of JxJy and Jz here – or whatever subscript you’d want to use, like Jθ,φ, for example (the example is not random, of course) – and so, of course, it’s only natural that we assume these things are the magnitude of the component (in the direction of measurement) of some real vector that is out there, but then… Well… Who knows? Think of what we wrote about the angular momentum in our previous post on electron orbitals. We imagine – or do like to think – that there’s some angular momentum vector J out there, which we think of as being “cocked” at some angle, so its projection onto the z-axis gives us those discrete values for m which, for = 2, for example, are equal to 0, 1 or 2 (and -1 and -2, of course) – like in the illustration below. 🙂cocked angle 2But… Well… Note those weird angles: we get something close to 24.1° and then another value close to 54.7°. No symmetry here. 😦 The table below gives some more values for larger j. They’re easy to calculate – it’s, once again, just Pythagoras’ Theorem – but… Well… No symmetries here. Just weird values. [I am not saying the formula for these angles is not straightforward. That formula is easy enough: θ = sin-1(m/√[j(j+1)]). It’s just… Well… No symmetry. You’ll see why that matters in a moment.]CaptureI skipped the half-integer values for in the table above so you might think they might make it easier to come up with some kind of sensible explanation for the angles. Well… No. They don’t. For example, for = 1/2 and m = ± 1/2, the angles are ±35.2644° – more or less, that is. 🙂 As you can see, these angles do not nicely cut up our circle in equal pieces, which triggers the obvious question: are these angles really equally likely? Equal angles do not correspond to equal distances on the z-axis (in case you don’t appreciate the point, look at the illustration below).  angles distance

So… Well… Let me summarize the issue on hand as follows: the idea of the angle of the vector being randomly distributed is not compatible with the idea of those Jz values being equally spaced and equally likely. The latter idea – equally spaced and equally likely Jz values – relates to different possible states of the system being equally likely, so… Well… It’s just a different idea. 😦

Now there is another thing which we should mention here. The maximum value of the z-component of our J vector is always smaller than that quantum-mechanical magnitude, and quite significantly so for small j, as shown in the table below. It is only for larger values of that the ratio of the two starts to converge to 1. For example, for = 25, it is about 1.02, so that’s only 2% off. convergenceThat’s why physicists tell us that, in quantum mechanics, the angular momentum is never “completely along the z-direction.” It is obvious that this actually challenges the idea of a very precise direction in quantum mechanics, but then that shouldn’t surprise us, does it? After, isn’t this what the Uncertainty Principle is all about?

Different states, rather than different directions… And then Uncertainty because… Well… Because of discrete variables that won’t split in the middle. Hmm… 😦

Perhaps. Perhaps I should just accept all of this and go along with it… But… Well… I am really not satisfied here, despite Feynman’s assurance that that’s OK: “Understanding of these matters comes very slowly, if at all. Of course, one does get better able to know what is going to happen in a quantum-mechanical situation—if that is what understanding means—but one never gets a comfortable feeling that these quantum-mechanical rules are ‘natural’.”

I do want to get that comfortable feeling – on some sunny day, at least. 🙂 And so I’ll keep playing with this, until… Well… Until I give up. 🙂 In the meanwhile, if you’d feel you’ve got some better or some more intuitive explanation for all of this, please do let me know. I’d be very grateful to you. 🙂

Post scriptum: Of course, we would all want to believe that J somehow exists because… Well… We want to explain those states somehow, right? I, for one, am not happy with being told to just accept things and shut up. So let me add some remarks here. First, you may think that the narrative above should distinguish between polar and axial vectors. You’ll remember polar vectors are the real vectors, like a radius vector r, or a force F, or velocity or (linear) momentum. Axial vectors (also known as pseudo-vectors) are vectors like the angular momentum vector: we sort of construct them from… Well… From real vectors. The angular momentum L, for example, is the vector cross product of the radius vector r and the linear momentum vector p: we write L = r×p. In that sense, they’re a figment of our imagination. But then… What’s real and unreal? The magnitude of L, for example, does correspond to something real, doesn’t it? And its direction does give us the direction of circulation, right? You’re right. Hence, I think polar and axial vectors are both real – in whatever sense you’d want to define real. Their reality is just different, and that’s reflected in their mathematical behavior: if you change the direction of the axes of your reference frame, polar vectors will change sign too, as opposed to axial vectors: they don’t swap sign. They do something else, which I’ll explain in my next post, where I’ll be talking symmetries.

But let us, for the sake of argument, assume whatever I wrote about those angles applies to axial vectors only. Let’s be even more specific, and say it applies to the angular momentum vector only. If that’s the case, we may want to think of a classical equivalent for the mentioned lack of a precise direction: free nutation. It’s a complicated thing – even more complicated than the phenomenon of precession, which we should be familiar with by now. Look at the illustration below (which I took from an article of a physics professor from Saint Petersburg), which shows both precession as well as nutation. Think of the movement of a spinning top when you release it: its axis will, at first, nutate around the axis of precession, before it settles in a more steady precession.nutationThe nutation is caused by the gravitational force field, and the nutation movement usually dies out quickly because of dampening forces (read: friction). Now, we don’t think of gravitational fields when analyzing angular momentum in quantum mechanics, and we shouldn’t. But there is something else we may want to think of. There is also a phenomenon which is referred to as free nutation, i.e. a nutation that is not caused by an external force field. The Earth, for example, nutates slowly because of a gravitational pull from the Sun and the other planets – so that’s not a free nutation – but, in addition to this, there’s an even smaller wobble – which is an example of free nutation – because the Earth is not exactly spherical. In fact, the Great Mathematician, Leonhard Euler, had already predicted this, back in 1765, but it took another 125 years or so before an astronomist, Seth Chandler, could finally experimentally confirm and measure it. So they named this wobble the Chandler wobble (Euler already has too many things named after him). 🙂

Now I don’t have much backup here – none, actually 🙂 – but why wouldn’t we imagine our electron would also sort of nutate freely because of… Well… Some symmetric asymmetry – something like the slightly elliptical shape of our Earth. 🙂 We may then effectively imagine the angular momentum vector as continually changing direction between a minimum and a maximum angle – something like what’s shown below, perhaps, between 0 and 40 degrees. Think of it as a rotation within a rotation, or an oscillation within an oscillation – or a standing wave within a standing wave. 🙂wobblingI am not sure if this approach would solve the problem of our angles and distances – the issue of whether we should think in equally likely angles or equally likely distances along the z-axis, really – but… Well… I’ll let you play with this. Please do send me some feedback if you think you’ve found something. 🙂

Whatever your solution is, it is likely to involve the equipartition theorem and harmonics, right? Perhaps we can, indeed, imagine standing waves within standing waves, and then standing waves within standing waves. How far can we go? 🙂

Post scriptum 2: When re-reading this post, I was thinking I should probably do something with the following idea. If we’ve got a sphere, and we’re thinking of some vector pointing to some point on the surface of that sphere, then we’re doing something which is referred to as point picking on the surface of a sphere, and the probability distributions – as a function of the polar and azimuthal angles θ and φ – are quite particular. See the article on the Wolfram site on this, for example. I am not sure if it’s going to lead to some easy explanation of the ‘angle problem’ we’ve laid out here but… Well… It’s surely an element in the explanation. The key idea here is shown in the illustration below: if the direction of our momentum in three-dimensional space is really random, there may still be more of a chance of an orientation towards the equator, rather than towards the pole. So… Well… We need to study the math of this. 🙂 But that’s for later.density

Quantum math: the rules – all of them! :-)

In my previous post, I made no compromise, and used all of the rules one needs to calculate quantum-mechanical stuff:


However, I didn’t explain them. These rules look simple enough, but let’s analyze them now. They’re simple and not at the same time, indeed.

[I] The first equation uses the Kronecker delta, which sounds fancy but it’s just a simple shorthand: δij = δji is equal to 1 if i = j, and zero if i ≠ j, with and j representing base states. Equation (I) basically says that base states are all different. For example, the angular momentum in the x-direction of a spin-1/2 particle – think of an electron or a proton – is either +ħ/2 or −ħ/2, not something in-between, or some mixture. So 〈 +x | +x 〉 = 〈 −x | −x 〉 = 1 and 〈 +x | −x 〉 = 〈 −x | +x 〉 = 0.

We’re talking base states here, of course. Base states are like a coordinate system: we settle on an x-, y- and z-axis, and a unit, and any point is defined in terms of an x-, y– and z-number. It’s the same here, except we’re talking ‘points’ in four-dimensional spacetime. To be precise, we’re talking constructs evolving in spacetime. To be even more precise, we’re talking amplitudes with a temporal as well as a spatial frequency, which we’ll often represent as:

ei·θ ei·(ω·t − k ∙x) = a·e(i/ħ)·(E·t − px)

The coefficient in front (a) is just a normalization constant, ensuring all probabilities add up to one. It may not be a constant, actually: perhaps it just ensure our amplitude stays within some kind of envelope, as illustrated below.

Photon wave

As for the ω = E/ħ and k = p/ħ identities, these are the de Broglie equations for a matter-wave, which the young Comte jotted down as part of his 1924 PhD thesis. He was inspired by the fact that the E·t − px factor is an invariant four-vector product (E·t − px = pμxμ) in relativity theory, and noted the striking similarity with the argument of any wave function in space and time (ω·t − k ∙x) and, hence, couldn’t resist equating both. Louis de Broglie was inspired, of course, by the solution to the blackbody radiation problem, which Max Planck and Einstein had convincingly solved by accepting that the ω = E/ħ equation holds for photons. As he wrote it:

“When I conceived the first basic ideas of wave mechanics in 1923–24, I was guided by the aim to perform a real physical synthesis, valid for all particles, of the coexistence of the wave and of the corpuscular aspects that Einstein had introduced for photons in his theory of light quanta in 1905.” (Louis de Broglie, quoted in Wikipedia)

Looking back, you’d of course want the phase of a wavefunction to be some invariant quantity, and the examples we gave our previous post illustrate how one would expect energy and momentum to impact its temporal and spatial frequency. But I am digressing. Let’s look at the second equation. However, before we move on, note that minus sign in the exponent of our wavefunction: a·ei·θ. The phase turns counter-clockwise. That’s just the way it is. I’ll come back to this.

[II] The φ and χ symbols do not necessarily represent base states. In fact, Feynman illustrates this law using a variety of examples including both polarized as well as unpolarized beams, or ‘filtered’ as well as ‘unfiltered’ states, as he calls it in the context of the Stern-Gerlach apparatuses he uses to explain what’s going on. Let me summarize his argument here.

I discussed the Stern-Gerlach experiment in my post on spin and angular momentum, but the Wikipedia article on it is very good too. The principle is illustrated below: a inhomogeneous magnetic field – note the direction of the gradient ∇B = (∂B/∂x, ∂B/∂y, ∂B/∂z) – will split a beam of spin-one particles into three beams. [Matter-particles with spin one are rather rare (Lithium-6 is an example), but three states (rather than two only, as we’d have when analyzing spin-1/2 particles, such as electrons or protons) allow for more play in the analysis. 🙂 In any case, the analysis is easily generalized.]

stern-gerlach simple

The splitting of the beam is based, of course, on the quantized angular momentum in the z-direction (i.e. the direction of the gradient): its value is either ħ, 0, or −ħ. We’ll denote these base states as +, 0 or −, and we should note they are defined in regard to an apparatus with a specific orientation. If we call this apparatus S, then we can denote these base states as +S, 0S and −S respectively.

The interesting thing in Feynman’s analysis is the imagined modified Stern-Gerlach apparatus, which – I am using Feynman‘s words here 🙂 –  “puts Humpty Dumpty back together.” It looks a bit monstruous, but it’s easy enough to understand. Quoting Feynman once more: “It consists of a sequence of three high-gradient magnets. The first one (on the left) is just the usual Stern-Gerlach magnet and splits the incoming beam of spin-one particles into three separate beams. The second magnet has the same cross section as the first, but is twice as long and the polarity of its magnetic field is opposite the field in magnet 1. The second magnet pushes in the opposite direction on the atomic magnets and bends their paths back toward the axis, as shown in the trajectories drawn in the lower part of the figure. The third magnet is just like the first, and brings the three beams back together again, so that leaves the exit hole along the axis.”

stern-gerlach modified

Now, we can use this apparatus as a filter by inserting blocking masks, as illustrated below.


But let’s get back to the lesson. What about the second ‘Law’ of quantum math? Well… You need to be able to imagine all kinds of situations now. The rather simple set-up below is one of them: we’ve got two of these apparatuses in series now, S and T, with T tilted at the angle α with respect to the first.


I know: you’re getting impatient. What about it? Well… We’re finally ready now. Let’s suppose we’ve got three apparatuses in series, with the first and the last one having the very same orientation, and the one in the middle being tilted. We’ll denote them by S, T and S’ respectively. We’ll also use masks: we’ll block the 0 and − state in the S-filter, like in that illustration above. In addition, we’ll block the + and − state in the T apparatus and, finally, the 0 and − state in the S’ apparatus. Now try to imagine what happens: how many particles will get through?


Just try to think about it. Make some drawing or something. Please!  


OK… The answer is shown below. Despite the filtering in S, the +S particles that come out do have an amplitude to go through the 0T-filter, and so the number of atoms that come out will be some fraction (α) of the number of atoms (N) that came out of the +S-filter. Likewise, some other fraction (β) will make it through the +S’-filter, so we end up with βαN particles.

ratio 2

Now, I am sure that, if you’d tried to guess the answer yourself, you’d have said zero rather than βαN but, thinking about it, it makes sense: it’s not because we’ve got some angular momentum in one direction that we have none in the other. When everything is said and done, we’re talking components of the total angular momentum here, don’t we? Well… Yes and no. Let’s remove the masks from T. What do we get?


Come on: what’s your guess? N?

[…] You’re right. It’s N. Perfect. It’s what’s shown below.

ratio 3

Now, that should boost your confidence. Let’s try the next scenario. We block the 0 and − state in the S-filter once again, and the + and − state in the T apparatus, so the first two apparatuses are the same as in our first example. But let’s change the S’ apparatus: let’s close the + and − state there now. Now try to imagine what happens: how many particles will get through?


Come on! You think it’s a trap, isn’t it? It’s not. It’s perfectly similar: we’ve got some other fraction here, which we’ll write as γαN, as shown below.

ratio 1Next scenario: S has the 0 and − gate closed once more, and T is fully open, so it has no masks. But, this time, we set S’ so it filters the 0-state with respect to it. What do we get? Come on! Think! Please!


The answer is zero, as shown below.

ratio 4

Does that make sense to you? Yes? Great! Because many think it’s weird: they think the T apparatus must ‘re-orient’ the angular momentum of the particles. It doesn’t: if the filter is wide open, then “no information is lost”, as Feynman puts it. Still… Have a look at it. It looks like we’re opening ‘more channels’ in the last example: the S and S’ filter are the same, indeed, and T is fully open, while it selected for 0-state particles before. But no particles come through now, while with the 0-channel, we had γαN.

Hmm… It actually is kinda weird, won’t you agree? Sorry I had to talk about this, but it will make you appreciate that second ‘Law’ now: we can always insert a ‘wide-open’ filter and, hence, split the beams into a complete set of base states − with respect to the filter, that is − and bring them back together provided our filter does not produce any unequal disturbances on the three beams. In short, the passage through the wide-open filter should not result in a change of the amplitudes. Again, as Feynman puts it: the wide-open filter should really put Humpty-Dumpty back together again. If it does, we can effectively apply our ‘Law’:

second law

For an example, I’ll refer you to my previous post. This brings me to the third and final ‘Law’.

[III] The amplitude to go from state φ to state χ is the complex conjugate of the amplitude to to go from state χ to state φ:

〈 χ | φ 〉 = 〈 φ | χ 〉*

This is probably the weirdest ‘Law’ of all, even if I should say, straight from the start, we can actually derive it from the second ‘Law’, and the fact that all probabilities have to add up to one. Indeed, a probability is the absolute square of an amplitude and, as we know, the absolute square of a complex number is also equal to the product of itself and its complex conjugate:

|z|= |z|·|z| = z·z*

[You should go through the trouble of reviewing the difference between the square and the absolute square of a complex number. Just write z as a + ib and calculate (a + ib)= a2 + 2ab+ b2 , as opposed to |z|= a2 + b2. Also check what it means when writing z as r·eiθ = r·(cosθ + i·sinθ).]

Let’s applying the probability rule to a two-filter set-up, i.e. the situation with the S and the tilted T filter which we described above, and let’s assume we’ve got a pure beam of +S particles entering the wide-open T filter, so our particles can come out in either of the three base states with respect to T. We can then write:

〈 +T | +S 〉+ 〈 0T | +S 〉+ 〈 −T | +S 〉= 1

⇔ 〈 +T | +S 〉〈 +T | +S 〉* + 〈 0T | +S 〉〈 0T | +S 〉* + 〈 −T | +S 〉〈 −T | +S 〉* = 1

Of course, we’ve got two other such equations if we start with a 0S or a −S state. Now, we take the 〈 χ | φ 〉 = ∑ 〈 χ | i 〉〈 i | φ 〉 ‘Law’, and substitute χ and φ for +S, and all states for the base states with regard to T. We get:

〈 +S | +S 〉 = 1 = 〈 +S | +T 〉〈 +T | +S 〉 + 〈 +S | 0T 〉〈 0T | +S 〉 + 〈 +S | –T 〉〈 −T | +S 〉

These equations are consistent only if:

〈 +S | +T 〉 = 〈 +T | +S 〉*,

〈 +S | 0T 〉 = 〈 0T | +S 〉*,

〈 +S | −T 〉 = 〈 −T | +S 〉*,

which is what we wanted to prove. One can then generalize to any state φ and χ. However, proving the result is one thing. Understanding it is something else. One can write down a number of strange consequences, which all point to Feynman‘s rather enigmatic comment on this ‘Law’: “If this Law were not true, probability would not be ‘conserved’, and particles would get ‘lost’.” So what does that mean? Well… You may want to think about the following, perhaps. It’s obvious that we can write:

|〈 φ | χ 〉|= 〈 φ | χ 〉〈 φ | χ 〉* = 〈 χ | φ 〉*〈 χ | φ 〉 = |〈 χ | φ 〉|2

This says that the probability to go from the φ-state to the χ-state  is the same as the probability to go from the χ-state to the φ-state.

Now, when we’re talking base states, that’s rather obvious, because the probabilities involved are either 0 or 1. However, if we substitute for +S and −T, or some more complicated states, then it’s a different thing. My guts instinct tells me this third ‘Law’ – which, as mentioned, can be derived from the other ‘Laws’ – reflects the principle of reversibility in spacetime, which you may also interpret as a causality principle, in the sense that, in theory at least (i.e. not thinking about entropy and/or statistical mechanics), we can reverse what’s happening: we can go back in spacetime.

In this regard, we should also remember that the complex conjugate of a complex number in polar form, i.e. a complex number written as r·eiθ, is equal to r·eiθ, so the argument in the exponent gets a minus sign. Think about what this means for our a·ei·θ ei·(ω·t − k ∙x) = a·e(i/ħ)·(E·t − pxfunction. Taking the complex conjugate of this function amounts to reversing the direction of t and x which, once again, evokes that idea of going back in spacetime.

I feel there’s some more fundamental principle here at work, on which I’ll try to reflect a bit more. Perhaps we can also do something with that relationship between the multiplicative inverse of a complex number and its complex conjugate, i.e. z−1 = z*/|z|2. I’ll check it out. As for now, however, I’ll leave you to do that, and please let me know if you’ve got any inspirational ideas on this. 🙂

So… Well… Goodbye as for now. I’ll probably talk about the Hamiltonian in my next post. I think we really did a good job in laying the groundwork for the really hardcore stuff, so let’s go for that now. 🙂

Post Scriptum: On the Uncertainty Principle and other rules

After writing all of the above, I realized I should add some remarks to make this post somewhat more readable. First thing: not all of the rules are there—obviously! Most notably, I didn’t say anything about the rules for adding or multiplying amplitudes, but that’s because I wrote extensively about that already, and so I assume you’re familiar with that. [If not, see my page on the essentials.]

Second, I didn’t talk about the Uncertainty Principle. That’s because I didn’t have to. In fact, we don’t need it here. In general, all popular accounts of quantum mechanics have an excessive focus on the position and momentum of a particle, while the approach in this and my previous post is quite different. Of course, it’s Feynman’s approach to QM really. Not ‘mine’. 🙂 All of the examples and all of the theory he presents in his introductory chapters in the Third Volume of Lectures, i.e. the volume on QM, are related to things like:

  • What is the amplitude for a particle to go from spin state +S to spin state −T?
  • What is the amplitude for a particle to be scattered, by a crystal, or from some collision with another particle, in the θ direction?
  • What is the amplitude for two identical particles to be scattered in the same direction?
  • What is the amplitude for an atom to absorb or emit a photon? [See, for example, Feynman’s approach to the blackbody radiation problem.]
  • What is the amplitude to go from one place to another?

In short, you read Feynman, and it’s only at the very end of his exposé, that he starts talking about the things popular books start with, such as the amplitude of a particle to be at point (x, t) in spacetime, or the Schrödinger equation, which describes the orbital of an electron in an atom. That’s where the Uncertainty Principle comes in and, hence, one can really avoid it for quite a while. In fact, one should avoid it for quite a while, because it’s now become clear to me that simply presenting the Uncertainty Principle doesn’t help all that much to truly understand quantum mechanics.

Truly understanding quantum mechanics involves understanding all of these weird rules above. To some extent, that involves dissociating the idea of the wavefunction with our conventional ideas of time and position. From the questions above, it should be obvious that ‘the’ wavefunction does actually not exist: we’ve got a wavefunction for anything we can and possibly want to measure. That brings us to the question of the base states: what are they?

Feynman addresses this question in a rather verbose section of his Lectures titled: What are the base states of the world? I won’t copy it here, but I strongly recommend you have a look at it. 🙂

I’ll end here with a final equation that we’ll need frequently: the amplitude for a particle to go from one place (r1) to another (r2). It’s referred to as a propagator function, for obvious reasons—one of them being that physicists like fancy terminology!—and it looks like this:


The shape of the e(i/ħ)·(pr12function is now familiar to you. Note the r12 in the argument, i.e. the vector pointing from r1 to r2. The pr12 dot product equals |p|∙|r12|·cosθ = p∙r12·cosθ, with θ the angle between p and r12. If the angle is the same, then cosθ is equal to 1. If the angle is π/2, then it’s 0, and the function reduces to 1/r12. So the angle θ, through the cosθ factor, sort of scales the spatial frequency. Let me try to give you some idea of how this looks like by assuming the angle between p and r12 is the same, so we’re looking at the space in the direction of the momentum only and |p|∙|r12|·cosθ = p∙r12. Now, we can look at the p/ħ factor as a scaling factor, and measure the distance x in units defined by that scale, so we write: x = p∙r12/ħ. The function then reduces to (ħ/p)·eix/x = (ħ/p)·cos(x)/x + i·(ħ/p)·sin(x)/x, and we just need to square this to get the probability. All of the graphs are drawn hereunder: I’ll let you analyze them. [Note that the graphs do not include the ħ/p factor, which you may look at as yet another scaling factor.] You’ll see – I hope! – that it all makes perfect sense: the probability quickly drops off with distance, both in the positive as well as in the negative x-direction, while it’s going to infinity when very near. [Note that the absolute square, using cos(x)/x and sin(x)/x yields the same graph as squaring 1/x—obviously!]


Working with amplitudes

Don’t worry: I am not going to introduce the Hamiltonian matrix—not yet, that is. But this post is going a step further than my previous ones, in the sense that it will be more abstract. At the same time, I do want to stick to real physical examples so as to illustrate what we’re doing when working with those amplitudes. The example that I am going to use involves spin. So let’s talk about that first.

Spin, angular momentum and the magnetic moment

You know spin: it allows experienced pool players to do the most amazing tricks with billiard balls, making a joke of what a so-called elastic collision is actually supposed to look like. So it should not come as a surprise that spin complicates the analysis in quantum mechanics too. We dedicated several posts to that (see, for example, my post on spin and angular momentum in quantum physics) and I won’t repeat these here. Let me just repeat the basics:

1. Classical and quantum-mechanical spin do share similarities: the basic idea driving the quantum-mechanical spin model is that of a electric charge – positive or negative – spinning about its own axis (this is often referred to as intrinsic spin) as well as having some orbital motion (presumably around some other charge, like an electron in orbit with a nucleus at the center). This intrinsic spin, and the orbital motion, give our charge some angular momentum (J) and, because it’s an electric charge in motion, there is a magnetic moment (μ). To put things simply: the classical and quantum-mechanical view of things converge in their analysis of atoms or elementary particles as tiny little magnets. Hence, when placed in an external magnetic field, there is some interaction – a force – and their potential and/or kinetic energy changes. The whole system, in fact, acquires extra energy when placed in an external magnetic field.

Note: The formula for that magnetic energy is quite straightforward, both in classical as well as in quantum physics, so I’ll quickly jot it down: U = −μB = −|μ|·|B|·cosθ = −μ·B·cosθ. So it’s just the scalar product of the magnetic moment and the magnetic field vector, with a minus sign in front so as to get the direction right. [θ is the angle between the μ and B vectors and determines whether U as a whole is positive or negative.

2. The classical and quantum-mechanical view also diverge, however. They diverge, first, because of the quantum nature of spin in quantum mechanics. Indeed, while the angular momentum can take on any value in classical mechanics, that’s not the case in quantum mechanics: in whatever direction we measure, we get a discrete set of values only. For example, the angular momentum of a proton or an electron is either −ħ/2 or +ħ/2, in whatever direction we measure it. Therefore, they are referred to as spin-1/2 particles. All elementary fermions, i.e. the particles that constitute matter (as opposed to force-carrying particles, like photons), have spin 1/2.

Note: Spin-1/2 particles include, remarkably enough, neutrons too, which has the same kind of magnetic moment that a rotating negative charge would have. The neutron, in other words, is not exactly ‘neutral’ in the magnetic sense. One can explain this by noting that a neutron is not ‘elementary’, really: it consists of three quarks, just like a proton, and, therefore, it may help you to imagine that the electric charges inside are, somehow, distributed unevenly—although physicists hate such simplifications. I am noting this because the famous Stern-Gerlach experiment, which established the quantum nature of particle spin, used silver atoms, rather than protons or electrons. More in general, we’ll tend to forget about the electric charge of the particles we’re describing, assuming, most of the time, or tacitly, that they’re neutral—which helps us to sort of forget about classical theory when doing quantum-mechanical calculations!

3. The quantum nature of spin is related to another crucial difference between the classical and quantum-mechanical view of the angular momentum and the magnetic moment of a particle. Classically, the angular momentum and the magnetic moment can have any direction.

Note: I should probably briefly remind you that J is a so-called axial vector, i.e. a vector product (as opposed to a scalar product) of the radius vector r and the (linear) momentum vector p = m·v, with v the velocity vector, which points in the direction of motion. So we write: J = r×p = r×m·v = |r|·|p|·sinθ·n. The n vector is the unit vector perpendicular to the plane containing r and (and, hence, v, of course) given by the right-hand rule. I am saying this to remind you that the direction of the magnetic moment and the direction of motion are not the same: the simple illustration below may help to see what I am talking about.]

atomic magnet

Back to quantum mechanics: the image above doesn’t work in quantum mechanics. We do not have an unambiguous direction of the angular momentum and, hence, of the magnetic moment. That’s where all of the weirdness of the quantum-mechanical concept of spin comes out, really. I’ll talk about that when discussing Feynman’s ‘filters’ – which I’ll do in a moment – but here I just want to remind you of the mathematical argument that I presented in the above-mentioned post. Just like in classical mechanics, we’ll have a maximum (and, hence, also a minimum) value for J, like +ħ, 0 and +ħ for a Lithium-6 nucleus. [I am just giving this rather special example of a spin-1 article so you’re reminded we can have particles with an integer spin number too!] So, when we measure its angular momentum in any direction really, it will take on one of these three values: +ħ, 0 or +ħ. So it’s either/or—nothing in-between. Now that leads to a funny mathematical situation: one would usually equate the maximum value of a quantity like this to the magnitude of the vector, which is equal to the (positive) square root of J2 = J= Jx2 + Jy2 + Jz2, with Jx, Jy and Jz the components of J in the x-, y- and z-direction respectively. But we don’t have continuity in quantum mechanics, and so the concept of a component of a vector needs to be carefully interpreted. There’s nothing definite there, like in classical mechanics: all we have is amplitudes, and all we can do is calculate probabilities, or expected values based on those amplitudes.

Huh? Yes. In fact, the concept of the magnitude of a vector itself becomes rather fuzzy: all we can do really is calculate its expected value. Think of it: in the classical world, we have a J2 = Jproduct that’s independent of the direction of J. For example, if J is all in the x-direction, then Jand Jwill be zero, and J2 = Jx2. If it’s all in the y-direction, then Jand Jwill be zero and all of the magnitude of J will be in the y-direction only, so we write: J2 = Jy2. Likewise, if J does not have any z-component, then our JJ product will only include the x- and y-components: JJ = Jx2 + Jy2. You get the idea: the J2 = Jproduct is independent of the direction of J exactly because, in classical mechanics, J actually has a precise and unambiguous magnitude and direction and, therefore, actually has a precise and unambiguous component in each direction. So we’d measure Jx, Jy, and Jand, regardless of the actual direction of J, we’d find its magnitude |J| = J = +√J2 = +(Jx2 + Jy2 + Jz2)1/2.

In quantum mechanics, we just don’t have quantities like that. We say that Jx, Jand Jhave an amplitude to take on a value that’s equal to +ħ, 0 or +ħ (or whatever other value is allowed by the spin number of the system). Now that we’re talking spin numbers, please note that this characteristic number is usually denoted by j, which is a bit confusing, but so be it. So can be 0, 1/2, 1, 3/2, etcetera, and the number of ‘permitted values’ is 2j + 1 values, with each value being separated by an amount equal to ħ. So we have 1, 2, 3, 4, 5 etcetera possible values for Jx, Jand Jrespectively. But let me get back to the lesson. We just can’t do the same thing in quantum mechanics. For starters, we can’t measure Jx, Jy, and Jsimultaneously: our Stern-Gerlach apparatus has a certain orientation and, hence, measures one component of J only. So what can we do?

Frankly, we can only do some math here. The wave-mechanical approach does allow to think of the expected value of J2 = J= Jx2 + Jy2 + Jz2 value, so we write:

E[J2] = E[JJ] = E[Jx2 + Jy2 + Jz2] = ?

[Feynman’s use of the 〈 and 〉 brackets to denote an expected value is hugely confusing, because these brackets are also used to denote an amplitude. So I’d rather use the more commonly used E[X] notation.] Now, it is a rather remarkable property, but the expected value of the sum of two or more random variables is equal to the sum of the expected values of the variables, even if those variables may not be independent. So we can confidently use the linearity property of the expected value operator and write:

E[Jx+ Jy2 + Jz2] = E[Jx2] + E[Jx2] + E[Jx2]

Now we need something else. It’s also just part of the quantum-mechanical approach to things and so you’ll just have to accept it. It sounds rather obvious but it’s actually quite deep: if we measure the x-, y- or z-component of the angular momentum of a random particle, then each of the possible values is equally likely to occur. So that means, in our case, that the +ħ, 0 or +ħ values are equally likely, so their likelihood is one into three, i.e. 1/3. Again, that sounds obvious but it’s not. Indeed, please note, once again, that we can’t measure Jx, Jy, and Jsimultaneously, so the ‘or’ in x-, y- or z-component is an exclusive ‘or’. Of course, I must add this equipartition of likelihoods is valid only because we do not have a preferred direction for J: the particles in our beam have random ‘orientations’. Let me give you the lingo for this: we’re looking at an unpolarized beam. You’ll say: so what? Well… Again, think about what we’re doing here: we may of may not assume that the Jx, Jy, and Jvariables are related. In fact, in classical mechanics, they surely are: they’re determined by the magnitude and direction of J. Hence, they are not random at all ! But let me continue, so you see what comes out.

Because the +ħ, 0 and +ħ values are equally, we can write: E[Jx2] = ħ2/3 + 0/3 + (−ħ)2/3 = [ħ2 + 0 + (−ħ)2]/3 = 2ħ2/3. In case you wonder, that’s just the definition of the expected value operator: E[X] = p1x+ p2x+ … = ∑pixi, with pi the likelihood of the possible value x. So we take a weighted average with the respective probabilities as the weights. However, in this case, with an unpolarized beam, the weighted average becomes a simple average.

Now, E[Jy2] and E[Jz2] are – rather unsurprisingly – also equal to 2ħ2/3, so we find that E[J2] = E[Jx2] + E[Jx2] + E[Jx2] = 3·(2ħ2/3) = 2ħand, therefore, we’d say that the magnitude of the angular momentum is equal to |J| = J = +√2·ħ ≈ 1.414·ħ. Now that value is not equal to the maximum value of our x-, y-, z-component of J, or the component of J in whatever direction we’d want to measure it. That maximum value is ħ, without the √2 factor, so that’s some 40% less than the magnitude we’ve just calculated!

Now, you’ve probably fallen asleep by now but, what this actually says, is that the angular momentum, in quantum mechanics, is never completely in any direction. We can state this in another way: it implies that, in quantum mechanics, there’s no such thing really as a ‘definite’ direction of the angular momentum.


OK. Enough on this. Let’s move on to a more ‘real’ example. Before I continue though, let me generalize the results above:

[I] A particle, or a system, will have a characteristic spin number: j. That number is always an integer or a half-integer, and it determines a discrete set of possible values for the component of the angular momentum J in any direction.

[II] The number of values is equal to 2j + 1, and these values are separated by ħ, which is why they are usually measured in units of ħ, i.e. Planck’s reduced constant: ħ ≈ 1×10−34 J·s, so that’s tiny but real. 🙂 [It’s always good to remind oneself that we’re actually trying to describe reality.] For example, the permitted values for a spin-3/2 particle are +3ħ/2, +ħ/2, −ħ/2 and −3ħ/2 or, measured in units of ħ, +3/2, +1/2, −1/2 and −3/2. When discussing spin-1/2 particles, we’ll often refer to the two possible states as the ‘up’ and the ‘down’ state respectively. For example, we may write the amplitude for an electron or a proton to have a angular momentum in the x-direction equal to +ħ/2 or −ħ/2 as 〈+x〉 and 〈−x〉 respectively. [Don’t worry too much about it right now: you’ll get used to the notation quickly.]

[III] The classical concepts of angular momentum, and the related magnetic moment, have their limits in quantum mechanics. The magnitude of a vector quantity like angular momentum is generally not equal to the maximum value of the component of that quantity in any direction. The general rule is:

 J= j·(j+1)ħ2 > j2·ħ2

So the maximum value of any component of J in whatever direction (i.e. j·ħ) is smaller than the magnitude of J (i.e. √[ j·(j+1)]·ħ). This implies we cannot associate any precise and unambiguous direction with quantities like the angular momentum J or the magnetic moment μ. As Feynman puts it:

“That the energy of an atom [or a particle] in a magnetic field can have only certain discrete energies is really not more surprising than the fact that atoms in general have only certain discrete energy levels—something we mentioned often in Volume I. Why should the same thing not hold for atoms in a magnetic field? It does. But it is the attempt to correlate this with the idea of an oriented magnetic moment that brings out some of the strange implications of quantum mechanics.”

A real example: the disintegration of a muon in a magnetic field

I talked about muon integration before, when writing a much more philosophical piece on symmetries in Nature and time reversal in particular. I used the illustration below. We’ve got an incoming muon that’s being brought to rest in a block of material, and then, as muons do, it disintegrates, emitting an electron and two neutrinos. As you can see, the decay direction is (mostly) in the direction of the axial vector that’s associated with the spin direction, i.e. the direction of the grey dashed line. However, there’s some angular distribution of the decay direction, as illustrated by the blue arrows, that are supposed to visualize the decay products, i.e. the electron and the neutrinos.

Muon decay

This disintegration process is very interesting from a more philosophical side. The axial vector isn’t ‘real’: it’s a mathematical concept—a pseudovector. A pseudo- or axial vector is the product of two so-called true vectors, aka as polar vectors. Just look back at what I wrote about the angular momentum: the J in the J = r×p = r×m·v formula is such vector, and its direction depends on the spin direction, which is clockwise or counter-clockwise, depending from what side you’re looking at it. Having said that, who’s to judge if the product of two ‘true’ vectors is any less ‘true’ than the vectors themselves? 🙂

The point is: the disintegration process does not respect what is referred to as P-symmetry. That’s because our mathematical conventions (like all of these right-hand rules that we’ve introduced) are unambiguous, and they tell us that the pseudovector in the mirror image of what’s going on, has the opposite direction. It has to, as per our definition of a vector product. Hence, our fictitious muon in the mirror should send its decay products in the opposite direction too! So… Well… The mirror image of our muon decay process is actually something that’s not going to happen: it’s physically impossible. So we’ve got a process in Nature here that doesn’t respect ‘mirror’ symmetry. Physicists prefer to call it ‘P-symmetry’, for parity symmetry, because it involves a flip of sign of all space coordinates, so there’s a parity inversion indeed. So there’s processes in Nature that don’t respect it but, while that’s all very interesting, it’s not what I want to write about. [Just check that post of mine if you’d want to read more.] Let me, therefore, use another illustration—one that’s more to the point in terms of what we do want to talk about here:

muon decay Feynman

So we’ve got the same muon here – well… A different one, of course! 🙂 – entering that block (A) and coming to a grinding halt somewhere in the middle, and then it disintegrates in a few micro-seconds, which is an eternity at the atomic or sub-atomic scale. It disintegrates into an electron and two neutrinos, as mentioned above, with some spread in the decay direction. [In case you wonder where we can find muons… Well… I’ll let you look it up yourself.] So we have:


Now it turns out that the presence of a magnetic field (represented by the B arrows in the illustration above) can drastically change the angular distribution of decay directions. That shouldn’t surprise us, of course, but how does it work, exactly? Well… To simplify the analysis, we’ve got a polarized beam here: the spin direction of all muons before they enter the block and/or the magnetic field, i.e. at time t = 0, is in the +x-direction. So we filtered them just, before they entered the block. [I will come back to this ‘filtering’ process.] Now, if the muon’s spin would stay that way, then the decay products – and the electron in particular – would just go straight, because all of the angular momentum is in that direction. However, we’re in the quantum-mechanical world here, and so things don’t stay the same. In fact, as we explained, there’s no such things as a definite angular momentum: there’s just an amplitude to be in the +x state, and that amplitude changes in time and in space.

How exactly? Well… We don’t know, but we can apply some clever tricks here. The first thing to note is that our magnetic field will add to the energy of our muon. So, as I explained in my previous post, the magnetic field adds to the E in the exponent of our complex-valued wavefunction a·e(i/ħ)(E·t − px). In our example, we’ve got a magnetic field in the z-direction only, so that U = −μB reduces to U = −μz·B, and we can re-write our wavefunction as:

a·e(i/ħ)[(E+U)·t − px] = a·e(i/ħ)(E·t − px)·e(i/ħ)(μz·B·t)

Of course, the magnetic field only acts from t = 0 to when the muon disintegrates, which we’ll denote by the point t = τ. So what we get is that the probability amplitude of a particle that’s been in a uniform magnetic field changes by a factor e(i/ħ)(μz·B·τ). Note that it’s a factor indeed: we use it to multiply. You should also note that this is a complex exponential, so it’s a periodic function, with its real and imaginary part oscillating between zero and one. Finally, we know that μz can take on only certain values: for a spin-1/2 particle, they are plus or minus some number, which we’ll simply denote as μ, so that’s without the subscript, so our factor becomes:


[The plus or minus sign needs to be explained here, so let’s do that quickly: we have two possible states for a spin-1/2 particle, one ‘up’, and the other ‘down’. But then we also know that the phase of our complex-valued wave function turns clockwise, which is why we have a minus sign in the exponent of our eiθ expression. In short, for the ‘up’ state, we should take the positive value, i.e. +μ, but the minus sign in the exponent of our eiθ function makes it negative again, so our factor is e−(i/ħ)(μ·B·t) for the ‘up’ state, and e+(i/ħ)(μ·B·t) for the ‘down’ state.]

OK. We get that, but that doesn’t get us anywhere—yet. We need another trick first. One of the most fundamental rules in quantum-mechanics is that we can always calculate the amplitude to go from one state, say φ (read: ‘phi’), to another, say χ (read: ‘khi’), if we have a complete set of so-called base states, which we’ll denote by the index i or j (which you shouldn’t confuse with the imaginary unit, of course), using the following formula:

〈 χ | φ 〉 = ∑ 〈 χ | i 〉〈 i | φ 〉

I know this is a lot to swallow, so let me start with the notation. You should read 〈 χ | φ 〉 from right to left: it’s the amplitude to go from state φ to state χ. This notation is referred to as the bra-ket notation, or the Dirac notation. [Dirac notation sounds more scientific, doesn’t it?] The right part, i.e. | φ 〉, is the bra, and the left part, i.e. 〈 χ | is the ket. In our example, we wonder what the amplitude is for our muon staying in the +x state. Because that amplitude is time-dependent, we can write it as A+(τ) = 〈 +at time t = τ | +at time t = 0 〉 = 〈 +at t = τ | +at t = 0 〉or, using a very misleading shorthand, 〈 +x | +x 〉. [The shorthand is misleading because the +in the ket obviously means something else than the +in the bra.]

But let’s apply the rule. We’ve got two states with respect to each coordinate axis only here. For example, in respect to the z-axis, the spin values are +z and −z respectively. [As mentioned above, we actually mean that the angular momentum in this direction is either +ħ/2 or −ħ/2, aka as ‘up’ or ‘down’ respectively, but then quantum theorists seem to like all kinds of symbols better, so we’ll use the +z and −z notations for these two base states here. So now we can use our rule and write:

A+(t) = 〈 +x | +x 〉 = 〈 +x | +z 〉〈 +z | +x 〉 + 〈 +x | −z 〉〈 −z | +x 〉

You’ll say this doesn’t help us any further, but it does, because there is another set of rules, which are referred to as transformation rules, which gives us those 〈 +z | +x 〉 and 〈 −z | +x 〉 amplitudes. They’re real numbers, and it’s the same number for both amplitudes.

〈 +z | +x 〉 = 〈 −z | +x 〉 = 1/√2

This shouldn’t surprise you too much: the square root disappears when squaring, so we get two equal probabilities – 1/2, to be precise – that add up to one which – you guess it – they have to add up to because of the normalization rule: the sum of all probabilities has to add up to one, always. [I can feel your impatience, but just hang in here for a while, as I guide you through what is likely to be your very first quantum-mechanical calculation.] Now, the 〈 +z | +x 〉 = 〈 −z | +x 〉 = 1/√2 amplitudes are the amplitudes at time t = 0, so let’s be somewhat less sloppy with our notation and write 〈 +z | +x 〉 as C+(0) and 〈 −z | +x 〉 as C(0), so we write:

〈 +z | +x 〉 = C+(0) = 1/√2

〈 −z | +x 〉 = C(0) = 1/√2

Now we know what happens with those amplitudes over time: that e(i/ħ)(±μ·B·t) factor kicks in, and so we have:

C+(t) = C+(0)·e−(i/ħ)(μ·B·t) = e−(i/ħ)(μ·B·t)/√2

C(t) = C(0)·e+(i/ħ)(μ·B·t) = e+(i/ħ)(μ·B·t)/√2

As for the plus and minus signs, see my remark on the tricky ± business in regard to μ. To make a long story somewhat shorter :-), our expression for A+(t) = 〈 +x at t | +x 〉 now becomes:

A+(t) = 〈 +x | +z 〉·C+(t) + 〈 +x | −z 〉·C(t)

Now, you wouldn’t be too surprised if I’d just tell you that the 〈 +x | +z 〉 and 〈 +x | −z 〉 amplitudes are also real-valued and equal to 1/√2, but you can actually use yet another rule we’ll generalize shortly: the amplitude to go from state φ to state χ is the complex conjugate of the amplitude to to go from state χ to state φ, so we write 〈 χ | φ 〉 = 〈 φ | χ 〉*, and therefore:

〈 +x | +z 〉 = 〈 +z | +x 〉* = (1/√2)* = (1/√2)

〈 +x | −z 〉 = 〈 −z | +x 〉* = (1/√2)* = (1/√2)

So our expression for A+(t) = 〈 +x at t | +x 〉 now becomes:

A+(t) = e−(i/ħ)(μ·B·t)/2 + e(i/ħ)(μ·B·t)/2

That’s the sum of a complex-valued function and its complex conjugate, and we’ve shown more than once (see my page on the essentials, for example) that such sum reduces to the sum of the real parts of the complex exponentials. [You should not expect any explanation of Euler’s eiθ = cosθ + i·sinθ rule at this level of understanding.] In short, we get the following grand result:

muon decay result

The big question, of course: what does this actually mean? 🙂 Well… Just square this thing and you get the probabilities shown below. [Note that the period of a squared cosine function is π, instead of 2π, which you can easily verify using an online graphing tool.]


Because you’re tired of this business, you probably don’t realize what we’ve just done. It’s spectacular and mundane at the same time. Let me quote Feynman to summarize the results:

“We find that the chance of catching the decay electron in the electron counter varies periodically with the length of time the muon has been sitting in the magnetic field. The frequency depends on the magnetic moment μ. The magnetic moment of the muon has, in fact, been measured in just this way.”

As far as I am concerned, the key result is that we’ve learned how to work with those mysterious amplitudes, and the wavefunction, in a practical way, thereby using all of the theoretical rules of the quantum-mechanical approach to real-life physical situations. I think that’s a great leap forward, and we’ll re-visit those rules in a more theoretical and philosophical démarche in the next post. As for the example itself, Feynman takes it much further, but I’ll just copy the Grand Master here:


Huh? Well… I am afraid I have to leave it at this, as I discussed the precession of ‘atomic’ magnets elsewhere (see my post on precession and diamagnetism), which gives you the same formula: ω= μ·B/J (just substitute J for ±ħ/2). However, the derivation above approaches it from an entirely different angle, which is interesting. Of course, all fits. 🙂 However, I’ll let you do your own homework now. I hope to see you tomorrow for the mentioned theoretical discussion. Have a nice evening, or weekend – or whatever ! 🙂

The Liénard–Wiechert potentials and the solution for Maxwell’s equations

In my post on gauges and gauge transformations in electromagnetics, I mentioned the full and complete solution for Maxwell’s equations, using the electric and magnetic (vector) potential Φ and A. Feynman frames it nicely, so I should print it and put it on the kitchen door, so I can look at it everyday. 🙂


I should print the wave equation we derived in our previous post too. Hmm… Stupid question, perhaps, but why is there no wave equation above? I mean: in the previous post, we said the wave equation was the solution for Maxwell’s equation, didn’t we? The answer is simple, of course: the wave equation is a solution for waves originating from some source and traveling through free space, so that’s a special case. Here we have everything. Those integrals ‘sweep’ all over space, and so that’s real space, which is full of moving charges and so there’s waves everywhere. So the solution above is far more general and captures it all: it’s the potential at every point in space, and at every point in time, taking into account whatever else is there, moving or not moving. In fact, it is the general solution of Maxwell’s equations.

How do we find it? Well… I could copy Feynman’s 21st Lecture but I won’t do that. The solution is based on the formula for Φ and A for a small blob of charge, and then the formulas above just integrate over all of space. That solution for a small blob of charge, i.e. a point charge really, was first deduced in 1898, by a French engineer: Alfred-Marie Liénard. However, his equations did not get much attention, apparently, because a German physicist, Emil Johann Wiechert, worked on the same thing and found the very same equations just two years later. That’s why they are referred to as the Liénard-Wiechert potentials, so they both get credit for it, even if both of them worked it out independently. These are the equations:

electric potential

magnetic potential

Now, you may wonder why I am mentioning them, and you may also wonder how we get those integrals above, i.e. our general solution for Maxwell’s equations, from them. You can find the answer to your second question in Feynman’s 21st Lecture. 🙂 As for the first question, I mention them because one can derive two other formulas for E and B from them. It’s the formulas that Feynman uses in his first Volume, when studying light: E


Now you’ll probably wonder how we can get these two equations from the Liénard-Wiechert potentials. They don’t look very similar, do they? No, they don’t. Frankly, I would like to give you the same answer as above, i.e. check it in Feynman’s 21st Lecture, but the truth is that the derivation is so long and tedious that even Feynman says one needs “a lot of paper and a lot of time” for that. So… Well… I’d suggest we just use all of those formulas and not worry too much about where they come from. If we can agree on that, we’re actually sort of finished with electromagnetism. All the chapters that follow Feynman’s 21st Lecture are applications indeed, so they do not add all that much to the core of the classical theory of electromagnetism.

So why did I write this post? Well… I am not sure. I guess I just wanted to sum things up for myself, so I can print it all out and put it on the kitchen door indeed. 🙂 Oh, and now that I think of it, I should add one more formula, and that’s the formula for spherical waves (as opposed to the plane waves we discussed in my previous post). It’s a very simple formula, and entirely what you’d expect to see:

spherical wave

The S function is the source function, and you can see that the formula is a Coulomb-like potential, but with the retarded argument. You’ll wonder: what is ψ? Is it E or B or what? Well… You can just substitute: ψ can be anything. Indeed, Feynman gives a very general solution for any type of spherical wave here. 🙂

So… That’s it, folks. That’s all there is to it. I hope you enjoyed it. 🙂

Addendum: Feynman’s equation for electromagnetic radiation

I talked about Feynman’s formula for electromagnetic radiation before, but it’s probably good to quickly re-explain it here. Note that it talks about the electric field only, as the magnetic field is so tiny and, in any case, if we have E then we can find B. So the formula is:


The geometry of the situation is depicted below. We have some charge q that, we assume, is moving through space, and so it creates some field E at point P. The er‘ vector is the unit vector from P to Q, so it points at the charge. Well… It points to where the charge was at the time just a little while ago, i.e. at the time t – r‘/c. Why? Well… We don’t know where q is right now, because the field needs some time travel, we don’t know q right now, i.e. q at time t. It might be anywhere. Perhaps it followed some weird trajectory during the time r‘/c, like the trajectory below.

radiation formula

So our er‘ vector moves as the charge moves, and so it will also have velocity and, likely, some acceleration, but what we measure for its velocity and acceleration, i.e. the d(er)/dt and d2(er)/dt2 in that Feynman equation, is also the retarded velocity and the retarded acceleration. But look at the terms in the equation. The first two terms have a 1/r’2 in them, so these two effects diminish with the square of the distance. The first term is just Coulomb’s Law (note that the minus sign in front takes care of the fact that like charges repel and so the E vector will point in the other way). Well… It is and it isn’t, because of the retarded time argument, of course. And so we have the second term, which sort of compensates for that. Indeed, the d(er)/dt is the time rate of change of er and, hence, if r‘/c = Δt, then (r‘/cd(er)/dt is a first-order approximation of Δer.

As Feynman puts it: “The second term is as though nature were trying to allow for the fact that the Coulomb effect is retarded, if we might put it very crudely. It suggests that we should calculate the delayed Coulomb field but add a correction to it, which is its rate of change times the time delay that we use. Nature seems to be attempting to guess what the field at the present time is going to be, by taking the rate of change and multiplying by the time that is delayed.” In short, the first two terms can be written as E = −(q/4πε0)/r2·[er + Δer] and, hence, it’s a sort of modified Coulomb Law that sort of tries to guess what the electrostatic field at P should be based on (a) what it is right now, and (b) how q’s direction and velocity, as measured now, would change it.

Now, the third term has a 1/c2 factor in front but, unlike the other two terms, this effect does not fall off with distance. So the formula below fully describes electromagnetic radiation, indeed, because it’s the only important term when we get ‘far enough away’, with ‘far enough’ meaning that the parts that go as the square of the distance have fallen off so much that they’re no longer significant.

radiation formula 2Of course, you’re smart, and so you’ll immediately note that, as r increases, that unit vector keeps wiggling but that effect will also diminish. You’re right. It does, but in a fairly complicated way. The acceleration of er has two components indeed. One is the transverse or tangential piece, because the end of er goes up and down, and the other is a radial piece because it stays on a sphere and so it changes direction. The radial piece is the smallest bit, and actually also varies as the inverse square of r when r is fairly large. The tangential piece, however, varies only inversely as the distance, so as 1/r. So, yes, the wigglings of er look smaller and smaller, inversely as the distance, but the tangential piece is and remains significant, because it does not vary as 1/r2 but as 1/r only.  That’s why you’ll usually see the law of radiation written in an even simpler way:

final law of radiation

This law reduces the whole effect to the component of the acceleration that is perpendicular to the line of sight only. It assumes the distance is huge as compared to the distance over which the charge is moving and, therefore, that r‘ and r can be equated for all practical purposes. It also notes that the tangential piece is all that matters, and so it equates d2(er)/dtwith ax/r. The whole thing is probably best illustrated as below: we have a generator driving charges up and down in G – so it’s an antenna really – and so we’ll measure a strong signal when putting the radiation detector D in position 1, but we’ll measure nothing in position 3. [The detector is, of course, another antenna, but with an amplifier for the signal.] But so here I am starting to talk about electromagnetic radiation once more, which was not what I wanted to do here, if only because Feynman does a much better job at that than I could ever do. 🙂radiator

Understanding gyroscopes

You know a gyroscope: it’s a spinning wheel or disk mounted in a frame that itself is free to alter in direction, so the axis of rotation is not affected as the mounting tilts or moves about. Therefore, gyroscopes are used to provide stability or maintain a reference direction in navigation systems. Understanding a gyroscope itself is simple enough: it only involves a good understanding of the so-called moment of inertia. Indeed, in the previous post, we introduced a lot of concepts related to rotational motion, notably the concepts of torque and angular momentum but, because that post was getting too long, I did not talk about the moment of inertia and gyroscopes. Let me do that now. However, I should warn you: you will not be able to understand this post if you haven’t read or didn’t understand the previous post. So, if you can’t follow, please go back: it’s probably because you didn’t get the other post.

The moment of inertia and angular momentum are related but not quite the same. Let’s first recapitulate angular momentum. Angular momentum is the equivalent of linear momentum for rotational motion:

  1. If we want to change the linear motion of an object, as measured by its momentum p = mv, we’ll need to apply a force. Changing the linear motion means changing either (a) the speed (v), i.e. the magnitude of the velocity vector v, (b) the direction, or (c) both. This is expressed in Newton’s Law, F = m(dv/dt), and so we note that the mass is just a factor of proportionality measuring the inertia to change.
  2. The same goes for angular momentum (denoted by L): if we want to change it, we’ll need to apply a force, or a torque as it’s referred to when talking rotational motion, and such torque can change either (a) L’s magnitude (L), (b) L’s direction or (c) both.

Just like linear momentum, angular momentum is also a product of two factors: the first factor is the angular velocity ω, and the second factor is the moment of inertia. The moment of inertia is denoted by I so we write L = Iω. But what is I? If we’re analyzing a rigid body (which is what we usually do), then it will be calculated as follows:

formula 1

This is easy enough to understand: the inertia for turning will depend not just on the masses of all of the particles that make up the object, but also on their distance from the axis of rotation–and note that we need to square these distances. The L = Iω formula, combined with the formula for I above, explains why a spinning skater doing a ‘scratch spin’ speeds up tremendously when drawing in his or her arms and legs. Indeed, the total angular momentum has to remain the same, but I becomes much smaller as a result of that r2 factor in the formula. Hence, if I becomes smaller, then ω has to go up significantly in order to conserve angular momentum.

Finally, we note that angular momentum and linear momentum can be easily related through the following equation:

Formula 2

That’s all kids stuff. To understand gyroscopes, we’ll have to go beyond that and do some vector analysis. In the previous post, we explained that rotational motion is usually analyzed in terms of torques than forces, and we detailed the relations between force and torque. More in particular, we introduced a torque vector τ with the following components:

τ = (τyz, τzx, τxy) = (τx, τy, τz) with

τx = τyz = yFz – zFy

τy = τzx = zFx – xFz

τz = τxy = xFy – yFx.

We also noted that this torque vector could be written as a cross product of a radius vector and the force: τ = F. Finally, we also pointed out the relation between the x-, y- and z-components of the torque vector and the plane of rotation:

(1) τx = τyz is rotational motion about the x-axis (i.e. motion in the yz-plane)

(2) τy = τzx is rotational motion about the y-axis (i.e. motion in the zx plane)

(3) τz = τxy is rotational motion about the z-axis (i.e. motion in the xy-plane)

The angular momentum vector L will have the same direction as the torque vector, but it’s the cross product of the radius vector and the momentum vector: L = p. For clarity, I reproduce the animation I used in my previous post once again.


How do we get that cross vector product for L? We noted that τ (i.e. the Greek tau) = dL/dt. So we need to take the time derivative of all three components of L. What are the components of L? They look very similar to those of τ:

L = (Lyz, Lzx, Lxy) = (Lx, Ly, Lz) with

Lx = Lyz = ypz – zpy

Ly = Lzx = zpx – xpz

Lz = Lxy = xpy – ypx.

Now, just check the time derivatives of Lx, Ly, and Lz and you’ll find the components of the torque vector τ. Together with the formulas above, that should be sufficient to convince you that L is, indeed, a vector cross product of r and p: L = p.

Again, if you feel this is too difficult, please read or re-read my previous post. But if you do understand everything, then you are ready for a much more difficult analysis, and that’s an explanation of why a spinning top does not fall as it rotates about.

In order to understand that explanation, we’ll first analyze the situation below. It resembles the experiment with the swivel chair that’s often described on ‘easy physics’ websites: the man below holds a spinning wheel with its axis horizontal, and then turns this axis into the vertical. As a result, the man starts to turn himself in the opposite direction.

Rotating angular momentum

Let’s now look at the forces and torques involved. These are shown below.

Angular vectors in gyroscope

This looks very complicated–you’ll say! You’re right: it is quite complicated–but not impossible to understand. First note the vectors involved in the starting position: we have an angular momentum vector L0 and an angular velocity vector ω0. These are both axial vectors, as I explained in my previous post: their direction is perpendicular to the plane of motion, i.e. they are arrows along the axis of rotation. This is in line with what we wrote above: if an object is rotating in the zx-plane (which is the case here), then the angular momentum vector will have a y-component only, and so it will be directed along the y-axis. Which side? That’s determined by the right-hand screw rule. [Again, please do read my previous post for more details if you’d need them.]

So now we have explained L0 and ω0. What about all the other vectors? First note that there would be no torque if the man would not try to turn the axis. In that case, the angular momentum would just remain what it is, i.e. dL/dt = 0, and there would be no torque. Indeed, remember that τ = dL/dt, just like F = dp/dt, so dL/dt = 0, then τ = 0. But so the man is turning the axis of rotation and, hence, τ = dL/dt ≠ 0. What’s changing here is not the magnitude of the angular momentum but its direction. As usual, the analysis is in terms of differentials.

As the man turns the spinning wheel, the directional change of the angular momentum is defined by the angle Δθ, and we get a new angular momentum vector L1. The difference between L1 and L0 is given by the vector ΔL. This ΔL vector is a tiny vector in the L0L1 plane and, because we’re looking at a differential displacement only, we can say that, for all practical purposes, this ΔL is orthogonal to L0 (as we move from L0 to L1, we’re actually moving along an arc and, hence, ΔL is a tangential vector). Therefore, simple trigonometry allows us to say that its magnitude ΔL will be equal to L0Δθ. [We should actually write sin(Δθ) but, because we’re talking differentials and measuring angles in radians (so the value reflects arc lengths), we can equate sin(Δθ) with Δθ).]

Now, the torque vector τ has the same direction as the ΔL vector (that’s obvious from their definitions), but what is its magnitude? That’s an easy question to answer: τ = ΔL/Δt = L0Δθ/Δt = L0 (Δθ/Δt). Now, this result induces us to define another axial vector which we’ll denote using the same Greek letter omega, but written as a capital letter instead of in lowercase: Ω. The direction of Ω is determined by using that right-hand screw rule which we’ve always been using, and Ω‘s magnitude is equal to Ω = Δθ/Δt. So, in short, Ω is an angular velocity vector just like ω: its magnitude is the speed with which the man is turning the axis of rotation of the spinning wheel, and its direction is determined using the same rules. If we do that, we get the rather remarkable result that we can write the torque vector τ as the cross product of Ω and L0:

τ = Ω×L0

Now, this is not an obvious result, so you should check it yourself. When doing that, you’ll note that the two vectors are orthogonal and so we have τ = Ω×L0 = Ω×L0 =|Ω||L0|sin(π/2)n = ΩL0n with n the normal unit vector given, once again, by the right-hand screw rule. [Note how the order of the two factors in a cross product matters: b = –a.]

You’re probably tired of this already, and so you’ll say: so what?

Well… We have a torque. A torque is produced by forces, and a torque vector along the z-axis is associated with rotation about the z-axis, i.e. rotation in the xy-plane. Such rotation is caused by the forces F and –F that produce the torque, as shown in the illustration. [Again, their direction is determined by the right-hand screw rule – but I’ll stop repeating that from now on.] But… Wait a minute. First, the direction is wrong, isn’t it? The man turns the other way in reality. And, second, where do these forces come from? Well… The man produces them, and the direction of the forces is not wrong: as the man applies these forces, with his hands, as he holds the spinning wheel and turns it into the vertical direction, equal and opposite forces act on him (cf. the action-reaction principle), and so he starts to turn in the opposite direction.

So there we are: we have explained this complex situation fully in terms of torques and forces now. So that’s good. [If you don’t believe the thing about those forces, just get one of your wheels out of your mountainbike, let it spin, and try to change the plane in which it is spinning: you’ll see you’ll need a bit of force. Not much, but enough, and it’s exactly the kind of force that the man in the illustration is experiencing.]

Now, what if we would not be holding the spinning wheel? What if we would let it pivot, for example? Well… It would just pivot, as shown below.


But… Why doesn’t it fall? Hah! There we are! Now we are finally ready for the analysis we really want to do, i.e. explaining why these spinning tops (or gyros as they’re referred to in physics) don’t fall.

Such spinning top is shown in the illustration below. It’s similar to the spinning wheel: there’s a rotational axis, and we have the force of gravity trying to change the direction of that axis, so it’s like the man turning that spinning wheel indeed, but so now it’s gravity exerting the force that’s needed to change the angular momentum. Let’s associate the vertical direction with the z-axis, and the horizontal place with the xy-axis, and let’s go step-by-step:

  1. The gravitational force wants to pull that spinning top down. So the ΔL vector points downward this time, not upward. Hence, the torque vector will point downward too. But so it’s a torque pointing along the z-axis.
  2. Such torque along the z-axis is associated with a rotation in the xy-plane, so that’s why the spinning top will slowly revolve about the z-axis, parallel to the xy-plane. This process is referred to as precession, and so there’s a precession torque and a precession angular velocity.

spinning top

So that explains precession and so that’s all there is to it. Now you’ll complain, and rightly so: what I write above, does not explain why the spinning top does not actually fall. I only explained that precession movement. So what’s going on? That spinning top should fall as it precesses, shouldn’t it?

It actually does fall. The point to note, however, is that the precession movement itself changes the direction of the angular momentum vector as well. So we have a new ΔL vector pointing sideways, i.e. a vector in the horizontal plane–so not along the z axis. Hence, we should have a torque in the horizontal plane, and so that implies that we should have two equal and opposite forces acting along the z-axis.

In fact, the right-hand screw rule gives us the direction of those forces: if these forces were effectively applied to the spinning top, it would fall even faster! However, the point to note is that there are no such forces. Indeed, it is not like the man with the spinning wheel: no one (or nothing) is pushing or applying the forces that should produce the torque associated with this change in angular momentum. Hence, because these forces are absent, the spinning top begins to ‘fall’ in the opposite direction of the lacking force, thereby counteracting the gravitational force in such a way that the spinning top just spins about the z-axis without actually falling.

Now, this is, most probably, very difficult to understand in the way you would like to understand it, so just let it sink in and think about it for a while. In this regard, and to help the understanding, it’s probably worth noting that the actual process of reaching equilibrium is somewhat messy. It is illustrated below: if we hold a spinning gyro for a while and then, suddenly, we let it fall (yes, just let it go), it will actually fall. However, as it’s falling, it also starts turning and then, because it starts turning, it also starts ‘falling’ upwards, as explained in that story of the ‘missing force’ above. Initially, the upward movement will overshoot the equilibrium position, thereby slowing the gyro’s speed in the horizontal plane. And so then, because its horizontal speed becomes smaller, it stops ‘falling upward’, and so that means it’s falling down again. But then it starts turning again, and so on and so on. I hope you grasp this–more or less at least. Note that frictional effects will cause the up-and-down movement to dampen out, and so we get a so-called cycloidal motion dampening down to the steady motion we associate with spinning tops and gyros.

Actual gyroscope motion

That, then, is the ‘miracle’ of a spinning top explained. Is it less of a ‘miracle’ now that we have explained it in terms of torques and missing forces? That’s an appreciation which each of us has to make for him- or herself. I actually find it all even more wonderful now that I can explain it more or less using the kind of math I used above–but then you may have a different opinion.

In any case, let us – to wrap it all up – ask some simple questions about some other spinning objects. What about the Earth for example? It has an axis of rotation too, and it revolves around the Sun. Is there anything like precession going on?

The first answer is: no, not really. The axis of rotation of the Earth changes little with respect to the stars. Indeed, why would it change? Changing it would require a torque, and where would the required force for such torque come from? The Earth is not like a gyro on a pivot being pulled down by some force we cannot see. The Sun attracts the Earth as a whole indeed. It does not change its axis of rotation. That’s why we have a fairly regular day and night cycle.

The more precise answer is: yes, there actually is a very slow axial precession. The whole precessional cycle takes approximately 26,000 years, and it causes the position of stars – as perceived by us, earthlings, that is – to slowly change. Over this cycle, the Earth’s north axial pole moves from where it is now, in a circle with an angular radius of about 23.5 degrees, as illustrated below.


What is this precession caused by? There must be some torque. There is. The Earth is not perfectly spherical: it bulges outward at the equator, and the gravitational tidal forces of the Moon and Sun apply some torque here, attempting to pull the equatorial bulge into the plane of the ecliptic, but instead causing it to precess. So it’s a quite subtle motion, but it’s there, and it’s got also something to do with the gravitational force. However, it’s got nothing to do with the way gravitation makes a spinning top do what it does. [The most amazing thing about this, in my opinion, is that, despite the fact that the precessional movement is so tiny, the Greeks had already discovered it: indeed, the Greek astronomer and mathematician Hipparchus of Nicaea gave a pretty precise figure for this so-called ‘precession of the equinoxes’ in 127 BC.]

What about electrons? Are they like gyros rotating around some pivot? Here the answer is very simple and very straightforward: No, not at all! First, there are no pivots in an atom. Second, the current understanding of an electron – i.e. the quantum-mechanical understanding of a electron – is not compatible with the classical notion of spin. Let me just copy an explanation from Georgia State University’s HyperPhyics website. It basically says it all:

“Experimental evidence like the hydrogen fine structure and the Stern-Gerlach experiment suggest that an electron has an intrinsic angular momentum, independent of its orbital angular momentum. These experiments suggest just two possible states for this angular momentum, and following the pattern of quantized angular momentum, this requires an angular momentum quantum number of 1/2. With this evidence, we say that the electron has spin 1/2. An angular momentum and a magnetic moment could indeed arise from a spinning sphere of charge, but this classical picture cannot fit the size or quantized nature of the electron spin. The property called electron spin must be considered to be a quantum concept without detailed classical analogy.

So… I guess this should conclude my exposé on rotational motion. I am not sure what I am going to write about next, but I’ll see. 🙂

Post scriptum:

The above treatment is largely based on Feynman’s Lectures.(Vol. I, Chapter 18, 19 and 20). The subject could also be discussed using the concept of a force couple, aka pure moment. A force couple is a system of forces with a resultant moment but no resultant force. Hence, it causes rotation without translation or, more generally, without any acceleration of the centre of mass. In such analysis, we can say that gravity produces a force couple on the spinning top. The two forces of this couple are equal and opposite, and they pull at opposite ends. However, because one end of the top is fixed (friction forces keep the tip fixed to the ground), the force at the other end makes the top go about the vertical axis.

The situation we have is that gravity causes such force couple to appear, just like the man tilting the spinning wheel causes such force couple to appear. Now, the analysis above shows that the direction of the new force is perpendicular to the plane in which the axis of rotation changes, or wants to change in the case of our spinning top. So gravity wants to pull the top down and causes it to move sideways. This horizontal movement will, in turn, create another force couple. The direction of the resultant force, at the free end of the axis of rotation of the top, will, once again, be vertical, but it will oppose the gravity force. So, in a very simplified explanation of things, we could say:

  1. Gravity pulls the top downwards, and causes a force that will make the top move sideways. So the new force, which causes the precession movement, is orthogonal to the gravitation force, i.e. it’s a horizontal force.
  2. That horizontal force will, in turn, cause another force to appear. That force will also be orthogonal to the horizontal force. As we made two 90 degrees turns, so to say, i.e. 180 degrees in total, it means that this third force will be opposite to the gravitational force.
  3. In equilibrium, we have three forces: gravity, the force causing the precession and, finally, a force neutralizing gravity as the spinning top precesses about the vertical axis.

This approach allows for a treatment that is somewhat more intuitive than Feynman’s concept of the ‘missing force.’

Spinning: the essentials

When introducing mirror symmetry (P-symmetry) in one of my older posts (time reversal and CPT-symmetry), I also introduced the concept of axial and polar vectors in physics. Axial vectors have to do with rotations, or spinning objects. Because spin – i.e. turning motion – is such an important concept in physics, I’d suggest we re-visit the topic here.

Of course, I should be clear from the outset that the discussion below is entirely classical. Indeed, as Wikipedia puts it: “The intrinsic spin of elementary particles (such as electrons) is quantum-mechanical phenomenon that does not have a counterpart in classical mechanics, despite the term spin being reminiscent of classical phenomena such as a planet spinning on its axis.” Nevertheless, if we don’t understand what spin is in the classical world – i.e. our world for all practical purposes – then we won’t get even near to appreciating what it might be in the quantum-mechanical world. Besides, it’s just plain fun: I am sure you have played, as a kid of as an adult even, with one of those magical spinning tops or toy gyroscopes and so you probably wonder how it really works in physics. So that’s what this post is all about.

The essential concept is the concept of torque. For rotations in space (i.e. rotational motion), the torque is what the force is for linear motion:

  • It’s the torque (τ) that makes an object spin faster or slower, just like the force would accelerate or decelerate that very same object when it would be moving along some curve (as opposed to spinning around some axis).
  • There’s also a similar ‘law of Newton’ for torque: you’ll remember that the force equals the time rate-of-change of a vector quantity referred to as (linear) momentum: F = dp/dt = d(mv)/dt = ma (the mass times the acceleration). Likewise, we have a vector quantity that is referred to as angular momentum (L), and we can write: τ (i.e. the Greek tau) = dL/dt.
  • Finally, instead of linear velocity, we’ll have an angular velocity ω (omega), which is the time rate-of-change of the angle θ defining how far the object has gone around (as opposed to the distance in linear dynamics, describing how far the object has gone along). So we have ω = dθ/dt. This is actually easy to visualize because we know that θ is the length of the corresponding arc on the unit circle. Hence, the equivalence with the linear distance traveled is easily ascertained.

There are numerous other equivalences. For example, we also have an angular acceleration: α = dω/dt = d2θ/dt2; and we should also note that, just like the force, the torque is doing work – in its conventional definition as used in physics – as it turns an object:

ΔW = τ·Δθ

However, we also need to point out the differences. The animation below does that very well, as it relates the ‘new’ concepts – i.e. torque and angular momentum – to the ‘old’ concepts – i.e. force and linear momentum.


So what do we have here? We have vector quantities once again, denoted by symbols in bold-face. However, τ, L and ω are special vectors: axial vectors indeed, as opposed to the polar vectors F, p and v. Axial vectors are directed along the axis of spin – so that is, strangely enough, at right angles to the direction of spin, or perpendicular to the ‘plane of the twist’ as Feynman calls it – and the direction of the axial vector is determined by the direction of spin through one of two conventions: the ‘right-hand screw rule’ or the ‘left-hand screw rule’. Physicists have settled on the former.

If you feel very confused now (I did when I first looked at it), just step back and go through the full argument as I develop it here. It helps to think of torque (also known, for some obscure reason, as the moment of the force) as a twist on an object or a plane indeed: the torque’s magnitude is equal to the tangential component of the force, i.e. F·sin(Δθ), times the distance between the object and the axis of rotation (we’ll denote this distance by r). This quantity is also equal to the product of the magnitude of the force itself and the length of the so-called lever arm, i.e. the perpendicular distance from the axis to the line of action of the force (this lever arm length is denoted by r0). So we can write τ as:

  1. The product of the tangential component of the force times the distance r: τ = r·Ft = r·F·sin(Δθ)
  2. The product of the length of the lever arm times the force: τ = r0·F
  3. The torque is the work done per unit of distance traveled: τ = ΔW/Δθ or τ = dW/dθ in the limit.

So… These are actually only the basics, which you should remember from your high-school physics course. If not, have another look at it. We now need to go from scalar quantities to vector quantities to understand that animation above. Torque is not a vector like force or velocity, not a priori at least. However, we can associate torque with a vector of a special type, an axial vector. Feynman calls vectors such as force or (linear) velocity ‘honest’ or ‘real’ vectors. The mathematically correct term for such ‘honest’ or ‘real’ vectors is polar vector. Hence, axial vectors are not ‘honest’ or ‘real’ in some sense: we derive them from the polar vectors. They are, in effect, a so-called cross product of two ‘honest’ vectors. Here we need to explain the difference between a dot and a cross product between two vectors once again:

(1) A dot product, which we denoted by a little dot (·), yields a scalar quantity: b = |a||b|cosα = a·b·cosα with α the angle between the two vectors a and b. Note that the dot product of two orthogonal vectors is equal to zero, so take care:  τ = r·Ft = r·F·sin(Δθ) is not a dot product of two vectors. It’s a simple product of two scalar quantities: we only use the dot as a mark of separation, which may be quite confusing. In fact, some authors use ∗ for a product of scalars to avoid confusion: that’s not a bad idea, but it’s not a convention as yet. Omitting the dot when multiplying scalars (as I do when I write |a||b|cosα) is also possible, but it makes it a bit difficult to read formulas I find. Also note, once again, how important the difference between bold-face and normal type is in formulas like this: it distinguishes vectors from scalars – and these are two very different things indeed.

(2) A cross product, which we denote by using a cross (×), yields another vector: τ = r×F =|r|·|F|·sinα·n = r·F·sinα·n with n the normal unit vector given by the right-hand rule. Note how a cross product involves a sine, not a cosine – as opposed to a dot product. Hence, if r and F are orthogonal vectors (which is not unlikely), then this sine term will be equal to 1. If the two vectors are not perpendicular to each other, then the sine function will assure that we use the tangential component of the force.

But, again, how do we go from torque as a scalar quantity (τ = r·Ft) to the vector τ = r×F? Well… Let’s suppose, first, that, in our (inertial) frame of reference, we have some object spinning around the z-axis only. In other words, it spins in the xy-plane only. So we have a torque around (or about) the z-axis, i.e. in the xy-plane. The work that will be done by this torque can be written as:

ΔW = FxΔx + FyΔy = (xFy – yFx)Δθ

Huh? Yes. This results from a simple two-dimensional analysis of what’s going on in the xy-plane: the force has an x- and a y-component, and the distance traveled in the x- and y-direction is Δx = –yΔθ and Δy = xΔθ respectively. I won’t go into the details of this (you can easily find these elsewhere) but just note the minus sign for Δx and the way the x and y get switched in the expressions.

So the torque in the xy-plane is given by τxy = ΔW/Δθ = xFy – yFx. Likewise, if the object would be spinning about the x-axis – or, what amounts to the same, in the yz-plane – we’d get τyz = yFz – zFy. Finally, for some object spinning about the y-axis (i.e. in the zx-plane – and please note I write zx, not xz, so as to be consistent as we switch the order of the x, y and z coordinates in the formulas), then we’d get τzx = zFx – xFz. Now we can appreciate the fact that a torque in some other plane, at some angle with our Cartesian planes, would be some combination of these three torques, so we’d write:

(1)    τxy = xFy – yFx

(2)    τyz = yFz – zFy and

(3)    τzx = zFx – xFz.

Another observer with his Cartesian x’, y’ and z’ axes in some other direction (we’re not talking some observer moving away from us but, quite simply, a reference frame that’s being rotated itself around some axis not necessarily coinciding with any of the x-, y- z- or x’-, y’- and z’-axes mentioned above) would find other values as he calculates these torques, but the formulas would look the same:

(1’) τx’y’ = x’Fy’ – y’Fx’

(2’) τy’z’ = y’Fz’ – z’Fy’ and

(3’) τz’x’ = z’Fx’ – x’Fz’.

Now, of course, there must be some ‘nice’ relationship that expresses the τx’y’, τy’z’ and τz’x’ values in terms of τxy, τyz, just like there was some ‘nice’ relationship between the x’, y’ and z’ components of a vector in one coordination system (the x’, y’ and z’ coordinate system) and the x, y, z components of that same vector in the x, y and z coordinate system. Now, I won’t go into the details but that ‘nice’ relationship is, in fact, given by transformation expressions involving a rotation matrix. I won’t write that one down here, because it looks pretty formidable, but just google ‘axis-angle representation of a rotation’ and you’ll get all the details you want.

The point to note is that, in both sets of equations above, we have an x-, y- and z-component of some mathematical vector that transform just like a ‘real’ vector. Now, if it behaves like a vector, we’ll just call it a vector, and that’s how, in essence, we define torque, angular momentum (and angular velocity too) as axial vectors. We should note how it works exactly though:

(1) τxy and τx’y’ will transform like the z-component of a vector (note that we were talking rotational motion about the z-axis when introducing this quantity);

(2) τyz and τy’z’ will transform like the x-component of a vector (note that we were talking rotational motion about the x-axis when introducing this quantity);

(3) τzx and τz’x’ will transform like the y-component of a vector (note that we were talking rotation motion when introducing this quantity). So we have

τ = (τyz, τzx, τxy) = (τx, τy, τz) with

τx = τyz = yFz – zFy

τy = τzx = zFx – xFz

τz = τxy = xFy – yFx.

[This may look very difficult to remember but just look at the order: all we do is respect the clockwise order x, y, z, x, y, z, x, etc. when jotting down the x, y and z subscripts.]

Now we are, finally, well equipped to once again look at that vector representation of rotation. I reproduce it once again below so you don’t have to scroll back to that animation:


We have rotation in the zx-plane here (i.e. rotation about the y-axis) driven by an oscillating force F, and so, yes, we can see that the torque vector oscillates along the y-axis only: its x- and z-components are zero. We also have L here, the angular momentum. That’s a vector quantity as well. We can write it as

L = (Lyz, Lzx, Lxy) = (Lx, Ly, Lz) with

Lx = Lyz = ypz – zpy (i.e. the angular momentum about the x-axis)

Ly = Lzx = zpx – xpz (i.e. the angular momentum about the y-axis)

Lz = Lxy = xpy – ypx (i.e. the angular momentum about the z-axis),

And we note, once again, that only the y-component is non-zero in this case, because the rotation is about the y-axis.

We should now remember the rules for a cross product. Above, we wrote that τ = r´F =|r|×|F|×sina×n = = r×F×sina×n with n the normal unit vector given by the right-hand rule. However, a vector product can also be written in terms of its components: c = a´b if and only

cx = aybz – azby,

cy = azbx – axbz, and

cz = axby – aybx.

Again, if this looks difficult, remember the trick above: respect the clockwise order when jotting down the x, y and z subscripts. I’ll leave it to you to work out r´F and r´p in terms of components but, when you write it all out, you’ll see it corresponds to the formulas above. In addition, I will also leave it to you to show that the velocity of some particle in a rotating body can be given by a similar vector product: v = ω´r, with ω being defined as another axial vector (aka pseudovector) pointing along the direction of the axis of rotation, i.e. not in the direction of motion. [Is that strange? No. As it’s rotational motion, there is no ‘direction of motion’ really: the object, or any particle in that object, goes round and round and round indeed and, hence, defining some normal vector using the right-hand rule to denote angular velocity makes a lot of sense.]

I could continue to write and write and write, but I need to stop here. Indeed, I actually wanted to tell you how gyroscopes work, but I notice that this introduction has already taken several pages. Hence, I’ll leave the gyroscope for a separate post. So, be warned, you’ll need to read and understand this one before reading my next one.