The reality of the wavefunction

If you haven’t read any of my previous posts on the geometry of the wavefunction (this link goes to the most recent one of them), then don’t attempt to read this one. It brings too much stuff together to be comprehensible. In fact, I am not even sure if I am going to understand what I write myself. 🙂 [OK. Poor joke. Acknowledged.]

Just to recap the essentials, I part ways with mainstream physicists in regard to the interpretation of the wavefunction. For mainstream physicists, the wavefunction is just some mathematical construct. Nothing real. Of course, I acknowledge mainstream physicists have very good reasons for that, but… Well… I believe that, if there is interference, or diffraction, then something must be interfering, or something must be diffracting. I won’t dwell on this because… Well… I have done that too many times already. My hypothesis is that the wavefunction is, in effect, a rotating field vector, so it’s just like the electric field vector of a (circularly polarized) electromagnetic wave (illustrated below).

Of course, it must be different, and it is. First, the (physical) dimension of the field vector of the matter-wave must be different. So what is it? Well… I am tempted to associate the real and imaginary component of the wavefunction with a force per unit mass (as opposed to the force per unit charge dimension of the electric field vector). Of course, the newton/kg dimension reduces to the dimension of acceleration (m/s2), so that’s the dimension of a gravitational field.

Second, I also am tempted to think that this gravitational disturbance causes an electron (or any matter-particle) to move about some center, and I believe it does so at the speed of light. In contrast, electromagnetic waves do not involve any mass: they’re just an oscillating field. Nothing more. Nothing less. Why would I believe there must still be some pointlike particle involved? Well… As Feynman puts it: “When you do find the electron some place, the entire charge is there.” (Feynman’s Lectures, III-21-4) So… Well… That’s why.

The third difference is one that I thought of only recently: the plane of the oscillation cannot be perpendicular to the direction of motion of our electron, because then we can’t explain the direction of its magnetic moment, which is either up or down when traveling through a Stern-Gerlach apparatus. I am more explicit on that in the mentioned post, so you may want to check there. 🙂

I wish I mastered the software to make animations such as the one above (for which I have to credit Wikipedia), but so I don’t. You’ll just have to imagine it. That’s great mental exercise, so… Well… Just try it. 🙂

Let’s now think about rotating reference frames and transformations. If the z-direction is the direction along which we measure the angular momentum (or the magnetic moment), then the up-direction will be the positive z-direction. We’ll also assume the y-direction is the direction of travel of our elementary particle—and let’s just consider an electron here so we’re more real. 🙂 So we’re in the reference frame that Feynman used to derive the transformation matrices for spin-1/2 particles (or for two-state systems in general). His ‘improved’ Stern-Gerlach apparatus—which I’ll refer to as a beam splitter—illustrates this geometry.

Modified Stern-Gerlach

So I think the magnetic moment—or the angular momentum, really—comes from an oscillatory motion in the x– and y-directions. One is the real component (the cosine function) and the other is the imaginary component (the sine function), as illustrated below. Circle_cos_sin

So the crucial difference with the animations above (which illustrate left- and a right-handed polarization respectively) is that we, somehow, need to imagine the circular motion is not in the xz-plane, but in the yz-plane. Now what happens if we change the reference frame?

Well… That depends on what you mean by changing the reference frame. Suppose we’re looking in the positive y-direction—so that’s the direction in which our particle is moving—, then we might imagine how it would look like when we would make a 180° turn and look at the situation from the other side, so to speak. Now, I did a post on that earlier this year, which you may want to re-read. When we’re looking at the same thing from the other side (from the back side, so to speak), we will want to use our familiar reference frame. So we will want to keep the z-axis as it is (pointing upwards), and we will also want to define the x– and y-axis using the familiar right-hand rule for defining a coordinate frame. So our new x-axis and our new y-axis will the same as the old x- and y-axes but with the sign reversed. In short, we’ll have the following mini-transformation: (1) z‘ = z, (2) x’ = −x, and (3) y’ = −y.

So… Well… If we’re effectively looking at something real that was moving along the y-axis, then it will now still be moving along the y’-axis, but in the negative direction. Hence, our elementary wavefunction eiθ = cosθ + i·sinθ will transform into −cosθ − i·sinθ = −cosθ − i·sinθ = cosθ − i·sinθ. It’s the same wavefunction. We just… Well… We just changed our reference frame. We didn’t change reality.

Now you’ll cry wolf, of course, because we just went through all that transformational stuff in our last post. To be specific, we presented the following transformation matrix for a rotation along the z-axis:rotation matrix

Now, if φ is equal to 180° (so that’s π in radians), then these eiφ/2 and eiφ/2/√2 factors are equal to eiπ/2 = +i and eiπ/2 = −i respectively. Hence, our eiθ = cosθ + i·sinθ becomes…

Hey ! Wait a minute ! We’re talking about two very different things here, right? The eiθ = cosθ + i·sinθ is an elementary wavefunction which, we presume, describes some real-life particle—we talked about an electron with its spin in the up-direction—while these transformation matrices are to be applied to amplitudes describing… Well… Either an up– or a down-state, right?

Right. But… Well… Is it so different, really? Suppose our eiθ = cosθ + i·sinθ wavefunction describes an up-electron, then we still have to apply that eiφ/2 = eiπ/2 = +i factor, right? So we get a new wavefunction that will be equal to eiφ/2·eiθ = eiπ/2·eiθ = +i·eiθ = i·cosθ + i2·sinθ = sinθ − i·cosθ, right? So how can we reconcile that with the cosθ − i·sinθ function we thought we’d find?

We can’t. So… Well… Either my theory is wrong or… Well… Feynman can’t be wrong, can he? I mean… It’s not only Feynman here. We’re talking all mainstream physicists here, right?

Right. But think of it. Our electron in that thought experiment does, effectively, make a turn of 180°, so it is going in the other direction now ! That’s more than just… Well… Going around the apparatus and looking at stuff from the other side.

Hmm… Interesting. Let’s think about the difference between the sinθ − i·cosθ and cosθ − i·sinθ functions. First, note that they will give us the same probabilities: the square of the absolute value of both complex numbers is the same. [It’s equal to 1 because we didn’t bother to put a coefficient in front.] Secondly, we should note that the sine and cosine functions are essentially the same. They just differ by a phase factor: cosθ = sin(θ + π/2) and −sinθ = cos(θ + π/2). Let’s see what we can do with that. We can write the following, for example:

sinθ − i·cosθ = −cos(θ + π/2) − i·sin(θ + π/2) = −[cos(θ + π/2) + i·sin(θ + π/2)] = −ei·(θ + π/2)

Well… I guess that’s something at least ! The ei·θ and −ei·(θ + π/2) functions differ by a phase shift and a minus sign so… Well… That’s what it takes to reverse the direction of an electron. 🙂 Let us mull over that in the coming days. As I mentioned, these more philosophical topics are not easily exhausted. 🙂

Transforming amplitudes for spin-1/2 particles

Some say it is not possible to fully understand quantum-mechanical spin. Now, I do agree it is difficult, but I do not believe it is impossible. That’s why I wrote so many posts on it. Most of these focused on elaborating how the classical view of how a rotating charge precesses in a magnetic field might translate into the weird world of quantum mechanics. Others were more focused on the corollary of the quantization of the angular momentum, which is that, in the quantum-mechanical world, the angular momentum is never quite all in one direction only—so that explains some of the seemingly inexplicable randomness in particle behavior.

Frankly, I think those explanations help us quite a bit already but… Well… We need to go the extra mile, right? In fact, that’s drives my search for a geometric (or physical) interpretation of the wavefunction: the extra mile. 🙂

Now, in one of these many posts on spin and angular momentum, I advise my readers – you, that is – to try to work yourself through Feynman’s 6th Lecture on quantum mechanics, which is highly abstract and, therefore, usually skipped. [Feynman himself told his students to skip it, so I am sure that’s what they did.] However, if we believe the physical (or geometric) interpretation of the wavefunction that we presented in previous posts is, somehow, true, then we need to relate it to the abstract math of these so-called transformations between representations. That’s what we’re going to try to do here. It’s going to be just a start, and I will probably end up doing several posts on this but… Well… We do have to start somewhere, right? So let’s see where we get today. 🙂

The thought experiment that Feynman uses throughout his Lecture makes use of what Feynman’s refers to as modified or improved Stern-Gerlach apparatuses. They allow us to prepare a pure state or, alternatively, as Feynman puts it, to analyze a state. In theory, that is. The illustration below present a side and top view of such apparatus. We may already note that the apparatus itself—or, to be precise, our perspective of it—gives us two directions: (1) the up direction, so that’s the positive direction of the z-axis, and (2) the direction of travel of our particle, which coincides with the positive direction of the y-axis. [This is obvious and, at the same time, not so obvious, but I’ll talk about that in my next post. In this one, we basically need to work ourselves through the math, so we don’t want to think too much about philosophical stuff.]

Modified Stern-Gerlach

The kind of questions we want to answer in this post are variants of the following basic one: if a spin-1/2 particle (let’s think of an electron here, even if the Stern-Gerlach experiment is usually done with an atomic beam) was prepared in a given condition by one apparatus S, say the +S state, what is the probability (or the amplitude) that it will get through a second apparatus T if that was set to filter out the +T state?

The result will, of course, depend on the angles between the two apparatuses S and T, as illustrated below. [Just to respect copyright, I should explicitly note here that all illustrations are taken from the mentioned Lecture, and that the line of reasoning sticks close to Feynman’s treatment of the matter too.]

basic set-up

We should make a few remarks here. First, this thought experiment assumes our particle doesn’t get lost. That’s obvious but… Well… If you haven’t thought about this possibility, I suspect you will at some point in time. So we do assume that, somehow, this particle makes a turn. It’s an important point because… Well… Feynman’s argument—who, remember, represents mainstream physics—somehow assumes that doesn’t really matter. It’s the same particle, right? It just took a turn, so it’s going in some other direction. That’s all, right? Hmm… That’s where I part ways with mainstream physics: the transformation matrices for the amplitudes that we’ll find here describe something real, I think. It’s not just perspective: something happened to the electron. That something does not only change the amplitudes but… Well… It describes a different electron. It describes an electron that goes in a different direction now. But… Well… As said, these are reflections I will further develop in my next post. 🙂 Let’s focus on the math here. The philosophy will follow later. 🙂 Next remark.

Second, we assume the (a) and (b) illustrations above represent the same physical reality because the relative orientation between the two apparatuses, as measured by the angle α, is the same. Now that is obvious, you’ll say, but, as Feynman notes, we can only make that assumption because experiments effectively confirm that spacetime is, effectively, isotropic. In other words, there is no aether allowing us to establish some sense of absolute direction. Directions are relativerelative to the observer, that is… But… Well… Again, in my next post, I’ll argue that it’s not because directions are relative that they are, somehow, not real. Indeed, in my humble opinion, it does matter whether an electron goes here or, alternatively, there. These two different directions are not just two different coordinate frames. But… Well… Again. The philosophy will follow later. We need to stay focused on the math here.

Third and final remark. This one is actually very tricky. In his argument, Feynman also assumes the two set-ups below are, somehow, equivalent.

equivalent set-up

You’ll say: Huh? If not, say it! Huh? 🙂 Yes. Good. Huh? Feynman writes equivalentnot the same because… Well… They’re not the same, obviously:

  1. In the first set-up (a), T is wide open, so the apparatus is not supposed to do anything with the beam: it just splits and re-combines it.
  2. In set-up (b) the T apparatus is, quite simply, not there, so… Well… Again. Nothing is supposed to happen with our particles as they come out of S and travel to U.

The fundamental idea here is that our spin-1/2 particle (again, think of an electron here) enters apparatus U in the same state as it left apparatus S. In both set-ups, that is! Now that is a very tricky assumption, because… Well… While the net turn of our electron is the same, it is quite obvious it has to take two turns to get to U in (a), while it only takes one turn in (b). And so… Well… You can probably think of other differences too. So… Yes. And no. Same-same but different, right? 🙂

Right. That is why Feynman goes out of his way to explain the nitty-gritty behind: he actually devotes a full page in small print on this, which I’ll try to summarize in just a few paragraphs here. [And, yes, you should check my summary against Feynman’s actual writing on this.] It’s like this. While traveling through apparatus T in set-up (a), time goes by and, therefore, the amplitude would be different by some phase factor δ. [Feynman doesn’t say anything about this, but… Well… In the particle’s own frame of reference, this phase factor depend on the energy, the momentum and the time and distance traveled. Think of the argument of the elementary wavefunction here: θ = (E∙t – px)/ħ).] Now, if we believe that the amplitude is just some mathematical construct—so that’s what mainstream physicists (not me!) believe—then we could effectively say that the physics of (a) and (b) are the same, as Feynman does. In fact, let me quote him here:

“The physics of set-up (a) and (b) should be the same but the amplitudes could be different by some phase factor without changing the result of any calculation about the real world.”

Hmm… It’s one of those mysterious short passages where we’d all like geniuses like Feynman (or Einstein, or whomever) to be more explicit on their world view: if the amplitudes are different, can the physics really be the same? I mean… Exactly the same? It all boils down to that unfathomable belief that, somehow, the particle is real but the wavefunction that describes it, is not. Of course, I admit that it’s true that choosing another zero point for the time variable would also change all amplitudes by a common phase factor and… Well… That’s something that I consider to be not real. But… Well… The time and distance traveled in the apparatus is the time and distance traveled in the apparatus, right?

Bon… I have to stay away from these questions as for now—we need to move on with the math here—but I will come back to it later. But… Well… Talking math, I should note a very interesting mathematical point here. We have these transformation matrices for amplitudes, right? Well… Not yet. In fact, the coefficient of these matrices are exactly what we’re going to try to derive in this post, but… Well… Let’s assume we know them already. 🙂 So we have a 2-by-2 matrix to go from S to T, from T to U, and then one to go from S to U without going through T, which we can write as RSTRTU,  and RSU respectively. Adding the subscripts for the base states in each representation, the equivalence between the (a) and (b) situations can then be captured by the following formula:

phase factor

So we have that phase factor here: the left- and right-hand side of this equation is, effectively, same-same but different, as they would say in Asia. 🙂 Now, Feynman develops a beautiful mathematical argument to show that the eiδ factor effectively disappears if we convert our rotation matrices to some rather special form that is defined as follows:


I won’t copy his argument here, but I’d recommend you go over it because it is wonderfully easy to follow and very intriguing at the same time. [Yes. Simple things can be very intriguing.] Indeed, the calculation below shows that the determinant of these special rotation matrices will be equal to 1.

det is one

So… Well… So what? You’re right. I am being sidetracked here. The point is that, if we put all of our rotation matrices in this special form, the eiδ factor vanishes and the formula above reduces to:

reduced formula

So… Yes. End of excursion. Let us remind ourselves of what it is that we are trying to do here. As mentioned above, the kind of questions we want to answer will be variants of the following basic one: if a spin-1/2 particle was prepared in a given condition by one apparatus (S), say the +S state, what is the probability (or the amplitude) that it will get through a second apparatus (T) if that was set to filter out the +T state?

We said the result would depend on the angles between the two apparatuses S and T. I wrote: angles—plural. Why? Because a rotation will generally be described by the three so-called Euler angles:  α, β and γ. Now, it is easy to make a mistake here, because there is a sequence to these so-called elemental rotations—and right-hand rules, of course—but I will let you figure that out. 🙂

The basic idea is the following: if we can work out the transformation matrices for each of these elemental rotations, then we can combine them and find the transformation matrix for any rotation. So… Well… That fills most of Feynman’s Lecture on this, so we don’t want to copy all that. We’ll limit ourselves to the logic for a rotation about the z-axis, and then… Well… You’ll see. 🙂

So… The z-axis… We take that to be the direction along which we are measuring the angular momentum of our electron, so that’s the direction of the (magnetic) field gradient, so that’s the up-axis of the apparatus. In the illustration below, that direction points out of the page, so to speak, because it is perpendicular to the direction of the x– and the y-axis that are shown. Note that the y-axis is the initial direction of our beam.

rotation about z

Now, because the (physical) orientation of the fields and the field gradients of S and T is the same, Feynman says that—despite the angle—the probability for a particle to be up or down with regard to and T respectively should be the same. Well… Let’s be fair. He does not only say that: experiment shows it to be true. [Again, I am tempted to interject here that it is not because the probabilities for (a) and (b) are the same, that the reality of (a) and (b) is the same, but… Well… You get me. That’s for the next post. Let’s get back to the lesson here.] The probability is, of course, the square of the absolute value of the amplitude, which we will denote as C+C, C’+, and C’ respectively. Hence, we can write the following:

same probabilities

Now, the absolute values (or the magnitudes) are the same, but the amplitudes may differ. In fact, they must be different by some phase factor because, otherwise, we would not be able to distinguish the two situations, which are obviously different. As Feynman, finally, admits himself—jokingly or seriously: “There must be some way for a particle to know that it has turned the corner at P1.” [P1 is the midway point between and in the illustration, of course—not some probability.]

So… Well… We write:

C’+ = eiλ ·C+ and C’ = eiμ ·C

Now, Feynman notes that an equal phase change in all amplitudes has no physical consequence (think of re-defining our t0 = 0 point), so we can add some arbitrary amount to both λ and μ without changing any of the physics. So then we can choose this amount as −(λ + μ)/2. We write:

subtracting a number

Now, it shouldn’t you too long to figure out that λ’ is equal to λ’ = λ/2 + μ/2 = −μ’. So… Well… Then we can just adopt the convention that λ = −μ. So our C’+ = eiλ ·C+ and C’ = eiμ ·C equations can now be written as:

C’+ = eiλ ·C+ and C’ = eiλ·C

The absolute values are the same, but the phases are different. Right. OK. Good move. What’s next?

Well… The next assumption is that the phase shift λ is proportional to the angle (α) between the two apparatuses. Hence, λ is equal to λ = m·α, and we can re-write the above as:

C’+ = ei·C+ and C’ = ei·C

Now, this assumption may or may not seem reasonable. Feynman justifies it with a continuity argument, arguing any rotation can be built up as a sequence of infinitesimal rotations and… Well… Let’s not get into the nitty-gritty here. [If you want it, check Feynman’s Lecture itself.] Back to the main line of reasoning. So we’ll assume we can write λ as λ = m·α. The next question then is: what is the value for m? Now, we obviously do get exactly the same physics if we rotate by 360°, or 2π radians. So we might conclude that the amplitudes should be the same and, therefore, that ei = eim·2π has to be equal to one, so C’+ = C+ and C’ = C . That’s the case if m is equal to 1. But… Well… No. It’s the same thing again: the probabilities (or the magnitudes) have to be the same, but the amplitudes may be different because of some phase factor. In fact, they should be different. If m = 1/2, then we also get the same physics, even if the amplitudes are not the same. They will be each other’s opposite:

same physical state

Huh? Yes. Think of it. The coefficient of proportionality (m) cannot be equal to 1. If it would be equal to 1, and we’d rotate by 180° only, then we’d also get those C’+ = −C+ and C’ = −C equations, and so these coefficients would, therefore, also describe the same physical situation. Now, you will understand, intuitively, that a rotation of the apparatus by 180° will not give us the same physical situation… So… Well… In case you’d want a more formal argument proving a rotation by 180° does not give us the same physical situation, Feynman has one for you. 🙂

I know that, by now, you’re totally tired and bored, and so you only want the grand conclusion at this point. Well… All of what I wrote above should, hopefully, help you to understand that conclusion, which – I quote Feynman here – is the following:

If we know the amplitudes C+ and C of spin one-half particles with respect to a reference frame S, and we then use new base states, defined with respect to a reference frame T which is obtained from S by a rotation α around the z-axis, the new amplitudes are given in terms of the old by the following formulas:


[Feynman denotes our angle α by phi (φ) because… He uses the Euler angles a bit differently. But don’t worry: it’s the same angle.]

What about the amplitude to go from the C to the C’+ state, and from the C+ to the C’ state? Well… That amplitude is zero. So the transformation matrix is this one:

rotation matrix

Let’s take a moment and think about this. Feynman notes the following, among other things: “It is very curious to say that if you turn the apparatus 360° you get new amplitudes. [They aren’t really new, though, because the common change of sign doesn’t give any different physics.] But if something has been rotated by a sequence of small rotations whose net result is to return it to the original orientation, then it is possible to define the idea that it has been rotated 360°—as distinct from zero net rotation—if you have kept track of the whole history.”

This is very deep. It connects space and time into one single geometric space, so to speak. But… Well… I’ll try to explain this rather sweeping statement later. Feynman also notes that a net rotation of 720° does give us the same amplitudes and, therefore, cannot be distinguished from the original orientation. Feynman finds that intriguing but… Well… I am not sure if it’s very significant. I do note some symmetries in quantum physics involve 720° rotations but… Well… I’ll let you think about this. 🙂

Note that the determinant of our matrix is equal to a·b·ceiφ/2·eiφ/2 = 1. So… Well… Our rotation matrix is, effectively, in that special form! How comes? Well… When equating λ = −μ, we are effectively putting the transformation into that special form.  Let us also, just for fun, quickly check the normalization condition. It requires that the probabilities, in any given representation, add to up to one. So… Well… Do they? When they come out of S, our electrons are equally likely to be in the up or down state. So the amplitudes are 1/√2. [To be precise, they are ±1/√2 but… Well… It’s the phase factor story once again.] That’s normalized: |1/√2|2 + |1/√2|2 = 1. The amplitudes to come out of the apparatus in the up or down state are eiφ/2/√2 and eiφ/2/√2 respectively, so the probabilities add up to |eiφ/2/√2|2 + |eiφ/2/√2|2 = … Well… It’s 1. Check it. 🙂

Let me add an extra remark here. The normalization condition will result in matrices whose determinant will be equal to some pure imaginary exponential, like eiα. So is that what we have here? Yes. We can re-write 1 as 1 = ei·0 = e0, so α = 0. 🙂 Capito? Probably not, but… Well… Don’t worry about it. Just think about the grand results. As Feynman puts it, this Lecture is really “a sort of cultural excursion.” 🙂

Let’s do a practical calculation here. Let’s suppose the angle is, effectively, 180°. So the eiφ/2 and eiφ/2/√2 factors are equal to eiπ/2 = +i and eiπ/2 = −i, so… Well… What does that mean—in terms of the geometry of the wavefunction? Hmm… We need to do some more thinking about the implications of all this transformation business for our geometric interpretation of he wavefunction, but so we’ll do that in our next post. Let us first work our way out of this rather hellish transformation logic. 🙂 [See? I do admit it is all quite difficult and abstruse, but… Well… We can do this, right?]

So what’s next? Well… Feynman develops a similar argument (I should say same-same but different once more) to derive the coefficients for a rotation of ±90° around the y-axis. Why 90° only? Well… Let me quote Feynman here, as I can’t sum it up more succinctly than he does: “With just two transformations—90° about the y-axis, and an arbitrary angle about the z-axis [which we described above]—we can generate any rotation at all.”

So how does that work? Check the illustration below. In Feynman’s words again: “Suppose that we want the angle α around x. We know how to deal with the angle α α around z, but now we want it around x. How do we get it? First, we turn the axis down onto x—which is a rotation of +90°. Then we turn through the angle α around z’. Then we rotate 90° about y”. The net result of the three rotations is the same as turning around x by the angle α. It is a property of space.”

full rotation

Besides helping us greatly to derive the transformation matrix for any rotation, the mentioned property of space is rather mysterious and deep. It sort of reduces the degrees of freedom, so to speak. Feynman writes the following about this:

“These facts of the combinations of rotations, and what they produce, are hard to grasp intuitively. It is rather strange, because we live in three dimensions, but it is hard for us to appreciate what happens if we turn this way and then that way. Perhaps, if we were fish or birds and had a real appreciation of what happens when we turn somersaults in space, we could more easily appreciate such things.”

In any case, I should limit the number of philosophical interjections. If you go through the motions, then you’ll find the following elemental rotation matrices:

full set of rotation matrices

What about the determinants of the Rx(φ) and Ry(φ) matrices? They’re also equal to one, so… Yes. A pure imaginary exponential, right? 1 = ei·0 = e0. 🙂

What’s next? Well… We’re done. We can now combine the elemental transformations above in a more general format, using the standardized Euler angles. Again, just go through the motions. The Grand Result is:

euler transformatoin

Does it give us normalized amplitudes? It should, but it looks like our determinant is going to be a much more complicated complex exponential. 🙂 Hmm… Let’s take some time to mull over this. As promised, I’ll be back with more reflections in my next post.

Quantum math: the rules – all of them! :-)

In my previous post, I made no compromise, and used all of the rules one needs to calculate quantum-mechanical stuff:


However, I didn’t explain them. These rules look simple enough, but let’s analyze them now. They’re simple and not at the same time, indeed.

[I] The first equation uses the Kronecker delta, which sounds fancy but it’s just a simple shorthand: δij = δji is equal to 1 if i = j, and zero if i ≠ j, with and j representing base states. Equation (I) basically says that base states are all different. For example, the angular momentum in the x-direction of a spin-1/2 particle – think of an electron or a proton – is either +ħ/2 or −ħ/2, not something in-between, or some mixture. So 〈 +x | +x 〉 = 〈 −x | −x 〉 = 1 and 〈 +x | −x 〉 = 〈 −x | +x 〉 = 0.

We’re talking base states here, of course. Base states are like a coordinate system: we settle on an x-, y- and z-axis, and a unit, and any point is defined in terms of an x-, y– and z-number. It’s the same here, except we’re talking ‘points’ in four-dimensional spacetime. To be precise, we’re talking constructs evolving in spacetime. To be even more precise, we’re talking amplitudes with a temporal as well as a spatial frequency, which we’ll often represent as:

ei·θ ei·(ω·t − k ∙x) = a·e(i/ħ)·(E·t − px)

The coefficient in front (a) is just a normalization constant, ensuring all probabilities add up to one. It may not be a constant, actually: perhaps it just ensure our amplitude stays within some kind of envelope, as illustrated below.

Photon wave

As for the ω = E/ħ and k = p/ħ identities, these are the de Broglie equations for a matter-wave, which the young Comte jotted down as part of his 1924 PhD thesis. He was inspired by the fact that the E·t − px factor is an invariant four-vector product (E·t − px = pμxμ) in relativity theory, and noted the striking similarity with the argument of any wave function in space and time (ω·t − k ∙x) and, hence, couldn’t resist equating both. Louis de Broglie was inspired, of course, by the solution to the blackbody radiation problem, which Max Planck and Einstein had convincingly solved by accepting that the ω = E/ħ equation holds for photons. As he wrote it:

“When I conceived the first basic ideas of wave mechanics in 1923–24, I was guided by the aim to perform a real physical synthesis, valid for all particles, of the coexistence of the wave and of the corpuscular aspects that Einstein had introduced for photons in his theory of light quanta in 1905.” (Louis de Broglie, quoted in Wikipedia)

Looking back, you’d of course want the phase of a wavefunction to be some invariant quantity, and the examples we gave our previous post illustrate how one would expect energy and momentum to impact its temporal and spatial frequency. But I am digressing. Let’s look at the second equation. However, before we move on, note that minus sign in the exponent of our wavefunction: a·ei·θ. The phase turns counter-clockwise. That’s just the way it is. I’ll come back to this.

[II] The φ and χ symbols do not necessarily represent base states. In fact, Feynman illustrates this law using a variety of examples including both polarized as well as unpolarized beams, or ‘filtered’ as well as ‘unfiltered’ states, as he calls it in the context of the Stern-Gerlach apparatuses he uses to explain what’s going on. Let me summarize his argument here.

I discussed the Stern-Gerlach experiment in my post on spin and angular momentum, but the Wikipedia article on it is very good too. The principle is illustrated below: a inhomogeneous magnetic field – note the direction of the gradient ∇B = (∂B/∂x, ∂B/∂y, ∂B/∂z) – will split a beam of spin-one particles into three beams. [Matter-particles with spin one are rather rare (Lithium-6 is an example), but three states (rather than two only, as we’d have when analyzing spin-1/2 particles, such as electrons or protons) allow for more play in the analysis. 🙂 In any case, the analysis is easily generalized.]

stern-gerlach simple

The splitting of the beam is based, of course, on the quantized angular momentum in the z-direction (i.e. the direction of the gradient): its value is either ħ, 0, or −ħ. We’ll denote these base states as +, 0 or −, and we should note they are defined in regard to an apparatus with a specific orientation. If we call this apparatus S, then we can denote these base states as +S, 0S and −S respectively.

The interesting thing in Feynman’s analysis is the imagined modified Stern-Gerlach apparatus, which – I am using Feynman‘s words here 🙂 –  “puts Humpty Dumpty back together.” It looks a bit monstruous, but it’s easy enough to understand. Quoting Feynman once more: “It consists of a sequence of three high-gradient magnets. The first one (on the left) is just the usual Stern-Gerlach magnet and splits the incoming beam of spin-one particles into three separate beams. The second magnet has the same cross section as the first, but is twice as long and the polarity of its magnetic field is opposite the field in magnet 1. The second magnet pushes in the opposite direction on the atomic magnets and bends their paths back toward the axis, as shown in the trajectories drawn in the lower part of the figure. The third magnet is just like the first, and brings the three beams back together again, so that leaves the exit hole along the axis.”

stern-gerlach modified

Now, we can use this apparatus as a filter by inserting blocking masks, as illustrated below.


But let’s get back to the lesson. What about the second ‘Law’ of quantum math? Well… You need to be able to imagine all kinds of situations now. The rather simple set-up below is one of them: we’ve got two of these apparatuses in series now, S and T, with T tilted at the angle α with respect to the first.


I know: you’re getting impatient. What about it? Well… We’re finally ready now. Let’s suppose we’ve got three apparatuses in series, with the first and the last one having the very same orientation, and the one in the middle being tilted. We’ll denote them by S, T and S’ respectively. We’ll also use masks: we’ll block the 0 and − state in the S-filter, like in that illustration above. In addition, we’ll block the + and − state in the T apparatus and, finally, the 0 and − state in the S’ apparatus. Now try to imagine what happens: how many particles will get through?


Just try to think about it. Make some drawing or something. Please!  


OK… The answer is shown below. Despite the filtering in S, the +S particles that come out do have an amplitude to go through the 0T-filter, and so the number of atoms that come out will be some fraction (α) of the number of atoms (N) that came out of the +S-filter. Likewise, some other fraction (β) will make it through the +S’-filter, so we end up with βαN particles.

ratio 2

Now, I am sure that, if you’d tried to guess the answer yourself, you’d have said zero rather than βαN but, thinking about it, it makes sense: it’s not because we’ve got some angular momentum in one direction that we have none in the other. When everything is said and done, we’re talking components of the total angular momentum here, don’t we? Well… Yes and no. Let’s remove the masks from T. What do we get?


Come on: what’s your guess? N?

[…] You’re right. It’s N. Perfect. It’s what’s shown below.

ratio 3

Now, that should boost your confidence. Let’s try the next scenario. We block the 0 and − state in the S-filter once again, and the + and − state in the T apparatus, so the first two apparatuses are the same as in our first example. But let’s change the S’ apparatus: let’s close the + and − state there now. Now try to imagine what happens: how many particles will get through?


Come on! You think it’s a trap, isn’t it? It’s not. It’s perfectly similar: we’ve got some other fraction here, which we’ll write as γαN, as shown below.

ratio 1Next scenario: S has the 0 and − gate closed once more, and T is fully open, so it has no masks. But, this time, we set S’ so it filters the 0-state with respect to it. What do we get? Come on! Think! Please!


The answer is zero, as shown below.

ratio 4

Does that make sense to you? Yes? Great! Because many think it’s weird: they think the T apparatus must ‘re-orient’ the angular momentum of the particles. It doesn’t: if the filter is wide open, then “no information is lost”, as Feynman puts it. Still… Have a look at it. It looks like we’re opening ‘more channels’ in the last example: the S and S’ filter are the same, indeed, and T is fully open, while it selected for 0-state particles before. But no particles come through now, while with the 0-channel, we had γαN.

Hmm… It actually is kinda weird, won’t you agree? Sorry I had to talk about this, but it will make you appreciate that second ‘Law’ now: we can always insert a ‘wide-open’ filter and, hence, split the beams into a complete set of base states − with respect to the filter, that is − and bring them back together provided our filter does not produce any unequal disturbances on the three beams. In short, the passage through the wide-open filter should not result in a change of the amplitudes. Again, as Feynman puts it: the wide-open filter should really put Humpty-Dumpty back together again. If it does, we can effectively apply our ‘Law’:

second law

For an example, I’ll refer you to my previous post. This brings me to the third and final ‘Law’.

[III] The amplitude to go from state φ to state χ is the complex conjugate of the amplitude to to go from state χ to state φ:

〈 χ | φ 〉 = 〈 φ | χ 〉*

This is probably the weirdest ‘Law’ of all, even if I should say, straight from the start, we can actually derive it from the second ‘Law’, and the fact that all probabilities have to add up to one. Indeed, a probability is the absolute square of an amplitude and, as we know, the absolute square of a complex number is also equal to the product of itself and its complex conjugate:

|z|= |z|·|z| = z·z*

[You should go through the trouble of reviewing the difference between the square and the absolute square of a complex number. Just write z as a + ib and calculate (a + ib)= a2 + 2ab+ b2 , as opposed to |z|= a2 + b2. Also check what it means when writing z as r·eiθ = r·(cosθ + i·sinθ).]

Let’s applying the probability rule to a two-filter set-up, i.e. the situation with the S and the tilted T filter which we described above, and let’s assume we’ve got a pure beam of +S particles entering the wide-open T filter, so our particles can come out in either of the three base states with respect to T. We can then write:

〈 +T | +S 〉+ 〈 0T | +S 〉+ 〈 −T | +S 〉= 1

⇔ 〈 +T | +S 〉〈 +T | +S 〉* + 〈 0T | +S 〉〈 0T | +S 〉* + 〈 −T | +S 〉〈 −T | +S 〉* = 1

Of course, we’ve got two other such equations if we start with a 0S or a −S state. Now, we take the 〈 χ | φ 〉 = ∑ 〈 χ | i 〉〈 i | φ 〉 ‘Law’, and substitute χ and φ for +S, and all states for the base states with regard to T. We get:

〈 +S | +S 〉 = 1 = 〈 +S | +T 〉〈 +T | +S 〉 + 〈 +S | 0T 〉〈 0T | +S 〉 + 〈 +S | –T 〉〈 −T | +S 〉

These equations are consistent only if:

〈 +S | +T 〉 = 〈 +T | +S 〉*,

〈 +S | 0T 〉 = 〈 0T | +S 〉*,

〈 +S | −T 〉 = 〈 −T | +S 〉*,

which is what we wanted to prove. One can then generalize to any state φ and χ. However, proving the result is one thing. Understanding it is something else. One can write down a number of strange consequences, which all point to Feynman‘s rather enigmatic comment on this ‘Law’: “If this Law were not true, probability would not be ‘conserved’, and particles would get ‘lost’.” So what does that mean? Well… You may want to think about the following, perhaps. It’s obvious that we can write:

|〈 φ | χ 〉|= 〈 φ | χ 〉〈 φ | χ 〉* = 〈 χ | φ 〉*〈 χ | φ 〉 = |〈 χ | φ 〉|2

This says that the probability to go from the φ-state to the χ-state  is the same as the probability to go from the χ-state to the φ-state.

Now, when we’re talking base states, that’s rather obvious, because the probabilities involved are either 0 or 1. However, if we substitute for +S and −T, or some more complicated states, then it’s a different thing. My guts instinct tells me this third ‘Law’ – which, as mentioned, can be derived from the other ‘Laws’ – reflects the principle of reversibility in spacetime, which you may also interpret as a causality principle, in the sense that, in theory at least (i.e. not thinking about entropy and/or statistical mechanics), we can reverse what’s happening: we can go back in spacetime.

In this regard, we should also remember that the complex conjugate of a complex number in polar form, i.e. a complex number written as r·eiθ, is equal to r·eiθ, so the argument in the exponent gets a minus sign. Think about what this means for our a·ei·θ ei·(ω·t − k ∙x) = a·e(i/ħ)·(E·t − pxfunction. Taking the complex conjugate of this function amounts to reversing the direction of t and x which, once again, evokes that idea of going back in spacetime.

I feel there’s some more fundamental principle here at work, on which I’ll try to reflect a bit more. Perhaps we can also do something with that relationship between the multiplicative inverse of a complex number and its complex conjugate, i.e. z−1 = z*/|z|2. I’ll check it out. As for now, however, I’ll leave you to do that, and please let me know if you’ve got any inspirational ideas on this. 🙂

So… Well… Goodbye as for now. I’ll probably talk about the Hamiltonian in my next post. I think we really did a good job in laying the groundwork for the really hardcore stuff, so let’s go for that now. 🙂

Post Scriptum: On the Uncertainty Principle and other rules

After writing all of the above, I realized I should add some remarks to make this post somewhat more readable. First thing: not all of the rules are there—obviously! Most notably, I didn’t say anything about the rules for adding or multiplying amplitudes, but that’s because I wrote extensively about that already, and so I assume you’re familiar with that. [If not, see my page on the essentials.]

Second, I didn’t talk about the Uncertainty Principle. That’s because I didn’t have to. In fact, we don’t need it here. In general, all popular accounts of quantum mechanics have an excessive focus on the position and momentum of a particle, while the approach in this and my previous post is quite different. Of course, it’s Feynman’s approach to QM really. Not ‘mine’. 🙂 All of the examples and all of the theory he presents in his introductory chapters in the Third Volume of Lectures, i.e. the volume on QM, are related to things like:

  • What is the amplitude for a particle to go from spin state +S to spin state −T?
  • What is the amplitude for a particle to be scattered, by a crystal, or from some collision with another particle, in the θ direction?
  • What is the amplitude for two identical particles to be scattered in the same direction?
  • What is the amplitude for an atom to absorb or emit a photon? [See, for example, Feynman’s approach to the blackbody radiation problem.]
  • What is the amplitude to go from one place to another?

In short, you read Feynman, and it’s only at the very end of his exposé, that he starts talking about the things popular books start with, such as the amplitude of a particle to be at point (x, t) in spacetime, or the Schrödinger equation, which describes the orbital of an electron in an atom. That’s where the Uncertainty Principle comes in and, hence, one can really avoid it for quite a while. In fact, one should avoid it for quite a while, because it’s now become clear to me that simply presenting the Uncertainty Principle doesn’t help all that much to truly understand quantum mechanics.

Truly understanding quantum mechanics involves understanding all of these weird rules above. To some extent, that involves dissociating the idea of the wavefunction with our conventional ideas of time and position. From the questions above, it should be obvious that ‘the’ wavefunction does actually not exist: we’ve got a wavefunction for anything we can and possibly want to measure. That brings us to the question of the base states: what are they?

Feynman addresses this question in a rather verbose section of his Lectures titled: What are the base states of the world? I won’t copy it here, but I strongly recommend you have a look at it. 🙂

I’ll end here with a final equation that we’ll need frequently: the amplitude for a particle to go from one place (r1) to another (r2). It’s referred to as a propagator function, for obvious reasons—one of them being that physicists like fancy terminology!—and it looks like this:


The shape of the e(i/ħ)·(pr12function is now familiar to you. Note the r12 in the argument, i.e. the vector pointing from r1 to r2. The pr12 dot product equals |p|∙|r12|·cosθ = p∙r12·cosθ, with θ the angle between p and r12. If the angle is the same, then cosθ is equal to 1. If the angle is π/2, then it’s 0, and the function reduces to 1/r12. So the angle θ, through the cosθ factor, sort of scales the spatial frequency. Let me try to give you some idea of how this looks like by assuming the angle between p and r12 is the same, so we’re looking at the space in the direction of the momentum only and |p|∙|r12|·cosθ = p∙r12. Now, we can look at the p/ħ factor as a scaling factor, and measure the distance x in units defined by that scale, so we write: x = p∙r12/ħ. The function then reduces to (ħ/p)·eix/x = (ħ/p)·cos(x)/x + i·(ħ/p)·sin(x)/x, and we just need to square this to get the probability. All of the graphs are drawn hereunder: I’ll let you analyze them. [Note that the graphs do not include the ħ/p factor, which you may look at as yet another scaling factor.] You’ll see – I hope! – that it all makes perfect sense: the probability quickly drops off with distance, both in the positive as well as in the negative x-direction, while it’s going to infinity when very near. [Note that the absolute square, using cos(x)/x and sin(x)/x yields the same graph as squaring 1/x—obviously!]


Working with amplitudes

Don’t worry: I am not going to introduce the Hamiltonian matrix—not yet, that is. But this post is going a step further than my previous ones, in the sense that it will be more abstract. At the same time, I do want to stick to real physical examples so as to illustrate what we’re doing when working with those amplitudes. The example that I am going to use involves spin. So let’s talk about that first.

Spin, angular momentum and the magnetic moment

You know spin: it allows experienced pool players to do the most amazing tricks with billiard balls, making a joke of what a so-called elastic collision is actually supposed to look like. So it should not come as a surprise that spin complicates the analysis in quantum mechanics too. We dedicated several posts to that (see, for example, my post on spin and angular momentum in quantum physics) and I won’t repeat these here. Let me just repeat the basics:

1. Classical and quantum-mechanical spin do share similarities: the basic idea driving the quantum-mechanical spin model is that of a electric charge – positive or negative – spinning about its own axis (this is often referred to as intrinsic spin) as well as having some orbital motion (presumably around some other charge, like an electron in orbit with a nucleus at the center). This intrinsic spin, and the orbital motion, give our charge some angular momentum (J) and, because it’s an electric charge in motion, there is a magnetic moment (μ). To put things simply: the classical and quantum-mechanical view of things converge in their analysis of atoms or elementary particles as tiny little magnets. Hence, when placed in an external magnetic field, there is some interaction – a force – and their potential and/or kinetic energy changes. The whole system, in fact, acquires extra energy when placed in an external magnetic field.

Note: The formula for that magnetic energy is quite straightforward, both in classical as well as in quantum physics, so I’ll quickly jot it down: U = −μB = −|μ|·|B|·cosθ = −μ·B·cosθ. So it’s just the scalar product of the magnetic moment and the magnetic field vector, with a minus sign in front so as to get the direction right. [θ is the angle between the μ and B vectors and determines whether U as a whole is positive or negative.

2. The classical and quantum-mechanical view also diverge, however. They diverge, first, because of the quantum nature of spin in quantum mechanics. Indeed, while the angular momentum can take on any value in classical mechanics, that’s not the case in quantum mechanics: in whatever direction we measure, we get a discrete set of values only. For example, the angular momentum of a proton or an electron is either −ħ/2 or +ħ/2, in whatever direction we measure it. Therefore, they are referred to as spin-1/2 particles. All elementary fermions, i.e. the particles that constitute matter (as opposed to force-carrying particles, like photons), have spin 1/2.

Note: Spin-1/2 particles include, remarkably enough, neutrons too, which has the same kind of magnetic moment that a rotating negative charge would have. The neutron, in other words, is not exactly ‘neutral’ in the magnetic sense. One can explain this by noting that a neutron is not ‘elementary’, really: it consists of three quarks, just like a proton, and, therefore, it may help you to imagine that the electric charges inside are, somehow, distributed unevenly—although physicists hate such simplifications. I am noting this because the famous Stern-Gerlach experiment, which established the quantum nature of particle spin, used silver atoms, rather than protons or electrons. More in general, we’ll tend to forget about the electric charge of the particles we’re describing, assuming, most of the time, or tacitly, that they’re neutral—which helps us to sort of forget about classical theory when doing quantum-mechanical calculations!

3. The quantum nature of spin is related to another crucial difference between the classical and quantum-mechanical view of the angular momentum and the magnetic moment of a particle. Classically, the angular momentum and the magnetic moment can have any direction.

Note: I should probably briefly remind you that J is a so-called axial vector, i.e. a vector product (as opposed to a scalar product) of the radius vector r and the (linear) momentum vector p = m·v, with v the velocity vector, which points in the direction of motion. So we write: J = r×p = r×m·v = |r|·|p|·sinθ·n. The n vector is the unit vector perpendicular to the plane containing r and (and, hence, v, of course) given by the right-hand rule. I am saying this to remind you that the direction of the magnetic moment and the direction of motion are not the same: the simple illustration below may help to see what I am talking about.]

atomic magnet

Back to quantum mechanics: the image above doesn’t work in quantum mechanics. We do not have an unambiguous direction of the angular momentum and, hence, of the magnetic moment. That’s where all of the weirdness of the quantum-mechanical concept of spin comes out, really. I’ll talk about that when discussing Feynman’s ‘filters’ – which I’ll do in a moment – but here I just want to remind you of the mathematical argument that I presented in the above-mentioned post. Just like in classical mechanics, we’ll have a maximum (and, hence, also a minimum) value for J, like +ħ, 0 and +ħ for a Lithium-6 nucleus. [I am just giving this rather special example of a spin-1 article so you’re reminded we can have particles with an integer spin number too!] So, when we measure its angular momentum in any direction really, it will take on one of these three values: +ħ, 0 or +ħ. So it’s either/or—nothing in-between. Now that leads to a funny mathematical situation: one would usually equate the maximum value of a quantity like this to the magnitude of the vector, which is equal to the (positive) square root of J2 = J= Jx2 + Jy2 + Jz2, with Jx, Jy and Jz the components of J in the x-, y- and z-direction respectively. But we don’t have continuity in quantum mechanics, and so the concept of a component of a vector needs to be carefully interpreted. There’s nothing definite there, like in classical mechanics: all we have is amplitudes, and all we can do is calculate probabilities, or expected values based on those amplitudes.

Huh? Yes. In fact, the concept of the magnitude of a vector itself becomes rather fuzzy: all we can do really is calculate its expected value. Think of it: in the classical world, we have a J2 = Jproduct that’s independent of the direction of J. For example, if J is all in the x-direction, then Jand Jwill be zero, and J2 = Jx2. If it’s all in the y-direction, then Jand Jwill be zero and all of the magnitude of J will be in the y-direction only, so we write: J2 = Jy2. Likewise, if J does not have any z-component, then our JJ product will only include the x- and y-components: JJ = Jx2 + Jy2. You get the idea: the J2 = Jproduct is independent of the direction of J exactly because, in classical mechanics, J actually has a precise and unambiguous magnitude and direction and, therefore, actually has a precise and unambiguous component in each direction. So we’d measure Jx, Jy, and Jand, regardless of the actual direction of J, we’d find its magnitude |J| = J = +√J2 = +(Jx2 + Jy2 + Jz2)1/2.

In quantum mechanics, we just don’t have quantities like that. We say that Jx, Jand Jhave an amplitude to take on a value that’s equal to +ħ, 0 or +ħ (or whatever other value is allowed by the spin number of the system). Now that we’re talking spin numbers, please note that this characteristic number is usually denoted by j, which is a bit confusing, but so be it. So can be 0, 1/2, 1, 3/2, etcetera, and the number of ‘permitted values’ is 2j + 1 values, with each value being separated by an amount equal to ħ. So we have 1, 2, 3, 4, 5 etcetera possible values for Jx, Jand Jrespectively. But let me get back to the lesson. We just can’t do the same thing in quantum mechanics. For starters, we can’t measure Jx, Jy, and Jsimultaneously: our Stern-Gerlach apparatus has a certain orientation and, hence, measures one component of J only. So what can we do?

Frankly, we can only do some math here. The wave-mechanical approach does allow to think of the expected value of J2 = J= Jx2 + Jy2 + Jz2 value, so we write:

E[J2] = E[JJ] = E[Jx2 + Jy2 + Jz2] = ?

[Feynman’s use of the 〈 and 〉 brackets to denote an expected value is hugely confusing, because these brackets are also used to denote an amplitude. So I’d rather use the more commonly used E[X] notation.] Now, it is a rather remarkable property, but the expected value of the sum of two or more random variables is equal to the sum of the expected values of the variables, even if those variables may not be independent. So we can confidently use the linearity property of the expected value operator and write:

E[Jx+ Jy2 + Jz2] = E[Jx2] + E[Jx2] + E[Jx2]

Now we need something else. It’s also just part of the quantum-mechanical approach to things and so you’ll just have to accept it. It sounds rather obvious but it’s actually quite deep: if we measure the x-, y- or z-component of the angular momentum of a random particle, then each of the possible values is equally likely to occur. So that means, in our case, that the +ħ, 0 or +ħ values are equally likely, so their likelihood is one into three, i.e. 1/3. Again, that sounds obvious but it’s not. Indeed, please note, once again, that we can’t measure Jx, Jy, and Jsimultaneously, so the ‘or’ in x-, y- or z-component is an exclusive ‘or’. Of course, I must add this equipartition of likelihoods is valid only because we do not have a preferred direction for J: the particles in our beam have random ‘orientations’. Let me give you the lingo for this: we’re looking at an unpolarized beam. You’ll say: so what? Well… Again, think about what we’re doing here: we may of may not assume that the Jx, Jy, and Jvariables are related. In fact, in classical mechanics, they surely are: they’re determined by the magnitude and direction of J. Hence, they are not random at all ! But let me continue, so you see what comes out.

Because the +ħ, 0 and +ħ values are equally, we can write: E[Jx2] = ħ2/3 + 0/3 + (−ħ)2/3 = [ħ2 + 0 + (−ħ)2]/3 = 2ħ2/3. In case you wonder, that’s just the definition of the expected value operator: E[X] = p1x+ p2x+ … = ∑pixi, with pi the likelihood of the possible value x. So we take a weighted average with the respective probabilities as the weights. However, in this case, with an unpolarized beam, the weighted average becomes a simple average.

Now, E[Jy2] and E[Jz2] are – rather unsurprisingly – also equal to 2ħ2/3, so we find that E[J2] = E[Jx2] + E[Jx2] + E[Jx2] = 3·(2ħ2/3) = 2ħand, therefore, we’d say that the magnitude of the angular momentum is equal to |J| = J = +√2·ħ ≈ 1.414·ħ. Now that value is not equal to the maximum value of our x-, y-, z-component of J, or the component of J in whatever direction we’d want to measure it. That maximum value is ħ, without the √2 factor, so that’s some 40% less than the magnitude we’ve just calculated!

Now, you’ve probably fallen asleep by now but, what this actually says, is that the angular momentum, in quantum mechanics, is never completely in any direction. We can state this in another way: it implies that, in quantum mechanics, there’s no such thing really as a ‘definite’ direction of the angular momentum.


OK. Enough on this. Let’s move on to a more ‘real’ example. Before I continue though, let me generalize the results above:

[I] A particle, or a system, will have a characteristic spin number: j. That number is always an integer or a half-integer, and it determines a discrete set of possible values for the component of the angular momentum J in any direction.

[II] The number of values is equal to 2j + 1, and these values are separated by ħ, which is why they are usually measured in units of ħ, i.e. Planck’s reduced constant: ħ ≈ 1×10−34 J·s, so that’s tiny but real. 🙂 [It’s always good to remind oneself that we’re actually trying to describe reality.] For example, the permitted values for a spin-3/2 particle are +3ħ/2, +ħ/2, −ħ/2 and −3ħ/2 or, measured in units of ħ, +3/2, +1/2, −1/2 and −3/2. When discussing spin-1/2 particles, we’ll often refer to the two possible states as the ‘up’ and the ‘down’ state respectively. For example, we may write the amplitude for an electron or a proton to have a angular momentum in the x-direction equal to +ħ/2 or −ħ/2 as 〈+x〉 and 〈−x〉 respectively. [Don’t worry too much about it right now: you’ll get used to the notation quickly.]

[III] The classical concepts of angular momentum, and the related magnetic moment, have their limits in quantum mechanics. The magnitude of a vector quantity like angular momentum is generally not equal to the maximum value of the component of that quantity in any direction. The general rule is:

 J= j·(j+1)ħ2 > j2·ħ2

So the maximum value of any component of J in whatever direction (i.e. j·ħ) is smaller than the magnitude of J (i.e. √[ j·(j+1)]·ħ). This implies we cannot associate any precise and unambiguous direction with quantities like the angular momentum J or the magnetic moment μ. As Feynman puts it:

“That the energy of an atom [or a particle] in a magnetic field can have only certain discrete energies is really not more surprising than the fact that atoms in general have only certain discrete energy levels—something we mentioned often in Volume I. Why should the same thing not hold for atoms in a magnetic field? It does. But it is the attempt to correlate this with the idea of an oriented magnetic moment that brings out some of the strange implications of quantum mechanics.”

A real example: the disintegration of a muon in a magnetic field

I talked about muon integration before, when writing a much more philosophical piece on symmetries in Nature and time reversal in particular. I used the illustration below. We’ve got an incoming muon that’s being brought to rest in a block of material, and then, as muons do, it disintegrates, emitting an electron and two neutrinos. As you can see, the decay direction is (mostly) in the direction of the axial vector that’s associated with the spin direction, i.e. the direction of the grey dashed line. However, there’s some angular distribution of the decay direction, as illustrated by the blue arrows, that are supposed to visualize the decay products, i.e. the electron and the neutrinos.

Muon decay

This disintegration process is very interesting from a more philosophical side. The axial vector isn’t ‘real’: it’s a mathematical concept—a pseudovector. A pseudo- or axial vector is the product of two so-called true vectors, aka as polar vectors. Just look back at what I wrote about the angular momentum: the J in the J = r×p = r×m·v formula is such vector, and its direction depends on the spin direction, which is clockwise or counter-clockwise, depending from what side you’re looking at it. Having said that, who’s to judge if the product of two ‘true’ vectors is any less ‘true’ than the vectors themselves? 🙂

The point is: the disintegration process does not respect what is referred to as P-symmetry. That’s because our mathematical conventions (like all of these right-hand rules that we’ve introduced) are unambiguous, and they tell us that the pseudovector in the mirror image of what’s going on, has the opposite direction. It has to, as per our definition of a vector product. Hence, our fictitious muon in the mirror should send its decay products in the opposite direction too! So… Well… The mirror image of our muon decay process is actually something that’s not going to happen: it’s physically impossible. So we’ve got a process in Nature here that doesn’t respect ‘mirror’ symmetry. Physicists prefer to call it ‘P-symmetry’, for parity symmetry, because it involves a flip of sign of all space coordinates, so there’s a parity inversion indeed. So there’s processes in Nature that don’t respect it but, while that’s all very interesting, it’s not what I want to write about. [Just check that post of mine if you’d want to read more.] Let me, therefore, use another illustration—one that’s more to the point in terms of what we do want to talk about here:

muon decay Feynman

So we’ve got the same muon here – well… A different one, of course! 🙂 – entering that block (A) and coming to a grinding halt somewhere in the middle, and then it disintegrates in a few micro-seconds, which is an eternity at the atomic or sub-atomic scale. It disintegrates into an electron and two neutrinos, as mentioned above, with some spread in the decay direction. [In case you wonder where we can find muons… Well… I’ll let you look it up yourself.] So we have:


Now it turns out that the presence of a magnetic field (represented by the B arrows in the illustration above) can drastically change the angular distribution of decay directions. That shouldn’t surprise us, of course, but how does it work, exactly? Well… To simplify the analysis, we’ve got a polarized beam here: the spin direction of all muons before they enter the block and/or the magnetic field, i.e. at time t = 0, is in the +x-direction. So we filtered them just, before they entered the block. [I will come back to this ‘filtering’ process.] Now, if the muon’s spin would stay that way, then the decay products – and the electron in particular – would just go straight, because all of the angular momentum is in that direction. However, we’re in the quantum-mechanical world here, and so things don’t stay the same. In fact, as we explained, there’s no such things as a definite angular momentum: there’s just an amplitude to be in the +x state, and that amplitude changes in time and in space.

How exactly? Well… We don’t know, but we can apply some clever tricks here. The first thing to note is that our magnetic field will add to the energy of our muon. So, as I explained in my previous post, the magnetic field adds to the E in the exponent of our complex-valued wavefunction a·e(i/ħ)(E·t − px). In our example, we’ve got a magnetic field in the z-direction only, so that U = −μB reduces to U = −μz·B, and we can re-write our wavefunction as:

a·e(i/ħ)[(E+U)·t − px] = a·e(i/ħ)(E·t − px)·e(i/ħ)(μz·B·t)

Of course, the magnetic field only acts from t = 0 to when the muon disintegrates, which we’ll denote by the point t = τ. So what we get is that the probability amplitude of a particle that’s been in a uniform magnetic field changes by a factor e(i/ħ)(μz·B·τ). Note that it’s a factor indeed: we use it to multiply. You should also note that this is a complex exponential, so it’s a periodic function, with its real and imaginary part oscillating between zero and one. Finally, we know that μz can take on only certain values: for a spin-1/2 particle, they are plus or minus some number, which we’ll simply denote as μ, so that’s without the subscript, so our factor becomes:


[The plus or minus sign needs to be explained here, so let’s do that quickly: we have two possible states for a spin-1/2 particle, one ‘up’, and the other ‘down’. But then we also know that the phase of our complex-valued wave function turns clockwise, which is why we have a minus sign in the exponent of our eiθ expression. In short, for the ‘up’ state, we should take the positive value, i.e. +μ, but the minus sign in the exponent of our eiθ function makes it negative again, so our factor is e−(i/ħ)(μ·B·t) for the ‘up’ state, and e+(i/ħ)(μ·B·t) for the ‘down’ state.]

OK. We get that, but that doesn’t get us anywhere—yet. We need another trick first. One of the most fundamental rules in quantum-mechanics is that we can always calculate the amplitude to go from one state, say φ (read: ‘phi’), to another, say χ (read: ‘khi’), if we have a complete set of so-called base states, which we’ll denote by the index i or j (which you shouldn’t confuse with the imaginary unit, of course), using the following formula:

〈 χ | φ 〉 = ∑ 〈 χ | i 〉〈 i | φ 〉

I know this is a lot to swallow, so let me start with the notation. You should read 〈 χ | φ 〉 from right to left: it’s the amplitude to go from state φ to state χ. This notation is referred to as the bra-ket notation, or the Dirac notation. [Dirac notation sounds more scientific, doesn’t it?] The right part, i.e. | φ 〉, is the bra, and the left part, i.e. 〈 χ | is the ket. In our example, we wonder what the amplitude is for our muon staying in the +x state. Because that amplitude is time-dependent, we can write it as A+(τ) = 〈 +at time t = τ | +at time t = 0 〉 = 〈 +at t = τ | +at t = 0 〉or, using a very misleading shorthand, 〈 +x | +x 〉. [The shorthand is misleading because the +in the ket obviously means something else than the +in the bra.]

But let’s apply the rule. We’ve got two states with respect to each coordinate axis only here. For example, in respect to the z-axis, the spin values are +z and −z respectively. [As mentioned above, we actually mean that the angular momentum in this direction is either +ħ/2 or −ħ/2, aka as ‘up’ or ‘down’ respectively, but then quantum theorists seem to like all kinds of symbols better, so we’ll use the +z and −z notations for these two base states here. So now we can use our rule and write:

A+(t) = 〈 +x | +x 〉 = 〈 +x | +z 〉〈 +z | +x 〉 + 〈 +x | −z 〉〈 −z | +x 〉

You’ll say this doesn’t help us any further, but it does, because there is another set of rules, which are referred to as transformation rules, which gives us those 〈 +z | +x 〉 and 〈 −z | +x 〉 amplitudes. They’re real numbers, and it’s the same number for both amplitudes.

〈 +z | +x 〉 = 〈 −z | +x 〉 = 1/√2

This shouldn’t surprise you too much: the square root disappears when squaring, so we get two equal probabilities – 1/2, to be precise – that add up to one which – you guess it – they have to add up to because of the normalization rule: the sum of all probabilities has to add up to one, always. [I can feel your impatience, but just hang in here for a while, as I guide you through what is likely to be your very first quantum-mechanical calculation.] Now, the 〈 +z | +x 〉 = 〈 −z | +x 〉 = 1/√2 amplitudes are the amplitudes at time t = 0, so let’s be somewhat less sloppy with our notation and write 〈 +z | +x 〉 as C+(0) and 〈 −z | +x 〉 as C(0), so we write:

〈 +z | +x 〉 = C+(0) = 1/√2

〈 −z | +x 〉 = C(0) = 1/√2

Now we know what happens with those amplitudes over time: that e(i/ħ)(±μ·B·t) factor kicks in, and so we have:

C+(t) = C+(0)·e−(i/ħ)(μ·B·t) = e−(i/ħ)(μ·B·t)/√2

C(t) = C(0)·e+(i/ħ)(μ·B·t) = e+(i/ħ)(μ·B·t)/√2

As for the plus and minus signs, see my remark on the tricky ± business in regard to μ. To make a long story somewhat shorter :-), our expression for A+(t) = 〈 +x at t | +x 〉 now becomes:

A+(t) = 〈 +x | +z 〉·C+(t) + 〈 +x | −z 〉·C(t)

Now, you wouldn’t be too surprised if I’d just tell you that the 〈 +x | +z 〉 and 〈 +x | −z 〉 amplitudes are also real-valued and equal to 1/√2, but you can actually use yet another rule we’ll generalize shortly: the amplitude to go from state φ to state χ is the complex conjugate of the amplitude to to go from state χ to state φ, so we write 〈 χ | φ 〉 = 〈 φ | χ 〉*, and therefore:

〈 +x | +z 〉 = 〈 +z | +x 〉* = (1/√2)* = (1/√2)

〈 +x | −z 〉 = 〈 −z | +x 〉* = (1/√2)* = (1/√2)

So our expression for A+(t) = 〈 +x at t | +x 〉 now becomes:

A+(t) = e−(i/ħ)(μ·B·t)/2 + e(i/ħ)(μ·B·t)/2

That’s the sum of a complex-valued function and its complex conjugate, and we’ve shown more than once (see my page on the essentials, for example) that such sum reduces to the sum of the real parts of the complex exponentials. [You should not expect any explanation of Euler’s eiθ = cosθ + i·sinθ rule at this level of understanding.] In short, we get the following grand result:

muon decay result

The big question, of course: what does this actually mean? 🙂 Well… Just square this thing and you get the probabilities shown below. [Note that the period of a squared cosine function is π, instead of 2π, which you can easily verify using an online graphing tool.]


Because you’re tired of this business, you probably don’t realize what we’ve just done. It’s spectacular and mundane at the same time. Let me quote Feynman to summarize the results:

“We find that the chance of catching the decay electron in the electron counter varies periodically with the length of time the muon has been sitting in the magnetic field. The frequency depends on the magnetic moment μ. The magnetic moment of the muon has, in fact, been measured in just this way.”

As far as I am concerned, the key result is that we’ve learned how to work with those mysterious amplitudes, and the wavefunction, in a practical way, thereby using all of the theoretical rules of the quantum-mechanical approach to real-life physical situations. I think that’s a great leap forward, and we’ll re-visit those rules in a more theoretical and philosophical démarche in the next post. As for the example itself, Feynman takes it much further, but I’ll just copy the Grand Master here:


Huh? Well… I am afraid I have to leave it at this, as I discussed the precession of ‘atomic’ magnets elsewhere (see my post on precession and diamagnetism), which gives you the same formula: ω= μ·B/J (just substitute J for ±ħ/2). However, the derivation above approaches it from an entirely different angle, which is interesting. Of course, all fits. 🙂 However, I’ll let you do your own homework now. I hope to see you tomorrow for the mentioned theoretical discussion. Have a nice evening, or weekend – or whatever ! 🙂

Atomic magnets: precession and diagmagnetism

This and the next posts will further build on the concepts introduced in my previous post on particle spin. This post in particular will focus on some of the math we’ll need to understand what quantum mechanics is all about. The first topic is about the quantum-mechanical equivalent of the phenomenon of precession. The other topics are… Well… You’ll see… 🙂

The Larmor frequency

The motion of a spinning object in a force field is quite complicated. In our post on gyroscopes, we introduced the concepts of precession and nutation. The concept of precession is illustrated below for the Earth as well as for a spinning top. In both cases, the external force is just gravity.


Nutation is an additional movement: on top of the precessional movement, a spinning object may wobble, as illustrated below.

17_Precession and Nutation

There seems to be no analog for nutation in quantum mechanics. In fact, the terms nutation and precession seem to be used interchangeably in quantum physics, although they are very different in classical physics. But let’s not complicate things and, hence, talk about the phenomenon of precession only.

We will not re-explain the phenomenon of precession here but just remind you that the phenomenon can be described in terms of (a) the angle between the symmetry axis and the momentum vector, which we’ll denote by θ, and (b) the angular velocity of the precession, which we’ll denote by ω= dφ/dt, as shown below. The J in the illustration below is the angular momentum of the object. Hence, if we’d imagine it to be an electron, then J would be the spin angular momentum only, not its orbital angular momentum—although the analysis would obviously be valid for the orbital and/or total angular momentum as well.


OK. Let’s look at what’s going on. The angular displacement – which is also, rather confusingly, referred to as the angle of precession – in the time interval Δt is, obviously, equal to Δφ = ωp·Δt. Now, looking at the geometry of the situation, and using the small-angle approximation for the sine, one can also see that ΔJ ≈ (J·sinθ)·(ωp·Δt). In fact, going to the limit (i.e. for infinitesimally small Δφ and ΔJ), we can write:

dJ/dt = ωp·J·sinθ

But the angular momentum cannot change if there’s no torque. In fact, the time rate of change of the angular momentum is equal to the torque. [You should look this up but, if you don’t want to do that, note that this is just the equivalent, for rotational motion, of the F = dp/dt law for linear motion.] Now, in my post on magnetic dipoles, I showed that the torque τ on a loop of current with magnetic moment μ in an external magnetic field B  is equal to τ = μ×B. So the magnitude of the torque is equal to |τ| = |μ|·|B|·sinθ = μ·B·sinθ. Therefore, ωp·J·sinθ = μ·B·sinθ and, hence,

ω= μ·B/J

However, from the general μ/J = –g·(qe/2m) equation we derived in our previous post, we know that μ/J – for an atomic magnet, that is – must be equal to μ/J = g·qe/2m. So we get the formula we wanted to get here:

ω= g·(qe/2m)·B

This equation says that the angular velocity of the precession is proportional to the magnitude of the external magnetic field, and that the constant of proportionality is equal to g·(qe/2m). It’s good to do the math and actually calculate the precession frequency fp = ωp/2π. It’s easy. We had calculated qe/2m already: it was equal to 1.6×10−19 C divided by 2·9.1×10−31 kg, so that’s 0.0879×1012  C/kg or 0.0879×1012 (C·m)/(N·s2), more or less. 🙂 Now, g is dimensionless, and B is expressed in tesla: 1 T = (N·s)/(C·m), so we get the s−1 dimension we want for a frequency. For g = 2 (so we look at the spin of the electron itself only), we get:

fp = ωp/2π = 2·0.0879×1012/2π ≈ 28×109 = 28 gigacycles per tesla = 28 GHz/T

This is a number expressed per unit of the magnetic field strength B. Note that you’ll often see this number expressed as 1.4 megacycles per gauss, using the older gauss unit for magnetic field strength: 1 tesla = 10,000 gauss. For a nucleus, we get a somewhat less impressive number because the proton (or neutron) mass is so much bigger: it’s a number expressed in megacycles per tesla, indeed, and for a proton (i.e. a hydrogen nucleus), it’s about 42.58 MHz/T.

Now, you may wonder about the numbers here. Are they astronomical? Maybe. Maybe not. It’s probably good to note that the strength of the magnetic field in medical MRI systems (magnetic resonance imaging systems) is only 1.5 to 3 tesla, so it’s a rather large unit. You should also note that the clock speed of the CPU in your laptop – so that’s the speed at which it executes instructions – is measured in GHz too, so perhaps it’s not so astronomic. I’ll let you judge. 🙂

So… Well… That’s all nice. The key question, of course, is whether or not this classical view of the electron spinning around a proton is accurate, quantum-mechanically, that is. I’ll let Feynman answer that question provisionally:

“According to the classical theory, then, the electron orbits—and spins—in an atom should precess in a magnetic field. Is it also true quantum-mechanically? It is essentially true, but the meaning of the “precession” is different. In quantum mechanics one cannot talk about the direction of the angular momentum in the same sense as one does classically; nevertheless, there is a very close analogy—so close that we continue to call it precession.”

To distinguish classical and quantum-mechanical precession, quantum-mechanical precession is usually referred to as Larmor precession, and the frequencies above are often referred to as Larmor frequencies. However, I should note that, technically speaking, the term Larmor frequency is actually reserved for the frequency I’ll describe in the next section. I should also note that the ω= g·(qe/2m)·B is usually written, quite simply, as ω= γ·B. Of course, the gamma is not the Lorentz factor here, but the so-called gyromagnetic ratio (aka as the magnetogyric ratio): γ = g·(qe/2m). Oh—just so you know: Sir Joseph Larmor was a British physicists and, yes, he developed all of the stuff we’re talking about here. 🙂

At this point, you may wonder if and why all of the above is relevant. Well… There’s more than one answer to this question, but I’d recommend you start with reading the Wikipedia article on NMR spectroscopy. 🙂 And then you should also read Feynman’s exposé on the Rabi atomic or molecular beam method for determining the precession frequency. It’s really fascinating stuff, but you are sufficiently armed now to read those things for yourself, and so I’ll just move on. Indeed, there’s something else I need to talk about here, and that’s Larmor’s Theorem.

Larmor’s Theorem

We’ve been talking single electrons only so far. Now, you may fear that things become quite complicated when many electrons are involved and… Well… That’s true, of course. And then you may also think that things become even more complicated when external fields are involved, like that external magnetic field we introduced above, and that led our electrons to precess at extraordinary frequencies. Well… That’s not true. Here we get some help: Larmor proved a theorem that basically says that, if we can work out the motions of the electrons without the external field, the solution for the motions with the external field is the no-field solution with an added rotation about the axis of the field. More specifically, for an external magnetic field, the added rotation will have an angular frequency equal to:

ω= (qe/2m)·B

So that’s the same formula as we found for the angular velocity of the precession if g = 1, so that’s very easy to remember. The ωL  frequency, which is the precession frequency for g = 1, is referred to as the Larmor frequency. The proof of the above is remarkably easy, but… Well… I don’t want to copy Feynman here, so I’ll just refer you to the relevant Lecture on it. 🙂


I guess it’s about time we relate all of what we learned so far to properties of matter we can relate to, and so that’s what I’ll do here. We’re not going to talk about ferromagnetism here, i.e. the mechanism through which iron, nickel and cobalt and most of their alloys become permanent magnets. That’s quite peculiar and so we will not discuss it here. Here we’ll talk about the very weak quantum-mechanical magnetic effect – a thousand to a million times less than the effects in ferromagnetic materials – that occurs in all materials when placed in an external magnetic field.

While the effect is there in all materials, it’s stronger for some than for others. In fact, it’s usually so weak it is hard to detect, and so it’s usually demonstrated using elements for which the diamagnetic effect is somewhat stronger, like bismuth or antimony. The effect is demonstrated by suspending a piece of material in a non-uniform field, as illustrated below. The diamagnetic effect will cause a small displacement of the material, away from the high-field region, i.e. away from the pointed pole.


I should immediately add that some materials, like aluminium, will actually be attracted to the pointed pole, but that’s because of yet another effect that not all materials share: paramagnetism. I’ll talk about that in another post, together with ferromagnetism. So… Diamagnetism: what is it?

The illustration below shows our spinning electron (q) once again. It also shows a magnetic field B but, unlike our analysis above, or the analysis in our previous post, we assume the external magnetic field is not just there. We assume it changes, because it’s been turned on or off—hopefully slowly: if not, we’d have eddy-current forces causing potentially strong impulses.

diagmagnetism 2But so we’ve got some change in the magnetic flux , and so we know, because of Faraday or Maxwell – you choose 🙂 – that we’ll have some circulation of E, i.e. the electric field. The magnetic flux is B times the surface area, and the circulation is the average tangential component E times the length of the path. Because our model of the orbiting electron is so nice and symmetric, we can write Faraday’s Law here as:

E·2π·r = −d(B·π·r2)/dt ⇔ E = −(r/2)·dB/dt

A field implies a force and, therefore, a torque on the electron. The torque is equal to the force times the lever arm, so it’s equal to (−qe·E)·r = −qe·E·r. Of course, the torque is also equal to the rate of the change of the angular momentum, so dJ/dt must equal:

dJ/dt = −qe·E·r =  qe·(r/2)·(dB/dt)·r = (qe·r2/2)·(dB/dt)

Now, the assumption is that the field goes from zero to B, so ΔB = B. Therefore, ΔJ must be equal to:

ΔJ = (qe·r2/2)·B

You should, in fact, derive this more formally, by integrating—but let’s keep things as simple as we can. 🙂 What does this formula say, really? It’s the extra angular momentum from the ‘twist’ that’s given to the electrons as the field is turned on. Now, this added angular momentum makes an extra magnetic moment which, because it is an orbital motion, is just qe/2m times the angular momentum that’s already there. But more angular momentum means the magnetic moment has changed, according to the μ = (qe/2m)·J formula we derived in our previous post, so we have:

Δμ = –(qe/2m)·ΔJ

The minus sign is there because of Lenz’ law: the added momentum is opposite to the magnetic field—and, yes, I know: it’s hard to keep track of all of the conventions involved here. :-/ In any case, we get the following grand equation:


So we found that the induced magnetic moment is directly proportional to the magnetic field B, and opposing it. Now that is what explains why our piece of bismuth does what it does in that non-uniform magnetic field. Of course, you’ll say: why is stronger for bismuth than for other materials? And what about aluminium, or paramagnetism in general? Well… Good questions, but we’ll tackle them in the next posts. 🙂

Let me conclude this post by copying Feynman’s little exposé on why the phenomenon of diamagnetism is so particular. In fact, he notes that, because we’re talking a piece of material here that can’t spin – so it’s held in place, so to say – we should have “no magnetic effects whatsoever”. The reasoning is as follows:


This is very interesting indeed. This classical theorem basically says that the energy of a system should not be affected by the presence of a magnetic field. However, we know magnetic effects, such as the diamagnetic effect, are there, so these effects are referred to as ‘quantum-mechanical’ effects indeed: they cannot be explained using classical theory only, even if all of what we wrote above used classical theory only.

I should also note another point: why do we need a non-homogeneous field? Well… The situation is comparable to what we wrote on the Stern-Gerlach experiment. If we would have a homogeneous magnetic field, then we would only have a torque on all of the atomic magnets, but no net force in one or the other direction. There’s something else here too: you may think that the forces pointing towards and away from the pointed tip should cancel each other out, so there should actually be no net movement of the material at all! Feynman’s analysis works for one atom, indeed, but does it still make sense if we look at the whole piece of material? It does, because we’re talking an induced magnetic moment that’s opposing the field, regardless of the orientation of the magnetic moment of the individual atoms in the piece of material. So, even if the individual atoms have opposite momenta, the extra induced magnetic moment will point in the same direction for all. So that solves that issue. However, it does not address Feynman’s own critical remark in regard to the supposed ‘impossibility’ of diamagnetism in classical mechanics.

But I’ll let you think about this, and sign off for today. 🙂 I hope you enjoyed this post.

Spin and angular momentum in quantum mechanics

Feynman starts his Volume of Lectures on quantum mechanics (so that’s Volume III of the whole series) with the rules we already know, so that’s the ‘special’ math involving probability amplitudes, rather than probabilities. However, these introductory chapters assume theoretical zero-spin particles, which means they don’t have any angular momentum. While that makes it much easier to understand the basics of quantum math, real elementary particles do have angular momentum, which makes the analysis much more complicated. Therefore, Feynman makes it very clear, after his introductory chapters, that he expects all prospective readers of his third volume to first work their way through chapter 34 and 35 of the second volume, which discusses the angular momentum of elementary particles from both a classical as well as a quantum-mechanical perspective. So that’s what we will do here. I have to warn you, though: while the mentioned two chapters are more generous with text than other textbooks on quantum mechanics I’ve looked at, the matter is still quite hard to digest. By way of introduction, Feynman writes the following:

“The behavior of matter on a small scale—as we have remarked many times—is different from anything that you are used to and is very strange indeed. Understanding of these matters comes very slowly, if at all. One never gets a comfortable feeling that these quantum-mechanical rules are ‘natural’. Of course they are, but they are not natural to our own experience at an ordinary level. The attitude that we are going to take with regard to this rule about angular momentum is quite different from many of the other things we have talked about. We are not going to try to ‘explain’ it but tell you what happens.”

I personally feel it’s not all as mysterious as Feynman claims it to be, but I’ll let you judge for yourself. So let’s just go for it and see what comes out. 🙂

Atomic magnets and the g-factor

When discussing electromagnetic radiation, we introduced the concept of atomic oscillators. It was a very useful model to help us understand what’s supposed to be going on. Now we’re going to introduce atomic magnets. It is based on the classical idea of an electron orbiting around a proton. Of course, we know this classical idea is wrong: we don’t have nice circular electron orbitals, and our discussion on the radius of an the electron in our previous post makes it clear that the idea of the electron itself is rather fuzzy. Nevertheless, the classical concepts used to analyze rotation are also used, mutatis mutandis, in quantum mechanics. Mutatis mutandis means: with necessary alterations. So… Well… Let’s go for it. 🙂 The basic idea is the following: an electron in a circular orbit is a circular current and, hence, it causes a magnetic field, i.e. a magnetic flux through the area of the loop—as illustrated below.


As such, we’ll have a magnetic (dipole) moment, and you may want to review my post(s) on that topic so as to ensure you understand what follows. The magnetic moment (μ) is the product of the current (I) and the area of the loop (π·r2), and its conventional direction is given by the μ vector in the illustration below, which also shows the other relevant scalar and/or vector quantities, such as the velocity v and the orbital angular momentum J. The orbital angular momentum is to be distinguished from the spin angular momentum, which results from the spin around its own axis. So the spin angular momentum – which is often referred to as the spin tout court – is not depicted below, and will only be discussed in a few minutes.

atomic magnet

Let me interject something on notation here. Feynman’s always uses J, for whatever momentum. That’s not so usual. Indeed, if you’d google a bit, you’ll see the convention is to use S and L respectively to distinguish spin and orbital angular momentum respectively. If we’d use S and L, we can write the total angular momentum as J = S + L, and the illustration below shows how the S and L vectors are to be added. It looks a bit complicated, so you can skip this for now and come back to it later. But just try to visualize things:

  1. The L vector is moving around, so that assumes the orbital plane is moving around too. That happens when we’d put our atomic system in a magnetic field. We’ll come back to that. In what follows, we’ll assume the orbital plane is not moving.
  2. The S vector here is also moving, which also assumes the axis of rotation is not steady. What’s going on here is referred to as precession, and we discussed it when presenting the math one needs to understand gyroscopes.
  3. Adding S and L yields J, the total angular momentum. Unsurprisingly, this vector wiggles around too. Don’t worry about the magnitudes of the vectors here. Also, in case you’d wonder why the axis of symmetry for the movement of the J vector happens to be the Jz axis, the answer is simple: we chose the coordinate system so as to ensure that was the case.


But I am digressing. I just inserted the illustration above to give you an inkling of where we’re going with this. Indeed, what’s shown above will make it easier for you to see how we can generalize the analysis that we’ll do now, which is an analysis of the orbital angular momentum and the related magnetic moment only. Let me copy the illustration we started with once more, so you don’t have to scroll up to see what we’re talking about.

atomic magnet

So we have a charge orbiting around some center. It’s a classical analysis, and so it’s really like a planet around the Sun, except that we should remember that likes repel, and opposites attract, so we’ve got a minus sign in the force law here.

Let’s go through the math. The magnetic moment is the current times the area of the loop. As the velocity is constant, the current is just the charge q times the frequency of rotation. The frequency of rotation is, of course, the velocity (i.e. the distance traveled per second) divided by the circumference of the orbit (i.e. 2π·r). Hence, we write: I = (qe·v)/(2π·r) and, therefore: μ = (qe/·v)·π·r2)/(2π·r) = qe·v·r/2. Note that, as per the convention, current is defined as a flow of positive charges, so the illustration above actually assumes we’re talking some proton in orbit, so q = qe would be the elementary charge +1. If we’d be talking an electron, then its charge is to be denoted as –q(minus qe, i.e. −1), and we’d need to reverse the direction of μ, which we’ll do in a moment. However, to simplify the discussion, you should just think of some positive charge orbiting the way it does in the illustration above.

OK. That’s all there’s to say about the magnetic moment—for the time being, that is. Let’s think about the angular momentum now. It’s orbital angular momentum here, and so that’s the type of angular momentum we discussed in our post on gyroscopes. We denoted it as L indeed – i.e. not as J, but that’s just a matter of conventions – and we noted that L could be calculated as the vector cross product of the position vector r and the momentum vector p, as shown in the animation below, which also shows the torque vector τ.

Torque_animation (1)

The angular momentum L changes in the animation above. In our J case above, it doesn’t. Also, unlike what’s happening with the angular momentum of that swinging ball above, the magnitude of our J doesn’t change. It remains constant, and it’s equal to |J| = J = |r×p| = |r|·|p|·sinθ = r·p = r·m·v. One should note this is a non-relativistic formula, but as the relative velocity of an electron v/c is equal to the fine-structure constant, so that’s α ≈ 0.0073 (see my post on the fine-structure constant if you wonder where this formula comes from), it’s OK to not include the Lorentz factor in our formulas as for now.

Now, as I mentioned already, the illustration we’re using to explain μ and J is somewhat unreal because it assumes a positive charge q, and so μ and J point in the same direction in this case, which is not the case if we’d be talking an actual atomic system with an electron orbiting around a proton. But let’s go along with it as for now and so we’ll put the required minus sign in later. We can combine the J = r·m·v and μ = q·v·r/2 formulas to write:

μ = (q/2m)·J or μ/= (q/2m) (electron orbit)

In other words, the ratio of the magnetic moment and the angular moment depends on (1) the charge (q) and (2) the mass of the charge, and on those two variables only. So the ratio does not depend on the velocity v nor on the radius r. It can be noted that the q/2m factor is often referred to as the gyromagnetic factor (not to be confused with the g-factor, which we’ll introduce shortly). It’s good to do a quick dimensional check of this relation: the magnetic moment is expressed in ampère per second times the loop area, so that’s (C/s)·m2. On the right-hand side, we have the dimension of the gyromagnetic factor, which is C/kg, times the dimension of the angular momentum, which is m·kg·m/s, so we have the same units on both sides: C·m2/s,  which is often written as joule per tesla (J/T): the joule is the energy unit (1 J = 1 N·m), and the tesla measures the strength of the magnetic field (1 T = 1 (N·s)/(C·m). OK. So that works out.

So far, so good. The story is a little bit different for the spin angular momentum and the spin magnetic moment. The formula is the following:

μ = (q/m)·J (electron spin)

This formula says that the μ/J ratio is twice what it is for the orbital motion of the electron. Why is that? Feynman says “the reasons are pure quantum-mechanical—there is no classical explanation.” So I’d suggest we just leave that question open for the moment and see if we’re any wiser once we’ve worked ourselves through all of his Lectures on quantum physics. 🙂 Let’s just go along with it as for now.

Now, we can write both formulas – i.e. the formula for the spin and the orbital angular momentum – in a more general way using the format below:

μ = –g·(qe/2meJ

Why the minus sign? Well… I wanted to get the sign right this time. Our model assumed some positive charge in orbit, but so we want a formula for a atomic system, and so our circling charge should be an electron. So the formula above is the formula for a electron, and the direction of the magnetic moment and of the angular motion will be opposite for electrons: it just replaces q by –qe. The format above also applies to any atomic system: as Feynman writes, “For an isolated atom, the direction of the magnetic moment will always be exactly opposite to the direction of the angular momentum.” So the g-factor will be characteristic of the state of the atom. It will be 1 for a pure orbital moment, 2 for a pure spin moment, or some other number in-between for a complicated system like an atom, indeed.

You may have one last question: why qe/2m instead of qe/m in the middle? Well… If we’d take qe/m, then g would be 1/2 for the orbital angular momentum, and the initial idea with g was that it would be some integer (we’ll quickly see that’s an idea only). So… Well… It’s just one more convention. Of course, conventions are not always respected so sometimes you’ll see the expression above written without the minus sign, so you may see it as μ = g·(qe/2meJ. In that case, the g-factor for our example involving the spin angular momentum and the spin magnetic moment, will obviously have to be written as minus 2.

Of course, it’s easy to see that the formula for the spin of a proton will look the same, except that we should take the mass of the proton in the formula, so that’s minstead of me. Having said that, the elementary charge remains what it is, but so we write it without the minus sign here. To make a long story short, the formula for the proton is: 

μ = g·(qe/2mpJ

OK. That’s clear enough. For electrons, the g-factor is referred to as the Landé g-factor, while the g-factor for protons or, more generally, for any spinning nucleus, is referred to as the nuclear g-factor. Now, you may or may not be surprised, but there’s a g-factor for neutrons too, despite the fact that they do not carry a net charge: the explanation for it must have something to do with the quarks that make up the neutron but that’s a complicated matter which we will not get into here. Finally, there is a g-factor for a whole atom, or a whole atomic system, and that’s referred to as… Well… Just the g-factor. 🙂 It’s, obviously, a number that’s characteristic of the state of the atom.

So… This was a big step forward. We’ve got all of the basics on that ‘magical’ spin number here, and so I hope it’s somewhat less ‘magical’ now. 🙂 Let me just copy the values of the g-factor for some elementary particles. It also shows how hard physicists have been trying to narrow down the uncertainty in the measurement. Quite impressive! The table comes from the Wikipedia article on it. I hope the explanations above will now enable you to read and understand that. 🙂


Let’s now move on to the next topic.

Spin numbers and states

Of course, we’re talking quantum mechanics and, therefore, J can only take on a finite number of values. While that’s weird – as weird as other quantum-mechanical things, such as the boson-fermion math, for example – it should not surprise us. As we will see in a moment, the values of J will determine the magnetic energy our system will acquire when we put in some magnetic field and, as Feynman writes: “That the energy of an atom in the magnetic field can have only certain discrete energies is really not more surprising than the fact that atoms in general have only certain discrete energy levels. Why should the same thing not hold for atoms in a magnetic field? It does. It is just correlation of this with the idea of an oriented magnetic moment that brings out some of the strange implications of quantum mechanics.” Yep. That’s true. We’ll talk about that later.

Of course, you’ll probably want some ‘easier’ explanation. I am afraid I can’t give you that. All I can say is that, perhaps, you should think of our discussion on the fine-structure constant, which made it clear that the various radii of the electron, its velocity and its mass and/or energy are all related one to another and, hence, that they can only take on certain values. Indeed, of all the relations we discussed, there’s two you should always remember. The first relationship is the U = (e2/r) = α/r. So that links the energy (which we can express in equivalent mass units), the electron charge and its radius. The second thing you should remember is that the Bohr radius and the classical electron radius are also related through α: α   re/r = α2. So you may want to think of the different values for J as being associated with different ‘orbitals’, so to speak. But that’s a very crude way of thinking about it, so I’d say: just accept the fact and see where it leads us. You’ll see, in a few moments from now, that the whole thing is not unlike the quantum-mechanical explanation of the blackbody radiation problem, which assumes that the permitted energy levels (or states) are equally spaced and h·f apart, with the frequency of the light that’s being absorbed and/or emitted. So the atom takes up energies only h·f at a time. Here we’ve got something similar: the energy levels that we’ll associate with the discrete values of J – or J‘s components , I should say – will also be equally spaced. Let me show you how it works, as that will make things somewhat more clear.

If we have an object with a given total angular momentum J in classical mechanics, then any of its components x, y or z, could take on any value from +J to −J. That’s not the case here. The rule is that the ‘system’ – the atom, the nucleus, or anything really – will have a characteristic number, which is referred to as the ‘spin’ of the system and, somewhat confusingly, it’s denoted by j (as you can, it’s extremely important, indeed, to distinguish capital letters (like J) from small letters (like j) if you want to keep track of what we’re explaining here). Now, if we have that characteristic spin number j, then any component of J (think of the z-direction, for example) can take on only (one of) the following values:

permitted values

Note that we will always have 2j + 1 values. For example, if j = 3/2, we’ll have 2·(3/2) + 1 = 4 permitted values, and in the extreme case where j is zero, we’ll still have 2·0 + 1 = 1 permitted value: zero itself. So that basically says we have no angular momentum. […] OK. That should be clear enough, but let’s pause here for a moment and analyze this—just to make sure we ‘get’ this indeed. What’s being written here? What are those numbers? Let’s do a quick dimensional analysis first. Because j, j − 1, j − 2, etcetera are pure numbers, it’s only the dimension of ħ that we need to look at. We know ħ: it’s the Planck constant h, which is expressed in joule·second, i.e. J·s = N·m·s, divided by 2π That makes sense, because we get the same dimension for the angular momentum. Indeed, the L or J = r·m·v formula also gives us the dimension of physical action, i.e. N·m·s. Just check it: [r]·[m]·[v] = m·kg·m/s = m·(N·s2/m)·m/s = N·m·s. Done!

So we’ve got some kind of unit of action once more here, even if it’s not h but ħ = h/2π. That makes it a quantum of action expressed for a radian, so that’s a unit of length, rather than for a full cycle. Just so you know, ħ = h/2π is 1×10−34 J·s ≈ 6.6×10−16 eV·s, and we could chose to express the components of J in terms of h by multiplying the whole thing with 2π. That would boil down to saying that our unit length is not unity but the unit circle, which is 2π times unity. Huh? Just think about it: h is a fundamental unit linked to one full cycle of something, so it all makes sense. Before we move on, you may want to compare the value of h or ħ with the energy of a photon, which is 1.6 to 3.2 eV in the visible light spectrum, but you should note that energy does not have the time dimension, and a second is an eternity in quantum physics, so the comparison is a bit tricky. So… […] Well… Let’s just move on. What about those coefficients? What constraints are there?

Well… The constraint is that the difference between +j and −j must be some integer, so +j−(−j) = 2j must be an integer. That implies that the spin number j is always an integer or a half-integer, depending on whether j is even or odd. Let’s do a few examples:

  1. A lithium (Li-7) nucleus has spin j = 3/2 and, therefore, the permitted values for the angular momentum around any axis (the z-axis, for example) are: 3/2, 3/2−1=1/2, 3/2−2=−1/2, and −3/2—all times ħ of course! Note that the difference between +j and –j is 3, and each ‘step’ between those two levels is ħ, as we’d like it to be.
  2. The nucleus of the much rarer Lithium-6 isotope is one of the few stable nuclei that has spin j = 1, so the permitted values are 1, 0 and −1. Again, all needs to be multiplied with ħ to get the actual value for the J-component that we’re looking at. So each step is ‘one’ again, and the total difference (between +j and –j) is 2.]
  3. An electron is a spin-1/2 particle, and so there are only two permitted values: +ħ/2 and −ħ/2. So there is just one ‘step’ and it’s equal to the whole difference between +j and –j. In fact, this is the most common situation, because we’ll be talking elementary fermions most of the time.
  4. Photons are an example of spin-1 ‘particles’, and ‘particles’ with integer spin are referred to as bosons. In this regard, you may heard of superfluid Helium-4, which is caused by Bose-Einstein condensation near the zero temperature point, and demonstrates the integer spin number of Helium-4, so it resembles Lithium-6 in this regard.

The four ‘typical’ examples makes it clear that the actual situations that we’ll be analyzing will usually be quite simple: we’ll only have 2, 3 or 4 permitted values only. As mentioned, there is this fundamental dichotomy between fermions and bosons. Fermions have half-integer spin, and all elementary fermions, such as protons, neutrons, electrons, neutrinos and quarks are spin-1/2 particles. [Note that a proton and a neutron are, strictly speaking, not elementary, as their constituent parts are quarks.] Bosons have integer spin, and the bosons we know of are spin-one particles, (except for the newly discovered Higgs boson, which is an actual spin-zero particle). The photon is an example, but the helium nucleus (He-4) also has spin one, which – as mentioned above – gives rise to superfluidity when its cooled near the absolute zero point.

In any case, to make a long story short, in practice, we’ll be dealing almost exclusively with spin-1, spin-1/2 particles and, occasionally, with spin-3/2 particles. In addition, to analyze simple stuff, we’ll often pretend particles do not have any spin, so our ‘theoretical’ particles will often be spin zero. That’s just to simplify stuff.

We now need to learn how to do a bit of math with all of this. Before we do so, let me make some additional remarks on these permitted values. Regardless of whether or not J is ‘wobbling’ or moving or not – let me be clear: J is not moving in the analysis above, but we’ll discuss the phenomenon of precession in the next post, and that will involve a J like that J circling around the Jz axis, so I am just preparing the terrain here – J‘s magnitude will always be some constant, which we denoted by |J| = J.

Now there’s something really interesting here, which again distinguishes classical mechanics from quantum mechanics. As mentioned, in classical mechanics, any of J‘s components Jx, Jy or Jz, could take on any value from +J to −J and, therefore, the maximum value of any component of J – say Jz – would be equal to J. To be precise, J would be the value of the component of J in the direction of J itself. So, in classical mechanics, we’d write: |J| = +√(J·J) = +√JJ, and it would be the maximum value of any component of J. But so we said that, if the spin number of J is j, then the maximum value of any component of J was equal to j·ħ. So, naturally, one would think that J = |J| = +√(J·J) = +√J= j·ħ.

However, that’s not the case in quantum mechanics: the maximum value of any component of J is not J = j·ħ but the square root of j·(j+1)·ħ.

Huh? Yes. Let me spell it out: |J| = +√(J·J) = +√J≠ jħ. Indeed, quantum math has many particularities, and this is one of them. The magnitude of J is not equal to the largest possible value of any component of J:

J‘s magnitude is not jħ but √(j(j+1)ħ).

As for the proof of this, let me simplify my life and just copy Feynman here:


The formula can be easily generalized for j ≠ 3/2. Also note that we used a fact that we didn’t mention as yet: all possible values of the z-component (or of whatever component) of J are equally likely.

Now, the result is fascinating, but the implications are even better. Let me paraphrase Feynman as he relates them:

  1. From what we have so far, we can get another interesting and somewhat surprising conclusion. In certain classical calculations the quantity that appears in the final result is the square of the magnitude of the angular momentum J—in other words, JJ = J2. It turns out that it is often possible to guess at the correct quantum-mechanical formula by using the classical calculation and the following simple rule: Replace J= Jby j(j+1)ħ. This rule is commonly used, and usually gives the correct result.
  2. The second implication is the one we announced already: although we would think classically that the largest possible value of the any component of J is just the magnitude of J, quantum-mechanically the maximum of any component of J is always less than that, because jħ is always less than √(j(j+1)ħ). For example, for j = 3/2 = 1.5, we have j(j+1) = (3/2)·(5/2) = 15/4 = 3.75. Now, the square root of this value is √3.75 ≈ 1.9365, so the magnitude of J is about 30% larger than the maximum value of any of J‘s components. That’s a pretty spectacular difference, obviously!   

The second point is quite deep: it implies that the angular momentum is ‘never completely along any direction’. Why? Well… Think of it: “any of J‘s components” also includes the component in the direction of J itself! But if the maximum value of that component is 30% less than the magnitude of J, what does that mean really? All we can say is that it implies that the concept of the direction of the magnitude itself is actually quite fuzzy in quantum mechanics! Of course, that’s got to do with the Uncertainty Principle, and so we’ll come back to this later.

In fact, if you look at the math, you may think: what’s that business with those average or expected values? A magnitude is a magnitude, isn’t it? It’s supposed to be calculated from the actual values of Jx, Jy and Jz, not from some average that’s based on the (equal) likelihoods of the permitted values. You’re right. Feynman’s derivation here is quantum-mechanical from the start and, therefore, we get a quantum-mechanical result indeed: the magnitude of J is calculated as the magnitude of a quantum-mechanical variable in the derivation above, not as the magnitude of a classical variable.

[…] OK. On to the next.

The magnetic energy of atoms

Before we start talking about this topic, we should, perhaps, relate the angular momentum to the magnetic moment once again. We can do that using the μ = (q/2m)·J and/or μ = (q/m)·formula (so that’s the simple formulas for the orbital and spin angular momentum respectively) or, else, by using the more general μ = – g·(q/2m)·J formula.

Let’s use the simpler μ = (qe/2m)·J formula, which is the one for the orbital angular momentum. What’s qe/2m? It should be equal to 1.6×10−19 C divided by 2·9.1×10−31 kg, so that’s about 0.0879×1012  C/kg, or 0.0879×1012 (C·m)/(N·s2). Now we multiply by ħ/2 ≈ 0.527×10−34 J·s. We get something like 0.0463×10−22 m2·C/s or J/T. These numbers are ridiculously small, so they’re usually measured in terms of a so-called natural unit: the Bohr magneton, which I’ll explain in a moment but so here we’re interested in its value only, which is μB = 9.274×10−24 J/T. Hence, μ/μB = 0.5 = 1/2. What a nice number!

Hmm… This cannot be a coincidence… […] You’re right. It isn’t. To get the full picture, we need to include the spin angular momentum, so we also need to see what the μ = (q/m)·J will yield. That’s easy, of course, as it’s twice the value of (q/2m)·J, so μ/μB = 1, and so the total is equal to 3/2. So the magnetic moment of an electron has the same value (when expressed in terms of the Bohr magneton) as the spin (when expressed in terms of ħ). Now that’s just sweet!

Yes, it is. All our definitions and formulas were formulated so as to make it sweet. Having said that, we do have a tiny little problem. If we use the general μ = −g·(q/2m)·J to write the result we found for the spin of the electron only (so we’re not looking at the orbital momentum here), then we’d write: μ = 2·(q/2m)·J = (q/m)·J and, hence, the g-factor here is −2. Yes. We know that. You told me so already. What’s the issue? Well… The problem is: experiments reveal the actual value of g is not exactly −2: it’s −2.00231930436182(52) instead, with the last two digits (in brackets) the uncertainty in the current measurements. Just check it for yourself on the NIST website. 🙂 [Please do check it: it brings some realness to this discussion.]

Hmm…. The accuracy of the measurement suggests we should take it seriously, even if we’re talking a difference of 0.1% only. We should. It can be explained, of course: it’s something quantum-mechanical. However, we’ll talk about this later. As for now, just try to understand the basics here. It’s complicated enough already, and so we’ll stay away from the nitty-gritty as long as we can.

Let’s now get back to the magnetic energy of our atoms. From our discussion on the torque on a magnetic dipole in an external magnetic field, we know that our magnetic atoms will have some extra magnetic energy when placed in an external field. So now we have an external magnetic field B, and we derived the formula for the energy is

Umag = −μ·B·cosθ = −μ·B

I won’t explain the whole thing once again, but it might help to visualize the situation, which we do below. The loop here is not circular but square, and it’s a current-carrying wire instead of an electron in orbit, but I hope you get the point.

Geometry 2

We need to chose some coordinate system to calculate stuff and so we’ll just choose our z-axis along the direction of the external magnetic field B so as to simplify those calculations. If we do that, we can just take the z-component of μ and then combine the interim result with our general μ = – g·(q/2m)·J formula, so we write:

Umag = −μz·B = g·(q/2m)·Jz·B

Now, we know that the maximum value of Jz is equal to j·ħ, and so the maximum value of Umag will be equal g(q/2m)jħB. Let’s now simplify this expression by choosing some natural unit, and that’s the unit we introduced already above: the Bohr magneton. It’s equal to (qeħ)/(2me) and its value is μB ≈ 9.274×10−24 J/T. So we get the result we wanted, and that is:


Let me make a few remarks here. First on that magneton: you should note there’s also something which is known as the nuclear magneton which, you guessed it, is calculated using the proton charge and the proton mass: μN = (qpħ)/(2mp) ≈ 5.05×10−27 J/T. My second remark is a question: what does that formula mean, really? Well… Let me quote Feynman on that. The formula basically says the following:

“The energy of an atomic system is changed when it is put in a magnetic field by an amount that is proportional to the field, and proportional to Jz. We say that the energy of an atomic system is ‘split’ into 2+ 1 ‘levels’ by a magnetic field. For instance, an atom whose energy is U0 outside a magnetic field and whose j is 3/2, will have four possible energies when placed in a field. We can show these energies by an energy-level diagram like that drawn below. Any particular atom can have only one of the four possible energies in any given field B. That is what quantum mechanics says about the behavior of an atomic system in a magnetic field.”

diagram 1

Of course, the simplest ‘atomic’ system is a single electron, which has spin 1/2 only (like most fermions really: the example in the diagram above, with spin 3/2, would be that Li-7 system or something similar). If the spin is 1/2, then there are only two energy levels, with Jz = ±ħ/2 and, as we mentioned already, the g-factor for an electron is −2 (again, the use of minus signs (or not) is quite confusing: I am sorry for that), and so our formula above becomes very simple:

Umag = ± μB·B

The graph above becomes the graph below, and we can now speak more loosely and say that the electron either has its spin ‘up’ (so that’s along the field), or ‘down’ (so that’s opposite the field).

diagram 2

By now, you’re probably tired of the math and you’ll wonder: how can we prove all of this permitted value business? Well… That question leads me to the last topic of my post: the Stern-Gerlach experiment.

The Stern-Gerlach experiment 

Here again, I can just copy straight of out of Feynman, and so I hope you’ll forgive me if I just do that, as I don’t think there’s any advantage to me trying to summarize what he writes on it:

“The fact that the angular momentum is quantized is such a surprising thing that we will talk a little bit about it historically. It was a shock from the moment it was discovered (although it was expected theoretically). It was first observed in an experiment done in 1922 by Stern and Gerlach. If you wish, you can consider the experiment of Stern-Gerlach as a direct justification for a belief in the quantization of angular momentum. Stern and Gerlach devised an experiment for measuring the magnetic moment of individual silver atoms. They produced a beam of silver atoms by evaporating silver in a hot oven and letting some of them come out through a series of small holes. This beam was directed between the pole tips of a special magnet, as shown in the illustration below. Their idea was the following. If the silver atom has a magnetic moment μ, then in a magnetic field B it has an energy −μzB, where z is the direction of the magnetic field. In the classical theory, μz would be equal to the magnetic moment times the cosine of the angle between the moment and the magnetic field, so the extra energy in the field would be

ΔU = −μ·B·cosθ

Of course, as the atoms come out of the oven, their magnetic moments would point in every possible direction, so there would be all values of θ. Now if the magnetic field varies very rapidly with z—if there is a strong field gradient—then the magnetic energy will also vary with position, and there will be a force on the magnetic moments whose direction will depend on whether cosine θ is positive or negative. The atoms will be pulled up or down by a force proportional to the derivative of the magnetic energy; from the principle of virtual work,


Stern and Gerlach made their magnet with a very sharp edge on one of the pole tips in order to produce a very rapid variation of the magnetic field. The beam of silver atoms was directed right along this sharp edge, so that the atoms would feel a vertical force in the inhomogeneous field. A silver atom with its magnetic moment directed horizontally would have no force on it and would go straight past the magnet. An atom whose magnetic moment was exactly vertical would have a force pulling it up toward the sharp edge of the magnet. An atom whose magnetic moment was pointed downward would feel a downward push. Thus, as they left the magnet, the atoms would be spread out according to their vertical components of magnetic moment. In the classical theory all angles are possible, so that when the silver atoms are collected by deposition on a glass plate, one should expect a smear of silver along a vertical line. The height of the line would be proportional to the magnitude of the magnetic moment. The abject failure of classical ideas was completely revealed when Stern and Gerlach saw what actually happened. They found on the glass plate two distinct spots. The silver atoms had formed two beams.

That a beam of atoms whose spins would apparently be randomly oriented gets split up into two separate beams is most miraculous. How does the magnetic moment know that it is only allowed to take on certain components in the direction of the magnetic field? Well, that was really the beginning of the discovery of the quantization of angular momentum, and instead of trying to give you a theoretical explanation, we will just say that you are stuck with the result of this experiment just as the physicists of that day had to accept the result when the experiment was done. It is an experimental fact that the energy of an atom in a magnetic field takes on a series of individual values. For each of these values the energy is proportional to the field strength. So in a region where the field varies, the principle of virtual work tells us that the possible magnetic force on the atoms will have a set of separate values; the force is different for each state, so the beam of atoms is split into a small number of separate beams. From a measurement of the deflection of the beams, one can find the strength of the magnetic moment.”

I should note one point which Feynman hardly addresses in the analysis above: why do we need a non-homogeneous field? Well… Think of it. The individual silver atoms are not like electrons in some electric field. They are tiny little magnets, and magnets do not behave like electrons. Remember we said there’s no such thing as a magnetic charge? So that applies here. If the silver atoms are tiny magnets, with a magnetic dipole moment, then the only thing they will do is turn, so as to minimize their energy U = −μBcosθ.

That energy is minimized when μ and B are at right angles of each other, so as to make the cosθ factor zero, which happens when θ = π/2. Hence, in a homogeneous magnetic field, we will have a torque on the loop of current – think of our electron(s) in orbit here – but no net force pulling it in this or that direction as a whole. So the atoms would just rotate but not move in our classical analysis here.

To make the atoms themselves move towards or away one of the poles (with or without a sharp tip), the magnetic field must be non-homogeneous, so as to ensure that the force that’s pulling on one side of the loop of current is slightly different from the force that’s pulling (in the opposite direction) on the other side of the loop of current. So that’s why the field has to be non-homogeneous (or inhomogeneous as Feynman calls it), and so that’s why one pole needs to have a sharply pointed tip.

As for the force formula, it’s crucial to remember that energy (or work) is force times distance. To be precise, it’s the ∫F∙ds integral. This integral will have a minus sign in front when we’re doing work against the force, so that’s when we’re increasing the potential energy of an object. Conversely, we’ll just take the positive value when we’re converting potential energy into kinetic energy. So that explains the F = −∂U/∂z formula above. In fact, in the analysis above, Feynman assumes the magnetic moment doesn’t turn at all. That’s pretty obvious from the Fz = −∂U/∂z = −μ∙cosθ∙∂B/∂z formula, in which μ is clearly being treated as a constant. So the Fz in this formula is a net force in the z-direction, and it’s crucially dependent on the variation of the magnetic field in the z-direction. If the field would not be varying, ∂B/∂z would be zero and, therefore, we would not have any net force in the z-direction. As mentioned above, we would only have a torque.

Well… This sort of covers all of what we wanted to cover today. 🙂 I hope you enjoyed it.


In the previous posts, I showed how the ‘real-world’ properties of photons and electrons emerge out of very simple mathematical notions and shapes. The basic notions are time and space. The shape is the wavefunction.

Let’s recall the story once again. Space is an infinite number of three-dimensional points (x, y, z), and time is a stopwatch hand going round and round—a cyclical thing. All points in space are connected by an infinite number of paths – straight or crooked, whatever  – of which we measure the length. And then we have ‘photons’ that move from A to B, but so we don’t know what is actually moving in space here. We just associate each and every possible path (in spacetime) between A and B with an amplitude: an ‘arrow‘ whose length and direction depends on (1) the length of the path l (i.e. the ‘distance’ in space measured along the path, be it straight or crooked), and (2) the difference in time between the departure (at point A) and the arrival (at point B) of our photon (i.e. the ‘distance in time’ as measured by that stopwatch hand).

Now, in quantum theory, anything is possible and, hence, not only do we allow for crooked paths, but we also allow for the difference in time to differ from l/c. Hence, our photon may actually travel slower or faster than the speed of light c! There is one lucky break, however, that makes all come out alright: the arrows associated with the odd paths and strange timings cancel each other out. Hence, what remains, are the nearby paths in spacetime only—the ‘light-like’ intervals only: a small core of space which our photon effectively uses as it travels through empty space. And when it encounters an obstacle, like a sheet of glass, it may or may not interact with the other elementary particle–the electron. And then we multiply and add the arrows – or amplitudes as we call them – to arrive at a final arrow, whose square is what physicists want to find, i.e. the likelihood of the event that we are analyzing (such a photon going from point A to B, in empty space, through two slits, or through as sheet of glass, for example) effectively happening.

The combining of arrows leads to diffraction, refraction or – to use the more general description of what’s going on – interference patterns:

  1. Adding two identical arrows that are ‘lined up’ yields a final arrow with twice the length of either arrow alone and, hence, a square (i.e. a probability) that is four times as large. This is referred to as ‘positive’ or ‘constructive’ interference.
  2. Two arrows of the same length but with opposite direction cancel each other out and, hence, yield zero: that’s ‘negative’ or ‘destructive’ interference.

Both photons and electrons are represented by wavefunctions, whose argument is the position in space (x, y, z) and time (t), and whose value is an amplitude or ‘arrow’ indeed, with a specific direction and length. But here we get a bifurcation. When photons interact with other, their wavefunctions interact just like amplitudes: we simply add them. However, when electrons interact with each other, we have to apply a different rule: we’ll take a difference. Indeed, anything is possible in quantum mechanics and so we combine arrows (or amplitudes, or wavefunctions) in two different ways: we can either add them or, as shown below, subtract one from the other.

vector addition

There are actually four distinct logical possibilities, because we may also change the order of A and B in the operation, but when calculating probabilities, all we need is the square of the final arrow, so we’re interested in its final length only, not in its direction (unless we want to use that arrow in yet another calculation). And so… Well… The fundamental duality in Nature between light and matter is based on this dichotomy only: identical (elementary) particles behave in one of two ways: their wavefunctions interfere either constructively or destructively, and that’s what distinguishes bosons (i.e. force-carrying particles, such as photons) from fermions (i.e. matter-particles, such as electrons). The mathematical description is complete and respects Occam’s Razor. There is no redundancy. One cannot further simplify: every logical possibility in the mathematical description reflects a physical possibility in the real world.

Having said that, there is more to an electron than just Fermi-Dirac statistics, of course. What about its charge, and this weird number, its spin?,

Well… That’s what’s this post is about. As Feynman puts it: “So far we have been considering only spin-zero electrons and photons, fake electrons and fake photons.”

I wouldn’t call them ‘fake’, because they do behave like real photons and electrons already but… Yes. We can make them more ‘real’ by including charge and spin in the discussion. Let’s go for it.

Charge and spin

From what I wrote above, it’s clear that the dichotomy between bosons and fermions (i.e. between ‘matter-particles’ and ‘force-carriers’ or, to put it simply, between light and matter) is not based on the (electric) charge. It’s true we cannot pile atoms or molecules on top of each other because of the repulsive forces between the electron clouds—but it’s not impossible, as nuclear fusion proves: nuclear fusion is possible because the electrostatic repulsive force can be overcome, and then the nuclear force is much stronger (and, remember, no quarks are being destroyed or created: all nuclear energy that’s being released or used is nuclear binding energy).

It’s also true that the force-carriers we know best, notably photons and gluons, do not carry any (electric) charge, as shown in the table below. So that’s another reason why we might, mistakenly, think that charge somehow defines matter-particles. However, we can see that matter-particles, first carry very different charges (positive or negative, and with very different values: 1/3, 2/3 or 1), and even be neutral, like the neutrinos. So, if there’s a relation, it’s very complex. In addition, one of the two force-carrier for the weak force, the W boson, can have positive or negative charge too, so that doesn’t make sense, does it? [I admit the weak force is a bit of a ‘special’ case, and so I should leave it out of the analysis.] The point is: the electric charge is what it is, but it’s not what defines matter. It’s just one of the possible charges that matter-particles can carry. [The other charge, as you know, is the color charge but, to confuse the picture once again, that’s a charge that can also be carried by gluons, i.e. the carriers of the strong force.]

Standard_Model_of_Elementary_ParticlesSo what is it, then? Well… From the table above, you can see that the property of ‘spin’ (i.e. the third number in the top left-hand corner) matches the above-mentioned dichotomy in behavior, i.e. the two different types of interference (bosons versus fermions or, to use a heavier term, Bose-Einstein statistics versus Fermi-Dirac statistics): all matter-particles are so-called spin-1/2 particles, while all force-carriers (gauge bosons) all have spin one. [Never mind the Higgs particle: that’s ‘just’ a mechanism to give (most) elementary particles some mass.]

So why is that? Why are matter-particles spin-1/2 particles and force-carries spin-1 particles? To answer that question, we need to answer the question: what’s this spin number? And to answer that question, we first need to answer the question: what’s spin?

Spin in the classical world

In the classical world, it’s, quite simply, the momentum associated with a spinning or rotating object, which is referred to as the angular momentum. We’ve analyzed the math involved in another post, and so I won’t dwell on that here, but you should note that, in classical mechanics, we distinguish two types of angular momentum:

  1. Orbital angular momentum: that’s the angular momentum an object gets from circling in an orbit, like the Earth around the Sun.
  2. Spin angular momentum: that’s the angular momentum an object gets from spinning around its own axis., just like the Earth, in addition to rotating around the Sun, is rotating around its own axis (which is what causes day and night, as you know).

The math involved in both is pretty similar, but it’s still useful to distinguish the two, if only because we’ll distinguish them in quantum mechanics too! Indeed, when I analyzed the math in the above-mentioned post, I showed how we represent angular momentum by a vector that’s perpendicular to the direction of rotation, with its direction given by the ubiquitous right-hand rule—as in the illustration below, which shows both the angular momentum (L) as well as the torque (τ) that’s produced by a rotating mass. The formulas are given too: the angular momentum L is the vector cross product of the position vector r and the linear momentum p, while the magnitude of the torque τ is given by the product of the length of the lever arm and the applied force. An alternative approach is to define the angular velocity ω and the moment of inertia I, and we get the same result: L = Iω. 


Of course, the illustration above shows orbital angular momentum only and, as you know, we no longer have a ‘planetary model’ (aka the Rutherford model) of an atom. So should we be looking at spin angular momentum only?

Well… Yes and no. More yes than no, actually. But it’s ambiguous. In addition, the analogy between the concept of spin in quantum mechanics, and the concept of spin in classical mechanics, is somewhat less than straightforward. Well… It’s not straightforward at all actually. But let’s get on with it and use more precise language. Let’s first explore it for light, not because it’s easier (it isn’t) but… Well… Just because. 🙂

The spin of a photon

I talked about the polarization of light in previous posts (see, for example, my post on vector analysis): when we analyze light as a traveling electromagnetic wave (so we’re still in the classical analysis here, not talking about photons as ‘light particles’), we know that the electric field vector oscillates up and down and is, in fact, likely to rotate in the xy-plane (with z being the direction of propagation). The illustration below shows the idealized (aka limiting) case of perfectly circular polarization: if there’s polarization, it is more likely to be elliptical. The other limiting case is plane polarization: in that case, the electric field vector just goes up and down in one direction only. [In case you wonder whether ‘real’ light is polarized, it often is: there’s an easy article on that on the Physics Classroom site.]

spin angular momentumThe illustration above uses Dirac’s bra-ket notation |L〉 and |R〉 to distinguish the two possible ‘states’, which are left- or right-handed polarization respectively. In case you forgot about bra-ket notations, let me quickly remind you: an amplitude is usually denoted by 〈x|s〉, in which 〈x| is the so-called ‘bra’, i.e. the final condition, and |s〉 is the so-called ‘ket’, i.e. the starting condition, so 〈x|s〉 could mean: a photon leaves at s (from source) and arrives at x. It doesn’t matter much here. We could have used any notation, as we’re just describing some state, which is either |L〉 (left-handed polarization) or |R〉 (right-handed polarization). The more intriguing extras in the illustration above, besides the formulas, are the values: ± ħ = ±h/2π. So that’s plus or minus the (reduced) Planck constant which, as you know, is a very tiny constant. I’ll come back to that. So what exactly is being represented here?

At first, you’ll agree it looks very much like the momentum of light (p) which, in a previous post, we calculated from the (average) energy (E) as p = E/c. Now, we know that E is related to the (angular) frequency of the light through the Planck-Einstein relation E = hν = ħω. Now, ω is the speed of light (c) times the wave number (k), so we can write: p = ħω = ħck/c = ħk. The wave number is the ‘spatial frequency’, expressed either in cycles per unit distance (1/λ) or, more usually, in radians per unit distance (k = 2π/λ), so we can also write p = ħk = h/λ. Whatever way we write it, we find that this momentum (p) depends on the energy and/or, what amounts to saying the same, the frequency and/or the wavelength of the light.

So… Well… The momentum of light is not just h or ħ, i.e. what’s written in that illustration above. So it must be something different. In addition, I should remind you this momentum was calculated from the magnetic field vector, as shown below (for more details, see my post on vector calculus once again), so it had nothing to do with polarization really.

radiation pressure

Finally, last but not least, the dimensions of ħ and p = h/λ are also different (when one is confused, it’s always good to do a dimensional analysis in physics):

  1. The dimension of Planck’s constant (both h as well as ħ = h/2π) is energy multiplied by time (J·s or eV·s) or, equivalently, momentum multiplied by distance. It’s referred to as the dimension of action in physics, and h is effectively, the so-called quantum of action.
  2. The dimension of (linear) momentum is… Well… Let me think… Mass times velocity (mv)… But what’s the mass in this case? Light doesn’t have any mass. However, we can use the mass-energy equivalence: 1 eV = 1.7826×10−36 kg. [10−36? Well… Yes. An electronvolt is a very tiny measure of energy.] So we can express p in eV·m/s units.

Hmm… We can check: momentum times distance gives us the dimension of Planck’s constant again – (eV·m/s)·m = eV·s. OK. That’s good… […] But… Well… All of this nonsense doesn’t make us much smarter, does it? 🙂 Well… It may or may not be more useful to note that the dimension of action is, effectively, the same as the dimension of angular momentum. Huh? Why? Well… From our classical L = r×p formula, we find L should be expressed in m·(eV·m/s) = eV·m2/s  units, so that’s… What? Well… Here we need to use a little trick and re-express energy in mass units. We can then write L in kg·m2/s units and, because 1 Newton (N) is 1 kg⋅m/s2, the kg·m2/s unit is equivalent to the N·m·s = J·s unit. Done!

Having said that, all of this still doesn’t answer the question: are the linear momentum of light, i.e. our p, and those two angular momentum ‘states’, |L〉 and |R〉, related? Can we relate |L〉 and |R〉 to that L = r×p formula?

The answer is simple: no. The |L〉 and |R〉 states represent spin angular momentum indeed, while the angular momentum we would derive from the linear momentum of light using that L = r×p is orbital angular momentum. Let’s introduce the proper symbols: orbital angular momentum is denoted by L, while spin angular momentum is denoted by S. And then the total angular momentum is, quite simply, J = L + S.

L and S can both be calculated using either a vector cross product r × p (but using different values for r and p, of course) or, alternatively, using the moment of inertia tensor I and the angular velocity ω. The illustrations below (which I took from Wikipedia) show how, and also shows how L and S are added to yield J = L + S.



So what? Well… Nothing much. The illustration above show that the analysis – which is entirely classical, so far – is pretty complicated. [You should note, for example, that in the S = Iω and L Iω formulas, we don’t use the simple (scalar) moment of inertia but the moment of inertia tensor (so that’s a matrix denoted by I, instead of the scalar I), because S (or L) and ω are not necessarily pointing in the same direction.

By now, you’re probably very confused and wondering what’s wiggling really. The answer for the orbital angular momentum is: it’s the linear momentum vector p. Now…

Hey! Stop! Why would that vector wiggle?

You’re right. Perhaps it doesn’t. The linear momentum p is supposed to be directed in the direction of travel of the wave, isn’t it? It is. In vector notation, we have p = ħk, and that k vector (i.e. the wavevector) points in the direction of travel of the wave indeed and so… Well… No. It’s not that simple. The wave vector is perpendicular to the surfaces of constant phase, i.e. the so-called wave fronts, as show in the illustration below (see the direction of ek, which is a unit vector in the direction of k).

wave vector

So, yes, if we’re analyzing light moving in a straight one-dimensional line only, or we’re talking a plane wave, as illustrated below, then the orbital angular momentum vanishes.

plane wave

But the orbital angular momentum L does not vanish when we’re looking at a real light beam, like the ones below. Real waves? Well… OK… The ones below are idealized wave shapes as well, but let’s say they are somewhat more real than a plane wave. 🙂


So what do we have here? We have wavefronts that are shaped as helices, except for the one in the middle (marked by m = 0) which is, once again, an example of plane wave—so for that one (m = 0), we have zero orbital angular momentum indeed. But look, very carefully, at the m = ± 1 and m = ± 2 situations. For m = ± 1, we have one helical surface with a step length equal to the wavelength λ. For m = ± 2, we have two intertwined helical surfaces with the step length of each helix surface equal to 2λ. [Don’t worry too much about the second and third column: they show a beam cross-section (so that’s not a wave front but a so-called phase front) and the (averaged) light intensity, again of a beam cross-section.] Now, we can further generalize and analyze waves composed of m helices with the step length of each helix surface equal to |m|λ. The Wikipedia article on OAM (orbital angular momentum of light), from which I got this illustration, gives the following formula to calculate the OAM:

Formula OAMThe same article also notes that the quantum-mechanical equivalent of this formula, i.e. the orbital angular momentum of the photons one would associate with the not-cylindrically-symmetric waves above (i.e. all those for which m ≠ 0), is equal to:

Lz = mħ

So what? Well… I guess we should just accept that as a very interesting result. For example, I duly note that Lis along the direction of propagation of the wave (as indicated by the z subscript), and I also note the very interesting fact that, apparently, Lz  can be either positive or negative. Now, I am not quite sure how such result is consistent with the idea of radiation pressure, but I am sure there must be some logical explanation to that. The other point you should note is that, once again, any reference to the energy (or to the frequency or wavelength) of our photon has disappeared. Hmm… I’ll come back to this, as I promised above already.

The thing is that this rather long digression on orbital angular momentum doesn’t help us much in trying to understand what that spin angular momentum (SAM) is all about. So, let me just copy the final conclusion of the Wikipedia article on the orbital angular momentum of light: the OAM is the component of angular momentum of light that is dependent on the field spatial distribution, not on the polarization of light.

So, again, what’s the spin angular momentum? Well… The only guidance we have is that same little drawing again and, perhaps, another illustration that’s supposed to compare SAM with OAM (underneath).

spin angular momentum

800px-Sam-oam-interactionNow, the Wikipedia article on SAM (spin angular momentum), from which I took the illustrations above, gives a similar-looking formula for it:

Formula SAM

When I say ‘similar-looking’, I don’t mean it’s the same. [Of course not! Spin and orbital angular momentum are two different things!]. So what’s different in the two formulas? Well… We don’t have any del operator () in the SAM formula, and we also don’t have any position vector (r) in the integral kernel (or integrand, if you prefer that term). However, we do find both the electric field vector (E) as well as the (magnetic) vector potential (A) in the equation again. Hence, the SAM (also) takes both the electric as well as the magnetic field into account, just like the OAM. [According to the author of the article, the expression also shows that the SAM is nonzero when the light polarization is elliptical or circular, and that it vanishes if the light polarization is linear, but I think that’s much more obvious from the illustration than from the formula… However, I realize I really need to move on here, because this post is, once again, becoming way too long. So…]

OK. What’s the equivalent of that formula in quantum mechanics?

Well… In quantum mechanics, the SAM becomes a ‘quantum observable’, described by a corresponding operator which has only two eigenvalues:

Sz = ± ħ

So that corresponds to the two possible values for Jz, as mentioned in the illustration, and we can understand, intuitively, that these two values correspond to two ‘idealized’ photons which describe a left- and right-handed circularly polarized wave respectively.

So… Well… There we are. That’s basically all there is to say about it. So… OK. So far, so good.

But… Yes? Why do we call a photon a spin-one particle?

That has to do with convention. A so-called spin-zero particle has no degrees of freedom in regard to polarization. The implied ‘geometry’ is that a spin-zero particle is completely symmetric: no matter in what direction you turn it, it will always look the same. In short, it really behaves like a (zero-dimensional) mathematical point. As you can see from the overview of all elementary particles, it is only the Higgs boson which has spin zero. That’s why the Higgs field is referred to as a scalar field: it has no direction. In contrast, spin-one particles, like photons, are also ‘point particles’, but they do come with one or the other kind of polarization, as evident from all that I wrote above. To be specific, they are polarized in the xy-plane, and can have one of two directions. So, when rotating them, you need a full rotation of 360° if you want them to look the same again.

Now that I am here, let me exhaust the topic (to a limited extent only, of course, as I don’t want to write a book here) and mention that, in theory, we could also imagine spin-2 particles, which would look the same after half a rotation (180°). However, as you can see from the overview, none of the elementary particles has spin-2. A spin-2 particle could be like some straight stick, as that looks the same even after it is rotated 180 degrees. I am mentioning the theoretical possibility because the graviton, if it would effectively exist, is expected to be a massless spin-2 boson. [Now why do I mention this? Not sure. I guess I am just noting this to remind you of the fact that the Higgs boson is definitely not the (theoretical) graviton, and/or that we have no quantum theory for gravity.]

Oh… That’s great, you’ll say. But what about all those spin-1/2 particles in the table? You said that all matter-particles are spin 1/2 particles, and that it’s this particular property that actually makes them matter-particles. So what’s the geometry here? What kind of ‘symmetries’ do they respect?

Well… As strange as it sounds, a spin-1/2 particle needs two full rotations (2×360°=720°) until it is again in the same state. Now, in regard to that particularity, you’ll often read something like: “There is nothing in our macroscopic world which has a symmetry like that.” Or, worse, “Common sense tells us that something like that cannot exist, that it simply is impossible.” [I won’t quote the site from which I took this quotes, because it is, in fact, the site of a very respectable  research center!] Bollocks! The Wikipedia article on spin has this wonderful animation: look at how the spirals flip between clockwise and counterclockwise orientations, and note that it’s only after spinning a full 720 degrees that this ‘point’ returns to its original configuration after spinning a full 720 degrees.


So, yes, we can actually imagine spin-1/2 particles, and with not all that much imagination, I’d say. But… OK… This is all great fun, but we have to move on. So what’s the ‘spin’ of these spin-1/2 particles and, more in particular, what’s the concept of ‘spin’ of an electron?

The spin of an electron

When starting to read about it, I thought that the angular momentum of an electron would be easier to analyze than that of a photon. Indeed, while a photon has no mass and no electric charge, that analysis with those E and B vectors is damn complicated, even when sticking to a strictly classical analysis. For an electron, the classical picture seems to be much more straightforward—but only at first indeed. It quickly becomes equally weird, if not more.

We can look at an electron in orbit as a rotating electrically charged ‘cloud’ indeed. Now, from Maxwell’s equations (or from your high school classes even), you know that a rotating electric charged body creates a magnetic dipole. So an electron should behave just like a tiny bar magnet. Of course, we have to make certain assumptions about the distribution of the charge in space but, in general, we can write that the magnetic dipole moment μ is equal to:

formule magnetic dipole moment

In case you want to know, in detail, where this formula comes from, let me refer you to Feynman once again, but trust me – for once 🙂 – it’s quite straightforward indeed: the L in this formula is the angular momentum, which may be the spin angular momentum, the orbital angular momentum, or the total angular momentum. The e and m are, of course, the charge and mass of the electron respectively.

So that’s a good and nice-looking formula, and it’s actually even correct except for the spin angular momentum as measured in experiments. [You’ll wonder how we can measure orbital and spin angular momentum respectively, but I’ll talk about an 1921 experiment in a few minutes, and so that will give you some clue as to that mystery. :-)] To be precise, it turns out that one has to multiply the above formula for μ with a factor which is referred to as the g-factor. [And, no, it’s got nothing to do with the gravitational constant or… Well… Nothing. :-)] So, for the spin angular momentum, the formula should be:

formula spin angular momentum

Experimental physicists are constantly checking that value and they know measure it to be something like g = is 2.00231930419922 ± 1.5×10−12. So what’s the explanation for that g? Where does it come from? There is, in fact, a classical explanation for it, which I’ll copy hereunder (yes, from Wikipedia). This classical explanation is based on assuming that the distribution of the electric charge of the electron and its mass does not coincide:

classical theory

Why do I mention this classical explanation? Well… Because, in most popular books on quantum mechanics (including Feynman’s delightful QED), you’ll read that (a) the value for g can be derived from a quantum-theoretical equation known as Dirac’s equation (or ‘Dirac theory’, as it’s referred to above) and, more importantly, that (b) physicists call the “accurate prediction of the electron g-factor” from quantum theory (i.e. ‘Dirac’s theory’ in this case) “one of the greatest triumphs” of the theory.

So what about it? Well… Whatever the merits of both explanations (classical or quantum-mechanical), they are surely not the reason why physicists abandoned the classical theory. So what was the reason then? What a stupid question! You know that already! The Rutherford model was, quite simply, not consistent: according to classical theory, electrons should just radiate their energy away and spiral into the nucleus. More in particular, there was yet another experiment that wasn’t consistent with classical theory, and it’s one that’s very relevant for the discussion at hand: it’s the so-called Stern-Gerlach experiment.

It was just as ‘revolutionary’ as the Michelson-Morley experiment (which couldn’t measure the speed of light), or the discovery of the positron in 1932. The Stern-Gerlach experiment was done in 1921, so that’s many years before quantum theory replaced classical theory and, hence, it’s not one of those experiments confirming quantum theory. No. Quite the contrary. It was, in fact, one of the experiments that triggered the so-called quantum revolution. Let me insert the experimental set-up and summarize it (below).


  • The German scientists Otto Stern and Walther Gerlach produced a beam of electrically-neutral silver atoms and let it pass through a (non-uniform) magnetic field. Why silver atoms? Well… Silver atoms are easy to handle (in a lab, that is) and easy to detect with a photoplate.
  • These atoms came out of an oven (literally), in which the silver was being evaporated (yes, one can evaporate silver), so they had no special orientation in space and, so Stern and Gerlach thought, the magnetic moment (or spin) of the outer electrons in these atoms would point into all possible directions in space.
  • As expected, the magnetic field did deflect the silver atoms, just like it would deflect little dipole magnets if you would shoot them through the field. However, the pattern of deflection was not the one which they expected. Instead of hitting the plate all over the place, within some contour, of course, only the contour itself was hit by the atoms. There was nothing in the middle!
  • And… Well… It’s a long story, but I’ll make it short. There was only one possible explanation for that behavior, and that’s that the magnetic moments – and, therefore the spins – had only two orientations in space, and two possible values only which – Surprise, surprise! – are ±ħ/2 (so that’s half the value of the spin angular momentum of photons, which explains the spin-1/2 terminology).

The spin angular momentum of an electron is more popularly known as ‘up’ or ‘down’.

So… What about it? Well… It explains why a atomic orbital can have two electrons, rather than one only and, as such, the behavior of the electron here is the basis of the so-called periodic table, which explains all properties of the chemical elements. So… Yes. Quantum theory is relevant, I’d say. 🙂


This has been a terribly long post, and you may no longer remember what I promised to do. What I promised to do, is to write some more about the difference between a photon and an electron and, more in particular, I said I’d write more about their charge, and that “weird number”: their spin. I think I lived up to that promise. The summary is simple:

  1. Photons have no (electric) charge, but they do have spin. Their spin is linked to their polarization in the xy-plane (if z is the direction of propagation) and, because of the strangeness of quantum mechanics (i.e. the quantization of (quantum) observables), the value for this spin is either +ħ orħ, which explains why they are referred to as spin-one particles (because either value is one unit of the Planck constant).
  2. Electrons have both electric charge as well as spin. Their spin is different and is, in fact, related to their electric charge. It can be interpreted as the magnetic dipole moment, which results from the fact we have a spinning charge here. However, again, because of the strangeness of quantum mechanics, their dipole moment is quantized and can take only one of two values: ±ħ/2, which is why they are referred to as spin-1/2 particles.

So now you know everything you need to know about photons and electrons, and then I mean real photons and electrons, including their properties of charge and spin. So they’re no longer ‘fake’ spin-zero photons and electrons now. Isn’t that great? You’ve just discovered the real world! 🙂

So… I am done—for the moment, that is… 🙂 If anything, I hope this post shows that even those ‘weird’ quantum numbers are rooted in ‘physical reality’ (or in physical ‘geometry’ at least), and that quantum theory may be ‘crazy’ indeed, but that it ‘explains’ experimental results. Again, as Feynman says:

“We understand how Nature works, but not why Nature works that way. Nobody understands that. I can’t explain why Nature behave in this particular way. You may not like quantum theory and, hence, you may not accept it. But physicists have learned to realize that whether they like a theory or not is not the essential question. Rather, it is whether or not the theory gives predictions that agree with experiment. The theory of quantum electrodynamics describes Nature as absurd from the point of view of common sense. But it agrees fully with experiment. So I hope you can accept Nature as She is—absurd.”

Frankly speaking, I am not quite prepared to accept Nature as absurd: I hope that some more familiarization with the underlying mathematical forms and shapes will make it look somewhat less absurd. More, I hope that such familiarization will, in the end, make everything look just as ‘logical’, or ‘natural’ as the two ways in which amplitudes can ‘interfere’.

Post scriptum: I said I would come back to the fact that, in the analysis of orbital and spin angular momentum of a photon (OAM and SAM), the frequency or energy variable sort of ‘disappears’. So why’s that? Let’s look at those expressions for |L〉 and |R〉 once again:

Formula L spin

Formula R spin

What’s written here really? If |L〉 and |R〉 are supposed to be equal to either +ħ orħ, then that product of ei(kz–ωt) with the 3×1 matrix (which is a ‘column vector’ in this case) does not seem to make much sense, does it? Indeed, you’ll remember that ei(kz–ωt) just a regular wave function. To be precise, its phase is φ = kz–ωt (with z the direction of propagation of the wave), and its real and imaginary part can be written as eiφ = cos(φ) + isin(φ) = a + bi. Multiplying it with that 3×1 column vector (1, i, 0) or (1, –i, 0) just yields another 3×1 column vector. To be specific, we get:

  1. The 3×1 ‘vector’ (a + bi, –b+ai, 0) for |L〉, and
  2. The 3×1 ‘vector’ (a + bi, b–ai, 0) for |R〉.

So we have two new ‘vectors’ whose components are complex numbers. Furthermore, we can note that their ‘x’-component is the same, their ‘y’-component is each other’s opposite –b+ai = –(b–ai), and their ‘z’-component is 0.

So… Well… In regard to their ‘y’-component, I should note that’s just the result of the multiplication with i and/or –i: multiplying a complex number with i amounts to a 90° degree counterclockwise rotation, while multiplication with –i amounts to the same but clockwise. Hence, we must arrive at two complex numbers that are each other’s opposite. [Indeed, in complex analysis, the value –1 = eiπ = eiπ is a 180° rotation, both clockwise (φ = –π) or counterclockwise (φ = +π), of course!.]

Hmm… Still… What does it all mean really? The truth is that it takes some more advanced math to interpret the result. To be precise, pure quantum states, such |L〉 and |R〉 here, are represented by so-called ‘state vectors’ in a Hilbert space over complex numbers. So that’s what we’ve got here. So… Well… I can’t say much more about this right now: we’ll just need to study some more before we’ll ‘understand’ those expressions for |L〉 and |R〉. So let’s not worry about it right now. We’ll get there.

Just for the record, I should note that, initially, I thought 1/√2 factor in front gave some clue as to what’s going on here: 1/√2 ≈ 0.707 is a factor that’s used to calculate the root mean square (RMS) value for a sine wave. It’s illustrated below. The RMS value is a ‘special average’ one can use to calculate the energy or power (i.e. energy per time unit) of a wave. [Using the term ‘average’ is misleading, because the average of a sine wave is 1/2 over half a cycle, and 0 over a fully cycle, as you can easily see from the shape of the function. But I guess you know what I mean.]

V-rmsIndeed, you’ll remember that the energy (E) of a wave is proportional to the square of its amplitude (A): E ∼ A2. For example, when we have a constant current I, the power P will be proportional to its square: P ∼ I2. With a varying current (I) and voltage (V), the formula is more complicated but we can simply it using the rms values: Pavg = VRMS·IRMS.

So… Looking at that formula, should we think of h and/or ħ as some kind of ‘average’ energy, like the energy of a photon per cycle or per radian? That’s an interesting idea so let’s explore it. If the energy of a photon is equal to E = ν = ω/2π = ħω, then we can also write:

h = E/ν and/or ħ = E/ω

So, yes: is the energy of a photon per cycle obviously and, because the phase covers 2π radians during each cycle, and ħ must be the energy of the photon per radian! That’s a great result, isn’t it? It also gives a wonderfully simple interpretation to Planck’s quantum of action!

Well… No. We made at least two mistakes here. The first mistake is that if we think of a photon as wave train being radiated by an atom – which, as we calculated in another post, lasts about 3.2×10–8 seconds – the graph for its energy is going to resemble the graph of its amplitude, so it’s going to die out and each oscillation will carry less and less energy. Indeed, the decay time given here (τ = 3.2×10–8 seconds) was the time it takes for the radiation (we assumed sodium light with a wavelength of 600 nanometer) to die out by a factor 1/e. To be precise, the shape of the energy curve is E = E0e−t/τ, and so it’s an envelope resembling the A(t) curve below.

decay time

Indeed, remember, the energy of a wave is determined not only by its frequency (or wavelength) but also by its amplitude, and so we cannot assume the amplitude of a ‘photon wave’ is going to be the same everywhere. Just for the record: note that the energy of a wave is proportional to the frequency (doubling the frequency doubles the energy) but, when linking it to the amplitude, we should remember that the energy is proportional to the square of the amplitude, so we write E ∼ A2.

The second mistake is that both ν and ω are the light frequency (expressed in cycles or radians respectively) of the light per second, i.e per time unit. So that’s not the number of cycles or radians that we should associate with the wavetrain! We should use the number of cycles (or radians) packed into one photon. We can calculate that easily from the value for the decay time τ. Indeed, for sodium light, which which has a frequency of 500 THz (500×1012 oscillations per second) and a wavelength of 600 nm (600×10–9 meter), we said the radiation lasts about 3.2×10–8 seconds (that’s actually the time it takes for the radiation’s energy to die out by a factor 1/e, so the wavetrain will actually last (much) longer, but so the amplitude becomes quite small after that time), and so that makes for some 16 million oscillations, and a ‘length’ of the wavetrain of about 9.6 meter! Now, the energy of a sodium light photon is about 2eV (h·ν ≈ 4×10−15 electronvolt·second times 0.5×1015 cycles/sec = 2eV) and so we could say the average energy of each of those 16 million oscillations would be 2/(16×106) eV = 0.125×10–6 eV. But, from all that I wrote above, it’s obvious that this number doesn’t mean all that much, because the wavetrain is not likely to be shaped very regularly.

So, in short, we cannot say that h is the photon energy per cycle or that ħ is the photon energy per radian!  That’s not only simplistic but, worse, false. Planck’s constant is what is is: a factor of proportionality for which there is no simple ‘arithmetic’ and/or ‘geometric’ explanation. It’s just there, and we’ll need to study some more math to truly understand the meaning of those two expressions for |L〉 and |R〉.

Having said that, and having thought about it all some more, I find there’s, perhaps, a more interesting way to re-write E = ν:

h = E/ν = (λ/c)E = T·E

T? Yes. T is the period, so that’s the time needed for one oscillation: T is just the reciprocal of the frequency (T = 1/ν = λ/c). It’s a very tiny number, because we divide (1) a very small number (the wavelength of light measured in meter) by (2) a very large number (the distance (in meter) traveled by light). For sodium light, T is equal to 2×10–15 seconds, so that’s two femtoseconds, i.e. two quadrillionths of a second.

Now, we can think of the period as a fraction of a second, and smaller fractions are, obviously, associated with higher frequencies and, what amounts to the same, shorter wavelengths (and, hence, higher energies). However, when writing T = λ/c, we can also think of T being another kind of fraction: λ/can also be written as the ratio of the wavelength and the distance traveled by light in one second, i.e. a light-second (remember that light-seconds are measures of length, not of distance). The two fractions are the same when we express time and distance in equivalent units indeed (i.e. distance in light-second, or time in sec/units).

So that links h to both time as well as distance and we may look at h as some kind of fraction or energy ‘density’ even (although the term ‘density’ in this context is not quite accurate). In the same vein, I should note that, if there’s anything that should make you think about h, is the fact that its value depends on how we measure time and distance. For example, if w’d measure time in other units (for example, the more ‘natural’ unit defined by the time light needs to travel one meter), then we get a different unit for h. And, of course, you also know we can relate energy to distance (1 J = 1 N·m). But that’s something that’s obvious from h‘s dimension (J·s), and so I shouldn’t dwell on that.

Hmm… Interesting thoughts. I think I’ll develop these things a bit further in one of my next posts. As for now, however, I’ll leave you with your own thoughts on it.

Note 1: As you’re trying to follow what I am writing above, you may have wondered whether or not the duration of the wavetrain that’s emitted by an atom is a constant, or whether or not it packs some constant number of oscillations. I’ve thought about that myself as I wrote down the following formula at some point of time:

h = (the duration of the wave)·(the energy of the photon)/(the number of oscillations in the wave)

As mentioned above, interpreting h as some kind of average energy per oscillation is not a great idea but, having said that, it would be a good exercise for you to try to answer that question in regard to the duration of these wavetrains, and/or the number of oscillations packed into them, yourself. There are various formulas for the Q of an atomic oscillator, but the simplest one is the one expressed in terms of the so-called classical electron radius r0:

Q = 3λ/4πr0

As you can see, the Q depends on λ: higher wavelengths (so lower energy) are associated with higher Q. In fact, the relationship is directly proportional: twice the wavelength will give you twice the Q. Now, the formula for the decay time τ is also dependent on the wavelength. Indeed, τ = 2Q/ω = Qλ/πc. Combining the two formulas yields (if I am not mistaken):

τ = 3λ2/4π2r0c.

Hence, the decay time is proportional to the square of the wavelength. Hmm… That’s an interesting result. But I really need to learn how to be a bit shorter, and so I’ll really let you think now about what all this means or could mean.

Note 2: If that 1/√2 factor has nothing to do with some kind of rms calculation, where does it come from? I am not sure. It’s related to state vector math, it seems, and I haven’t started that as yet. I just copy a formula from Wikipedia here, which shows the same factor in front:

state vector

The formula above is said to represent the “superposition of joint spin states for two particles”. My gut instinct tells me 1/√2 factor has to do with the normalization condition and/or with the fact that we have to take the (absolute) square of the (complex-valued) amplitudes to get the probability.

Understanding gyroscopes

You know a gyroscope: it’s a spinning wheel or disk mounted in a frame that itself is free to alter in direction, so the axis of rotation is not affected as the mounting tilts or moves about. Therefore, gyroscopes are used to provide stability or maintain a reference direction in navigation systems. Understanding a gyroscope itself is simple enough: it only involves a good understanding of the so-called moment of inertia. Indeed, in the previous post, we introduced a lot of concepts related to rotational motion, notably the concepts of torque and angular momentum but, because that post was getting too long, I did not talk about the moment of inertia and gyroscopes. Let me do that now. However, I should warn you: you will not be able to understand this post if you haven’t read or didn’t understand the previous post. So, if you can’t follow, please go back: it’s probably because you didn’t get the other post.

The moment of inertia and angular momentum are related but not quite the same. Let’s first recapitulate angular momentum. Angular momentum is the equivalent of linear momentum for rotational motion:

  1. If we want to change the linear motion of an object, as measured by its momentum p = mv, we’ll need to apply a force. Changing the linear motion means changing either (a) the speed (v), i.e. the magnitude of the velocity vector v, (b) the direction, or (c) both. This is expressed in Newton’s Law, F = m(dv/dt), and so we note that the mass is just a factor of proportionality measuring the inertia to change.
  2. The same goes for angular momentum (denoted by L): if we want to change it, we’ll need to apply a force, or a torque as it’s referred to when talking rotational motion, and such torque can change either (a) L’s magnitude (L), (b) L’s direction or (c) both.

Just like linear momentum, angular momentum is also a product of two factors: the first factor is the angular velocity ω, and the second factor is the moment of inertia. The moment of inertia is denoted by I so we write L = Iω. But what is I? If we’re analyzing a rigid body (which is what we usually do), then it will be calculated as follows:

formula 1

This is easy enough to understand: the inertia for turning will depend not just on the masses of all of the particles that make up the object, but also on their distance from the axis of rotation–and note that we need to square these distances. The L = Iω formula, combined with the formula for I above, explains why a spinning skater doing a ‘scratch spin’ speeds up tremendously when drawing in his or her arms and legs. Indeed, the total angular momentum has to remain the same, but I becomes much smaller as a result of that r2 factor in the formula. Hence, if I becomes smaller, then ω has to go up significantly in order to conserve angular momentum.

Finally, we note that angular momentum and linear momentum can be easily related through the following equation:

Formula 2

That’s all kids stuff. To understand gyroscopes, we’ll have to go beyond that and do some vector analysis. In the previous post, we explained that rotational motion is usually analyzed in terms of torques than forces, and we detailed the relations between force and torque. More in particular, we introduced a torque vector τ with the following components:

τ = (τyz, τzx, τxy) = (τx, τy, τz) with

τx = τyz = yFz – zFy

τy = τzx = zFx – xFz

τz = τxy = xFy – yFx.

We also noted that this torque vector could be written as a cross product of a radius vector and the force: τ = F. Finally, we also pointed out the relation between the x-, y- and z-components of the torque vector and the plane of rotation:

(1) τx = τyz is rotational motion about the x-axis (i.e. motion in the yz-plane)

(2) τy = τzx is rotational motion about the y-axis (i.e. motion in the zx plane)

(3) τz = τxy is rotational motion about the z-axis (i.e. motion in the xy-plane)

The angular momentum vector L will have the same direction as the torque vector, but it’s the cross product of the radius vector and the momentum vector: L = p. For clarity, I reproduce the animation I used in my previous post once again.


How do we get that cross vector product for L? We noted that τ (i.e. the Greek tau) = dL/dt. So we need to take the time derivative of all three components of L. What are the components of L? They look very similar to those of τ:

L = (Lyz, Lzx, Lxy) = (Lx, Ly, Lz) with

Lx = Lyz = ypz – zpy

Ly = Lzx = zpx – xpz

Lz = Lxy = xpy – ypx.

Now, just check the time derivatives of Lx, Ly, and Lz and you’ll find the components of the torque vector τ. Together with the formulas above, that should be sufficient to convince you that L is, indeed, a vector cross product of r and p: L = p.

Again, if you feel this is too difficult, please read or re-read my previous post. But if you do understand everything, then you are ready for a much more difficult analysis, and that’s an explanation of why a spinning top does not fall as it rotates about.

In order to understand that explanation, we’ll first analyze the situation below. It resembles the experiment with the swivel chair that’s often described on ‘easy physics’ websites: the man below holds a spinning wheel with its axis horizontal, and then turns this axis into the vertical. As a result, the man starts to turn himself in the opposite direction.

Rotating angular momentum

Let’s now look at the forces and torques involved. These are shown below.

Angular vectors in gyroscope

This looks very complicated–you’ll say! You’re right: it is quite complicated–but not impossible to understand. First note the vectors involved in the starting position: we have an angular momentum vector L0 and an angular velocity vector ω0. These are both axial vectors, as I explained in my previous post: their direction is perpendicular to the plane of motion, i.e. they are arrows along the axis of rotation. This is in line with what we wrote above: if an object is rotating in the zx-plane (which is the case here), then the angular momentum vector will have a y-component only, and so it will be directed along the y-axis. Which side? That’s determined by the right-hand screw rule. [Again, please do read my previous post for more details if you’d need them.]

So now we have explained L0 and ω0. What about all the other vectors? First note that there would be no torque if the man would not try to turn the axis. In that case, the angular momentum would just remain what it is, i.e. dL/dt = 0, and there would be no torque. Indeed, remember that τ = dL/dt, just like F = dp/dt, so dL/dt = 0, then τ = 0. But so the man is turning the axis of rotation and, hence, τ = dL/dt ≠ 0. What’s changing here is not the magnitude of the angular momentum but its direction. As usual, the analysis is in terms of differentials.

As the man turns the spinning wheel, the directional change of the angular momentum is defined by the angle Δθ, and we get a new angular momentum vector L1. The difference between L1 and L0 is given by the vector ΔL. This ΔL vector is a tiny vector in the L0L1 plane and, because we’re looking at a differential displacement only, we can say that, for all practical purposes, this ΔL is orthogonal to L0 (as we move from L0 to L1, we’re actually moving along an arc and, hence, ΔL is a tangential vector). Therefore, simple trigonometry allows us to say that its magnitude ΔL will be equal to L0Δθ. [We should actually write sin(Δθ) but, because we’re talking differentials and measuring angles in radians (so the value reflects arc lengths), we can equate sin(Δθ) with Δθ).]

Now, the torque vector τ has the same direction as the ΔL vector (that’s obvious from their definitions), but what is its magnitude? That’s an easy question to answer: τ = ΔL/Δt = L0Δθ/Δt = L0 (Δθ/Δt). Now, this result induces us to define another axial vector which we’ll denote using the same Greek letter omega, but written as a capital letter instead of in lowercase: Ω. The direction of Ω is determined by using that right-hand screw rule which we’ve always been using, and Ω‘s magnitude is equal to Ω = Δθ/Δt. So, in short, Ω is an angular velocity vector just like ω: its magnitude is the speed with which the man is turning the axis of rotation of the spinning wheel, and its direction is determined using the same rules. If we do that, we get the rather remarkable result that we can write the torque vector τ as the cross product of Ω and L0:

τ = Ω×L0

Now, this is not an obvious result, so you should check it yourself. When doing that, you’ll note that the two vectors are orthogonal and so we have τ = Ω×L0 = Ω×L0 =|Ω||L0|sin(π/2)n = ΩL0n with n the normal unit vector given, once again, by the right-hand screw rule. [Note how the order of the two factors in a cross product matters: b = –a.]

You’re probably tired of this already, and so you’ll say: so what?

Well… We have a torque. A torque is produced by forces, and a torque vector along the z-axis is associated with rotation about the z-axis, i.e. rotation in the xy-plane. Such rotation is caused by the forces F and –F that produce the torque, as shown in the illustration. [Again, their direction is determined by the right-hand screw rule – but I’ll stop repeating that from now on.] But… Wait a minute. First, the direction is wrong, isn’t it? The man turns the other way in reality. And, second, where do these forces come from? Well… The man produces them, and the direction of the forces is not wrong: as the man applies these forces, with his hands, as he holds the spinning wheel and turns it into the vertical direction, equal and opposite forces act on him (cf. the action-reaction principle), and so he starts to turn in the opposite direction.

So there we are: we have explained this complex situation fully in terms of torques and forces now. So that’s good. [If you don’t believe the thing about those forces, just get one of your wheels out of your mountainbike, let it spin, and try to change the plane in which it is spinning: you’ll see you’ll need a bit of force. Not much, but enough, and it’s exactly the kind of force that the man in the illustration is experiencing.]

Now, what if we would not be holding the spinning wheel? What if we would let it pivot, for example? Well… It would just pivot, as shown below.


But… Why doesn’t it fall? Hah! There we are! Now we are finally ready for the analysis we really want to do, i.e. explaining why these spinning tops (or gyros as they’re referred to in physics) don’t fall.

Such spinning top is shown in the illustration below. It’s similar to the spinning wheel: there’s a rotational axis, and we have the force of gravity trying to change the direction of that axis, so it’s like the man turning that spinning wheel indeed, but so now it’s gravity exerting the force that’s needed to change the angular momentum. Let’s associate the vertical direction with the z-axis, and the horizontal place with the xy-axis, and let’s go step-by-step:

  1. The gravitational force wants to pull that spinning top down. So the ΔL vector points downward this time, not upward. Hence, the torque vector will point downward too. But so it’s a torque pointing along the z-axis.
  2. Such torque along the z-axis is associated with a rotation in the xy-plane, so that’s why the spinning top will slowly revolve about the z-axis, parallel to the xy-plane. This process is referred to as precession, and so there’s a precession torque and a precession angular velocity.

spinning top

So that explains precession and so that’s all there is to it. Now you’ll complain, and rightly so: what I write above, does not explain why the spinning top does not actually fall. I only explained that precession movement. So what’s going on? That spinning top should fall as it precesses, shouldn’t it?

It actually does fall. The point to note, however, is that the precession movement itself changes the direction of the angular momentum vector as well. So we have a new ΔL vector pointing sideways, i.e. a vector in the horizontal plane–so not along the z axis. Hence, we should have a torque in the horizontal plane, and so that implies that we should have two equal and opposite forces acting along the z-axis.

In fact, the right-hand screw rule gives us the direction of those forces: if these forces were effectively applied to the spinning top, it would fall even faster! However, the point to note is that there are no such forces. Indeed, it is not like the man with the spinning wheel: no one (or nothing) is pushing or applying the forces that should produce the torque associated with this change in angular momentum. Hence, because these forces are absent, the spinning top begins to ‘fall’ in the opposite direction of the lacking force, thereby counteracting the gravitational force in such a way that the spinning top just spins about the z-axis without actually falling.

Now, this is, most probably, very difficult to understand in the way you would like to understand it, so just let it sink in and think about it for a while. In this regard, and to help the understanding, it’s probably worth noting that the actual process of reaching equilibrium is somewhat messy. It is illustrated below: if we hold a spinning gyro for a while and then, suddenly, we let it fall (yes, just let it go), it will actually fall. However, as it’s falling, it also starts turning and then, because it starts turning, it also starts ‘falling’ upwards, as explained in that story of the ‘missing force’ above. Initially, the upward movement will overshoot the equilibrium position, thereby slowing the gyro’s speed in the horizontal plane. And so then, because its horizontal speed becomes smaller, it stops ‘falling upward’, and so that means it’s falling down again. But then it starts turning again, and so on and so on. I hope you grasp this–more or less at least. Note that frictional effects will cause the up-and-down movement to dampen out, and so we get a so-called cycloidal motion dampening down to the steady motion we associate with spinning tops and gyros.

Actual gyroscope motion

That, then, is the ‘miracle’ of a spinning top explained. Is it less of a ‘miracle’ now that we have explained it in terms of torques and missing forces? That’s an appreciation which each of us has to make for him- or herself. I actually find it all even more wonderful now that I can explain it more or less using the kind of math I used above–but then you may have a different opinion.

In any case, let us – to wrap it all up – ask some simple questions about some other spinning objects. What about the Earth for example? It has an axis of rotation too, and it revolves around the Sun. Is there anything like precession going on?

The first answer is: no, not really. The axis of rotation of the Earth changes little with respect to the stars. Indeed, why would it change? Changing it would require a torque, and where would the required force for such torque come from? The Earth is not like a gyro on a pivot being pulled down by some force we cannot see. The Sun attracts the Earth as a whole indeed. It does not change its axis of rotation. That’s why we have a fairly regular day and night cycle.

The more precise answer is: yes, there actually is a very slow axial precession. The whole precessional cycle takes approximately 26,000 years, and it causes the position of stars – as perceived by us, earthlings, that is – to slowly change. Over this cycle, the Earth’s north axial pole moves from where it is now, in a circle with an angular radius of about 23.5 degrees, as illustrated below.


What is this precession caused by? There must be some torque. There is. The Earth is not perfectly spherical: it bulges outward at the equator, and the gravitational tidal forces of the Moon and Sun apply some torque here, attempting to pull the equatorial bulge into the plane of the ecliptic, but instead causing it to precess. So it’s a quite subtle motion, but it’s there, and it’s got also something to do with the gravitational force. However, it’s got nothing to do with the way gravitation makes a spinning top do what it does. [The most amazing thing about this, in my opinion, is that, despite the fact that the precessional movement is so tiny, the Greeks had already discovered it: indeed, the Greek astronomer and mathematician Hipparchus of Nicaea gave a pretty precise figure for this so-called ‘precession of the equinoxes’ in 127 BC.]

What about electrons? Are they like gyros rotating around some pivot? Here the answer is very simple and very straightforward: No, not at all! First, there are no pivots in an atom. Second, the current understanding of an electron – i.e. the quantum-mechanical understanding of a electron – is not compatible with the classical notion of spin. Let me just copy an explanation from Georgia State University’s HyperPhyics website. It basically says it all:

“Experimental evidence like the hydrogen fine structure and the Stern-Gerlach experiment suggest that an electron has an intrinsic angular momentum, independent of its orbital angular momentum. These experiments suggest just two possible states for this angular momentum, and following the pattern of quantized angular momentum, this requires an angular momentum quantum number of 1/2. With this evidence, we say that the electron has spin 1/2. An angular momentum and a magnetic moment could indeed arise from a spinning sphere of charge, but this classical picture cannot fit the size or quantized nature of the electron spin. The property called electron spin must be considered to be a quantum concept without detailed classical analogy.

So… I guess this should conclude my exposé on rotational motion. I am not sure what I am going to write about next, but I’ll see. 🙂

Post scriptum:

The above treatment is largely based on Feynman’s Lectures.(Vol. I, Chapter 18, 19 and 20). The subject could also be discussed using the concept of a force couple, aka pure moment. A force couple is a system of forces with a resultant moment but no resultant force. Hence, it causes rotation without translation or, more generally, without any acceleration of the centre of mass. In such analysis, we can say that gravity produces a force couple on the spinning top. The two forces of this couple are equal and opposite, and they pull at opposite ends. However, because one end of the top is fixed (friction forces keep the tip fixed to the ground), the force at the other end makes the top go about the vertical axis.

The situation we have is that gravity causes such force couple to appear, just like the man tilting the spinning wheel causes such force couple to appear. Now, the analysis above shows that the direction of the new force is perpendicular to the plane in which the axis of rotation changes, or wants to change in the case of our spinning top. So gravity wants to pull the top down and causes it to move sideways. This horizontal movement will, in turn, create another force couple. The direction of the resultant force, at the free end of the axis of rotation of the top, will, once again, be vertical, but it will oppose the gravity force. So, in a very simplified explanation of things, we could say:

  1. Gravity pulls the top downwards, and causes a force that will make the top move sideways. So the new force, which causes the precession movement, is orthogonal to the gravitation force, i.e. it’s a horizontal force.
  2. That horizontal force will, in turn, cause another force to appear. That force will also be orthogonal to the horizontal force. As we made two 90 degrees turns, so to say, i.e. 180 degrees in total, it means that this third force will be opposite to the gravitational force.
  3. In equilibrium, we have three forces: gravity, the force causing the precession and, finally, a force neutralizing gravity as the spinning top precesses about the vertical axis.

This approach allows for a treatment that is somewhat more intuitive than Feynman’s concept of the ‘missing force.’

Spinning: the essentials

When introducing mirror symmetry (P-symmetry) in one of my older posts (time reversal and CPT-symmetry), I also introduced the concept of axial and polar vectors in physics. Axial vectors have to do with rotations, or spinning objects. Because spin – i.e. turning motion – is such an important concept in physics, I’d suggest we re-visit the topic here.

Of course, I should be clear from the outset that the discussion below is entirely classical. Indeed, as Wikipedia puts it: “The intrinsic spin of elementary particles (such as electrons) is quantum-mechanical phenomenon that does not have a counterpart in classical mechanics, despite the term spin being reminiscent of classical phenomena such as a planet spinning on its axis.” Nevertheless, if we don’t understand what spin is in the classical world – i.e. our world for all practical purposes – then we won’t get even near to appreciating what it might be in the quantum-mechanical world. Besides, it’s just plain fun: I am sure you have played, as a kid of as an adult even, with one of those magical spinning tops or toy gyroscopes and so you probably wonder how it really works in physics. So that’s what this post is all about.

The essential concept is the concept of torque. For rotations in space (i.e. rotational motion), the torque is what the force is for linear motion:

  • It’s the torque (τ) that makes an object spin faster or slower, just like the force would accelerate or decelerate that very same object when it would be moving along some curve (as opposed to spinning around some axis).
  • There’s also a similar ‘law of Newton’ for torque: you’ll remember that the force equals the time rate-of-change of a vector quantity referred to as (linear) momentum: F = dp/dt = d(mv)/dt = ma (the mass times the acceleration). Likewise, we have a vector quantity that is referred to as angular momentum (L), and we can write: τ (i.e. the Greek tau) = dL/dt.
  • Finally, instead of linear velocity, we’ll have an angular velocity ω (omega), which is the time rate-of-change of the angle θ defining how far the object has gone around (as opposed to the distance in linear dynamics, describing how far the object has gone along). So we have ω = dθ/dt. This is actually easy to visualize because we know that θ is the length of the corresponding arc on the unit circle. Hence, the equivalence with the linear distance traveled is easily ascertained.

There are numerous other equivalences. For example, we also have an angular acceleration: α = dω/dt = d2θ/dt2; and we should also note that, just like the force, the torque is doing work – in its conventional definition as used in physics – as it turns an object:

ΔW = τ·Δθ

However, we also need to point out the differences. The animation below does that very well, as it relates the ‘new’ concepts – i.e. torque and angular momentum – to the ‘old’ concepts – i.e. force and linear momentum.


So what do we have here? We have vector quantities once again, denoted by symbols in bold-face. However, τ, L and ω are special vectors: axial vectors indeed, as opposed to the polar vectors F, p and v. Axial vectors are directed along the axis of spin – so that is, strangely enough, at right angles to the direction of spin, or perpendicular to the ‘plane of the twist’ as Feynman calls it – and the direction of the axial vector is determined by the direction of spin through one of two conventions: the ‘right-hand screw rule’ or the ‘left-hand screw rule’. Physicists have settled on the former.

If you feel very confused now (I did when I first looked at it), just step back and go through the full argument as I develop it here. It helps to think of torque (also known, for some obscure reason, as the moment of the force) as a twist on an object or a plane indeed: the torque’s magnitude is equal to the tangential component of the force, i.e. F·sin(Δθ), times the distance between the object and the axis of rotation (we’ll denote this distance by r). This quantity is also equal to the product of the magnitude of the force itself and the length of the so-called lever arm, i.e. the perpendicular distance from the axis to the line of action of the force (this lever arm length is denoted by r0). So we can write τ as:

  1. The product of the tangential component of the force times the distance r: τ = r·Ft = r·F·sin(Δθ)
  2. The product of the length of the lever arm times the force: τ = r0·F
  3. The torque is the work done per unit of distance traveled: τ = ΔW/Δθ or τ = dW/dθ in the limit.

So… These are actually only the basics, which you should remember from your high-school physics course. If not, have another look at it. We now need to go from scalar quantities to vector quantities to understand that animation above. Torque is not a vector like force or velocity, not a priori at least. However, we can associate torque with a vector of a special type, an axial vector. Feynman calls vectors such as force or (linear) velocity ‘honest’ or ‘real’ vectors. The mathematically correct term for such ‘honest’ or ‘real’ vectors is polar vector. Hence, axial vectors are not ‘honest’ or ‘real’ in some sense: we derive them from the polar vectors. They are, in effect, a so-called cross product of two ‘honest’ vectors. Here we need to explain the difference between a dot and a cross product between two vectors once again:

(1) A dot product, which we denoted by a little dot (·), yields a scalar quantity: b = |a||b|cosα = a·b·cosα with α the angle between the two vectors a and b. Note that the dot product of two orthogonal vectors is equal to zero, so take care:  τ = r·Ft = r·F·sin(Δθ) is not a dot product of two vectors. It’s a simple product of two scalar quantities: we only use the dot as a mark of separation, which may be quite confusing. In fact, some authors use ∗ for a product of scalars to avoid confusion: that’s not a bad idea, but it’s not a convention as yet. Omitting the dot when multiplying scalars (as I do when I write |a||b|cosα) is also possible, but it makes it a bit difficult to read formulas I find. Also note, once again, how important the difference between bold-face and normal type is in formulas like this: it distinguishes vectors from scalars – and these are two very different things indeed.

(2) A cross product, which we denote by using a cross (×), yields another vector: τ = r×F =|r|·|F|·sinα·n = r·F·sinα·n with n the normal unit vector given by the right-hand rule. Note how a cross product involves a sine, not a cosine – as opposed to a dot product. Hence, if r and F are orthogonal vectors (which is not unlikely), then this sine term will be equal to 1. If the two vectors are not perpendicular to each other, then the sine function will assure that we use the tangential component of the force.

But, again, how do we go from torque as a scalar quantity (τ = r·Ft) to the vector τ = r×F? Well… Let’s suppose, first, that, in our (inertial) frame of reference, we have some object spinning around the z-axis only. In other words, it spins in the xy-plane only. So we have a torque around (or about) the z-axis, i.e. in the xy-plane. The work that will be done by this torque can be written as:

ΔW = FxΔx + FyΔy = (xFy – yFx)Δθ

Huh? Yes. This results from a simple two-dimensional analysis of what’s going on in the xy-plane: the force has an x- and a y-component, and the distance traveled in the x- and y-direction is Δx = –yΔθ and Δy = xΔθ respectively. I won’t go into the details of this (you can easily find these elsewhere) but just note the minus sign for Δx and the way the x and y get switched in the expressions.

So the torque in the xy-plane is given by τxy = ΔW/Δθ = xFy – yFx. Likewise, if the object would be spinning about the x-axis – or, what amounts to the same, in the yz-plane – we’d get τyz = yFz – zFy. Finally, for some object spinning about the y-axis (i.e. in the zx-plane – and please note I write zx, not xz, so as to be consistent as we switch the order of the x, y and z coordinates in the formulas), then we’d get τzx = zFx – xFz. Now we can appreciate the fact that a torque in some other plane, at some angle with our Cartesian planes, would be some combination of these three torques, so we’d write:

(1)    τxy = xFy – yFx

(2)    τyz = yFz – zFy and

(3)    τzx = zFx – xFz.

Another observer with his Cartesian x’, y’ and z’ axes in some other direction (we’re not talking some observer moving away from us but, quite simply, a reference frame that’s being rotated itself around some axis not necessarily coinciding with any of the x-, y- z- or x’-, y’- and z’-axes mentioned above) would find other values as he calculates these torques, but the formulas would look the same:

(1’) τx’y’ = x’Fy’ – y’Fx’

(2’) τy’z’ = y’Fz’ – z’Fy’ and

(3’) τz’x’ = z’Fx’ – x’Fz’.

Now, of course, there must be some ‘nice’ relationship that expresses the τx’y’, τy’z’ and τz’x’ values in terms of τxy, τyz, just like there was some ‘nice’ relationship between the x’, y’ and z’ components of a vector in one coordination system (the x’, y’ and z’ coordinate system) and the x, y, z components of that same vector in the x, y and z coordinate system. Now, I won’t go into the details but that ‘nice’ relationship is, in fact, given by transformation expressions involving a rotation matrix. I won’t write that one down here, because it looks pretty formidable, but just google ‘axis-angle representation of a rotation’ and you’ll get all the details you want.

The point to note is that, in both sets of equations above, we have an x-, y- and z-component of some mathematical vector that transform just like a ‘real’ vector. Now, if it behaves like a vector, we’ll just call it a vector, and that’s how, in essence, we define torque, angular momentum (and angular velocity too) as axial vectors. We should note how it works exactly though:

(1) τxy and τx’y’ will transform like the z-component of a vector (note that we were talking rotational motion about the z-axis when introducing this quantity);

(2) τyz and τy’z’ will transform like the x-component of a vector (note that we were talking rotational motion about the x-axis when introducing this quantity);

(3) τzx and τz’x’ will transform like the y-component of a vector (note that we were talking rotation motion when introducing this quantity). So we have

τ = (τyz, τzx, τxy) = (τx, τy, τz) with

τx = τyz = yFz – zFy

τy = τzx = zFx – xFz

τz = τxy = xFy – yFx.

[This may look very difficult to remember but just look at the order: all we do is respect the clockwise order x, y, z, x, y, z, x, etc. when jotting down the x, y and z subscripts.]

Now we are, finally, well equipped to once again look at that vector representation of rotation. I reproduce it once again below so you don’t have to scroll back to that animation:


We have rotation in the zx-plane here (i.e. rotation about the y-axis) driven by an oscillating force F, and so, yes, we can see that the torque vector oscillates along the y-axis only: its x- and z-components are zero. We also have L here, the angular momentum. That’s a vector quantity as well. We can write it as

L = (Lyz, Lzx, Lxy) = (Lx, Ly, Lz) with

Lx = Lyz = ypz – zpy (i.e. the angular momentum about the x-axis)

Ly = Lzx = zpx – xpz (i.e. the angular momentum about the y-axis)

Lz = Lxy = xpy – ypx (i.e. the angular momentum about the z-axis),

And we note, once again, that only the y-component is non-zero in this case, because the rotation is about the y-axis.

We should now remember the rules for a cross product. Above, we wrote that τ = r´F =|r|×|F|×sina×n = = r×F×sina×n with n the normal unit vector given by the right-hand rule. However, a vector product can also be written in terms of its components: c = a´b if and only

cx = aybz – azby,

cy = azbx – axbz, and

cz = axby – aybx.

Again, if this looks difficult, remember the trick above: respect the clockwise order when jotting down the x, y and z subscripts. I’ll leave it to you to work out r´F and r´p in terms of components but, when you write it all out, you’ll see it corresponds to the formulas above. In addition, I will also leave it to you to show that the velocity of some particle in a rotating body can be given by a similar vector product: v = ω´r, with ω being defined as another axial vector (aka pseudovector) pointing along the direction of the axis of rotation, i.e. not in the direction of motion. [Is that strange? No. As it’s rotational motion, there is no ‘direction of motion’ really: the object, or any particle in that object, goes round and round and round indeed and, hence, defining some normal vector using the right-hand rule to denote angular velocity makes a lot of sense.]

I could continue to write and write and write, but I need to stop here. Indeed, I actually wanted to tell you how gyroscopes work, but I notice that this introduction has already taken several pages. Hence, I’ll leave the gyroscope for a separate post. So, be warned, you’ll need to read and understand this one before reading my next one.