Re-visiting electron orbitals (II)

I’ve talked about electron orbitals in a couple of posts already – including a fairly recent one, which is why I put the (II) after the title. However, I just wanted to tie up some loose ends here – and do some more thinking about the concept of a definite energy state. What is it really? We know the wavefunction for a definite energy state can always be written as:

ψ(x, t) = ei·(E/ħ)·t·ψ(x)

Well… In fact, we should probably formally prove that but… Well… Let us just explore this formula in a more intuitive way – for the time being, that is – using those electron orbitals we’ve derived.

First, let me note that ψ(x, t) and ψ(x) are very different functions and, therefore, the choice of the same symbol for both (the Greek psi) is – in my humble opinion – not very fortunate, but then… Well… It is the choice of physicists – as copied in textbooks all over – and so we’ll just have to live with it. Of course, we can appreciate why they choose to use the same symbol – ψ(x) is like a time-independent wavefunction now, so that’s nice – but… Well… You should note that it is not so obvious to write some function as the product of two other functions. To be complete, I’ll be a bit more explicit here: if some function in two variables – say F(x, y) – can be written as the product of two functions in one variable – say f(x) and g(y), so we can write F as F(x, y) = f(x)·g(y) – then we say F is a separable function. For a full overview of what that means, click on this link. And note mathematicians do choose a different symbol for the functions F and g. It would probably be interesting to explore what the conditions for separability actually imply in terms of properties of… Well… The wavefunction and its argument, i.e. the space and time variables. But… Well… That’s stuff for another post. 🙂

Secondly, note that the momentum variable (p) – i.e. the p in our elementary wavefunction a·ei·(p·x−E·t)/ħ has sort of vanished: ψ(x) is a function of the position only. Now, you may think it should be somewhere there – that, perhaps, we can write something like ψ(x) = ψ[x), p(x)]. But… No. The momentum variable has effectively vanished. Look at Feynman’s solutions for the electron orbitals of a hydrogen atom:Grand EquationThe Yl,m(θ, φ) and Fn,l(ρ) functions here are functions of the (polar) coordinates ρ, θ, φ only. So that’s the position only (these coordinates are polar or spherical coordinates, so ρ is the radial distance, θ is the polar angle, and φ is the azimuthal angle). There’s no idea whatsoever of any momentum in one or the other spatial direction here. I find that rather remarkable. Let’s see how it all works with a simple example.

The functions below are the Yl,m(θ, φ) for = 1. Note the symmetry: if we swap θ and φ for -θ and -φ respectively, we get the other function: 2-1/2·sin(-θ)·ei(-φ) = -2-1/2·sinθ·eiφ.


To get the probabilities, we need to take the absolute square of the whole thing, including ei·(E/ħ), but we know |ei·δ|2 = 1 for any value of δ. Why? Because the absolute square of any complex number is the product of the number with its complex conjugate, so |ei·δ|2 = ei·δ·ei·δ = ei·0 = 1. So we only have to look at the absolute square of the Yl,m(θ, φ) and Fn,l(ρ) functions here. The Fn,l(ρ) function is a real-valued function, so its absolute square is just what it is: some real number (I gave you the formula for the ak coefficients in my post on it, and you shouldn’t worry about them: they’re real too). In contrast, the Yl,m(θ, φ) functions are complex-valued – most of them are, at least. Unsurprisingly, we find the probabilities are also symmetric:

P = |-2-1/2·sinθ·eiφ|2 = (-2-1/2·sinθ·eiφ)·(-2-1/2·sinθ·eiφ)

= (2-1/2·sinθ·eiφ)·(2-1/2·sinθ·eiφ) =  |2-1/2·sinθ·eiφ|2 = (1/2)·sin2θ

Of course, for = 0, the probability is just cos2θ. The graphs below are the polar graphs for the cos2θ and (1/2)·sin2θ functions respectively.

combinationThese polar graphs are not so easy to interpret, so let me say a few words about them. The points that are plotted combine (a) some radial distance from the center – which I wrote as P because this distance is, effectively, a probability – with (b) the polar angle θ (so that’s one of the  three coordinates). To be precise, the plot gives us, for a given ρ, all of the (θ, P) combinations. It works as follows. To calculate the probability for some ρ and θ (note that φ can be any angle), we must take the absolute square of that ψn,l,m, = Yl,m(θ, φ)·Fn,l(ρ) product. Hence, we must calculate |Yl,m(θ, φ)·Fn,l(ρ)|2 = |Fn,l(ρ)|2·cos2θ for = 0, and (1/2)·|Fn,l(ρ)|2·sin2θ for = ±1. Hence, the value of ρ determines the value of Fn,l(ρ), and that Fn,l(ρ) value then determines the shape of the polar graph. The three graphs below – P = cos2θ, P = (1/2)·cos2θ and P = (1/4)·cos2θ – illustrate the idea. polar comparativeNote that we’re measuring θ from the z-axis here, as we should. So that gives us the right orientation of this volume, as opposed to the other polar graphs above, which measured θ from the x-axis. So… Well… We’re getting there, aren’t we? 🙂

Now you’ll have two or three – or even more – obvious questions. The first one is: where is the third lobe? That’s a good question. Most illustrations will represent the p-orbitals as follows:p orbitalsThree lobes. Well… Frankly, I am not quite sure here, but the equations speak for themselves: the probabilities only depend on ρ and θ. Hence, the azimuthal angle φ can be anything. So you just need to rotate those P = (1/2)·sin2θ and P = cos2θ curves about the the z-axis. In case you wonder how to do that, the illustration below may inspire you.sphericalThe second obvious question is about the size of those lobes. That 1/2 factor must surely matter, right? Well… We still have that Fn,l(ρ) factor, of course, but you’re right: that factor does not depend on the value for m: it’s the same for = 0 or ± 1. So… Well… Those representations above – with the three lobes, all of the same volume – may not be accurate. I found an interesting site – Atom in a Box – with an app that visualizes the atomic orbitals in a fun and exciting way. Unfortunately, it’s for Mac and iPhone only – but this YouTube video shows how it works. I encourage you to explore it. In fact, I need to explore it – but what I’ve seen on that YouTube video (I don’t have a Mac nor an iPhone) suggests the three-lobe illustrations may effectively be wrong: there’s some asymmetry here – which we’d expect, because those p-orbitals are actually supposed to be asymmetric! In fact, the most accurate pictures may well be the ones below. I took them from Wikimedia Commons. The author explains the use of the color codes as follows: “The depicted rigid body is where the probability density exceeds a certain value. The color shows the complex phase of the wavefunction, where blue means real positive, red means imaginary positive, yellow means real negative and green means imaginary negative.” I must assume he refers to the sign of and when writing a complex number as + i·b

The third obvious question is related to the one above: we should get some cloud, right? Not some rigid body or some surface. Well… I think you can answer that question yourself now, based on what the author of the illustration above wrote: if we change the cut-off value for the probability, then we’ll give a different shape. So you can play with that and, yes, it’s some cloud, and that’s what the mentioned app visualizes. 🙂

The fourth question is the most obvious of all. It’s the question I started this post with: what are those definite energy states? We have uncertainty, right? So how does that play out? Now that is a question I’ll try to tackle in my next post. Stay tuned ! 🙂

Post scriptum: Let me add a few remarks here so as to – hopefully – contribute to an even better interpretation of what’s going on here. As mentioned, the key to understanding is, obviously, the following basic functional form:

ψ(r, t) = ei·(E/ħ)·t·ψ(r)

Wikipedia refers to the ei·(E/ħ)·t factor as a time-dependent phase factor which, as you can see, we can separate out because we are looking at a definite energy state here. Note the minus sign in the exponent – which reminds us of the minus sign in the exponent of the elementary  wavefunction, which we wrote as:

 a·ei·θ = a·ei·[(E/ħ)·t − (p/ħ)∙x] = a·ei·[(p/ħ)∙x − (E/ħ)·t] = a·ei·(E/ħ)·t·ei·(p/ħ)∙x

We know this elementary wavefunction is problematic in terms of interpretation because its absolute square gives us some constant probability P(x, t) = |a·ei·[(E/ħ)·t − (p/ħ)∙x]|= a2. In other words, at any point in time, our electron is equally likely to be anywhere in space. That is not consistent with the idea of our electron being somewhere at some point in time.

The other question is: what reference frame do we use to measure E and p? Indeed, the value of E and p = (px, py, pz) depends on our reference frame: from the electron’s own point of view, it has no momentum whatsoever: p = 0. Fortunately, we do have a point of reference here: the nucleus of our hydrogen atom. And our own position, of course, because you should note, indeed, that both the subject and the object of the observation are necessary to define the Cartesian xx, y, z – or, more relevant in this context – the polar r = ρ, θ, φ coordinates.

This, then, defines some finite or infinite box in space in which the (linear) momentum (p) of our electron vanishes, and then we just need to solve Schrödinger’s diffusion equation to find the solutions for ψ(r). These solutions are more conveniently written in terms of the radial distance ρ, the polar angle θ, and the azimuthal angle φ:Grand Equation

The functions below are the Yl,m(θ, φ) functions for = 1.


The interesting thing about these Yl,m(θ, φ) functions is the ei·φ and/or ei·φ factor. Indeed, note the following:

  1. Because the sinθ and cosθ factors are real-valued, they only define some envelope for the ψ(r) function.
  2. In contrast, the ei·φ and/or ei·φ factor define some phase shift.

Let’s have a look at the physicality of the situation, which is depicted below.


The nucleus of our hydrogen atom is at the center. The polar angle is measured from the z-axis, and we know we only have an amplitude there for = 0, so let’s look at what that cosθ factor does. If θ = 0°, the amplitude is just what it is, but when θ > 0°, then |cosθ| < 1 and, therefore, the probability P = |Fn,l(ρ)|2·cos2θ will diminish. Hence, for the same radial distance (ρ), we are less likely to find the electron at some angle θ > 0° than on the z-axis itself. Now that makes sense, obviously. You can work out the argument for = ± 1 yourself, I hope. [The axis of symmetry will be different, obviously!] angle 2In contrast, the ei·φ and/or ei·φ factor work very differently. These just give us a phase shift, as illustrated below. A re-set of our zero point for measuring time, so to speak, and the ei·φ and/or ei·φ factor effectively disappears when we’re calculating probabilities, which is consistent with the fact that this angle clearly doesn’t influence the magnitude of the amplitude fluctuations.phase shiftSo… Well… That’s it, really. I hope you enjoyed this ! 🙂

Some more on symmetries…

In our previous post, we talked a lot about symmetries in space – in a rather playful way. Let’s try to take it further here by doing some more thinking on symmetries in spacetime. This post will pick up some older stuff – from my posts on states and the related quantum math in November 2015, for example – but that shouldn’t trouble you too much. On the contrary, I actually hope to tie up some loose ends here.

Let’s first review some obvious ideas. Think about the direction of time. On a time axis, time goes from left to right. It will usually be measured from some zero point – like when we started our experiment or something 🙂 – to some +point but we may also think of some point in time before our zero point, so the minus (−t) points – the left side of the axis – make sense as well. So the direction of time is clear and intuitive. Now, what does it mean to reverse the direction of time? We need to distinguish two things here: the convention, and… Well… Reality. If we would suddenly decide to reverse the direction in which we measure time, then that’s just another convention. We don’t change reality: trees and kids would still grow the way they always did. 🙂 We would just have to change the numbers on our clocks or, alternatively, the direction of rotation of the hand(s) of our clock, as shown below. [I only showed the hour hand because… Well… I don’t want to complicate things by introducing two time units. But adding the minute hand doesn’t make any difference.]

clock problemNow, imagine you’re the dictator who decided to change our time measuring convention. How would you go about it? Would you change the numbers on the clock or the direction of rotation? Personally, I’d be in favor of changing the direction of rotation. Why? Well… First, we wouldn’t have to change expressions such as: “If you are looking north right now, then west is in the 9 o’clock direction, so go there.” 🙂 More importantly, it would align our clocks with the way we’re measuring angles. On the other hand, it would not align our clocks with the way the argument (θ) of our elementary wavefunction ψ = a·eiθ = ei·(E·t – p·x)/ħ is measured, because that’s… Well… Clockwise.

So… What are the implications here? We would need to change t for −t in our wavefunction as well, right? Yep. Good point. So that’s another convention that would change: we should write our elementary wavefunction now as ψ = a·ei·(E·t – p·x)/ħ. So we would have to re-define θ as θ = –E·t + p·x = p·x –E·t. So… Well… Done!

So… Well… What’s next? Nothing. Note that we’re not changing reality here. We’re just adapting our formulas to a new dictatorial convention according to which we should count time from positive to negative – like 2, 1, 0, -1, -2 etcetera, as shown below. Fortunately, we can fix all of our laws and formulas in physics by swapping for -t. So that’s great. No sweat. time reversal

Is that all? Yes. We don’t need to do anything else. We’ll still measure the argument of our wavefunction as an angle, so that’s… Well… After changing our convention, it’s now clockwise. 🙂 Whatever you want to call it: it’s still the same direction. Our dictator can’t change physical reality 🙂

Hmm… But so we are obviously interested in changing physical reality. I mean… Anyone can become a dictator, right? In contrast, we – enlightened scientists – want to really change the world, don’t we? 🙂 So what’s a time reversal in reality? Well… I don’t know… You tell me. 🙂 We may imagine some movie being played backwards, or trees and kids shrinking instead of growing, or some bird flying backwards – and I am not talking the hummingbird here. 🙂

Hey! The latter illustration – that bird flying backwards – is probably the better one: if we reverse the direction of time – in reality, that is – then we should also reverse all directions in space. But… Well… What does that mean, really? We need to think in terms of force fields here. A stone that’d be falling must now go back up. Two opposite charges that were going towards each other, should now move away from each other. But… My God! Such world cannot exist, can it?

No. It cannot. And we don’t need to invoke the second law of thermodynamics for that. 🙂 None of what happens in a movie that’s played backwards makes sense: a heavy stone does not suddenly fly up and decelerate upwards. So it is not like the anti-matter world we described in our previous post. No. We can effectively imagine some world in which all charges have been replaced by their opposite: we’d have positive electrons (positrons) around negatively charged nuclei consisting of antiprotons and antineutrons and, somehow, negative masses. But Coulomb’s law would still tell us two opposite charges – q1 and –q2 , for example – don’t repel but attract each other, with a force that’s proportional to the product of their charges, i.e. q1·(-q2) = –q1·q2. Likewise, Newton’s law of gravitation would still tell us that two masses m1 and m2 – negative or positive – will attract each other with a force that’s proportional to the product of their masses, i.e. m1·m= (-m1)·(-m2). If you’d make a movie in the antimatter world, it would look just like any other movie. It would definitely not look like a movie being played backwards.

In fact, the latter formula – m1·m= (-m1)·(-m2) – tells us why: we’re not changing anything by putting a minus sign in front of all of our variables, which are time (t), position (x), mass (m) and charge (q). [Did I forget one? I don’t think so.] Hence, the famous CPT Theorem – which tells us that a world in which (1) time is reversed, (2) all charges have been conjugated (i.e. all particles have been replaced by their antiparticles), and (3) all spatial coordinates now have the opposite sign, is entirely possible (because it would obey the same Laws of Nature that we, in our world, have discovered over the past few hundred years) – is actually nothing but a tautology. Now, I mean that literally: a tautology is a statement that is true by necessity or by virtue of its logical form. Well… That’s the case here: if we flip the signs of all of our variables, we basically just agreed to count or measure everything from positive to negative. That’s it. Full stop. Such exotic convention is… Well… Exotic, but it cannot change the real world. Full stop.

Of course, this leaves the more intriguing questions entirely open. Partial symmetries. Like time reversal only. 🙂 Or charge conjugation only. 🙂 So let’s think about that.

We know that the world that we see in a mirror must be made of anti-matter but, apart from that particularity, that world makes sense: if we drop a stone in front of the mirror, the stone in the mirror will drop down too. Two like charges will be seen as repelling each other in the mirror too, and concepts such as kinetic or potential energy look just the same. So time just seems to tick away in both worlds – no time reversal here! – and… Well… We’ve got two CP-symmetrical worlds here, don’t we? We only flipped the sign of the coordinate frame and of the charges. Both are possible, right? And what’s possible must exist, right? Well… Maybe. That’s the next step. Let’s first see if both are possible. 🙂

Now, when you’ve read my previous post, you’ll note that I did not flip the z-coordinate when reflecting my world in the mirror. That’s true. But… Well… That’s entirely beside the point. We could flip the z-axis too and so then we’d have a full parity inversion. [Or parity transformation – sounds more serious, doesn’t it? But it’s only a simple inversion, really.] It really doesn’t matter. The point is: axial vectors have the opposite sign in the mirror world, and so it’s not only about whether or not an antimatter world is possible (it should be, right?): it’s about whether or not the sign reversal of all of those axial vectors makes sense in each and every situation. The illustration below, for example, shows how a left-handed neutrino should be a right-handed antineutrino in the mirror world.right-handed antineutrinoI hope you understand the left- versus right-handed thing. Think, for example, of how the left-circularly polarized wavefunction below would look like in the mirror. Just apply the customary right-hand rule to determine the direction of the angular momentum vector. You’ll agree it will be right-circularly polarized in the mirror, right? That’s why we need the charge conjugation: think of the magnetic moment of a circulating charge! So… Well… I can’t dwell on this too much but – if Maxwell’s equations are to hold – then that world in the mirror must be made of antimatter.animation

Now, we know that some processes – in our world – are not entirely CP-symmetrical. I wrote about this at length in previous posts, so I won’t dwell on these experiments here. The point is: these experiments – which are not easy to understand – lead physicists, philosophers, bloggers and what have you to solemnly state that the world in the mirror cannot really exist. And… Well… They’re right. However, I think their observations are beside the point. Literally.

So… Well… I would just like to make a very fundamental philosophical remark about all those discussions. My point is quite simple:

We should realize that the mirror world and our world are effectively separated by the mirror. So we should not be looking at stuff in the mirror from our perspective, because that perspective is well… Outside of the mirror. A different world. 🙂 In my humble opinion, the valid point of reference would be the observer in the mirror, like the photographer in the image below. Now note the following: if the real photographer, on this side of the mirror, would have a left-circularly polarized beam in front of him, then the imaginary photographer, on the other side of the mirror, would see the mirror image of this left-circularly polarized beam as a left-circularly polarized beam too. 🙂 I know that sounds complicated but re-read it a couple of times and – I hope – you’ll see the point. If you don’t… Well… Let me try to rephrase it: the point is that the observer in the mirror would be seeing our world – just the same laws and what have you, all makes sense! – but he would see our world in his world, so he’d see it in the mirror world. 🙂


Capito? If you would actually be living in the mirror world, then all the things you would see in the mirror world would make perfectly sense. But you would be living in the mirror world. You would not look at it from outside, i.e. from the other side of the mirror. In short, I actually think the mirror world does exist – but in the mirror only. 🙂 […] I am, obviously, joking here. Let me be explicit: our world is our world, and I think those CP violations in Nature are telling us that it’s the only real world. The other worlds exist in our mind only – or in some mirror. 🙂

Post scriptum: I know the Die Hard philosophers among you will now have an immediate rapid-backfire question. [Hey – I just invented a new word, didn’t I? A rapid-backfire question. Neat.] How would the photographer in the mirror look at our world? The answer to that question is simple: symmetry! He (or she) would think it’s a mirror world only. His world and our world would be separated by the same mirror. So… What are the implications here?

Well… That mirror is only a piece of glass with a coating. We made it. Or… Well… Some man-made company made it. 🙂 So… Well… If you think that observer in the mirror – I am talking about that image of the photographer in that picture above now – would actually exist, then… Well… Then you need to be aware of the consequences: the corollary of his existence is that you do not exist. 🙂 And… Well… No. I won’t say more. If you’re reading stuff like this, then you’re smart enough to figure it out for yourself. We live in one world. Quantum mechanics tells us the perspective on that world matters very much – amplitudes are different in different reference frames – but… Well… Quantum mechanics – or physics in general – does not give us many degrees of freedoms. None, really. It basically tells us the world we live in is the only world that’s possible, really. But… Then… Well… That’s just because physics… Well… When everything is said and done, it’s just mankind’s drive to ensure our perception of the Universe lines up with… Well… What we perceive it to be. 😦 or 🙂 Whatever your appreciation of it. Those Great Minds did an incredible job. 🙂

Symmetries and transformations

In my previous post, I promised to do something on symmetries. Something simple but then… Well… You know how it goes: one question always triggers another one. 🙂

Look at the situation in the illustration on the left below. We suppose we have something real going on there: something is moving from left to right (so that’s in the 3 o’clock direction), and then something else is going around clockwise (so that’s not the direction in which we measure angles (which also include the argument θ of our wavefunction), because that’s always counter-clockwise, as I note at the bottom of the illustration). To be precise, we should note that the angular momentum here is all about the y-axis, so the angular momentum vector L points in the (positive) y-direction. We get that direction from the familiar right-hand rule, which is illustrated in the top right corner.

mirrorNow, suppose someone else is looking at this from the other side – or just think of yourself going around a full 180° to look at the same thing from the back side. You’ll agree you’ll see the same thing going from right to left (so that’s in the 9 o’clock direction now – or, if our clock is transparent, the 3 o’clock direction of our reversed clock). Likewise, the thing that’s turning around will now go counter-clockwise.

Note that both observers – so that’s me and that other person (or myself after my walk around this whole thing) – use a regular coordinate system, which implies the following:

  1. We’ve got regular 90° degree angles between our coordinates axes.
  2. Our x-axis goes from negative to positive from left to right, and our y-axis does the same going away from us.
  3. We also both define our z-axis using, once again, the ubiquitous right-hand rule, so our z-axis points upwards.

So we have two observers looking at the same reality – some linear as well as some angular momentum – but from opposite sides. And so we’ve got a reversal of both the linear as well as the angular momentum. Not in reality, of course, because we’re looking at the same thing. But we measure it differently. Indeed, if we use the subscripts 1 and 2 to denote the measurements in the two coordinate systems, we find that p2 = –p1. Likewise, we also find that L2 = –L1.

Now, when you see these two equations, you will probably not worry about that p2 = –p1 equation – although you should, because it’s actually only valid for this rather particular orientation of the linear momentum (I’ll come back to that in a moment). It’s the L2 = –L1 equation which should surprise you most. Why? Because you’ve always been told there is a big difference between (1) real vectors (aka polar vectors), like the momentum p, or the velocity v, or the force F, and (2) pseudo-vectors (aka axial vectors), like the angular momentum L. You may also remember how to distinguish between the two: if you change the direction of the axes of your reference frame, polar vectors will change sign too, as opposed to axial vectors: axial vectors do not swap sign if we swap the coordinate signs.

So… Well… How does that work here? In fact, what we should ask ourselves is: why does that not work here? Well… It’s simple, really. We’re not changing the direction of the axes here. Or… Well… Let me be more precise: we’re only swapping the sign of the x– and y-axis. We did not flip the z-axis. So we turned things around, but we didn’t turn them upside down. It makes a huge difference. Note, for example, that if all of the linear momentum would have been in the z-direction only (so our p vector would have been pointing in the z-direction, and in the z-direction only), it would not swap sign. The illustration below shows what really happens with the coordinates of some vector when we’re doing a rotation. It’s, effectively, only the x– and y-coordinates that flip sign.reflection symmetry

It’s easy to see that this rotation about the z-axis here preserves our deep sense of ‘up’ versus ‘down’, but that it swaps ‘left’ for ‘right’, and vice versa. Note that this is not a reflection. We are not looking at some mirror world here. The difference between a reflection (a mirror world) and a rotation (the real world seen from another angle) is illustrated below. It’s quite confusing but, unlike what you might think, a reflection does not swap left for right. It does turn things inside out, but that’s what a rotation does as well: near becomes far, and far becomes near.difference between reflection and rotation

Before we move on, let me say a few things about the mirror world and, more in particular, about the obvious question: could it possibly exist? Well… What do you think? Your first reaction might well be: “Of course! What nonsense question! We just walk around whatever it is that we’re seeing – or, what amounts to the same, we just turn it around – and there it is: that’s the mirror world, right? So of course it exists!” Well… No. That’s not the mirror world. That’s just the real world seen from the opposite direction, and that world… Well… That’s just the real world. 🙂 The mirror world is, literally, the world in the mirror – like the photographer in the illustration below. We don’t swap left for right here: some object going from left to right in the real world is still going from left to right in the mirror world!MirrorOf course, you may now involve the photographer in the picture above and observe – note that you’re now an observer of the observer of the mirror 🙂 – that, if he would move his left arm in the real world, the photographer in the mirror world would be moving his right arm. But… Well… No. You’re saying that because you’re now imaging that you’re the photographer in the mirror world yourself now, who’s looking at the real world from inside, so to speak. So you’ve rotated the perspective in your mind and you’re saying it’s his right arm because you imagine yourself to be the photographer in the mirror. We usually do that because… Well… Because we look in a mirror every day, right? So we’re used to seeing ourselves that way and we always think it’s us we’re seeing. 🙂 However, the illustration above is correct: the mirror world only swaps near for far, and far for near, so it only swaps the sign of the y-axis.

So the question is relevant: could the mirror world actually exist? What we’re really asking here is the following: can we swap the sign of one coordinate axis only in all of our physical laws and equations and… Well… Do we then still get the same laws and equations? Do we get the same Universe – because that’s what those laws and equations describe? If so, our mirror world can exist. If not, then not.

Now, I’ve done a post on that, in which I explain that mirror world can only exist if it would consist of anti-matter. So if our real world and the mirror world would actually meet, they would annihilate each other. 🙂 But that post is quite technical. Here I want to keep it very simple: I basically only want to show what the rotation operation implies for the wavefunction. There is no doubt whatsoever that the rotated world exists. In fact, the rotated world is just our world. We walk around some object, or we turn it around, but so we’re still watching the same object. So we’re not thinking about the mirror world here. We just want to know how things look like when adopting some other perspective.

So, back to the starting point: we just have two observers here, who look at the same thing but from opposite directions. Mathematically, this corresponds to a rotation of our reference frame about the z-axis of 180°. Let me spell out – somewhat more precisely – what happens to the linear and angular momentum here:

  1. The direction of the linear momentum in the xy-plane swaps direction.
  2. The angular momentum about the y-axis, as well as about the x-axis, swaps direction too.

Note that the illustration only shows angular momentum about the y-axis, but you can easily verify the statement about the angular momentum about the x-axis. In fact, the angular momentum about any line in the xy-plane will swap direction.

Of course, the x-, y-, z-axes in the other reference frame are different than mine, and so I should give them a subscript, right? Or, at the very least, write something like x’, y’, z’, so we have a primed reference frame here, right? Well… Maybe. Maybe not. Think about it. 🙂 A coordinate system is just a mathematical thing… Only the momentum is real… Linear or angular… Equally real… And then Nature doesn’t care about our position, does it? So… Well… No subscript needed, right? Or… Well… What do you think? 🙂

It’s just funny, isn’t it? It looks like we can’t really separate reality and perception here. Indeed, note how our p2 = –pand L2 = –L1 equations already mix reality with how we perceive it. It’s the same thing in reality but the coordinates of p1 and L1 are positive, while the coordinates of p2 and Lare negative. To be precise, these coordinates will look like this:

  1. p1 = (p, 0, 0) and L1 = (0, L, 0)
  2. p2 = (−p, 0, 0) and L1 = (0, −L, 0)

So are they two different things or are they not? 🙂 Think about it. I’ll move on in the meanwhile. 🙂

Now, you probably know a thing or two about parity symmetry, or P-symmetry: if if we flip the sign of all coordinates, then we’ll still find the same physical laws, like F = m·a and what have you. [It works for all physical laws, including quantum-mechanical laws – except those involving the weak force (read: radioactive decay processes).] But so here we are talking rotational symmetry. That’s not the same as P-symmetry. If we flip the signs of all coordinates, we’re also swapping ‘up’ for ‘down’, so we’re not only turning around, but we’re also getting upside down. The difference between rotational symmetry and P-symmetry is shown below.up and down swap

As mentioned, we’ve talked about P-symmetry at length in other posts, and you can easily google a lot more on that. The question we want to examine here – just as a fun exercise – is the following:

How does that rotational symmetry work for a wavefunction?

The very first illustration in this post gave you the functional form of the elementary wavefunction  eiθ = ei·(E·t p·x)/ħ. We should actually use a bold type x = (x, y, z) in this formula but we’ll assume we’re talking something similar to that p vector: something moving in the x-direction only – or in the xy-plane only. The z-component doesn’t change. Now, you know that we can reduce all actual wavefunctions to some linear combination of such elementary wavefunctions by doing a Fourier decomposition, so it’s fine to look at the elementary wavefunction only – so we don’t make it too complicated here. Now think of the following.

The energy E in the eiθ = ei·(E·t – p·x)/ħ function is a scalar, so it doesn’t have any direction and we’ll measure it the same from both sides – as kinetic or potential energy or, more likely, by adding both. But… Well… Writing ei·(E·t – p·x)/ħ or ei·(E·t + p·x)/ħ is not the same, right? No, it’s not. However, think of it as follows: we won’t be changing the direction of time, right? So it’s OK to not change the sign of E. In fact, we can re-write the two expressions as follows:

  1. ei·(E·t – p·x)/ħ = ei·(E/ħ)·t·ei·(p/ħ)·x
  2. ei·(E·t + p·x)/ħ = ei·(E/ħ)·t·ei·(p/ħ)·x

The first wavefunction describes some particle going in the positive x-direction, while the second wavefunction describes some particle going in the negative x-direction, so… Well… That’s exactly what we see in those two reference frames, so there is no issue whatsoever. 🙂 It’s just… Well… I just wanted to show the wavefunction does look different too when looking at something from another angle.

So why am I writing about this? Why am I being fussy? Well.. It’s just to show you that those transformations are actually quite natural – just as natural as it is to see some particle go in one direction in one reference frame and see it go in the other in the other. 🙂 It also illustrates another point that I’ve been trying to make: the wavefunction is something real. It’s not just a figment of our imagination. The real and imaginary part of our wavefunction have a precise geometrical meaning – and I explained what that might be in my more speculative posts, which I’ve brought together in the Deep Blue page of this blog. But… Well… I can’t dwell on that here because… Well… You should read that page. 🙂

The point to note is the following: we do have different wavefunctions in different reference frames, but these wavefunctions describe the same physical reality, and they also do respect the symmetries we’d expect them to respect, except… Well… The laws describing the weak force don’t, but I wrote about that a very long time ago, and it was not in the context of trying to explain the relatively simple basic laws of quantum mechanics. 🙂 If you’re interested, you should check out my post(s) on that or, else, just google a bit. It’s really exciting stuff, but not something that will help you much to understand the basics, which is what we’re trying to do here. 🙂

The second point to note is that those transformations of the wavefunction – or of quantum-mechanical states – which we go through when rotating our reference frame, for example – are really quite natural. There’s nothing special about them. We had such transformations in classical mechanics too! But… Well… Yes, I admit they do look complicated. But then that’s why you’re so fascinated and why you’re reading this blog, isn’t it? 🙂

Post scriptum: It’s probably useful to be somewhat more precise on all of this. You’ll remember we visualized the wavefunction in some of our posts using the animation below. It uses a left-handed coordinate system, which is rather unusual but then it may have been made with a software which uses a left-handed coordinate system (like RenderMan, for example). Now the rotating arrow at the center moves with time and gives us the polarization of our wave. Applying our customary right-hand rule,you can see this beam is left-circularly polarized. [I know… It’s quite confusing, but just go through the motions here and be consistent.]AnimationNow, you know that ei·(p/ħ)·x and ei·(p/ħ)·x are each other’s complex conjugate:

  1. ei·k·x cos(k·x) + i·sin(k·x)
  2. ei·k·x cos(-k·x) + i·sin(-k·x) = cos(k·x) − i·sin(k·x)

Their real part – the cosine function – is the same, but the imaginary part – the sine function – has the opposite sign. So, assuming the direction of propagation is, effectively, the x-direction, then what’s the polarization of the mirror image? Well… The wave will now go from right to left, and its polarization… Hmm… Well… What? 

Well… If you can’t figure it out, then just forget about those signs and just imagine you’re effectively looking at the same thing from the backside. In fact, if you have a laptop, you can push the screen down and go around your computer. 🙂 There’s no shame in that. In fact, I did that just to make sure I am not talking nonsense here. 🙂 If you look at this beam from the backside, you’ll effectively see it go from right to left – instead of from what you see on this side, which is a left-to-right direction. And as for its polarization… Well… The angular momentum vector swaps direction too but the beam is still left-circularly polarized. So… Well… That’s consistent with what we wrote above. 🙂 The real world is real, and axial vectors are as real as polar vectors. This real beam will only appear to be right-circularly polarized in a mirror. Now, as mentioned above, that mirror world is not our world. If it would exist – in some other Universe – then it would be made up of anti-matter. 🙂

So… Well… Might it actually exist? Is there some other world made of anti-matter out there? I don’t know. We need to think about that reversal of ‘near’ and ‘far’ too: as mentioned, a mirror turns things inside out, so to speak. So what’s the implication of that? When we walk around something – or do a rotation – then the reversal between ‘near’ and ‘far’ is something physical: we go near to what was far, and we go away from what was near. But so how would we get into our mirror world, so to speak? We may say that this anti-matter world in the mirror is entirely possible, but then how would we get there? We’d need to turn ourselves, literally, inside out – like short of shrink to the zero point and then come back out of it to do that parity inversion along our line of sight. So… Well… I don’t see that happen, which is why I am a fan of the One World hypothesis. 🙂 So think the mirror world is just what it is: the mirror world. Nothing real. But… Then… Well… What do you think? 🙂

Quantum-mechanical magnitudes

As I was writing about those rotations in my previous post (on electron orbitals), I suddenly felt I should do some more thinking on (1) symmetries and (2) the concept of quantum-mechanical magnitudes of vectors. I’ll write about the first topic (symmetries) in some other post. Let’s first tackle the latter concept. Oh… And for those I frightened with my last post… Well… This should really be an easy read. More of a short philosophical reflection about quantum mechanics. Not a technical thing. Something intuitive. At least I hope it will come out that way. 🙂

First, you should note that the fundamental idea that quantities like energy, or momentum, may be quantized is a very natural one. In fact, it’s what the early Greek philosophers thought about Nature. Of course, while the idea of quantization comes naturally to us (I think it’s easier to understand than, say, the idea of infinity), it is, perhaps, not so easy to deal with it mathematically. Indeed, most mathematical ideas – like functions and derivatives – are based on what I’ll loosely refer to as continuum theory. So… Yes, quantization does yield some surprising results, like that formula for the magnitude of some vector J:Magnitude formulasThe J·J in the classical formula above is, of course, the equally classical vector dot product, and the formula itself is nothing but Pythagoras’ Theorem in three dimensions. Easy. I just put a + sign in front of the square roots so as to remind you we actually always have two square roots and that we should take the positive one. 🙂

I will now show you how we get that quantum-mechanical formula. The logic behind it is fairly straightforward but, at the same time… Well… You’ll see. 🙂 We know that a quantum-mechanical variable – like the spin of an electron, or the angular momentum of an atom – is not continuous but discrete: it will have some value = jj-1, j-2, …, -(j-2), -(j-1), –j. Our here is the maximum value of the magnitude of the component of our vector (J) in the direction of measurement, which – as you know – is usually written as Jz. Why? Because we will usually choose our coordinate system such that our z-axis is aligned accordingly. 🙂 Those values jj-1, j-2, …, -(j-2), -(j-1), –j are separated by one unit. That unit would be Planck’s quantum of action ħ ≈ 1.0545718×10−34 N·m·s – by the way, isn’t it amazing we can actually measure such tiny stuff in some experiment? 🙂 – if J would happen to be the angular momentum, but the approach here is more general – action can express itself in various ways 🙂 – so the unit doesn’t matter: it’s just the unit, so that’s just one. 🙂 It’s easy to see that this separation implies must be some integer or half-integer. [Of course, now you might think the values of a series like 2.4, 1.4, 0.4, -0.6, -1.6 are also separated by one unit, but… Well… That would violate the most basic symmetry requirement so… Well… No. Our has to be an integer or a half-integer. Please also note that the number of possible values for is equal to 2j+1, as we’ll use that in a moment.]

OK. You’re familiar with this by now and so I should not repeat the obvious. To make things somewhat more real, let’s assume = 3/2, so =  3/2, 1/2, -1/2 or +3/2. Now, we don’t know anything about the system and, therefore, these four values are all equally likely. Now, you may not agree with this assumption but… Well… You’ll have to agree that, at this point, you can’t come up with anything else that would make sense, right? It’s just like a classical situation: J might point in any direction, so we have to give all angles an equal probability. [In fact, I’ll show you – in a minute or so – that you actually have a point here: we should think some more about this assumption – but so that’s for later. I am asking you to just go along with this story as for now.]

So the expected value of Jz is E[Jz] is equal to E[Jz] = (1/4)·(3/2)+(1/4)·(1/2)+(1/4)·(-1/2)+(1/4)·(-3/2) = 0. Nothing new here. We just multiply probabilities with all of the possible values to get an expected value. So we get zero here because our values are distributed symmetrically around the zero point. No surprise. Now, to calculate a magnitude, we don’t need Jbut Jz2. In case you wonder, that’s what this squaring business is all about: we’re abstracting away from the direction and so we’re going to square both positive as well as negative values to then add it all up and take a square root. Now, the expected value of Jz2 is equal to E[Jz] = (1/4)·(3/2)2+(1/4)·(1/2)2+(1/4)·(-1/2)2+(1/4)·(-3/2)2 = 5/4 = 1.25. Some positive value.

You may note that it’s a bit larger than the average of the absolute value of our variable, which is equal to (|3/2|+|1/2|+|-1/2|+|-3/2|)/4 = 1, but that’s just because the squaring favors larger values 🙂 Also note that, of course, we’d also get some positive value if Jwould be a continuous variable over the [-3/2, +3/2] interval, but I’ll let you think about what positive value we’d get for Jzassuming Jz is uniform distributed over the [-3/2, +3/2] interval, because that calculation is actually not so straightforward as it may seem at first. In any case, these considerations are not very relevant to our story here, so let’s move on.

Of course, our z-direction was random, and so we get the same thing for whatever direction. More in particular, we’ll also get it for the x– and y-directions: E[Jx] = E[Jy] = E[Jz] = 5/4. Now, at this point it’s probably good to give you a more generalized formula for these quantities. I think you’ll easily agree to the following one:magnitude squared formulaSo now we can apply our classical J·J = JxJyJzformula to these quantities by calculating the expected value of JJ·J, which is equal to:

E[J·J] = E[Jx2] + E[Jy2] + E[Jz2] = 3·E[Jx2] = 3·E[Jy2] = 3·E[Jz2]

You should note we’re making use of the E[X Y] = E[X]+ E[Y] property here: the expected value of the sum of two variables is equal to the sum of the expected values of the variables, and you should also note this is true even if the individual variables would happen to be correlated – which might or might not be the case. [What do you think is the case here?]

For = 3/2, it’s easy to see we get E[J·J] = 3·E[Jx] = 3·5/4 = (3/2)·(3/2+1) = j·(j+1). We should now generalize this formula for other values of j,  which is not so easy… Hmm… It obviously involves some formula for a series, and I am not good at that… So… Well… I just checked if it was true for = 1/2 and = 1 (please check that at least for yourself too!) and then I just believe the authorities on this for all other values of j. 🙂

Now, in a classical situation, we know that J·J product will be the same for whatever direction J would happen to have, and so its expected value will be equal to its constant value J·J. So we can write: E[J·J] = J·J. So… Well… That’s why we write what we wrote above:Magnitude formulas

Makes sense, no? E[J·J] = E[Jx2+Jy2+Jz2] = E[Jx2]+E[Jy2]+E[Jz2] = j·(j+1) = J·J = J2, so = +√[j(j+1)], right?

Hold your horses, man! Think! What are we doing here, really? We didn’t calculate all that much above. We only found that E[Jx2]+E[Jy2]+E[Jz2] = E[Jx2+Jy2+Jz2] =  j·(j+1). So what? Well… That’s not a proof that the J vector actually exists.


Yes. That J vector might just be some theoretical concept. When everything is said and done, all we’ve been doing – or at least, we imagined we did – is those repeated measurements of JxJy and Jz here – or whatever subscript you’d want to use, like Jθ,φ, for example (the example is not random, of course) – and so, of course, it’s only natural that we assume these things are the magnitude of the component (in the direction of measurement) of some real vector that is out there, but then… Well… Who knows? Think of what we wrote about the angular momentum in our previous post on electron orbitals. We imagine – or do like to think – that there’s some angular momentum vector J out there, which we think of as being “cocked” at some angle, so its projection onto the z-axis gives us those discrete values for m which, for = 2, for example, are equal to 0, 1 or 2 (and -1 and -2, of course) – like in the illustration below. 🙂cocked angle 2But… Well… Note those weird angles: we get something close to 24.1° and then another value close to 54.7°. No symmetry here. 😦 The table below gives some more values for larger j. They’re easy to calculate – it’s, once again, just Pythagoras’ Theorem – but… Well… No symmetries here. Just weird values. [I am not saying the formula for these angles is not straightforward. That formula is easy enough: θ = sin-1(m/√[j(j+1)]). It’s just… Well… No symmetry. You’ll see why that matters in a moment.]CaptureI skipped the half-integer values for in the table above so you might think they might make it easier to come up with some kind of sensible explanation for the angles. Well… No. They don’t. For example, for = 1/2 and m = ± 1/2, the angles are ±35.2644° – more or less, that is. 🙂 As you can see, these angles do not nicely cut up our circle in equal pieces, which triggers the obvious question: are these angles really equally likely? Equal angles do not correspond to equal distances on the z-axis (in case you don’t appreciate the point, look at the illustration below).  angles distance

So… Well… Let me summarize the issue on hand as follows: the idea of the angle of the vector being randomly distributed is not compatible with the idea of those Jz values being equally spaced and equally likely. The latter idea – equally spaced and equally likely Jz values – relates to different possible states of the system being equally likely, so… Well… It’s just a different idea. 😦

Now there is another thing which we should mention here. The maximum value of the z-component of our J vector is always smaller than that quantum-mechanical magnitude, and quite significantly so for small j, as shown in the table below. It is only for larger values of that the ratio of the two starts to converge to 1. For example, for = 25, it is about 1.02, so that’s only 2% off. convergenceThat’s why physicists tell us that, in quantum mechanics, the angular momentum is never “completely along the z-direction.” It is obvious that this actually challenges the idea of a very precise direction in quantum mechanics, but then that shouldn’t surprise us, does it? After, isn’t this what the Uncertainty Principle is all about?

Different states, rather than different directions… And then Uncertainty because… Well… Because of discrete variables that won’t split in the middle. Hmm… 😦

Perhaps. Perhaps I should just accept all of this and go along with it… But… Well… I am really not satisfied here, despite Feynman’s assurance that that’s OK: “Understanding of these matters comes very slowly, if at all. Of course, one does get better able to know what is going to happen in a quantum-mechanical situation—if that is what understanding means—but one never gets a comfortable feeling that these quantum-mechanical rules are ‘natural’.”

I do want to get that comfortable feeling – on some sunny day, at least. 🙂 And so I’ll keep playing with this, until… Well… Until I give up. 🙂 In the meanwhile, if you’d feel you’ve got some better or some more intuitive explanation for all of this, please do let me know. I’d be very grateful to you. 🙂

Post scriptum: Of course, we would all want to believe that J somehow exists because… Well… We want to explain those states somehow, right? I, for one, am not happy with being told to just accept things and shut up. So let me add some remarks here. First, you may think that the narrative above should distinguish between polar and axial vectors. You’ll remember polar vectors are the real vectors, like a radius vector r, or a force F, or velocity or (linear) momentum. Axial vectors (also known as pseudo-vectors) are vectors like the angular momentum vector: we sort of construct them from… Well… From real vectors. The angular momentum L, for example, is the vector cross product of the radius vector r and the linear momentum vector p: we write L = r×p. In that sense, they’re a figment of our imagination. But then… What’s real and unreal? The magnitude of L, for example, does correspond to something real, doesn’t it? And its direction does give us the direction of circulation, right? You’re right. Hence, I think polar and axial vectors are both real – in whatever sense you’d want to define real. Their reality is just different, and that’s reflected in their mathematical behavior: if you change the direction of the axes of your reference frame, polar vectors will change sign too, as opposed to axial vectors: they don’t swap sign. They do something else, which I’ll explain in my next post, where I’ll be talking symmetries.

But let us, for the sake of argument, assume whatever I wrote about those angles applies to axial vectors only. Let’s be even more specific, and say it applies to the angular momentum vector only. If that’s the case, we may want to think of a classical equivalent for the mentioned lack of a precise direction: free nutation. It’s a complicated thing – even more complicated than the phenomenon of precession, which we should be familiar with by now. Look at the illustration below (which I took from an article of a physics professor from Saint Petersburg), which shows both precession as well as nutation. Think of the movement of a spinning top when you release it: its axis will, at first, nutate around the axis of precession, before it settles in a more steady precession.nutationThe nutation is caused by the gravitational force field, and the nutation movement usually dies out quickly because of dampening forces (read: friction). Now, we don’t think of gravitational fields when analyzing angular momentum in quantum mechanics, and we shouldn’t. But there is something else we may want to think of. There is also a phenomenon which is referred to as free nutation, i.e. a nutation that is not caused by an external force field. The Earth, for example, nutates slowly because of a gravitational pull from the Sun and the other planets – so that’s not a free nutation – but, in addition to this, there’s an even smaller wobble – which is an example of free nutation – because the Earth is not exactly spherical. In fact, the Great Mathematician, Leonhard Euler, had already predicted this, back in 1765, but it took another 125 years or so before an astronomist, Seth Chandler, could finally experimentally confirm and measure it. So they named this wobble the Chandler wobble (Euler already has too many things named after him). 🙂

Now I don’t have much backup here – none, actually 🙂 – but why wouldn’t we imagine our electron would also sort of nutate freely because of… Well… Some symmetric asymmetry – something like the slightly elliptical shape of our Earth. 🙂 We may then effectively imagine the angular momentum vector as continually changing direction between a minimum and a maximum angle – something like what’s shown below, perhaps, between 0 and 40 degrees. Think of it as a rotation within a rotation, or an oscillation within an oscillation – or a standing wave within a standing wave. 🙂wobblingI am not sure if this approach would solve the problem of our angles and distances – the issue of whether we should think in equally likely angles or equally likely distances along the z-axis, really – but… Well… I’ll let you play with this. Please do send me some feedback if you think you’ve found something. 🙂

Whatever your solution is, it is likely to involve the equipartition theorem and harmonics, right? Perhaps we can, indeed, imagine standing waves within standing waves, and then standing waves within standing waves. How far can we go? 🙂

Post scriptum 2: When re-reading this post, I was thinking I should probably do something with the following idea. If we’ve got a sphere, and we’re thinking of some vector pointing to some point on the surface of that sphere, then we’re doing something which is referred to as point picking on the surface of a sphere, and the probability distributions – as a function of the polar and azimuthal angles θ and φ – are quite particular. See the article on the Wolfram site on this, for example. I am not sure if it’s going to lead to some easy explanation of the ‘angle problem’ we’ve laid out here but… Well… It’s surely an element in the explanation. The key idea here is shown in the illustration below: if the direction of our momentum in three-dimensional space is really random, there may still be more of a chance of an orientation towards the equator, rather than towards the pole. So… Well… We need to study the math of this. 🙂 But that’s for later.density

Re-visiting electron orbitals

One of the pieces I barely gave a glance when reading Feynman’s Lectures over the past few years, was the derivation of the non-spherical electron orbitals for the hydrogen atom. It just looked like a boring piece of math – and I thought the derivation of the s-orbitals – the spherically symmetrical ones – was interesting enough already. To some extent, it is – but there is so much more to it. When I read it now, the derivation of those p-, d-, f– etc. orbitals brings all of the weirdness of quantum mechanics together and, while doing so, also provides for a deeper understanding of all of the ideas and concepts we’re trying to get used to. In addition, Feynman’s treatment of the matter is actually much shorter than what you’ll find in other textbooks, because… Well… As he puts it, he takes a shortcut. So let’s try to follow the bright mind of our Master as he walks us through it.

You’ll remember – if not, check it out again – that we found the spherically symmetric solutions for Schrödinger’s equation for our hydrogen atom. Just to be make sure, Schrödinger’s equation is a differential equation – a condition we impose on the wavefunction for our electron – and so we need to find the functional form for the wavefunctions that describe the electron orbitals. [Quantum math is so confusing that it’s often good to regularly think of what it is that we’re actually trying to do. :-)] In fact, that functional form gives us a whole bunch of solutions – or wavefunctions – which are defined by three quantum numbers: n, l, and m. The parameter n corresponds to an energy level (En), l is the orbital (quantum) number, and m is the z-component of the angular momentum. But that doesn’t say much. Let’s go step by step.

First, we derived those spherically symmetric solutions – which are referred to as s-states – assuming this was a state with zero (orbital) angular momentum, which we write as = 0. [As you know, Feynman does not incorporate the spin of the electron in his analysis, which is, therefore, approximative only.] Now what exactly is a state with zero angular momentum? When everything is said and done, we are effectively trying to describe some electron orbital here, right? So that’s an amplitude for the electron to be somewhere, but then we also know it always moves. So, when everything is said and done, the electron is some circulating negative charge, right? So there is always some angular momentum and, therefore, some magnetic moment, right?

Well… If you google this question on Physics Stack Exchange, you’ll get a lot of mumbo jumbo telling you that you shouldn’t think of the electron actually orbiting around. But… Then… Well… A lot of that mumbo jumbo is contradictory. For example, one of the academics writing there does note that, while we shouldn’t think of an electron as some particle, the orbital is still a distribution which gives you the probability of actually finding the electron at some point (x,y,z). So… Well… It is some kind of circulating charge – as a point, as a cloud or as whatever. The only reasonable answer – in my humble opinion – is that = 0 probably means there is no net circulating charge, so the movement in this or that direction must balance the movement in the other. One may note, in this regard, that the phenomenon of electron capture in nuclear reactions suggests electrons do travel through the nucleus for at least part of the time, which is entirely coherent with the wavefunctions for s-states – shown below – which tell us that the most probable (x, y, z) position for the electron is right at the center – so that’s where the nucleus is. There is also a non-zero probability for the electron to be at the center for the other orbitals (pd, etcetera).s-statesIn fact, now that I’ve shown this graph, I should quickly explain it. The three graphs are the spherically symmetric wavefunctions for the first three energy levels. For the first energy level – which is conventionally written as n = 1, not as n = 0 – the amplitude approaches zero rather quickly. For n = 2 and n = 3, there are zero-crossings: the curve passes the r-axis. Feynman calls these zero-crossing radial nodes. To be precise, the number of zero-crossings for these s-states is n − 1, so there’s none for = 1, one for = 2, two for = 3, etcetera.

Now, why is the amplitude – apparently – some real-valued function here? That’s because we’re actually not looking at ψ(r, t) here but at the ψ(r) function which appears in the following break-up of the actual wavefunction ψ(r, t):

ψ(r, t) = ei·(E/ħ)·t·ψ(r)

So ψ(r) is more of an envelope function for the actual wavefunction, which varies both in space as well as in time. It’s good to remember that: I would have used another symbol, because ψ(r, t) and ψ(r) are two different beasts, really – but then physicists want you to think, right? And Mr. Feynman would surely want you to do that, so why not inject some confusing notation from time to time? 🙂 So for = 3, for example, ψ(r) goes from positive to negative and then to positive, and these areas are separated by radial nodes. Feynman put it on the blackboard like this:radial nodesI am just inserting it to compare this concept of radial nodes with the concept of a nodal plane, which we’ll encounter when discussing p-states in a moment, but I can already tell you what they are now: those p-states are symmetrical in one direction only, as shown below, and so we have a nodal plane instead of a radial node. But so I am getting ahead of myself here… 🙂nodal planesBefore going back to where I was, I just need to add one more thing. 🙂 Of course, you know that we’ll take the square of the absolute value of our amplitude to calculate a probability (or the absolute square – as we abbreviate it), so you may wonder why the sign is relevant at all. Well… I am not quite sure either but there’s this concept of orbital parity which you may have heard of.  The orbital parity tells us what will happen to the sign if we calculate the value for ψ for −r rather than for r. If ψ(−r) = ψ(r), then we have an even function – or even orbital parity. Likewise, if ψ(−r) = −ψ(r), then we’ll the function odd – and so we’ll have an odd orbital parity. The orbital parity is always equal to (-1)l = ±1. The exponent is that angular quantum number, and +1, or + tout court, means even, and -1 or just − means odd. The angular quantum number for those p-states is = 1, so that works with the illustration of the nodal plane. 🙂 As said, it’s not hugely important but I might as well mention in passing – especially because we’ll re-visit the topic of symmetries a few posts from now. 🙂

OK. I said I would talk about states with some angular momentum (so ≠ 0) and so it’s about time I start doing that. As you know, our orbital angular momentum is measured in units of ħ (just like the total angular momentum J, which we’ve discussed ad nauseam already). We also know that if we’d measure its component along any direction – any direction really, but physicists will usually make sure that the z-axis of their reference frame coincides with, so we call it the z-axis 🙂 – then we will find that it can only have one of a discrete set of values m·ħ l·ħ, (l-1)·ħ, …, -(l-1)·ħ, –l·ħ. Hence, just takes the role of our good old quantum number here, and m is just Jz. Likewise, I’d like to introduce l as the equivalent of J, so we can easily talk about the angular momentum vector. And now that we’re here, why not write in bold type too, and say that m is the z-component itself – i.e. the whole vector quantity, so that’s the direction and the magnitude.

Now, we do need to note one crucial difference between and l, or between J and l: our j could be an integer or a half-integer. In contrast, must be some integer. Why? Well… If can be zero, and the values of must be separated by a full unit, then l must be 1, 2, 3 etcetera. 🙂 If this simple answer doesn’t satisfy you, I’ll refer you to Feynman’s, which is also short but more elegant than mine. 🙂 Now, you may or may not remember that the quantum-mechanical equivalent of the magnitude of a vector quantity such as l is to be calculated as √[l·(l+1)]·ħ, so if = 1, that magnitude will be √2·ħ ≈ 1.4142·ħ, so that’s – as expected – larger than the maximum value for m, which is +1. As you know, that leads us to think of that z-component m as a projection of l. Paraphrasing Feynman, the limited set of values for m imply that the angular momentum is always “cocked” at some angle. For = 1, that angle is either +45° or, else, −45°, as shown below.cocked angleWhat if l = 2? The magnitude of is then equal to √[2·(2+1)]·ħ = √6·ħ ≈ 2.4495·ħ. How do we relate that to those “cocked” angles? The values of now range from -2 to +2, with a unit distance in-between. The illustration below shows the angles. [I didn’t mention ħ any more in that illustration because, by now, we should know it’s our unit of measurement – always.]

cocked angle 2Note we’ve got a bigger circle here (the radius is about 2.45 here, as opposed to a bit more than 1.4 for m = 0). Also note that it’s not a nice cake with perfectly equal pieces. From the graph, it’s obvious that the formula for the angle is the following:angle formulaIt’s simple but intriguing. Needless to say, the sin −1 function is the inverse sine, also known as the arcsine. I’ve calculated the values for all for l = 1, 2, 3, 4 and 5 below. The most interesting values are the angles for = 1 and l. As the graphs underneath show, for = 1, the values start approaching the zero angle for very large l, so there’s not much difference any more between = ±1 and = 1 for large values of l. What about the l case? Well… Believe it or not, if becomes really large, then these angles do approach 90°. If you don’t remember how to calculate limits, then just calculate θ for some huge value for and m. For = 1,000,000, for example, you should find that θ = 89.9427…°. 🙂angles

graphIsn’t this fascinating? I’ve actually never seen this in a textbook – so it might be an original contribution. 🙂 OK. I need to get back to the grind: Feynman’s derivation of non-symmetrical electron orbitals. Look carefully at the illustration below. If m is really the projection of some angular momentum that’s “cocked”, either at a zero-degree or, alternatively, at ±45º (for the = 1 situation we show here) – a projection on the z-axis, that is – then the value of m (+1, 0 or -1) does actually correspond to some idea of the orientation of the space in which our electron is circulating. For = 0, that space – think of some torus or whatever other space in which our electron might circulate – would have some alignment with the z-axis. For = ±1, there is no such alignment. m = 0

The interpretation is tricky, however, and the illustration on the right-hand side above is surely too much of a simplification: an orbital is definitely not like a planetary orbit. It doesn’t even look like a torus. In fact, the illustration in the bottom right corner, which shows the probability density, i.e. the space in which we are actually likely to find the electron, is a picture that is much more accurate – and it surely does not resemble a planetary orbit or some torus. However, despite that, the idea that, for = 0, we’d have some alignment of the space in which our electron moves with the z-axis is not wrong. Feynman expresses it as follows:

“Suppose m is zero, then there can be some non-zero amplitude to find the electron on the z-axis at some distance r. We’ll call this amplitude Fl(r).”

You’ll say: so what? And you’ll also say that illustration in the bottom right corner suggests the electron is actually circulating around the z-axis, rather than through it. Well… No. That illustration does not show any circulation. It only shows a probability density. No suggestion of any actual movement or circulation. So the idea is valid: if = 0, then the implication is that, somehow, the space of circulation of current around the direction of the angular momentum vector (J), as per the well-known right-hand rule, will include the z-axis. So the idea of that electron orbiting through the z-axis for = 0 is essentially correct, and the corollary is… Well… I’ll talk about that in a moment.

But… Well… So what? What’s so special about that Fl(r) amplitude? What can we do with that? Well… If we would find a way to calculate Fl(r), then we know everything. Huh? Everything? Yes. The reasoning here is quite complicated, so please bear with me as we go through it.

The first thing you need to accept, is rather weird. The thing we said about the non-zero amplitudes to find the electron somewhere on the z-axis for the m = 0 state – which, using Dirac’s bra-ket notation, we’ll write as |l= 0〉 – has a very categorical corollary:

The amplitude to find an electron whose state m is not equal to zero on the z-axis (at some non-zero distance r) is zero. We can only find an electron on the z-axis unless the z-component of its angular momentum (m) is zero. 

Now, I know this is hard to swallow, especially when looking at those 45° angles for J in our illustrations, because these suggest the actual circulation of current may also include at least part of the z-axis. But… Well… No. Why not? Well… I have no good answer here except for the usual one which, I admit, is quite unsatisfactory: it’s quantum mechanics, not classical mechanics. So we have to look at the m and m vectors, which are pointed along the z-axis itself for m = ±1 and, hence, the circulation we’d associate with those momentum vectors (even if they’re the zcomponent only) is around the z-axis. Not through or on it. I know it’s a really poor argument, but it’s consistent with our picture of the actual electron orbitals – that picture in terms of probability densities, which I copy below. For m = −1, we have the yz-plane as the nodal plane between the two lobes of our distribution, so no amplitude to find the electron on the z-axis (nor would we find it on the y-axis, as you can see). Likewise, for m = +1, we have the xz-plane as the nodal plane. Both nodal planes include the z-axis and, therefore, there’s zero probability on that axis. p orbitals

In addition, you may also want to note the 45° angle we associate with = ±1 does sort of demarcate the lobes of the distribution by defining a three-dimensional cone and… Well… I know these arguments are rather intuitive, and so you may refuse to accept them. In fact, to some extent, refuse to accept them. 🙂 Indeed, let me say this loud and clear: I really want to understand this in a better way! 

But… Then… Well… Such better understanding may never come. Feynman’s warning, just before he starts explaining the Stern-Gerlach experiment and the quantization of angular momentum, rings very true here: “Understanding of these matters comes very slowly, if at all. Of course, one does get better able to know what is going to happen in a quantum-mechanical situation—if that is what understanding means—but one never gets a comfortable feeling that these quantum-mechanical rules are “natural.” Of course they are, but they are not natural to our own experience at an ordinary level.” So… Well… What can I say?

It is now time to pull the rabbit out of the hat. To understand what we’re going to do next, you need to remember that our amplitudes – or wavefunctions – are always expressed with regard to a specific frame of reference, i.e. some specific choice of an x-, y– and z-axis. If we change the reference frame – say, to some new set of x’-, y’– and z’-axes – then we need to re-write our amplitudes (or wavefunctions) in terms of the new reference frame. In order to do so, one should use a set of transformation rules. I’ve written several posts on that – including a very basic one, which you may want to re-read (just click the link here).

Look at the illustration below. We want to calculate the amplitude to find the electron at some point in space. Our reference frame is the x, y, z frame and the polar coordinates (or spherical coordinates, I should say) of our point are the radial distance r, the polar angle θ (theta), and the azimuthal angle φ (phi). [The illustration below – which I copied from Feynman’s exposé – uses a capital letter for phi, but I stick to the more usual or more modern convention here.]

change of reference frame

In case you wonder why we’d use polar coordinates rather than Cartesian coordinates… Well… I need to refer you to my other post on the topic of electron orbitals, i.e. the one in which I explain how we get the spherically symmetric solutions: if you have radial (central) fields, then it’s easier to solve stuff using polar coordinates – although you wouldn’t think so if you think of that monster equation that we’re actually trying to solve here:

new de

It’s really Schrödinger’s equation for the situation on hand (i.e. a hydrogen atom, with a radial or central Coulomb field because of its positively charged nucleus), but re-written in terms of polar coordinates. For the detail, see the mentioned post. Here, you should just remember we got the spherically symmetric solutions assuming the derivatives of the wavefunction with respect to θ and φ – so that’s the ∂ψ/∂θ and ∂ψ/∂φ in the equation abovewere zero. So now we don’t assume these partial derivatives to be zero: we’re looking for states with an angular dependence, as Feynman puts it somewhat enigmatically. […] Yes. I know. This post is becoming very long, and so you are getting impatient. Look at the illustration with the (r, θ, φ) point, and let me quote Feynman on the line of reasoning now:

“Suppose we have the atom in some |lm〉 state, what is the amplitude to find the electron at the angles θ and φ and the distance from the origin? Put a new z-axis, say z’, at that angle (see the illustration above), and ask: what is the amplitude that the electron will be at the distance along the new z’-axis? We know that it cannot be found along z’ unless its z’-component of angular momentum, say m’, is zero. When m’ is zero, however, the amplitude to find the electron along z’ is Fl(r). Therefore, the result is the product of two factors. The first is the amplitude that an atom in the state |lm〉 along the z-axis will be in the state |lm’ = 0〉 with respect to the z’-axis. Multiply that amplitude by Fl(r) and you have the amplitude ψl,m(r) to find the electron at (r, θ, φ) with respect to the original axes.”

So what is he telling us here? Well… He’s going a bit fast here. 🙂 Worse, I think he may actually not have chosen the right words here, so let me try to rephrase it. We’ve introduced the Fl(r) function above: it was the amplitude, for m = 0, to find the electron on the z-axis at some distance r. But so here we’re obviously in the x’, y’, z’ frame and so Fl(r) is the amplitude for m’ = 0,  it’s the amplitude to find the electron on the z-axis at some distance r along the z’-axis. Of course, for this amplitude to be non-zero, we must be in the |lm’ = 0〉 state, but are we? Well… |lm’ = 0〉 actually gives us the amplitude for that. So we’re going to multiply two amplitudes here:

Fl(r)·|lm’ = 0〉

So this amplitude is the product of two amplitudes as measured in the the x’, y’, z’ frame. Note it’s symmetric: we may also write it as |lm’ = 0〉·Fl(r). We now need to sort of translate that into an amplitude as measured in the x, y, frame. To go from x, y, z to x’, y’, z’, we first rotated around the z-axis by the angle φ, and then rotated around the new y’-axis by the angle θ. Now, the order of rotation matters: you can easily check that by taking a non-symmetrical object in your hand and doing those rotations in the two different sequences: check what happens to the orientation of your object. Hence, to go back we should first rotate about the y’-axis by the angle −θ, so our z’-axis folds into the old z-axis, and then rotate about the z-axis by the angle −φ.

Now, we will denote the transformation matrices that correspond to these rotations as Ry’(−θ) and Rz(−φ) respectively. These transformation matrices are complicated beasts. They are surely not the easy rotation matrices that you can use for the coordinates themselves. You can click this link to see how they look like for = 1. For larger l, there are other formulas, which Feynman derives in another chapter of his Lectures on quantum mechanics. But let’s move on. Here’s the grand result:

The amplitude for our wavefunction ψl,m(r) – which denotes the amplitude for (1) the atom to be in the state that’s characterized by the quantum numbers and m and – let’s not forget – (2) find the electron at r – note the bold type: = (x, y, z) – would be equal to:

ψl,m(r) = 〈l, m|Rz(−φ) Ry’(−θ)|lm’ = 0〉·Fl(r)

Well… Hmm… Maybe. […] That’s not how Feynman writes it. He writes it as follows:

ψl,m(r) = 〈l, 0|Ry(θ) Rz(φ)|lm〉·Fl(r)

I am not quite sure what I did wrong. Perhaps the two expressions are equivalent. Or perhaps – is it possible at all? – Feynman made a mistake? I’ll find out. [P.S: I re-visited this point in the meanwhile: see the P.S. to this post. :-)] The point to note is that we have some combined rotation matrix Ry(θ) Rz(φ). The elements of this matrix are algebraic functions of θ and φ, which we will write as Yl,m(θ, φ), so we write:

a·Yl,m(θ, φ) = 〈l, 0|Ry(θ) Rz(φ)|lm

Or a·Yl,m(θ, φ) = 〈l, m|Rz(−φ) Ry’(−θ)|lm’ = 0〉, if Feynman would have it wrong and my line of reasoning above would be correct – which is obviously not so likely. Hence, the ψl,m(r) function is now written as:

ψl,m(r) = a·Yl,m(θ, φ)·Fl(r)

The coefficient is, as usual, a normalization coefficient so as to make sure the surface under the probability density function is 1. As mentioned above, we get these Yl,m(θ, φ) functions from combining those rotation matrices. For = 1, and = -1, 0, +1, they are:spherical harmonics A more complete table is given below:spherical harmonics 2So, yes, we’re done. Those equations above give us those wonderful shapes for the electron orbitals, as illustrated below (credit for the illustration goes to an interesting site of the UC Davis school).electron orbitalsBut… Hey! Wait a moment! We only have these Yl,m(θ, φ) functions here. What about Fl(r)?

You’re right. We’re not quite there yet, because we don’t have a functional form for Fl(r). Not yet, that is. Unfortunately, that derivation is another lengthy development – and that derivation actually is just tedious math only. Hence, I will refer you to Feynman for that. 🙂 Let me just insert one more thing before giving you The Grand Equation, and that’s a explanation of how we get those nice graphs. They are so-called polar graphs. There is a nice and easy article on them on the website of the University of Illinois, but I’ll summarize it for you. Polar graphs use a polar coordinate grid, as opposed to the Cartesian (or rectangular) coordinate grid that we’re used to. It’s shown below. 

The origin is now referred to as the pole – like in North or South Pole indeed. 🙂 The straight lines from the pole (like the diagonals, for example, or the axes themselves, or any line in-between) measure the distance from the pole which, in this case, goes from 0 to 10, and we can connect the equidistant points by a series of circles – as shown in the illustration also. These lines from the pole are defined by some angle – which we’ll write as θ to make things easy 🙂 – which just goes from 0 to 2π = 0 and then round and round and round again. The rest is simple: you’re just going to graph a function, or an equation – just like you’d graph y = ax + b in the Cartesian plane – but it’s going to be a polar equation. Referring back to our p-orbitals, we’ll want to graph the cos2θ = ρ equation, for example, because that’s going to show us the shape of that probability density function for = 1 and = 0. So our graph is going to connect the (θ, ρ) points for which the angle (θ) and the distance from the pole (ρ) satisfies the cos2θ = ρ equation. There is a really nice widget on the WolframAlpha site that produces those graphs for you. I used it to produce the graph below, which shows the 1.1547·cos2θ = ρ graph (the 1.1547 coefficient is the normalization coefficient a). Now, you’ll wonder why this is a curve, or a curved line. That widget even calculates its length: it’s about 6.374743 units long. So why don’t we have a surface or a volume here? We didn’t specify any value for ρ, did we? No, we didn’t. The widget calculates those values from the equation. So… Yes. It’s a valid question: where’s the distribution? We were talking about some electron cloud or something, right?

Right. To get that cloud – those probability densities really – we need that Fl(r) function. Our cos2θ = ρ is, once again, just some kind of envelope function: it marks a space but doesn’t fill it, so to speak. 🙂 In fact, I should now give you the complete description, which has all of the possible states of the hydrogen atom – everything! No separate pieces anymore. Here it is. It also includes n. It’s The Grand Equation:The ak coefficients in the formula for ρFn,l(ρ) are the solutions to the equation below, which I copied from Feynman’s text on it all. I’ll also refer you to the same text to see how you actually get solutions out of it, and what they then actually represent. 🙂We’re done. Finally!

I hope you enjoyed this. Look at what we’ve achieved. We had this differential equation (a simple diffusion equation, really, albeit in the complex space), and then we have a central Coulomb field and the rather simple concept of quantized (i.e. non-continuous or discrete) angular momentum. Now see what magic comes out of it! We literally constructed the atomic structure out of it, and it’s all wonderfully elegant and beautiful.

Now think that’s amazing, and if you’re reading this, then I am sure you’ll find it as amazing as I do.

Note: I did a better job in explaining the intricacies of actually representing those orbitals in a later post. I recommend you have a look at it by clicking the link here.

Post scriptum on the transformation matrices:

You must find the explanation for that 〈l, 0|Ry(θ) Rz(φ)|lm〉·Fl(r) product highly unsatisfactory, and it is. 🙂 I just wanted to make you think – rather than just superficially read through it. First note that Fl(r)·|lm’ = 0〉 is not a product of two amplitudes: it is the product of an amplitude with a state. A state is a vector in a rather special vector space – a Hilbert space (just a nice word to throw around, isn’t it?). The point is: a state vector is written as some linear combination of base states. Something inside of me tells me we may look at the three p-states as base states, but I need to look into that.

Let’s first calculate the Ry(θ) Rmatrix to see if we get those formulas for the angular dependence of the amplitudes. It’s the product of the Ry(θ) and Rmatrices, which I reproduce below.

Note that this product is non-commutative because… Well… Matrix products generally are non-commutative. 🙂 So… Well… There they are: the second row gives us those functions, so am wrong, obviously, and Dr. Feynman is right. Of course, he is. He is always right – especially because his Lectures have gone through so many revised editions that all errors must be out by now. 🙂

However, let me – just for fun – also calculate my Rz(−φ) Ry’(−θ) product. I can do so in two steps: first I calculate Rz(φ) Ry’(θ), and then I substitute the angles φ and θ for –φ and –θ, remembering that cos(–α) = cos(α) and sin(–α) = –sin(α). I might have made a mistake, but I got this:The functions look the same but… Well… No. The eiφ and eiφ are in the wrong place (it’s just one minus sign – but it’s crucially different). And then these functions should not be in a column. That doesn’t make sense when you write it all out. So Feynman’s expression is, of course, fully correct. But so how do we interpret that 〈l, 0|Ry(θ) Rz(φ)|lm〉 expression then? This amplitude probably answers the following question:

Given that our atom is in the |lm〉 state, what is the amplitude for it to be in the 〈l, 0| state in the x’, y’, z’ frame?

That makes sense – because we did start out with the assumption that our atom was in the the |lm〉 state, so… Yes. Think about it some more and you’ll see it all makes sense: we can – and should – multiply this amplitude with the Fl(r) amplitude.

OK. Now we’re really done with this. 🙂

Note: As for the 〈 | and  | 〉 symbols to denote a state, note that there’s not much difference: both are state vectors, but a state vector that’s written as an end state – so that’s like 〈 Φ | – is a 1×3 vector (so that’s a column vector), while a vector written as | Φ 〉 is a 3×1 vector (so that’s a row vector). So that’s why 〈l, 0|Ry(θ) Rz(φ)|lm〉 does give us some number. We’ve got a (1×3)·(3×3)·(3×1) matrix product here – but so it gives us what we want: a 1×1 amplitude. 🙂

The state(s) of a photon

While hurrying to try to understand the things I wanted to understand most – like Schrödinger’s equation and, equally important, its solutions explaining the weird shapes of electron orbitals – I skipped some interesting bits and pieces. Worse, I skipped two or three of Feynman’s Lectures on quantum mechanics entirely. These include Chapter 17 – on symmetry and conservation laws – and Chapter 18 – on angular momentum. With the benefit of hindsight, that was not the right thing to do. If anything, doing all of the Lectures would, at the very least, ensure I would have more than an ephemeral grasp of it all. So… In this and the next post, I want to tidy up and go over everything I skipped so far. 🙂

We’ve written a lot on how quantum mechanics applies to both bosons as well as fermions. For example, we pointed out – in very much detail – that the mathematical structure of the electromagnetic wave – light! 🙂 – is quite similar to that of the ubiquitous wavefunction. Equally fundamental – if not more – is the fact that light also arrives in lumps – little light-particles which we call photons. It’s the photoelectric effect, which Einstein explained in 1905 by… Well… By telling us that light consists of quanta – photons – whose energy must be high enough so as to be able to dislodge an electron. It’s what got him his Nobel Prize. [Einstein never got a Nobel Prize for his relativity theory, which is – arguably – at least as important. There’s a lot of controversy around that but, in any case, that’s history.]

So it shouldn’t surprise you that there’s an equivalent to the spin of an electron. With spin, we refer to the angular momentum of a quantum-mechanical system – an atom, a nucleus, an electron, whatever – which, as you know, can only be one of a set of discrete values when measured along some direction, which we usually refer to as the z-direction. More formally, we write that the z-component of the angular moment J is equal to

Jz = j·ħ, (j-1)·ħ, (j-2)·ħ, …, -(j-2)·ħ, -(j-1)·ħ, –j·ħ

The in this expression is the so-called spin of the system. For an electron, it’s equal to ±1/2, which we referred to as “up” and “down” states respectively because of obvious reasons: one state points upwards – more or less, that is (we know the angular momentum will actually precess around the direction of the magnetic field) – while the other points downwards.

We also know that the magnetic energy of an electron in a (weak) magnetic field – which, as you know, we conveniently assume to be pointing in the same z-direction, so B= B – will be equal to:

Umag = g·μz·B·= ± 2·μz·B·(1/2) = ± μz·B = ± B·(qe·ħ)/(2m)

In short, the magnetic energy is proportional to the magnetic field, and the constant of proportionality is the so-called Bohr magneton qe·ħ/2m. So far, so good. What’s the analog for a photon?

Well… Let’s first discuss the equivalent of a Stern-Gerlach apparatus for photons. That would be a polarizing material, like a piece of calcite, for example. Now, it is, unfortunately, much more difficult to explain how a polarizing material works than to explain how a Stern-Gerlach apparatus works. [If you thought the workings of that (hypothetical) Stern-Gerlach filter were difficult to understand, think again.] We actually have different types of polarizers – some complicated, some easy. We’ll take the easy ones: linear ones. In addition, the phenomenon of polarization itself is a bit intricate. The phenomenon is well described in Chapter 33 of Feynman’s first Volume of Lectures, out of which I copied the two illustrations below the next paragraph.

Of course, to make sure you think about whatever is that you’re reading, Feynman now chooses the z-direction such that it coincides with the direction of propagation of the electromagnetic radiation. So it’s now the x– and y-direction that we’re looking at. Not the z-direction any more. As usual, we forget about the magnetic field vector B and so we think of the oscillating electric field vector E only. Why can we forget about B? Well… If we have E, we know B. Full stop. As you know, I think B is pretty essential in the analysis too but… Well… You’ll see all textbooks on physics quickly forget about B when describing light. I don’t want to do that, but… Well… I need to move on. [I’ll come back to the matter – sideways – at the end of this post. :-)]

So we know the electric field vector E may oscillate in a plane (so that’s up and down and back again) but – interestingly enough – its direction may also rotate around the z-axis (again, remember the z-axis is the direction of propagation). Why? Well… Because E has an x– and a y-component (no z-component!), and these two components may oscillate in phase or out of phase, and so all of the combinations below are possible.Linear polarizationElliptical polarizationTo make a long story short, light comes in two varieties: linearly polarized and elliptically polarized. Of course, elliptically may be circularly – if you’re lucky! 🙂

Now, a (linear) polarizer has an optical axis, and only light whose E vector is oscillating along that axis will go through. […] OK. That’s not true: the component along the optical axis of some E pointing in some other direction will go through too! I’ll show how that works in a moment. But so all the rest is absorbed, and the absorbed energy just heats up the polarizer (which, of course, then radiates heat back out).

In any case, if the optical axis happens to be our x-axis, then we know that the light that comes through will be x-polarized, so that corresponds to the rather peculiar Ex = 1 and Ey = 0 notation. [This notation refers to coefficients we’ll use later to resolve states into base states – but don’t worry about it now.] Needless to say, you shouldn’t confuse the electric field vector E with the energy of our photon, which we denote as E. No bold letter here. No subscript. 🙂

Pfff… This introduction is becoming way too long. What about our photon? We want to talk about one photon only and we’ve already written over a page and haven’t started yet. 🙂

Well… First, we must note that we’ll assume the light is perfectly monochromatic, so all photons will have an energy that’s equal to E = h·f, so the energy is proportional to the frequency of our light, and the constant of proportionality is Planck’s constant. That’s Einstein’s relation, not a de Broglie relation. Just remember: we’re talking definite energy states here.

Second – and much more importantly – we may define two base states for our photon, |x〉 and |y〉 respectively, which correspond to the classical linear x– and y-polarization. So a photon can be in state |x〉 or |y〉 but, as usual, it is much more likely to be in some state that is some linear combination of these two base states.

OK. Now we can start playing with these ideas. Imagine a polarizer – or polaroid, as Feynman calls it – whose optical axis is tilted – say, it’s at an angle θ from the x-axis, as shown below. Classically, the light that comes through will be polarized in the x’-direction, which we associate with that angle θ. So we say the photons will be in the |x‘〉 state. linear combinationSo far, so good. But what happens if we have two polarizers, set up as shown below, with the optical axis of the first one at an angle θ, which is, say, equal to 30°? Will any light get through?two polarizers

Well? No answer? […] Think about it. What happens classically? […] No answer? Let me tell you. In a classical analysis, we’d say that only the x-component of the light that comes through the first polarizer would get through the second one. Huh? Yes. It is not all or nothing in a classical analysis. This is where the magnitude of E comes in, which we’ll write as E0, so as to not confuse it with the energy E. [I know you’ll confuse it anyway but… Well… I need to move on or I won’t get anywhere with this story.] So if E0 is the (maximum) magnitude (or amplitude – in the classical sense of the word, that is) of E as the light leaves the first polarizer, then its x-component will be equal to E0·cosθ. [I don’t need to make a drawing here, do I?] Of course, you know that the intensity of the light will be proportional to the square of the (maximum) field, which is equal to E02·cos2θ = 0.75·E02 for θ = 30°.

So our classical theory says that only 3/4 of the energy that we were sending in will get through. The rest (1/4) will be absorbed. So how do we model that quantum-mechanically? It’s amazingly simple. We’ve already associated the |x‘〉 state with the photons coming out of the first polaroid, and so now we’ll just say that this |x‘〉 state is equal to the following linear combination of the |x〉 and |y〉 base states:

|x‘〉 = cosθ·|x〉 + sinθ·|y

Huh? Yes. As Feynman puts it, we should think our |x‘〉 beam of photons can, somehow, be resolved into |x〉 and |y〉 beams. Of course, we’re talking amplitudes here, so we’re talking 〈x|x‘〉 and 〈y|x‘〉 amplitudes here, and the absolute square of those amplitudes will give us the probability that a photon in the |x‘〉 state gets into the |x〉 and |y〉 state respectively. So how do we calculate that? Well… If |x‘〉 = cosθ·|x〉 + sinθ·|y〉, then we can obviously write the following:

x|x‘〉 = cosθ·〈x|x〉 + sinθ·〈x|y

Now, we know that 〈x|y〉 = 0, because |x〉 and |y〉 are base states. Because of the same reason, 〈x|x〉 = 1. That’s just an implication of the definition of base states: 〈i|j〉 = δij. So we get:

x|x‘〉 = cosθ

Lo and behold! The absolute square of that is equal to cos2θ, so each of these photons have an (average) probability of 3/4 to get through. So if we were to have like 10 billion photons, then some 7.5 billion of them would get through. As these photons are all associated with a definite energy – and they go through as one whole, of course (no such thing as a 3/4 photon!) – we find that 3/4 of all of the energy goes through. The quantum-mechanical theory gives the same result as the classical theory – as it should, in this case at least!

Now that’s all good for linear polarization. What about elliptical or circular polarization? Hmm… That’s a bit more complicated, but equally feasible. If we denote the state of a photon with a right-hand circular polarization (RHC) as |R〉 and, likewise, the state of a photon with a left-hand circular polarization (LHC) as |L〉, then we can write these as the following linear combinations of our base states |x〉 and |y〉:linear combination RHC and LHCThat’s where those coefficients under illustrations (c) and (g) come in, although I think they’ve got the sign of i (the imaginary unit) wrong. 🙂 So how does it work? Well… That 1/√2 factor is – obviously – just there to make sure everything’s normalized, so all probabilities over all states add up to 1. So that is taken care of and now we just need to explain how and why we’re adding |x〉 and |y〉. For |R〉, the amplitudes must be the same but with a phase difference of 90°. That corresponds to the sine and cosine function, which are the same except for a phase difference of π/2 (90°), indeed: sin(φ + π/2) = cosφ. Now, a phase shift of 90° corresponds to a multiplication with the imaginary unit i. Indeed, ei·π/2 and, therefore, it is obvious that ei·π/2·ei·φ = ei·(φ + π/2).

Of course, if we can write RHC and LHC states as a linear combination of the base states |x〉 and |y〉, then you’ll believe me if I say that we can write any polarization state – including non-circular elliptical ones – as a linear combination of these base states. Now, there are two or three other things I’d like to point out here:

1. The RHC and LHC states can be used as base states themselves – so they satisfy all of the conditions for a set of base states. Indeed, it’s easy to add and then subtract the two equations above to get the following:new base setAs an exercise, you should verify the right and left polarization states effectively satisfy the conditions for a set of base states.

2. We can also rotate the xy-plane around the z-axis (as mentioned, that’s the direction of propagation of our beam) and use the resulting |x‘〉 and |y‘〉 states as base states. In short, we can effectively, as Feynman puts it, “You can resolve light into x– and y– polarizations, or into x’– and y’-polarizations, or into right and left polarizations as a basis.” These pairs are always orthogonal and also satisfy the other conditions we’d impose on a set of base states.

3. The last point I want to make here is much more enigmatic but, as far as I am concerned – by far – the most interesting of all of Feynman’s Lecture on this topic. It’s actually just a footnote, but I am very excited about it. So… Well… What is it?

Well… Feynman does the calculations to show how a circularly polarized photon looks like when we rotate the coordinates around the z-axis, and shows the phase of the right and left polarized states effectively keeps track of the x– and y-axes, so all of our “right-hand” rules don’t get lost somehow. He compares this analysis to an analysis he actually did – in a much earlier Lecture (in Chapter 5) – for spin-one particles. But, of course, here we’ve been analyzing the photon as a two-state system, right?

So… Well… Don’t we have a contradiction here? If photons are spin-one particles, then they’re supposed to be analyzed in terms of three base states, right? Well… I guess so… But then Feynman adds a footnote – with very important remark:

“The photon is a spin-one particle which has, however, no ‘zero’-state.”

Why I am noting that? Because it confirms my theory about photons – force-particles – being different from matter-particles not only because of the different rules for adding amplitudes, but also because we get two wavefunctions for the price of one and, therefore, twice the energy for every oscillation! And so we’ll also have a distance of two Planck units between the equivalent of the “up” and “down” states of the photon, rather than one Planck unit, like what we have for the angular momentum for an electron. 

I described the gist of my argument in my e-book, which you’ll find under another tab of this blog, and so I’ll refer you there. However, in case you’re interested, the summary of the summary is as follows:

  1. We can think of a photon having some energy that’s equal to E = p = m (assuming we choose our time and distance units such that c = 1), but that energy would be split up in an electric and a magnetic wavefunction respectively: ψand ψB.
  2. Now, Schrödinger’s equation would then apply to both wavefunctions, but the E, p and m in those two wavefunctions are the same and not the same: their numerical value is the same (pE =EE = mE = pB =EB = mB), but they’re conceptually different. [They must be: I showed that, if they aren’t, then we get a phase and group velocity for the wave that doesn’t make sense.]

It is then easy to show that – using the B = i·E relation between the magnetic and the electric field vectors – we find a composite wavefunction for our photon which we can write as:

E + B = ψ+ ψ= E + i·E = √2·ei(p·x/2 − E·t/2 + π/4) = √2·ei(π/4)·ei(p·x/2 − E·t/2) = √2·ei(π/4)·E

The whole thing then becomes:

ψ = ψ+ ψ= √2·ei(p·x/2 − E·t/2 + π/4) = √2·ei(π/4)·ei(p·x/2 − E·t/2) 

So we’ve got a √2 factor here in front of our combined wavefunction for our photon which, knowing that the energy is proportional to the square of the amplitude gives us twice the energy we’d associate with a regular amplitude… [With “regular”, I mean the wavefunction for matter-particles – fermions, that is.] So… Well… That little footnote of Feynman seems to confirm I really am on to something. Nice! Very nice, actually! 🙂

Davidson’s function

This post has got nothing to do with quantum mechanics. It’s just… Well… My son – who’s preparing for his entrance examinations for engineering studies – sent me a message yesterday asking me to quickly explain Davidson’s function – as he has to do some presentation on it as part of a class assignment. As I am an economist – and Davidson’s function is used in transport economics – he thought I would be able to help him out quickly, and I was. So I just thought it might be interesting to quickly jot down my explanation as a post in this blog. It won’t help you with quantum mechanics but, if anything, it may help you think about functional forms and some related topics.

In his message, he sent me the function – copied below – and some definitions of the variables which he got from some software package he had seen or used – at least that’s what he told me. 🙂Davidson functionSo… This function tells us that the dependent variable is the travel time t, and that it is seen as a function of some independent variable x and some parameters t0, c and ε. My son defined the variable x as the flow (of vehicles) on the road, and c as the capacity of the road. To be precise, he wrote the formula that was to be used for c as follows:capacityWhat about a formula for x? Well… He said that was the actual flow of vehicles, but he had no formula for it. As for t0, that was the travel time “at free speed.” Finally, he said ε was a “paramètre de sensibilité de congestion.” Sorry for the French, but that’s the language of his school, which is located in some town in southern Belgium. In English, we might translate it as a congestion sensitivity coefficient. And so that’s what he struggled most with – or so he said.

So that got us started. I immediately told him that, if you write something like c − x, then you’d better make sure and have the same physical dimension. The formula above tells us that is the number of vehicles that you can park on that road. Bumper to bumper. So I told him that’s a rather weird definition of capacity. It’s definitely not the dimension of flow: the flow should be some number per second or – much more likely in transport economics – per minute or per hour. So I told him that he should double-check those definitions of x and c, and that I’d get back to him to explain the formula itself after I had googled and read some articles on it. So I did that, and so here’s the full explanation I gave him.

While there’s some pretty awesome theory behind (queuing theory and all that), which transportation gurus take very seriously – see, for example, the papers written by Rahmi Akçelik – a quick look at it all reveals that Davidson’s function is, essentially, just a specific functional form that we impose on some real-life problem. So I’d call it an empirical function: there’s some theory behind, but it’s more based on experience than pure theory. Of course, sound logic is – or should be – applied to both empirical as well as to purely theoretical functions, but… Well… It’s a different approach than, say, modeling the dynamics of quantum-mechanical state changes. 🙂 Just note, for example, that we might just as well have tried something else – some exponential function. Something like this, for example:function alternativeDavidson’s function is, quite simply, just nicer and easier than the one above, because the function above is not linear. It could be quadratic (β = 2), or whatever, but surely not linear. In contrast, Davidson’s function is linear and, therefore, easy to fit onto actual traffic data using the simplest of simple linear regression models – and, speaking from experience, most engineers and economists in a real-life job can barely handle even that! 🙂

So just look at that x/(cx) factor as measuring the congestion or saturation, somehow. We’ll denote it by s. If you can sort of accept that, then you’ll agree that Davidson’s function tells us that the extra time that’s needed to drive from some place a to some place b along our road will be directly proportional to:

  1. That congestion factor x/(cx), about which I’ll write more in a moment;
  2. The free-speed or free-flow travel time t0 – which I’ll call the free-flow travel time from now on, rather than the free-speed travel time, because there’s no such thing as free speed in reality: we have speed limits – or safety limits, or scared moms in the car, whatever – and, a more authoritative argument, the literature on Davidson’s function also talks about free flow rather than free speed;
  3. That epsilon factor (ε), which – of all the stuff I presented so far – mystified my son most.

So the formula for the extra travel time that’s needed is, obviously, equal to:function 1So we have a very simple linear functional form for the extra travel time, and we can easily estimate the actual value of our ε parameter using actual traffic data in a simple linear regression. The data analysis toolkit of MS Excel will do stuff like this – if you have the data, of course – so you don’t need a sophisticated statistical software package here.

So that’s it, really: Davidson’s function is, effectively, just nice and easy to work with. […] Well… […] Of course, we still need to define what x and c actually are. And what’s that so-called free flow (or free speed?) travel time? Well… The free-flow travel time is, obviously, the time you need to go from to at the free-flow speed. But what’s the free-flow speed? My friend’s Maserati is faster than my little Santro. 🙂 And are we allowed to go faster than the maximum authorized speed? Interesting questions.

So that’s where the analysis becomes interesting, and why we need better definitions of x and c. If is some density – what my son’s rather non-sensical formula seems to imply – we may want to express it per unit distance. Per kilometer, for example. So we should probably re-define c more simply: as the number of lanes divided by the average length of the vehicles that are using it. We get that by dividing the c above by the length of the road – so we divide the length of the road by the length of the road, which gives 1. 🙂 You may think that’s weird, because we get something like 3/5 = 0.6… So… What? Well… Yes. 0.6 vehicles per meter, so that’s 600 vehicles per kilometer! Does that sound OK? I think it does. So let’s express that capacity (c) as a maximum density – for the time being, at least.

Now, none of those cars can move, of course: they are all standing still. Bumper to bumper. It’s only when we decrease the density that they’re able to move. In fact, you can – and should – visualize the process: the first car moves and opens a space of, say, one or two meter, and then the second one, and so and so on – till all cars are moving with a few meter in-between them. So the density will obviously decrease and, as a result, we’re getting some flow of vehicles here. If there’s three meter between them, for example, then the density goes down to 3/8 vehicles per meter, so that’s 375 vehicles per kilometer. Still a lot, and you’ll have to agree that – with only 3 meters between them – they’ll probably only move very slowly!

You get the idea. We can now define as a density too – some density that is smaller than the maximum density c. Then that x/(cx) factor – measuring the saturation – obviously makes a lot of sense. The graph below shows how it looks like for c = 5. [The value of 5 is just random, and its order of magnitude doesn’t matter either: we can always re-scale from m to km, or from seconds to minutes and what have you. So don’t worry about it.] Look at this example: when is small – like 1 or 2 only – then x/(5−x) doesn’t increase all that much. So that means we add little to the travel time. Conversely, when x approaches c = 5 – so that’s the limit (as you can see, the = 5 line is a (vertical) asymptote of the function) – then the travel time becomes huge and starts approaching infinity. So… Well… Yes. That’s when all cars are standing still – bumper to bumper. asymptoteBut so what’s the free-flow speed? Is it the maximum speed of my friend’s Maserati -which is like 275 km/h? Well… I don’t think my friend ever drove that fast, so probably not. What else? Think about it. What should we choose here? The obvious choice is the speed limit: 120 km/h, or 90 km/h, or 60 km/h – or whatever. Why? Because you don’t want a ticket, I guess… In any case, let’s analyze that question later. Let’s first look at something else.

Of course, you’ll want to keep some distance between you and the car in front of you when driving at relatively high speeds, and that’s the crux of the analysis really. You may or, more likely, you may not remember that your driving instructor told you to always measure the safety distance between you and the car(s) in front in seconds rather than in meter. In Belgium, we’re told to stay two seconds away from the car in front of us. So when it passes a light pole, we’ll count “twenty-one, twenty-two” and… Well… If we pass that same light pole while we’re still counting those two seconds, then we’d better keep some more distance. It’s got to do with reaction time: when the car in front of you slams the brakes, you need some time to react, and then that car might also have better brakes than yours, so you want to build in some extra safety margin in case you don’t slow down as fast as the car in front of you. So that two-seconds rule is not about the breaking distance really – or not about the breaking distance alone. No. It’s more about the reaction time. In any case, the point is that you’ll want to measure the safety distance in time rather than in meterCapito? OK… Onwards…

Now, 120 km/h amounts to 120,000/3,600 = 33.333 meter per second. So the safety distance here is almost 67 meter! If the maximum authorized velocity is only 90 km/h, then the safety distance shrinks to 2 × (90,000/3,600) = 50 meter. For a maximum authorized velocity of 60 km/h, the safety distance would be equal to 33.333 meter. Both are much larger distances than the average length of the vehicles and, hence, it’s basically the safety distance – not the length of the vehicle – that we need to consider! Let’s quickly calculate the related densities:

  • For a three-lane highway, with all vehicles traveling at 120 km/h and keeping their safety distance, the density will be equal to 3·1,000/66.666… = 45 vehicles per kilometer of highway, so that’s 15 vehicles per lane.
  • If the travel speed is 90 km/h, then the density will be equal to 60 vehicles per km (20 vehicles per lane).
  • Finally, at 60 km/h, the density will be 90 vehicles per km (30 vehicles per lane).

Note that our two-seconds rule implies a linear relation between the safety distance and the maximum authorized speed. You can also see that the relation between the density and the maximum authorized speed is inversely proportional: if we halve the speed, the density doubles.

Now, you can easily come up with some more formulas, and play around a bit. For example, if we denote the security distance by d, and the mentioned two seconds as td – so that’s the time (t) that defines the security distance d – then d is, logically, equal to: d = td∙vmax. But rather than trying to find more formulas and play with them, let’s think about that concept of flow now. If we would want to define the capacity – or the actual flow – in terms of the number of vehicles that are passing along any point along this highway, how should we calculate that?

Well… The flow is the number of vehicles that will pass us in one hour, right? So if vmax is 120 km/h, then – assuming full capacity – all the vehicles on the next 120 km of highway will all pass us, right? So that makes 45 vehicles per km times 120 km = 5,400 vehicles – per hour, of course. Hence, the flow is just the product of the density times the speed.

Now, look at this: if vmax is equal to 90 km/h, then we’ll have 60 vehicles per km times 90 km = … Well… It’s – interestingly enough – the same number: 5,400 vehicles per hour. Let’s calculate for vmax = 60 km/h… Security distance is 33.333 meter, so we can have 90 vehicles on each km of highway which means that, over one hour, 90 times 60 = 5,400 vehicles will pass us! It’s, once again, the same number: 5,400! Now that’s a very interesting conclusion. Let me highlight it:

If we assume the vehicles will keep some fixed time distance between them (e.g. two seconds), then the capacity of our highway – expressed as some number of vehicles passing along it per time unit – does not depend on the velocity.

So the capacity – expressed as a flow rather than as a density – is just a fixed number: vehicles per hour. The density affects only the (average) speed of all those vehicles. Hence, increasing densities are associated with lower speeds, and higher travel times, but they don’t change the capacity.

It’s really a rather remarkable conclusion, even if the relation between the density and the flow is easily understood – both mathematically and, more importantly, intuitively. For example, if the density goes down to 60 vehicles per km of highway, then they will only be able to move at a speed of 90 km/h, but we’ll still have that flow of 5,400 vehicles per hour – which we can look at as the capacity but expressed as some flow rather than as a density. Lower densities allow for even higher speeds: we calculated above that a density of 45 vehicles per km would allow them to drive at a maximum speed of 120 km/h, so travel time would be reduced even more, but we’d still have 5,400 vehicles per hour! So… Well… Yes. It all makes sense.

Now what happens if the density is even lower, so we could – theoretically – drive safely, or not so safely, at some speed that’s way above the speed limit? If we have enough cars – say 30 vehicles per km, but all driving more than 120 km/h, while respecting the two-seconds rule – we’d still have the same flow: 5,400 vehicles per hour. And travel time would go down. But so we can think of lower densities and higher speeds but, again, there’s got to be some limit here – a speed limit, safety considerations, a limit to what our engine or our car can do, and, finally, there’s the speed of light too. 🙂 I am just joking, of course, but I hope you see the point. At some point, it doesn’t matter whether or not the density goes down even further: the travel time should hit some minimum. And it’s that minimum – the lowest possible travel time – that you’d probably like to define as t0.

As mentioned, the minimum travel time is associated with some maximum speed, and – after some consideration of the possible candidates for the maximum speed – you’ll agree the speed limit is a better candidate than the 275 km/h limit of my friends’ Maserati Quattroporte. Likewise, you would probably also like to define x0 as the (maximum) density at the speed limit.

What we’re saying here is that – in theory at least – our t = t(x) function should start with a linear section, between x = 0 and x = x0. That linear section defines a density 0 < x < x0 which is compatible with us driving at the speed limit – say, 120 km/h – and, hence, with us only needing the time t = t0 to arrive at our destination. Only when x becomes larger than x0, we’ve got to reduce speed – below the speed limit (say, 120 km/h) – to keep the flow going while keeping an appropriate safety distance. A reduction of speed implies a increase in travel time, of course. So that’s what’s illustrated in the graph below.

graphTo be specific, if the speed limit is 120 km/h, then – assuming you don’t want to be caught speeding – the minimum travel time will always be equal to 30 seconds per km, even if you’re alone on the highway. Now, as long as the density is less than 45 vehicles per km, you can keep that travel time the same, because you can do your 120 km/h while keeping the safety distance. But if the density increases, above 45 vehicles per km, then stuff starts slowing down because everyone is uncomfortable with the shorter distance between them and the car in front. As the density goes up even more – say, to 60 vehicles per km – we can only do 90 km/h, and so the travel time will then be equal to 40 seconds per km. And then it goes to 90 vehicles per km, so speed slows down to 60 km/h, and so that’s a travel time of 60 seconds per km. Of course, you’re smart – very smart – and so you’ll immediately say this implies that the second section of our graph should be linear too, like this:

graph 2You’re right. But then… Well… That doesn’t work with our limit for x, which is c. As I pointed out, is an absolute maximum density: you just can’t park any more cars on that highway – unless you fold them up or so. 🙂 So what’s the conclusion? Well… We may think of the Davidson function as a primitive combination of both shapes, as shown below.

graph 3

I call it a primitive approximation, because that Davidson function (so that’s the green smooth curve above) is not a precise (linear or non-linear) combination of the two functions we presented (I am talking about the blue broken line and the smooth red curve here). It’s just… Well… Some primitive approximation. 🙂 Now you can write some very complicated papers – as other authors do – to sort of try to explain this shape, but you’ll find yourself fiddling with variable time distance rules and other hypotheses that may or may not make sense. In short, you’re likely to introduce other logical inconsistencies when trying to refine the model. So my advice is to just accept the Davidson’s function as some easy empirical fit to some real-life situation, and think of what the parameters actually do – mathematically speaking, that is. How do they change the shape of our graph?

So we’re now ready to explain that epsilon factor (ε) by looking at what it does, indeed. Please try an online graphing tool with a slider (e.g. – just type something like a + bx in the function box, and you’ll see the sliders appear – so you can see how the function changes for different parameter values. The two graphs below, for example, which I made using that graphing tool, show you the function t = 2 + 2∙ε∙x/(10−x) for ε = 0.5 and ε = 10 respectively. As you can see, both functions start at t = 2 and have the same asymptote at x = c = 10. However, you’ll agree that they look very different – and that’s because of the value of the ε parameter. For ε = 0.5, the travel time does not increase all that much – initially at least. Indeed, as you can see, t is equal to 3 if the density is half of the capacity (t = 3 for x = 5 = c/2). In contrast, for ε = 10, we have immediate saturation, so to speak: travel time goes through the roof almost immediately! For example, for = 3, t ≈ 10.6, so while the density is less than a third of the capacity, the associated travel time is already more than five times the free-flow travel time!

Now I have a tricky question for you: does it make sense to allow ε to take on values larger than one? Think about it. 🙂 In any case, now you’ve seen what the ε factor does from a math point of view. So… Well… I’ll conclude here by just noting that it does, indeed, make sense to refer to ε as a “paramètre de sensibilité de congestion”, because that’s what it is: a congestion sensitivity coefficient. Indeed, it’s not the congestion or saturation parameter itself (that’s a term we should reserve term for the x/(cx) factor), but a congestion sensitivity coefficient alright!

Of course, you will still want some theoretical interpretation. Well… To be honest, I can’t give you one. I don’t want to get lost in all of those theoretical excursions on Davidson’s function, because… Well… It’s no use. That ε is just what it is: it’s a proportionality coefficient that we are imposing upon our functional form for our travel-time function. You can sum it up as follows:

If x/(cx) is the congestion parameter (or variable, I should say), then it goes from 0 to ∞ (infinity) when the traffic flow (x) goes x = 0 to x = (full capacity). So, yes, we can call the x/(cx) factor the congestion or saturation variable and write it as s = x/(cx). And then we can refer to ε as the “paramètre de sensibilité de congestion”, because it is a measure not of the congestion itself, but of the sensitivity of the travel time to the congestion.

If you’d absolutely want some mathematical formula for it, then you could use this one, which you get from re-writing Δt = t0·ε·s as Δt/t0 = ε·s:

∂(Δt/t0)/∂s = ε

But… Frankly. You can stare at this formula for a long while – it’s a derivative alright, and you know what derivatives stand for – but you’ll probably learn nothing much from it. [Of course, please write me if you don’t agree, Vincent!] I just looked at those two graphs, and note how their form changes as a function of ε. Perhaps you have some brighter idea about it!

So… Well… I am done. You should now fully understand Davidson’s function. Let me write it down once more:

formula 3

Again, as mentioned, its main advantage is its linearity. Because of its linearity, it is easy to actually estimate the parameters: it’s just a simple linear regression – using actual travel times and actual congestion measurements – and so then we can estimate the value of ε and see if it works. Huh? How do we see if it works? Well… I told you already: when everything is said and done, Davidson’s function is just one of the many models for the actual reality, so it tries to explain how travel time increases because of congestion. There are other models, which come with other functions – but they are more complicated, and so are the functions that come with them (check out that paper from Rahmi Akçelik, for example). Only reality can tell us what model is the best fit to whatever it is that we’re trying to model. So that’s why I call Davidson’s function an empirical function, and so you should check it against reality. That’s when a statistical software package might be handy: it allows you to test the fit of various functional – linear and non-linear – forms against a real-life data set.

So that’s it. I tasked my son to go through this post and correct any errors – only typos, I hope! – I may have made. I hope he’ll enjoy this little exercise. 🙂

Comments on the MIT’s Stern-Gerlach lab experiment

In my previous post, I noted that I’d go through the MIT’s documentation on the Stern-Gerlach experiment that their undergrad students have to do, because we should now – after 175 posts on quantum physics 🙂 – be ready to fully understand what is said in there. So this post is just going to be a list of comments. I’ll organize it section by section.

Theory of atomic beam experiments

The theory is known – and then it isn’t, of course. The key idea is that individual atoms behave like little magnets. Why? In the simplest and most naive of models, it’s because the electrons somehow circle around the nucleus. You’ve seen the illustration below before. Note that current is, by convention, the flow of positive charge, which is, of course, opposite to the electron flow. You can check the direction by applying the right-hand rule: if you curl the fingers of your right hand in the direction of the current in the loop (so that’s opposite to v), your thumb will point in the direction of the magnetic moment (μ).orbital-angular-momentumSo the electron orbit – in whatever way we’d want to visualize it – gives us L, which we refer to as the orbital angular momentum. We know the electron is also supposed to spin about its own axis – even if we know this planetary model of an electron isn’t quite correct. So that gives us a spin angular momentum S. In the so-called vector model of the atom, we simply add the two to get the total angular momentum J = L + S.

Of course, now you’ll say: only hydrogen has one electron, so how does it work with multiple electrons? Well… Then we have multiple orbital angular momentum li which are to be added to give a total orbital angular momentum L. Likewise, the electrons spins si can also be added to give some total spin angular momentum S. So we write:

J = L + S with L = Σi li and S = Σi si

Really? Well… If you’d google this to double-check – check the Wikipedia article on it, for example – then you’ll find this additivity property is valid only for relatively light atoms (Z ≤ 30) and only if any external magnetic field is weak enough. The way individual orbital and spin angular momenta have to be combined so as to arrive at some total L, S and J is referred to a coupling scheme: the additivity rule above is referred to as LS coupling, but one may also encounter LK coupling, or jj coupling, or other stuff. The US National Institute of Standards and Technology (NIST) has a nice article on all these models – but we need to move on here. Just note that we do assume the LS coupling scheme applies to our potassium beam – because its atomic number (Z) is 19, and the external magnetic field is assumed to be weak enough.

The vector model of the atom describes the atom using angular momentum vectors. Of course, we know that a magnetic field will cause our atomic magnet to precess – rather than line up. At this point, the classical analogy between a spinning top – or a gyroscope – and our atomic magnet becomes quite problematic. First, think about the implications for L and S when assuming,  as we usually do, that J precesses nicely about an axis that is parallel to the magnetic field – as shown in the illustration below, which I took from Feynman’s treatment of the matterprecessionIf J is the sum of two other vectors L and S, then this has rather weird implications for the precession of L and S, as shown in the illustration below – which I took from the Wikipedia article on LS coupling. Think about: if L and S are independent, then the axis of precession for these two vectors should be just the same as for J, right? So their axis of precession should also be parallel to the magnetic field (B), so that’s the direction of the Jz component, which is just the z-axis of our reference frame here.375px-ls_couplingMore importantly, our classical model also gets into trouble when actually measuring the magnitude of Jz: repeated measurements will not yield some randomly distributed continuous variable, as one would classically expect. No. In fact, that’s what this experiment is all about: it shows that Jz will take only certain quantized values. That is what is shown in the illustration below (which once again assumes the magnetic field (B) is along the z-axis). vector-modelI copied the illustration above from the HyperPhysics site, because I found it to be enlightening and intriguing at the same time. First, it also shows this rather weird implication of the vector model: if J continually changes direction because of its precession in a weak magnetic field, then L and S must, obviously, also continually change direction. However, this illustration is even more intriguing than the Wikipedia illustration because it assumes the axis of precession of L and S and L actually the same!

So what’s going on here? To better understand what’s going on, I started to read the whole HyperPhysics article on the vector model, which also includes the illustration below, with the following comments: “When orbital angular momentum L and electron spin S are combined to produce the total angular momentum of an atomic electron, the combination process can be visualized in terms of a vector model. Both the orbital and spin angular momenta are seen as precessing about the direction of the total angular momentum J. This diagram can be seen as describing a single electron, or multiple electrons for which the spin and orbital angular momenta have been combined to produce composite angular momenta S and L respectively. In so doing, one has made assumptions about the coupling of the angular momenta which are described by the LS coupling scheme which is appropriate for light atoms with relatively small external magnetic fields.”vector-model-2Hmm… What about those illustrations on the right-hand side – with the vector sums and those values for and mj? I guess the idea may also be illustrated by the table below: combining different values for l (±1) and (±1/2) gives four possible values, ranging from +3/2 to -1/2, for l + s.tableHaving said that, the illustration raises a very fundamental question: the length of the sum of two vectors is definitely not the same as the sum of the length of the two vectors! So… Well… Hmm… Something doesn’t make sense here! However, I can’t dwell any longer on this. I just wanted to note you should not take all that’s published on those oft-used sites on quantum mechanics for granted. But so I need to move on. Back to the other illustration – copied once more below.vector-modelWe have that very special formula for the magnitude (J) of the angular momentum J:

J│= J = √(J·J) = √[j·(j+1)·ħ2] = √[j·(j+1)]·ħ 

So if = 3/2, then J is equal to √[(3/2)·(3/2+1)]·ħ = √[(15/4)·ħ ≈ 1.9635·ħ, so that’s almost 2ħ. 🙂 At the same time, we know that for = 3/2, the possible values of Jz can only be +3ħ/2, +ħ/2, -ħ/2, and -3ħ/2. So that’s what’s shown in that half-circular diagram: the magnitude of J is larger than its z-component – always!

OK. Next. What’s that 3p3/2 notation? Excellent question! Don’t think this 3p denotes an electron orbital, like 1s or 3d – i.e. the orbitals we got from solving Schrödinger’s equation. No. In fact, the illustration above is somewhat misleading because the correct notation is not 3p3/2 but 3P3/2. So we have a capital P which is preceded by a superscript 3. This is the notation for the so-called term symbol for a nuclear, atomic or molecular (ground) state which – assuming our LS coupling model is valid – because we’ve got other term symbols for other coupling models – we can write, more generally, as:


The J, L and S in this thing are the following:

1. The J is the total angular momentum quantum number, so it is – the notation gets even more confusing now – the in the │J│= J = √(J·J) = √[j·(j+1)·ħ2] = √[j·(j+1)]·ħ expression. We know that number is 1/2 for electrons, but it may take on other values for nuclei, atoms or molecules. For example, it is 3/2 for nitrogen, and 2 for oxygen, for which the corresponding terms are 4S3/2 and 3P2 respectively.

2. The S in the term symbol is the total spin quantum number, and 2S+1 itself is referred to as the fine-structure multiplicity. It is not an easy concept. Just remember that the fine structure describes the splitting of the spectral lines of atoms due to electron spin. In contrast, the gross structure energy levels are those we effectively get from solving Schrödinger’s equation assuming our electrons have no spin.  We also have a hyperfine structure, which is due to the existence of a (small) nuclear magnetic moment, which we do not take into consideration here, which is why the 4S3/2 and 3P2 terms are sometimes being referred to as describing electronic ground states. In fact, the MIT lab document, which we are studying here, refers to the ground state of the potassium atoms in the beam as an electronic ground state, which is written up as 2S1/2. So is, effectively, equal to 1/2. [Are you still there? If so, just write it down: 2S+1 = 2 ⇒ = 1/2. That means the following: our potassium atom behaves like an electron: its spin is either ‘up’ or, else, it is ‘down’. There is no in-between.]

3. Finally, the in the term symbol is the total orbital angular momentum quantum number but, rather than using a number, the values of are often represented as S, P, D, F, etcetera. This number is very confusing because – as mentioned above – one would think it represents those s, p, d, f, g,… orbitals. However, that is not the case. The difference may easily be illustrated by observing that a carbon atom, for example, has six electrons, which are distributed over the 1s, 2s and 2p orbitals (one pair each). However, its ground state only gets one number: L = P. Hence, its value is 1. Of course, now you will wonder how we get that number.

Well… I wish I could give you an easy answer, but I can’t. For two electrons – think of our carbon atom once again – we can have = 0, 1 or 2, or S, P and D. They effectively correspond to different energy levels, which are related to the way these two electrons interact with each other. The phenomenon is referred to as angular momentum coupling. In fact, all of the numbers we discussed so far – J, S and L – are numbers resulting from angular momentum coupling. As Wikipedia puts it: “Angular momentum coupling refers to the construction of eigenstates of total angular momentum out of eigenstates of separate angular momentum.” [As you know, each eigenstate corresponds to an energy level, of course.]

Now that should clear some of the confusion on the 2S+1LJ notation: the capital letters J, S and L refer to some total, as opposed to the quantum numbers you are used to, i.e. n, l, m and s, i.e. the so-called principalorbitalmagnetic and spin quantum number respectively. The lowercase letters are quantum numbers that describe an electron in an atom, while those capital letters denote quantum numbers describing the atom – or a molecule – itself.

OK. Onwards. But where were we? 🙂 Oh… Yes. That J = L + S formula gives us some total electronic angular momentum, but we’ll also have some nuclear angular momentum, which our MIT paper denotes as I. Our vector model of our potassium atom allows us, once again, to simply add the two to get the total angular momentum, which is written as F = J + I = L + S + I. This, then, explains why the MIT experiment writes the magnitude of the total angular momentum as:


Of course, here I don’t need to explain – or so I hope – why this quantum-mechanical formula for the calculation of the magnitude is what it is (or, equivalently, why the usual Euclidean metric – i.e. √(x2 + y2 + z2) – is not to be used here. If you do need an explanation, you’ll need to go through the basics once again.

Now, the whole point, of course, is that the z-component of F can have only the discrete values that are specified by the Fz = mf·ħ equation, with mf – i.e. the (total) magnetic quantum number – having an equally discrete value equal to mf = −f, −(f−1), …, +(f+1), f.

For the rest, I probably shouldn’t describe the experiment itself: you know it. But let me just copy the set-up below, so it’s clear what it is that we’re expecting to happen. In addition, you’ll also need the illustration below because I’ll refer to those d1 and d2 distances shown in what follows.set-up

Note the MIT documentation does spell out some additional assumptions. Most notably, it says that the potassium atoms that emerge from the oven (at a temperature of 200°) will be:

(1) almost exclusively in the ground electronic state,

(2) nearly equally distributed among the two (magnetic) sub-states characterized by f, and, finally,

(3) very nearly equally distributed among the hyperfine states, i.e. the states with the same but with different mf.

I am just noting these assumptions because it is interesting to note that – according to the man or woman who wrote this paper – we would actually have states within states here. The paper states that the hyperfine splitting of the two sub-beams we expect to come out of the magnet can only be resolved by very advanced atomic beam techniques, so… Well… That’s not the apparatus that’s being used for this experiment.

However, it’s all a bit weird, because the paper notes that the rules for combining the electronic and nuclear angular momentum – using that F = J + I = L + S + I formula – imply that our quantum number f = i ± j can be eitheror 2. These two values would be associated with the following mf and mforce values:

= 1 ⇒ Fz = mf·ħ = −ħ, 0 or +ħ (so we’d have three beams here)

= 2 ⇒ Fz = mf·ħ = −2ħ, −ħ, 0, +ħ or +2ħ (so we’d have five beams here)

Neither of the two possibilities relates to the situation at hand – which assumes two beams only. In short, I think the man or women who wrote the theoretical introduction – an assistant professor, most likely (no disrespect here: that’s how far progressed in economics – nothing more, nothing less) – might have made a mistake. Or perhaps he or she may have wanted to confuse us.

I’ll look into it over the coming days. As for now, all you need to know – please jot it down! – is that our potassium atom is fully described by 2S1/2. That shorthand notation has all the quantum number we need to know. Most importantly, it tells us is, effectively, equal to 1/2. So… Well… That 2S1/2 notation tells us our potassium atom should behave like an electron: its spin is either ‘up’ or ‘down’. No in-between. 🙂 So we should have two beams. Not three or five. No fine or hyperfine sub-structures! 🙂 In any case, the rest of the paper makes it clear the assumption is, effectively, that the angular momentum number is equal to = 1/2. So… Two beams only. 🙂

How to calculate the expected deflection

We know that the inhomogeneous magnetic field (B), whose direction is the z-axis, will result in a force, which we have calculated a couple of times already as being equal to:f1In case you’d want to check this, you can check one of my posts on this. I just need to make one horrifying remark on notation here: while the same symbol is used, the force Fis, obviously, not to be confused with the z-component of the angular momentum F = J + I = L + S + I that we described above. Frankly, I hope that the MIT guys have corrected that in the meanwhile, because it’s really terribly confusing notation! In any case… Let’s move on.

Now, we assume the deflecting force is constant because of the rather particular design of the magnet pole pieces (see Appendix I of the paper). We can then use Newton’s Second Law (F = m·a) to calculate the velocity in the z-direction, which is denoted by Vz (I am not sure why a capital letter is used here, but that’s not important, of course). That velocity is assumed to go from 0 to its final value Vz while our potassium atom travels between the two magnet poles but – to be clear – at any point in time, Vz will increase linearly – not exponentially – so we can write: Vz = a·t1, with t1 the time that is needed to travel through the magnet. Now, the relevant mass is the mass of the atom, of course, which is denoted by M. Hence, it is easy to see that = Fz/M = Vz/t1. Hence, we find that V= Fz·t1/M.

Now, the vertical distance traveled (z) can be calculated by solving the usual integral: z = ∫0t1 v(t)·dt = ∫0t1 a·t·dt = a·t12/2 = (Vz/t1)·t12/2 = Vz·t1/2. Of course, once our potassium atom comes out of the magnetic field, it will continue to travel upward or downward with the same velocity Vz, which adds Vz·t2 to the total distance traveled along the z-direction. Hence, the formula for the deflection is, effectively, the one that you’ll find in the paper:

z = Vz·t1/2 + Vz·t= Vz·(t1/2 + t2)

Now, the travel times depend on the velocity of our potassium atom along the y-axis, which is approximated by equating it with │V│= V, because the y-component of the velocity is easily the largest – by far! Hence, t1 = d1/V and t2 = d2/V. Some more manipulation will then give you the expression we need, which is a formula for the deflection in terms of variables that we actually know:


Statistical mechanics

We now need to combine this with the Maxwell-Boltzmann distribution for the velocities we gave you in our previous post:formula-aThe next step is to use this formula so as to be able to calculate a distribution which would describe the intensity of the beam. Now, it’s easy to understand such intensity will be related to the flux of potassium atoms, and it’s equally easy to get that a flux is defined as the rate of flow per unit area. Hmm… So how does this get us the formula below?density-formula-2The tricky thing – of course – is the use of those normalized velocities because… Well… It’s easy to see that the right-hand side of the equation above – just forget about the d(V/V0 ) bit for a second, as we have it on both sides of the equation and so it cancels out anyway – is just density times velocity. We do have a product of the density of particles and the velocity with which they emerge here – albeit a normalized velocity. But then… Who cares? The normalization is just a division by V– or a multiplication by 1/V0, which is some constant. From a math point of view, it doesn’t make any difference: our variable is V/V0 instead of V. It’s just like using some other unit. No worries here – as long as you use the new variable consistently everywhere. 🙂

Alright. […] What’s next? Well… Nothing much. The only thing that we still need to explain now is that factor 2. It’s easy to see that’s just a normalization factor – just like that 4/√π factor in the first formula. So we get it from imposing the usual condition:densitySo… What’s next… Well… We’re almost there. 🙂 As the MIT paper notes, the f(V) and I(V/V0) functions can be mapped to each other: the related transformation maps a velocity distribution to an intensity distribution – i.e. a distribution of the deflection – and vice versa.

Now, the rest of the paper is just a lot of algebraic manipulations – distinguishing the case of a quantized Fversus a continuous Fz. Here again, I must admit I am a bit shocked by the mix-up of concepts and symbols. The paper talks about a quantized deflecting force – while it’s obvious we should be talking a quantized angular momentum. The two concepts – and their units – are fundamentally different: the unit in which angular momentum is measured is the action unit: newton·meter·second (N·m·s). Force is just force: x newton.

Having said that, the mix-up does trigger an interesting philosophical question: what is quantized really? Force (expressed in N)? Energy (expressed in N·m)? Momentum (expressed in N·s)? Action (expressed in N·m·s, i.e. the unit of angular momentum)? Space? Time? Or space-time – related through the absolute speed of light (c)? Three factors (force, distance and time), six possibilities. What’s your guess?


What’s my guess? Well… The formulas tell us the only thing that’s quantized is action: Nature itself tells us we have to express it in terms of Planck units. However, because action is a product involving all of these factors, with different dimensions, the quantum-mechanical quantization of action can, obviously, express itself in various ways. 🙂

Statistical mechanics re-visited

Quite a while ago – in June and July 2015, to be precise – I wrote a series of posts on statistical mechanics, which included digressions on thermodynamics, Maxwell-Boltzmann, Bose-Einstein and Fermi-Dirac statistics (probability distributions used in quantum mechanics), and so forth. I actually thought I had sort of exhausted the topic. However, when going through the documentation on that Stern-Gerlach experiment that MIT undergrad students need to analyze as part of their courses, I realized I did actually not present some very basic formulas that you’ll definitely need in order to actually understand that experiment.

One of those basic formulas is the one for the distribution of velocities of particles in some volume (like an oven, for instance), or in a particle beam – like the beam of potassium atoms that is used to demonstrate the quantization of the magnetic moment in the Stern-Gerlach experiment. In fact, we’ve got two formulas here, which are subtly – as subtle as the difference between (boldface, so it’s a vector) and v (lightface, so it’s a scalar) 🙂 – but fundamentally different:


Both functions are referred to as the Maxwell-Boltzmann density distribution, but the first distribution gives us the density for some v in the velocity space, while the second gives us the distribution density of the absolute value (or modulus) of the velocity, so that is the distribution density of the speed, which is just a scalar – without any direction. As you can see, the second formula includes a 4π·v2 factor.

The question is: how are these formulas related to Boltzmann’s f(E) = C·e−energy/kT Law? The answer is: we can derive all of these formulas – for the distribution of velocities, or of momenta – by clever substitutions. However, as evidenced by the two formulas above, these substitutions are not always straightforward. So let me quickly show you a few things here.

First note the two formulas above already include the e−energy/kT function if we equate the energy E with the kinetic energy: E = K.E. = m·v2/2. Of course, if you’ve read those June-July 2015 posts, you’ll note that we derived Boltzmann’s Law in the context of a force field, like gravity, or an electric potential. For example, we wrote the law for the density (n = N/V) of gas in a gravitational field (like the Earth’s atmosphere) as n = n0·e−P.E./kT. In this formula, we only see the potential energy: P.E. = m·g·h, i.e. the product of the mass (m), the gravitational constant (g), and the height (h). However, when we’re talking the distribution of velocities – or of momenta – then the kinetic energy comes into play.

So that’s a first thing to note: Boltzmann’s Law is actually a whole set of laws. For example, the frequency distribution of particles in a system over various possible states, also involves the same exponential function: F(state) ∝ e−E/kT. E is just the total energy of the state here (which varies from state to state, of course), so we don’t distinguish between potential and kinetic energy here.

So what energy concept should we use in that Stern-Gerlach experiment? Because these potassium atoms in that oven – or when they come out of it in a beam – have kinetic energy only, our E = m·v2/2 substitution does the trick: we can say that the potential energy is taken to be zero, so that all energy is in the form of kinetic energy. So now we understand the e−m·v2/2kT function in those f(v) and f(v) formulas. Now we only need to explain those complicated coefficients. How do we get these?

We get them through clever substitutions using equations such as:

fv(v)·dv  = fp(p)·dp

What are we writing here? We’re basically combining two normalization conditions: if fv(v) and fp(p) are proper probability density functions, then they must give us 1 when integrating over their domain. The domain of these two functions is, obviously, the velocity (v) and momentum (p) space. The velocity and momentum space are the same mathematical space, but they are obviously not the same physical space. But the two physical spaces are closely related: p = m·v, and so it’s easy to do the required transformation of variables. For example, it’s easy to see that, if E = m·v2/2, then E is also equal to E = p2/2m.

However, when doing these substitutions, things get tricky. We already noted that p and v are vectors, unlike E, or p and v – which are scalars, or magnitudes. So we write: p = (px, py, pz) and |p| = p, and v = (vx, vy, v z) and |v| = v. Of course, you also know how we calculate those magnitudes:


Note that this also implies the following: p·p = p= px+ py+pz= p2. Trivial, right? Yes. But have a look now at the following differentials:

  • d3p
  • dp
  • dp = d(px, py, pz)
  • dpx·dpy·dpz

Are these the same or not? Now you need to think, right? That d3p and dp are different beasts is obvious: d3p is, obviously, some infinitesimal volume, as opposed to dp, which is, equally obviously, an (infinitesimal) interval. But what volume exactly? Is it the same as that dp = d(px, py, pz) volume, and is that the same as the dpx·dpy·dpz volume?

Fortunately, the volume differentials are, in fact, the same – so you can start breathing again. 🙂 Let’s get going with that d3p notation for the time being, as you will find that’s the notation which is used in the Wikipedia article on the Maxwell-Boltzmann distribution – which I warmly recommend, because – for a change – it is a much easier read than other Wikipedia articles on stuff like this. Among other things, the mentioned article writes the following:

fE(E)·dE = fp(p)·d3p

What is this? Well… It’s just like that fv(v)·dv  = fp(p)·dp equation: it combines the normalization condition for both distributions. However, it’s much more interesting, because, on the left-hand side, we multiply a density with an (infinitesimal) interval (dE), while on the right-hand side we multiply with an (infinitesimal) volume (d3p). Now, the (infinitesimal) energy interval dE must, obviously, correspond with the (infinitesimal) momentum volume d3p. So how does that work?

Well… The mentioned Wikipedia article talks about the “spherical symmetry of the energy-momentum dispersion relation” (that dispersion relation is just E = |p|2/2m, of course), but that doesn’t make us all that wiser, so let’s try a more heuristic approach. You might remember the formula for the volume of a spherical shell, which is simply the difference between the volume of the outer sphere minus the volume of the inner sphere: V = (4π/3)·R− (4π/3)·r= (4π/3)·(R− r3). Now, for a very thin shell of thickness Δr, we can use the following first-order approximation: V = 4π·r2·Δr. In case you wonder, I hereby copy a nice explanation from the Physics Stack Exchange site:


Perfect. That’s all we need to know. We’ll use that first-order approximation to re-write d3as:

d3= dp = 4π·|p|2·d|p| = 4π·p2·dp

Note that we’ll have the same formula for d3v, of course: d3v = dv = 4π·|v|2·d|v| = 4π·v2·dv, and also note that we get that same 4π·v2 factor which we mentioned when discussing the f(v) and f(v) formulas. That is not a coincidence, of course, but – as I’ll explain in a moment – it is not so easy to immediately relate the formulas. In any case, we’re now ready to relate dE and dp so we can re-write that d3p formula in terms of m, E and dE:


We are now – finally! – sufficiently armed to derive all of the formulas we want – or need. Let me just copy them from the mentioned Wikipedia article:




As said, you’ll encounter these formulas regularly – and so it’s good that you know how you can derive them. Indeed, the derivation is very straightforward and is done in the same article: the tips I gave you should allow you to read it in a couple of minutes only. Only the density function for velocities might cause you a bit of trouble – but only for a very short moment: just use the p = m·v equation to write d3p as d3p = 4π·p2·dp = 4π·m2·v2·m·dv = 4π·m3·v2·dv = m3·d3v, and you’re all set. 🙂

Of course, you will recognize the formula for the distribution of velocities: it’s the f(v) we mentioned in the introduction. However, you’re more likely to need the f(v) formula (i.e. the probability density function for the speed) than the f(v) function. So how can we derive get the f(v) – i.e. that formula for the distribution of speeds, with the 4π·v2 factor – from the f(v) formula?

Well… I wish I could give you an easy answer. In fact, the same Wikipedia article suggests it’s easy – but it’s not. It involves a transformation from Cartesian to polar coordinates: the volume element dvx·dvy·dvz is to be written as v2·sinθ·dv·dθ·dφ. And then… Well… Have a look at this link. 🙂 It involves a so-called Jacobian transformation matrix. If you want to know more about it, then I recommend you read some article on how to transform distribution functions: here’s a link to one of those, but you can easily google others. Frankly, as for now, I’d suggest you just accept the formula for f(v) as for now. 🙂 Let me copy it from the same article in a slightly different form:density-formulaNow, the final thing to note is that you’ll often want to use so-called normalized velocities, i.e. velocities that are defined as a v/v0 ratio, with vthe most probable speed, which is equal to √(2kT/m). You get that value by calculating the df(v)/dv derivative, and then finding the value v = v0 for which df(v)/dv = 0. You should now be able to verify the formula that is used in the mentioned MIT version of the Stern-Gerlach experiment:mit-formulaIndeed, when you write it all out – note that π/π3/2 = 1/√π 🙂 – you’ll see the two formulas are effectively equivalent. Of course, by now you are completely formula-ed out, and so you probably don’t even wonder what that f(v)·dv product actually stands for. What does it mean, really? Now you’ll sigh: why would I even want to know that? Well… I want you to understand that MIT experiment. 🙂 And you won’t if you don’t know what f(v)·dv actually represents. So think about it. […]

[…] OK. Let me help you once more. Remember the normalization condition once again: the integral of the whole thing – over the whole range of possible velocities – needs to add up to 1, so f(v)·dv is really the fraction of (potassium) atoms (inside the oven) with a velocity in the (infinitesimally small) dv interval. It’s going to be a tiny fraction, of course: just a tiny bit larger than zero. Surely not larger than 1, obviously. 🙂 Think of integrating the function between two values – say v1 and v2 – that are pretty close to each other.

So… Well… We’re done as for now. So where are we now in terms of understanding the calculations in that description of that MIT experiment? Well… We’ve got the meat. But we need a lot of other ingredients now. We’ll want formulas for the intensity of the beam at some point along the axis measuring its deflection from its main direction. That axis is the z-axis. So we’ll want a formula for some I(z) function.

Deflection? Yes. There are a lot of steps to go through now. Here’s the set-up:set-upFirst, we’ll need some formula measuring the flux of (potassium) atoms coming out of the oven. And then… Well… Just have a look and try to make your way through the whole thing now – which is just what I want to do in the coming days, so I’ll give you some more feedback soon. 🙂 Here I only wanted to introduce those formulas for the distribution of velocities and momenta, because you’ll need them in other contexts too.

So I hope you found this useful. Stuff like this all makes it somewhat more real, doesn’t it? 🙂 Frankly, I think the math is at least as fascinating as the physics. We could have a closer look at those distributions, for example, by noting the following:

1. The probability density function for the momenta is the product of three normal distributions. Which ones? Well…  The distribution of px, py and pz respectively: three normal distributions whose variance is equal to mkT. 🙂

2. The fE(E) function is a chi-squared (χ2) distribution with 3 degrees of freedom. Now, we have the equipartition theorem (which you should know – if you don’t, see my post on it), which tells us that this energy is evenly distributed among all three degrees of freedom. It is then relatively easy to show – if you know something about χ2 distributions at least 🙂 – that the energy per degree of freedom (which we’ll write as ε below) will also be distributed as a chi-squared distribution with one degree of freedom:chi-square-2This holds true for any number of degrees of freedom. For example, a diatomic molecule will have extra degrees of freedom, which are related to its rotational and vibrational motion (I explained that in my June-July 2015 posts too, so please go there if you’d want to know more). So we can really use this stuff in, for example, the theory of the specific heat of gases. 🙂

3. The function for the distribution of the velocities is also a product of three independent normally distributed variables – just like the density function for momenta. In this case, we have the vx, vy and vz variables that are normally distributed, with variance kT/m.

So… Well… I’m done – for the time being, that is. 🙂 Isn’t it a privilege to be alive and to be able to savor all these little wonderful intellectual excursions? I wish you a very nice day and hope you enjoy stuff like this as much as I do. 🙂