Schrödinger’s equation as an energy conservation law

Post scriptum note added on 11 July 2016: This is one of the more speculative posts which led to my e-publication analyzing the wavefunction as an energy propagation. With the benefit of hindsight, I would recommend you to immediately the more recent exposé on the matter that is being presented here, which you can find by clicking on the provided link.

Original post:

In the movie about Stephen Hawking’s life, The Theory of Everything, there is talk about a single unifying equation that would explain everything in the universe. I must assume the real Stephen Hawking is familiar with Feynman’s unworldliness equation: U = 0, which – as Feynman convincingly demonstrates – effectively integrates all known equations in physics. It’s one of Feynman’s many jokes, of course, but an exceptionally clever one, as the argument convincingly shows there’s no such thing as one equation that explains all. Or, to be precise, one can, effectively, ‘hide‘ all the equations you want in a single equation, but it’s just a trick. As Feynman puts it: “When you unwrap the whole thing, you get back where you were before.”

Having said that, some equations in physics are obviously more fundamental than others. You can readily think of obvious candidates: Einstein’s mass-energy equivalence (m = E/c2); the wavefunction (ψ = ei(ω·t − k·x)) and the two de Broglie relations that come with it (ω = E/ħ and k = p/ħ); and, of course, Schrödinger’s equation, which we wrote as:

Schrodinger's equation

In my post on quantum-mechanical operators, I drew your attention to the fact that this equation is structurally similar to the heat diffusion equation. Indeed, assuming the heat per unit volume (q) is proportional to the temperature (T) – which is the case when expressing T in degrees Kelvin (K), so we can write q as q = k·T  – we can write the heat diffusion equation as:

heat diffusion 2

Moreover, I noted the similarity is not only structural. There is more to it: both equations model energy flows and/or densities. Look at it: the dimension of the left- and right-hand side of Schrödinger’s equation is the energy dimension: both quantities are expressed in joule. [Remember: a time derivative is a quantity expressed per second, and the dimension of Planck’s constant is the joule·second. You can figure out the dimension of the right-hand side yourself.] Now, the time derivative on the left-hand side is expressed in K/s. The constant in front (k) is just the (volume) heat capacity of the substance, which is expressed in J/(m3·K). So the dimension of k·(∂T/∂t) is J/(m3·s). On the right-hand side we have that Laplacian, whose dimension is K/m2, multiplied by the thermal conductivity, whose dimension is W/(m·K) = J/(m·s·K). Hence, the dimension of the product is  the same as the left-hand side: J/(m3·s).

We can present the thing in various ways: if we bring k to the other side, then we’ve got something expressed per second on the left-hand side, and something expressed per square meter on the right-hand side—but the k/κ factor makes it alright. The point is: both Schrödinger’s equation as well as the diffusion equation are actually an expression of the energy conservation law. They’re both expressions of Gauss’ flux theorem (but in differential form, rather than in integral form) which, as you know, pops up everywhere when talking energy conservation.


Yep. I’ll give another example. Let me first remind you that the k·(∂T/∂t) = ∂q/∂t = κ·∇2T equation can also be written as:

heat diffusion 3

The h in this equation is, obviously, not Planck’s constant, but the heat flow vector, i.e. the heat that flows through a unit area in a unit time, and h is, obviously, equal to h = −κ∇T. And, of course, you should remember your vector calculus here: ∇· is the divergence operator. In fact, we used the equation above, with ∇·h rather than ∇2T to illustrate the energy conservation principle. Now, you may or may not remember that we gave you a similar equation when talking about the energy of fields and the Poynting vector:

Poynting vector

This immediately triggers the following reflection: if there’s a ‘Poynting vector’ for heat flow (h), and for the energy of fields (S), then there must be some kind of ‘Poynting vector’ for amplitudes too! I don’t know which one, but it must exist! And it’s going to be some complex vector, no doubt! But it should be out there.

It also makes me think of a point I’ve made a couple of times already—about the similarity between the E and B vectors that characterize the traveling electromagnetic field, and the real and imaginary part of the traveling amplitude. Indeed, the similarity between the two illustrations below cannot be a coincidence. In both cases, we’ve got two oscillating magnitudes that are orthogonal to each other, always. As such, they’re not independent: one follows the other, or vice versa.







The only difference is the phase shift. Euler’s formula incorporates a phase shift—remember: sinθ = cos(θ − π/2)—and so you don’t have that with the E and B vectors. But – Hey! – isn’t that why bosons and fermions are different? 🙂


This is great fun, and I’ll surely come back to it. As for now, however, I’ll let you ponder the matter for yourself. 🙂

Post scriptum: I am sure that all that the questions I raise here will be answered at the Masters’ level, most probably in some course dealing with quantum field theory, of course. 🙂 In any case, I am happy I can already anticipate such questions. 🙂

Oh – and, as for those two illustrations above, the animation below is one that should help you to think things through. It’s the electric field vector of a traveling circularly polarized electromagnetic wave, as opposed to the linearly polarized light that was illustrated above.


Quantum-mechanical operators

We climbed a mountain—step by step, post by post. 🙂 We have reached the top now, and the view is gorgeous. We understand Schrödinger’s equation, which describes how amplitudes propagate through space-time. It’s the quintessential quantum-mechanical expression. Let’s enjoy now, and deepen our understanding by introducing the concept of (quantum-mechanical) operators.

The operator concept

We’ll introduce the operator concept using Schrödinger’s equation itself and, in the process, deepen our understanding of Schrödinger’s equation a bit. You’ll remember we wrote it as:

schrodinger 5

However, you’ve probably seen it like it’s written on his bust, or on his grave, or wherever, which is as follows:



It’s the same thing, of course. The ‘over-dot’ is Newton’s notation for the time derivative. In fact, if you click on the picture above (and zoom in a bit), then you’ll see that the craftsman who made the stone grave marker, mistakenly, also carved a dot above the psi (ψ) on the right-hand side of the equation—but then someone pointed out his mistake and so the dot on the right-hand side isn’t painted. 🙂 The thing I want to talk about here, however, is the H in that expression above, which is, obviously, the following operator:


That’s a pretty monstrous operator, isn’t it? It is what it is, however: an algebraic operator (it operates on a number—albeit a complex number—unlike a matrix operator, which operates on a vector or another matrix). As you can see, it actually consists of two other (algebraic) operators:

  1. The ∇operator, which you know: it’s a differential operator. To be specific, it’s the Laplace operator, which is the divergence (·) of the gradient () of a function: ∇= · = (∂/∂x, ∂/∂y , ∂/∂z)·(∂/∂x, ∂/∂y , ∂/∂z) = ∂2/∂x2  + ∂2/∂y+ ∂2/∂z2. This too operates on our complex-valued function wavefunction ψ, and yields some other complex-valued function, which we then multiply by −ħ2/2m to get the first term.
  2. The V(x, y, z) ‘operator’, which—in this particular context—just means: “multiply with V”. Needless to say, V is the potential here, and so it captures the presence of external force fields. Also note that V is a real number, just like −ħ2/2m.

Let me say something about the dimensions here. On the left-hand side of Schrödinger’s equation, we have the product of ħ and a time derivative (is just the imaginary unit, so that’s just a (complex) number). Hence, the dimension there is [J·s]/[s] (the dimension of a time derivative is something expressed per second). So the dimension of the left-hand side is joule. On the right-hand side, we’ve got two terms. The dimension of that second-order derivative (∇2ψ) is something expressed per square meter, but then we multiply it with −ħ2/2m, whose dimension is [J2·s2]/[J/(m2/s2)]. [Remember: m = E/c2.] So that reduces to [J·m2]. Hence, the dimension of (−ħ2/2m)∇2ψ is joule. And the dimension of V is joule too, of course. So it all works out. In fact, now that we’re here, it may or may not be useful to remind you of that heat diffusion equation we discussed when introducing the basic concepts involved in vector analysis:

diffusion equation

That equation illustrated the physical significance of the Laplacian. We were talking about the flow of heat in, say, a block of metal, as illustrated below. The in the equation above is the heat per unit volume, and the h in the illustration below was the heat flow vector (so it’s got nothing to do with Planck’s constant), which depended on the material, and which we wrote as = –κT, with T the temperature, and κ (kappa) the thermal conductivity. In any case, the point is the following: the equation below illustrates the physical significance of the Laplacian. We let it operate on the temperature (i.e. a scalar function) and its product with some constant (just think of replacing κ by −ħ2/2m gives us the time derivative of q, i.e. the heat per unit volume.

heat flow

In fact, we know that is proportional to T, so if we’d choose an appropriate temperature scale – i.e. choose the zero point such that T (your physics teacher in high school would refer to as the (volume) specific heat capacity) – then we could simple write:

∂T/∂t = (κ/k)∇2T

From a mathematical point of view, that equation is just the same as ∂ψ/∂t = –(i·ħ/2m)·∇2ψ, which is Schrödinger’s equation for V = 0. In other words, you can – and actually should – also think of Schrödinger’s equation as describing the flow of… Well… What?

Well… Not sure. I am tempted to think of something like a probability density in space, but ψ represents a (complex-valued) amplitude. Having said that, you get the idea—I hope! 🙂 If not, let me paraphrase Feynman on this:

“We can think of Schrödinger’s equation as describing the diffusion of a probability amplitude from one point to another. In fact, the equation looks something like the diffusion equation we introduced when discussing heat flow, or the spreading of a gas. But there is one main difference: the imaginary coefficient in front of the time derivative makes the behavior completely different from the ordinary diffusion such as you would have for a gas spreading out. Ordinary diffusion gives rise to real exponential solutions, whereas the solutions of Schrödinger’s equation are complex waves.”

That says it all, right? 🙂 In fact, Schrödinger’s equation – as discussed here – was actually being derived when describing the motion of an electron along a line of atoms, i.e. for motion in one direction only, but you can visualize what it represents in three-dimensional space. The real exponential functions Feynman refer to exponential decay function: as the energy is spread over an ever-increasing volume, the amplitude of the wave becomes smaller and smaller. That may be the case for complex-valued exponentials as well. The key difference between a real- and complex-valued exponential decay function is that a complex exponential is a cyclical function. Now, I quickly googled to see how we could visualize that, and I like the following illustration:


The dimensional analysis of Schrödinger’s equation is also quite interesting because… Well… Think of it: that heat diffusion equation incorporates the same dimensions: temperature is a measure of the average energy of the molecules. That’s really something to think about. These differential equations are not only structurally similar but, in addition, they all seem to describe some flow of energy. That’s pretty deep stuff: it relates amplitudes to energies, so we should think in terms of Poynting vectors and all that. But… Well… I need to move on, and so I will move on—so you can re-visit this later. 🙂

Now that we’ve introduced the concept of an operator, let me say something about notations, because that’s quite confusing.

Some remarks on notation

Because it’s an operator, we should actually use the hat symbol—in line with what we did when we were discussing matrix operators: we’d distinguish the matrix (e.g. A) from its use as an operator (Â). You may or may not remember we do the same in statistics: the hat symbol is supposed to distinguish the estimator (â) – i.e. some function we use to estimate a parameter (which we usually denoted by some Greek symbol, like α) – from a specific estimate of the parameter, i.e. the value (a) we get when applying â to a specific sample or observation. However, if you remember the difference, you’ll also remember that hat symbol was quickly forgotten, because the context made it clear what was what, and so we’d just write a(x) instead of â(x). So… Well… I’ll be sloppy as well here, if only because the WordPress editor only offers very few symbols with a hat! 🙂

In any case, this discussion on the use (or not) of that hat is irrelevant. In contrast, what is relevant is to realize this algebraic operator H here is very different from that other quantum-mechanical Hamiltonian operator we discussed when dealing with a finite set of base states: that H was the Hamiltonian matrix, but used in an ‘operation’ on some state. So we have the matrix operator H, and the algebraic operator H.


Yes and no. First, we’ve got the context again, and so you always know whether you’re looking at continuous or discrete stuff:

  1. If your ‘space’ is continuous (i.e. if states are to defined with reference to an infinite set of base states), then it’s the algebraic operator.
  2. If, on the other hand, your states are defined by some finite set of discrete base states, then it’s the Hamiltonian matrix.

There’s another, more fundamental, reason why there should be no confusion. In fact, it’s the reason why physicists use the same symbol H in the first place: despite the fact that they look so different, these two operators (i.e. H the algebraic operator and H the matrix operator) are actually equivalent. Their interpretation is similar, as evidenced from the fact that both are being referred to as the energy operator in quantum physics. The only difference is that one operates on a (state) vector, while the other operates on a continuous function. It’s just the difference between matrix mechanics as opposed to wave mechanics really.

But… Well… I am sure I’ve confused you by now—and probably very much so—and so let’s start from the start. 🙂

Matrix mechanics

Let’s start with the easy thing indeed: matrix mechanics. The matrix-mechanical approach is summarized in that set of Hamiltonian equations which, by now, you know so well:


If we have base states, then we have equations like this: one for each = 1, 2,… n. As for the introduction of the Hamiltonian, and the other subscript (j), just think of the description of a state:


So… Well… Because we had used already, we had to introduce j. 🙂

Let’s think about |ψ〉. It is the state of a system, like the ground state of a hydrogen atom, or one of its many excited states. But… Well… It’s a bit of a weird term, really. It all depends on what you want to measure: when we’re thinking of the ground state, or an excited state, we’re thinking energy. That’s something else than thinking its position in space, for example. Always remember: a state is defined by a set of base states, and so those base states come with a certain perspective: when talking states, we’re only looking at some aspect of reality, really. Let’s continue with our example of energy states, however.

You know that the lifetime of a system in an excited state is usually short: some spontaneous or induced emission of a quantum of energy (i.e. a photon) will ensure that the system quickly returns to a less excited state, or to the ground state itself. However, you shouldn’t think of that here: we’re looking at stable systems here. To be clear: we’re looking at systems that have some definite energy—or so we think: it’s just because of the quantum-mechanical uncertainty that we’ll always measure some other different value. Does that make sense?

If it doesn’t… Well… Stop reading, because it’s only going to get even more confusing. Not my fault, however!


The ubiquity of that ψ symbol (i.e. the Greek letter psi) is really something psi-chological 🙂 and, hence, very confusing, really. In matrix mechanics, our ψ would just denote a state of a system, like the energy of an electron (or, when there’s only one electron, our hydrogen atom). If it’s an electron, then we’d describe it by its orbital. In this regard, I found the following illustration from Wikipedia particularly helpful: the green orbitals show excitations of copper (Cu) orbitals on a CuOplane. [The two big arrows just illustrate the principle of X-ray spectroscopy, so it’s an X-ray probing the structure of the material.]


So… Well… We’d write ψ as |ψ〉 just to remind ourselves we’re talking of some state of the system indeed. However, quantum physicists always want to confuse you, and so they will also use the psi symbol to denote something else: they’ll use it to denote a very particular Ci amplitude (or coefficient) in that |ψ〉 = ∑|iCi formula above. To be specific, they’d replace the base states |i〉 by the continuous position variable x, and they would write the following:

Ci = ψ(i = x) = ψ(x) = Cψ(x) = C(x) = 〈x|ψ〉

In fact, that’s just like writing:

φ(p) = 〈 mom p | ψ 〉 = 〈p|ψ〉 = Cφ(p) = C(p)

What they’re doing here, is (1) reduce the ‘system‘ to a ‘particle‘ once more (which is OK, as long as you know what you’re doing) and (2) they basically state the following:

If a particle is in some state |ψ〉, then we can associate some wavefunction ψ(x) or φ(p)—with it, and that wavefunction will represent the amplitude for the system (i.e. our particle) to be at x, or to have a momentum that’s equal to p.

So what’s wrong with that? Well… Nothing. It’s just that… Well… Why don’t they use χ(x) instead of ψ(x)? That would avoid a lot of confusion, I feel: one should not use the same symbol (psi) for the |ψ〉 state and the ψ(x) wavefunction.

Huh? Yes. Think about it. The point is: the position or the momentum, or even the energy, are properties of the system, so to speak and, therefore, it’s really confusing to use the same symbol psi (ψ) to describe (1) the state of the system, in general, versus (2) the position wavefunction, which describes… Well… Some very particular aspect (or ‘state’, if you want) of the same system (in this case: its position). There’s no such problem with φ(p), so… Well… Why don’t they use χ(x) instead of ψ(x) indeed? I have only one answer: psi-chology. 🙂

In any case, there’s nothing we can do about it and… Well… In fact, that’s what this post is about: it’s about how to describe certain properties of the system. Of course, we’re talking quantum mechanics here and, hence, uncertainty, and, therefore, we’re going to talk about the average position, energy, momentum, etcetera that’s associated with a particular state of a system, or—as we’ll keep things very simple—the properties of a ‘particle’, really. Think of an electron in some orbital, indeed! 🙂

So let’s now look at that set of Hamiltonian equations once again:


Looking at it carefully – so just look at it once again! 🙂 – and thinking about what we did when going from the discrete to the continuous setting, we can now understand we should write the following for the continuous case:


Of course, combining Schrödinger’s equation with the expression above implies the following:


Now how can we relate that integral to the expression on the right-hand side? I’ll have to disappoint you here, as it requires a lot of math to transform that integral. It requires writing H(x, x’) in terms of rather complicated functions, including – you guessed it, didn’t you? – Dirac’s delta function. Hence, I assume you’ll believe me if I say that the matrix- and wave-mechanical approaches are actually equivalent. In any case, if you’d want to check it, you can always read Feynman yourself. 🙂

Now, I wrote this post to talk about quantum-mechanical operators, so let me do that now.

Quantum-mechanical operators

You know the concept of an operator. As mentioned above, we should put a little hat (^) on top of our Hamiltonian operator, so as to distinguish it from the matrix itself. However, as mentioned above, the difference is usually quite clear from the context. Our operators were all matrices so far, and we’d write the matrix elements of, say, some operator A, as:

Aij ≡ 〈 i | A | j 〉

The whole matrix itself, however, would usually not act on a base state but… Well… Just on some more general state ψ, to produce some new state φ, and so we’d write:

| φ 〉 = A | ψ 〉

Of course, we’d have to describe | φ 〉 in terms of the (same) set of base states and, therefore, we’d expand this expression into something like this:

operator 2

You get the idea. I should just add one more thing. You know this important property of amplitudes: the 〈 ψ | φ 〉 amplitude is the complex conjugate of the 〈 φ | ψ 〉 amplitude. It’s got to do with time reversibility, because the complex conjugate of eiθ = ei(ω·t−k·x) is equal to eiθ = ei(ω·t−k·x), so we’re just reversing the x- and tdirection. We write:

 〈 ψ | φ 〉 = 〈 φ | ψ 〉*

Now what happens if we want to take the complex conjugate when we insert a matrix, so when writing 〈 φ | A | ψ 〉 instead of 〈 φ | ψ 〉, this rules becomes:

〈 φ | A | ψ 〉* = 〈 ψ | A† | φ 〉

The dagger symbol denotes the conjugate transpose, so A† is an operator whose matrix elements are equal to Aij† = Aji*. Now, it may or may not happen that the A† matrix is actually equal to the original A matrix. In that case – and only in that case – we can write:

〈 ψ | A | φ 〉 = 〈 φ | A | ψ 〉*

We then say that A is a ‘self-adjoint’ or ‘Hermitian’ operator. That’s just a definition of a property, which the operator may or may not have—but many quantum-mechanical operators are actually Hermitian. In any case, we’re well armed now to discuss some actual operators, and we’ll start with that energy operator.

The energy operator (H)

We know the state of a system is described in terms of a set of base states. Now, our analysis of N-state systems showed we can always describe it in terms of a special set of base states, which are referred to as the states of definite energy because… Well… Because they’re associated with some definite energy. In that post, we referred to these energy levels as En (n = I, II,… N). We used boldface for the subscript n (so we wrote n instead of n) because of these Roman numerals. With each energy level, we could associate a base state, of definite energy indeed, that we wrote as |n〉. To make a long story short, we summarized our results as follows:

  1. The energies EI, EII,…, En,…, EN are the eigenvalues of the Hamiltonian matrix H.
  2. The state vectors |n〉 that are associated with each energy En, i.e. the set of vectors |n〉, are the corresponding eigenstates.

We’ll be working with some more subscripts in what follows, and these Roman numerals and the boldface notation are somewhat confusing (if only because I don’t want you to think of these subscripts as vectors), we’ll just denote EI, EII,…, En,…, EN as E1, E2,…, Ei,…, EN, and we’ll number the states of definite energy accordingly, also using some Greek letter so as to clearly distinguish them from all our Latin letter symbols: we’ll write these states as: |η1〉, |η1〉,… |ηN〉. [If I say, ‘we’, I mean Feynman of course. You may wonder why he doesn’t write |Ei〉, or |εi〉. The answer is: writing |En〉 would cause confusion, because this state will appear in expressions like: |Ei〉Ei, so that’s the ‘product’ of a state (|Ei〉) and the associated scalar (Ei). Too confusing. As for using η (eta) instead of ε (epsilon) to denote something that’s got to do with energy… Well… I guess he wanted to keep the resemblance with the n, and then the Ancient Greek apparently did use this η letter  for a sound like ‘e‘ so… Well… Why not? Let’s get back to the lesson.]

Using these base states of definite energy, we can write the state of the system as:

|ψ〉 = ∑ |ηi〉 C = ∑ |ηi〉〈ηi|ψ〉    over all (i = 1, 2,… , N)

Now, we didn’t talk all that much about what these base states actually mean in terms of measuring something but you’ll believe if I say that, when measuring the energy of the system, we’ll always measure one or the other E1, E2,…, Ei,…, EN value. We’ll never measure something in-between: it’s eitheror. Now, as you know, measuring something in quantum physics is supposed to be destructive but… Well… Let us imagine we could make a thousand measurements to try to determine the average energy of the system. We’d do so by counting the number of times we measure E1 (and of course we’d denote that number as N1), E2E3, etcetera. You’ll agree that we’d measure the average energy as:

E average

However, measurement is destructive, and we actually know what the expected value of this ‘average’ energy will be, because we know the probabilities of finding the system in a particular base state. That probability is equal to the absolute square of that Ccoefficient above, so we can use the P= |Ci|2 formula to write:

Eav〉 = ∑ Pi Ei over all (i = 1, 2,… , N)

Note that this is a rather general formula. It’s got nothing to do with quantum mechanics: if Ai represents the possible values of some quantity A, and Pi is the probability of getting that value, then (the expected value of) the average A will also be equal to 〈Aav〉 = ∑ Pi Ai. No rocket science here! 🙂 But let’s now apply our quantum-mechanical formulas to that 〈Eav〉 = ∑ Pi Ei formula. [Oh—and I apologize for using the same angle brackets 〈 and 〉 to denote an expected value here—sorry for that! But it’s what Feynman does—and other physicists! You see: they don’t really want you to understand stuff, and so they often use very confusing symbols.] Remembering that the absolute square of a complex number equals the product of that number and its complex conjugate, we can re-write the 〈Eav〉 = ∑ Pi Ei formula as:

Eav〉 = ∑ Pi Ei = ∑ |Ci|Ei = ∑ Ci*CEi = ∑ C*CEi = ∑ 〈ψ|ηi〉〈ηi|ψ〉E= ∑ 〈ψ|ηiEi〈ηi|ψ〉 over all i

Now, you know that Dirac’s bra-ket notation allows numerous manipulations. For example, what we could do is take out that ‘common factor’ 〈ψ|, and so we may re-write that monster above as:

Eav〉 = 〈ψ| ∑ ηiEi〈ηi|ψ〉 = 〈ψ|φ〉, with |φ〉 = ∑ |ηiEi〈ηi|ψ〉 over all i

Huh? Yes. Note the difference between |ψ〉 = ∑ |ηi〉 C = ∑ |ηi〉〈ηi|ψ〉 and |φ〉 = ∑ |ηiEi〈ηi|ψ〉. As Feynman puts it: φ is just some ‘cooked-up‘ state which you get by taking each of the base states |ηi〉 in the amount Ei〈ηi|ψ〉 (as opposed to the 〈ηi|ψ〉 amounts we took for ψ).

I know: you’re getting tired and you wonder why we need all this stuff. Just hang in there. We’re almost done. I just need to do a few more unpleasant things, one of which is to remind you that this business of the energy states being eigenstates (and the energy levels being eigenvalues) of our Hamiltonian matrix (see my post on N-state systems) comes with a number of interesting properties, including this one:

H |ηi〉 = Eii〉 = |ηiEi

Just think about what’s written here: on the left-hand side, we’re multiplying a matrix with a (base) state vector, and on the left-hand side we’re multiplying it with a scalar. So our |φ〉 = ∑ |ηiEi〈ηi|ψ〉 sum now becomes:

|φ〉 = ∑ H |ηi〉〈ηi|ψ〉 over all (i = 1, 2,… , N)

Now we can manipulate that expression some more so as to get the following:

|φ〉 = H ∑|ηi〉〈ηi|ψ〉 = H|ψ〉

Finally, we can re-combine this now with the 〈Eav〉 = 〈ψ|φ〉 equation above, and so we get the fantastic result we wanted:

Eav〉 = 〈 ψ | φ 〉 = 〈 ψ | H ψ 〉

Huh? Yes! To get the average energy, you operate on |ψ with H, and then you multiply the result with ψ|. It’s a beautiful formula. On top of that, the new formula for the average energy is not only pretty but also useful, because now we don’t need to say anything about any particular set of base states. We don’t even have to know all of the possible energy levels. When we have to calculate the average energy of some system, we only need to be able to describe the state of that system in terms of some set of base states, and we also need to know the Hamiltonian matrix for that set, of course. But if we know that, we can calculate its average energy.

You’ll say that’s not a big deal because… Well… If you know the Hamiltonian, you know everything, so… Well… Yes. You’re right: it’s less of a big deal than it seems. Having said that, the whole development above is very interesting because of something else: we can easily generalize it for other physical measurements. I call it the ‘average value’ operator idea, but you won’t find that term in any textbook. 🙂 Let me explain the idea.

The average value operator (A)

The development above illustrates how we can relate a physical observable, like the (average) energy (E), to a quantum-mechanical operator (H). Now, the development above can easily be generalized to any observable that would be proportional to the energy. It’s perfectly reasonable, for example, to assume the angular momentum – as measured in some direction, of course, which we usually refer to as the z-direction – would be proportional to the energy, and so then it would be easy to define a new operator Lz, which we’d define as the operator of the z-component of the angular momentum L. [I know… That’s a bit of a long name but… Well… You get the idea.] So we can write:

Lzav = 〈 ψ | Lψ 〉

In fact, further generalization yields the following grand result:

If a physical observable A is related to a suitable quantum-mechanical operator Â, then the average value of A for the state | ψ 〉 is given by:

Aav = 〈 ψ |  ψ 〉 = 〈 ψ | φ 〉 with | φ 〉 =  ψ 〉

At this point, you may have second thoughts, and wonder: what state | ψ 〉? The answer is: it doesn’t matter. It can be any state, as long as we’re able to describe in terms of a chosen set of base states. 🙂

OK. So far, so good. The next step is to look at how this works for the continuity case.

The energy operator for wavefunctions (H)

We can start thinking about the continuous equivalent of the 〈Eav〉 = 〈ψ|H|ψ〉 expression by first expanding it. We write:

e average continuous function

You know the continuous equivalent of a sum like this is an integral, i.e. an infinite sum. Now, because we’ve got two subscripts here (i and j), we get the following double integral:

double integral

Now, I did take my time to walk you through Feynman’s derivation of the energy operator for the discrete case, i.e. the operator when we’re dealing with matrix mechanics, but I think I can simplify my life here by just copying Feynman’s succinct development:


Done! Given a wavefunction ψ(x), we get the average energy by doing that integral above. Now, the quantity in the braces of that integral can be written as that operator we introduced when we started this post:


So now we can write that integral much more elegantly. It becomes:

Eav = ∫ ψ*(xH ψ(x) dx

You’ll say that doesn’t look like 〈Eav〉 = 〈 ψ | H ψ 〉! It does. Remember that 〈 ψ | = ψ 〉*. 🙂 Done!

I should add one qualifier though: the formula above assumes our wavefunction has been normalized, so all probabilities add up to one. But that’s a minor thing. The only thing left to do now is to generalize to three dimensions. That’s easy enough. Our expression becomes a volume integral:

Eav = ∫ ψ*(rH ψ(r) dV

Of course, dV stands for dVolume here, not for any potential energy, and, of course, once again we assume all probabilities over the volume add up to 1, so all is normalized. Done! 🙂

We’re almost done with this post. What’s left is the position and momentum operator. You may think this is going to another lengthy development but… Well… It turns out the analysis is remarkably simple. Just stay with me a few more minutes and you’ll have earned your degree. 🙂

The position operator (x)

The thing we need to solve here is really easy. Look at the illustration below as representing the probability density of some particle being at x. Think about it: what’s the average position?

average position

Well? What? The (expected value of the) average position is just this simple integral: 〈xav = ∫ P(x) dx, over all the whole range of possible values for x. 🙂 That’s all. Of course, because P(x) = |ψ(x)|2 =ψ*(x)·ψ(x), this integral now becomes:

xav = ∫ ψ*(x) x ψ(x) dx

That looks exactly the same as 〈Eav = ∫ ψ*(xH ψ(x) dx, and so we can look at as an operator too!

Huh? Yes. It’s an extremely simple operator: it just means “multiply by x“. 🙂

I know you’re shaking your head now: is it that easy? It is. Moreover, the ‘matrix-mechanical equivalent’ is equally simple but, as it’s getting late here, I’ll refer you to Feynman for that. 🙂

The momentum operator (px)

Now we want to calculate the average momentum of, say, some electron. What integral would you use for that? […] Well… What? […] It’s easy: it’s the same thing as for x. We can just substitute replace for in that 〈xav = ∫ P(x) dformula, so we get:

pav = ∫ P(p) dp, over all the whole range of possible values for p

Now, you might think the rest is equally simple, and… Well… It actually is simple but there’s one additional thing in regard to the need to normalize stuff here. You’ll remember we defined a momentum wavefunction (see my post on the Uncertainty Principle), which we wrote as:

φ(p) = 〈 mom p | ψ 〉

Now, in the mentioned post, we related this momentum wavefunction to the particle’s ψ(x) = 〈x|ψ〉 wavefunction—which we should actually refer to as the position wavefunction, but everyone just calls it the particle’s wavefunction, which is a bit of a misnomer, as you can see now: a wavefunction describes some property of the system, and so we can associate several wavefunctions with the same system, really! In any case, we noted the following there:

  • The two probability density functions, φ(p) and ψ(x), look pretty much the same, but the half-width (or standard deviation) of one was inversely proportional to the half-width of the other. To be precise, we found that the constant of proportionality was equal to ħ/2, and wrote that relation as follows: σp = (ħ/2)/σx.
  • We also found that, when using a regular normal distribution function for ψ(x), we’d have to normalize the probability density function by inserting a (2πσx2)−1/2 in front of the exponential.

Now, it’s a bit of a complicated argument, but the upshot is that we cannot just write what we usually write, i.e. Pi = |Ci|2 or P(x) = |ψ(x)|2. No. We need to put a normalization factor in front, which combines the two factors I mentioned above. To be precise, we have to write:

P(p) = |〈p|ψ〉|2/(2πħ)

So… Well… Our 〈pav = ∫ P(p) dp integral can now be written as:

pav = ∫ 〈ψ|ppp|ψ〉 dp/(2πħ)

So that integral is totally like what we found for 〈xav and so… We could just leave it at that, and say we’ve solved the problem. In that sense, it is easy. However, having said that, it’s obvious we’d want some solution that’s written in terms of ψ(x), rather than in terms of φ(p), and that requires some more manipulation. I’ll refer you, once more, to Feynman for that, and I’ll just give you the result:

momentum operator

So… Well… I turns out that the momentum operator – which I tentatively denoted as px above – is not so simple as our position operator (x). Still… It’s not hugely complicated either, as we can write it as:

px ≡ (ħ/i)·(∂/∂x)

Of course, the purists amongst you will, once again, say that I should be more careful and put a hat wherever I’d need to put one so… Well… You’re right. I’ll wrap this all up by copying Feynman’s overview of the operators we just explained, and so he does use the fancy symbols. 🙂


Well, folks—that’s it! Off we go! You know all about quantum physics now! We just need to work ourselves through the exercises that come with Feynman’s Lectures, and then you’re ready to go and bag a degree in physics somewhere. So… Yes… That’s what I want to do now, so I’ll be silent for quite a while now. Have fun! 🙂

Dirac’s delta function and Schrödinger’s equation in three dimensions

Feynman’s rather informal derivation of Schrödinger’s equation – following Schrödinger’s own logic when he published his famous paper on it back in 1926 – is wonderfully simple but, as I mentioned in my post on it, does lack some mathematical rigor here and there. Hence, Feynman hastens to dot all of the i‘s and cross all of the t‘s in the subsequent Lectures. We’ll look at two things here:

  1. Dirac’s delta function, which ensures proper ‘normalization’. In fact, as you’ll see in a moment, it’s more about ‘orthogonalization’ than normalization. 🙂
  2. The generalization of Schrödinger’s equation to three dimensions (in space) and also including the presence of external force fields (as opposed to the usual ‘free space’ assumption).

The second topic is the most interesting, of course, and also the easiest, really. However, let’s first use our energy to grind through the first topic. 🙂

Dirac’s delta function

When working with a finite set of discrete states, a fundamental condition is that the base states be ‘orthogonal’, i.e. they must satisfy the following equation:

ij 〉 = δij, with δij = 1 if i = j and δij = 0 if ij

Needless to say, the base states and j are rather special vectors in a rather special mathematical space (a so-called Hilbert space) and so it’s rather tricky to interpret their ‘orthogonality’ in any geometric way, although such geometric interpretation is often actually possible in simple quantum-mechanical systems: you’ll just notice a ‘right’ angle may actually be 45°, or 180° angles, or whatever. 🙂 In any case, that’s not the point here. The question is: if we move an infinite number of base states – like we did when we introduced the ψ(x) and φ(p) wavefunctions – what happens to that condition?

Your first reaction is going to be: nothing. Because… Well… Remember that, for a two-state system, in which we have two base states only, we’d fully describe some state | φ 〉 as a linear combination of the base states, so we’d write:

| φ 〉 =| I 〉 CI + | II 〉 CII 

Now, while saying we were talking a Hilbert space here, I did add we could use the same expression to define the base states themselves, so I wrote the following triviality:

M1Trivial but sensible. So we’d associate the base state | I 〉 with the base vector (1, 0) and, likewise, base state | II 〉 with the base vector (0, 1). When explaining this, I added that we could easily extend to an N-state system and so there’s a perfect analogy between the 〈 i | j 〉 bra-ket expression in quantum math and the ei·ej product in the run-of-the-mill coordinate spaces that you’re used to. So why can’t we just extend the concept to an infinite-state system and move to base vectors with an infinite number of elements, which we could write as ei =(…, 0, ei = 1, 0, 0,,…) and ej =(…, 0, 0, ej = 1, 0,…), thereby ensuring 〈 i | j 〉 = ei·ej = δijalways! The ‘orthogonality’ condition looks simple enough indeed, and so we could re-write it as:

xx’ 〉 = δxx’, with δxx’ = 1 if x = x’ and δxx’ = 0 if if x ≠ x’

However, when moving from a space with a finite number of dimensions to a space with an infinite number of dimensions, there are some issues. They pop up, for example, when we insert that 〈 xx’ 〉 = δxx’ function (note that we’re talking some function here of x and x’, indeed, so we’ll write it as f(x, x’) in the next step) in that 〈φ|ψ〉 = ∫〈φ|x〉〈x|ψ〉dx integral.

Huh? What integral? Relax: that 〈φ|ψ〉 = ∫〈φ|x〉〈x|ψ〉dx integral just generalizes our 〈φ|ψ〉 = ∑〈φ|x〉〈x|ψ〉 expression for discrete settings for the continuous case. Just look at it. When substituting φ for x’, we get:

x’|ψ〉 = ψ(x’) = ∫ 〈x’|x〉 〈x|ψ〉 dx ⇔ ψ(x’) = ∫ 〈x’|x〉 ψ(x) dx

You’ll say: what’s the problem? Well… From a mathematical point of view, it’s a bit difficult to find a function 〈x’|x〉 = f(x, x’) which, when multiplied with a wavefunction ψ(x), and integrated over all x, will just give us ψ(x’). A bit difficult? Well… It’s worse than that: it’s actually impossible!

Huh? Yes. Feynman illustrates the difficulty for x’ = 0, but he could have picked whatever value, really. In any case, if x’ = 0, we can write f(x, 0) = f(x), and our integral now reduces to:

ψ(0) = ∫ f(x) ψ(0) dx

This is a weird expression: the value of the integral (i.e. the right-hand side of the expression) does not depend on x: it is just some non-zero value ψ(0). However, we know that the f(x) in the integrand is zero for all x ≠ 0. Hence, this integral will be zero. So we have an impossible situation: we wish a function to be zero everywhere but for one point, and, at the same time, we also want it to give us a finite integral when using it in that integral above.

You’re likely to shake your head now and say: what the hell? Does it matter? It does: it is an actual problem in quantum math. Well… I should say: it was an actual problem in quantum math. Dirac solved it. He invented a new function which looks a bit less simple than our suggested generalization of Kronecker’s delta for the continuous case (i.e. that 〈 xx’ 〉 = δxx’ conjecture above). Dirac’s function is – quite logically – referred to as the Dirac delta function, and it’s actually defined by that integral above, in the sense that we impose the following two conditions on it:

  • δ(x‘) = 0 if x ≠ x’ (so that’s just like the first of our two conditions for that 〈 xx’ 〉 = δxx’ function)
  • δ(x)ψ(x) dx = ψ(x’) (so that’s not like the second of our two condition for that 〈 xx’ 〉 = δxx’ function)

Indeed, that second condition is much more sophisticated than our 〈 xx’ 〉 = 1 if x = x’ condition. In fact, one can show that the second condition amounts to finding some function satisfying this condition:

δ(x)dx = 1

We get this by equating x’ to zero once more and, additionally, by equating ψ(x) to 1. [Please do double-check yourself.] Of course, this ‘normalization’ (or ‘orthogonalization’) problem all sounds like a lot of hocus-pocus and, in many ways, it is. In fact, we’re actually talking a mathematical problem here which had been lying around for centuries (for a brief overview, see the Wikipedia article on it). So… Well… Without further ado, I’ll just give you the mathematical expression now—and please don’t stop reading now, as I’ll explain it in a moment:


I will also credit Wikipedia with the following animation, which shows that the expression above is just the normal distribution function, and which shows what happens when that a, i.e. its standard deviation, goes to zero: Dirac’s delta function is just the limit of a sequence of (zero-centered) normal distributions. That’s all. Nothing more, nothing less.


But how do we interpret it? Well… I can’t do better than Feynman as he describes what’s going on really:

“Dirac’s δ(xfunction has the property that it is zero everywhere except at x = 0 but, at the same time, it has a finite integral equal to unity. [See the δ(x)dx = 1 equation.] One should imagine that the δ(x) function has such fantastic infinity at one point that the total area comes out equal to one.”

Well… That says it all, I guess. 🙂 Don’t you love the way he puts it? It’s not an ‘ordinary’ infinity. No. It’s fantastic. Frankly, I think these guys were all fantastic. 🙂 The point is: that special function, Dirac’s delta function, solves our problem. The equivalent expression for the 〈 ij 〉 = δij condition for a finite and discrete set of base states is the following one for the continuous case:

xx’ 〉 = δ(x − x’)

The only thing left now is to generalize this result to three dimensions. Now that’s fairly straightforward. The ‘normalization’ condition above is all that’s needed in terms of modifying the equations for dealing with the continuum of base states corresponding to the points along a line. Extending the analysis to three dimensions goes as follows:

  • First, we replace the x coordinate by the vector r = (x, y, z)
  • As a result, integrals over x, become integrals over x, y and z. In other words, they become volume integrals.
  • Finally, the one-dimensional δ-function must be replaced by the product of three δ-functions: one in x, one in y and one in z. We write:

r | r 〉 = δ(x − x’) δ(y − y’)δ(z − z’)

Feynman summarizes it all together as follows:


What if we have two particles, or more? Well… Once again, I won’t bother to try to re-phrase the Grand Master as he explains it. I’ll just italicize or boldface the key points:

Suppose there are two particles, which we can call particle 1 and particle 2. What shall we use for the base states? One perfectly good set can be described by saying that particle 1 is at xand particle 2 is at x2, which we can write as | xx〉. Notice that describing the position of only one particle does not define a base state. Each base state must define the condition of the entire system, so you must not think that each particle moves independently as a wave in three dimensions. Any physical state | ψ 〉 can be defined by giving all of the amplitudes 〈 xx| ψ 〉 to find the two particles at x1 and x2. This generalized amplitude is therefore a function of the two sets of coordinates x1 and x1. You see that such a function is not a wave in the sense of an oscillation that moves along in three dimensions. Neither is it generally simply a product of two individual waves, one for each particle. It is, in general, some kind of a wave in the six dimensions defined by x1 and x1Hence, if there are two particles in Nature which are interacting, there is no way of describing what happens to one of the particles by trying to write down a wave function for it alone. The famous paradoxes that we considered in earlier chapters—where the measurements made on one particle were claimed to be able to tell what was going to happen to another particle, or were able to destroy an interference—have caused people all sorts of trouble because they have tried to think of the wave function of one particle alone, rather than the correct wave function in the coordinates of both particles. The complete description can be given correctly only in terms of functions of the coordinates of both particles.

Now we really know it all, don’t we? 🙂

Well… Almost. I promised to tackle another topic as well. So here it is:

Schrödinger’s equation in three dimensions

Let me start by jotting down what we had found already, i.e. Schrödinger’s equation when only one coordinate in space is involved. It’s written as:

schrodinger 3

Now, the extension to three dimensions is remarkably simple: we just substitute the ∂/∂xoperator by the ∇operator, i.e. ∇= ∂/∂x2  + ∂/∂y+ ∂/∂z2. We get:

schrodinger 4

Finally, we can also put forces on the particle, so now we are not looking at a particle moving in free space: we’ve got some force field working on it. It turns out the required modification is equally simple. The grand result is Schrödinger’s original equation in three dimensions:

schrodinger 5

V = V(x, y, z) is, of course, just the potential here. Remarkably simple equations but… How do we get these? Well… Sorry. The math is not too difficult, but you’re well equipped now to look at Feynman’s Lecture on it yourself now. You really are. Trust me. I really dealt with all of the ‘serious’ stuff you need to understand how he’s going about it in my previous posts so, yes, now I’ll just sit back and relax. Or go biking. Or whatever. 🙂

The Uncertainty Principle

In my previous post, I showed how Feynman derives Schrödinger’s equation using a historical and, therefore, quite intuitive approach. The approach was intuitive because the argument used a discrete model, so that’s stuff we are well acquainted with—like a crystal lattice, for example. However, now we’re now going to think continuity from the start. Let’s first see what changes in terms of notation.

New notations

Our C(xn, t) = 〈xn|ψ〉 now becomes C(x) = 〈x|ψ〉. This notation does not explicitly show the time dependence but then you know amplitudes like this do vary in space as well as in time. Having said that, the analysis below focuses mainly on their behavior in space, so it does make sense to not explicitly mention the time variable. It’s the usual trick: we look at how stuff behaves in space or, alternatively, in time. So we temporarily ‘forget’ about the other variable. That’s just how we work: it’s hard for our mind to think about these wavefunctions in both dimensions simultaneously although, ideally, we should do that.

Now, you also know that quantum physicists prefer to denote the wavefunction C(x) with some Greek letter: ψ (psi) or φ (phi). Feynman think it’s somewhat confusing because we use the same to denote a state itself, but I don’t agree. I think it’s pretty straightforward. In any case, we write:

ψ(x) = Cψ(x) = C(x) = 〈x|ψ〉

The next thing is the associated probabilities. From your high school math course, you’ll surely remember that we have two types of probability distributions: they are either discrete or, else, continuous. If they’re continuous, then our probability distribution becomes a probability density function (PDF) and, strictly speaking, we should no longer say that the probability of finding our particle at any particular point x at some time t is this or that. That probability is, strictly speaking, zero: if our variable is continuous, then our probability is defined for an interval only, and the P[x] value itself is referred to as a probability density. So we’ll look at little intervals Δx, and we can write the associated probability as:

prob (x, Δx) = |〈x|ψ〉|2Δx = |ψ(x)|2Δx

The idea is illustrated below. We just re-divide our continuous scale in little intervals and calculate the surface of some tiny elongated rectangle now. 🙂


It is also easy to see that, when moving to an infinite set of states, our 〈φ|ψ〉 = ∑〈φ|x〉〈x|ψ〉 (over all x) formula for calculating the amplitude for a particle to go from state ψ to state φ should now be written as an infinite sum, i.e. as the following integral:

amplitude continuous

Now, we know that 〈φ|x〉 = 〈x|φ〉* and, therefore, this integral can also be written as:


For example, if φ(x) =  〈x|φ〉 is equal to a simple exponential, so we can write φ(x) = a·eiθ, then φ*(x) =  〈φ|x〉 = a·e+iθ.

With that, we’re ready for the plat de résistance, except for one thing, perhaps: we don’t look at spin here. If we’d do that, we’d have to take two sets of base sets: one for up and one for down spin—but we don’t worry about this, for the time being, that is. 🙂

The momentum wavefunction

Our wavefunction 〈x|ψ〉 varies in time as well as in space. That’s obvious. How exactly depends on the energy and the momentum: both are related and, hence, if there’s uncertainty in the momentum, there will be uncertainty in the momentum, and vice versa. Uncertainty in the momentum changes the behavior of the wavefunction in space—through the p = ħk factor in the argument of the wavefunction (θ = ω·t − k·x)—while uncertainty in the energy changes the behavior of the wavefunction in time—through the E = ħω relation. As mentioned above, we focus on the variation in space here. We’ll do so y defining a new state, which is referred to as a state of definite momentum. We’ll write it as mom p, and so now we can use the Dirac notation to write the amplitude for an electron to have a definite momentum equal to p as:

φ(p) = 〈 mom p | ψ 〉

Now, you may think that the 〈x|ψ〉 and 〈mom p|ψ〉 amplitudes should be the same because, surely, we do associate the state with a definite momentum p, don’t we? Well… No! If we want to localize our wave ‘packet’, i.e. localize our particle, then we’re actually not going to associate it with a definite momentum. See my previous posts: we’re going to introduce some uncertainty so our wavefunction is actually a superposition of more elementary waves with slightly different (spatial) frequencies. So we should just go through the motions here and apply our integral formula to ‘unpack’ this amplitude. That goes as follows:

integral 2

So, as usual, when seeing a formula like this, we should remind ourselves of what we need to solve. Here, we assume we somehow know the ψ(x) = 〈x|ψ〉 wavefunction, so the question is: what do we use for 〈 mom p | x 〉? At this point, Feynman wanders off to start a digression on normalization, which really confuses the picture. When everything is said and done, the easiest thing to do is to just jot down the formula for that 〈mom p | x〉 in the integrand and think about it for a while:

〈mom p | x〉 = ei(p/ħ)∙x

I mean… What else could it be? This formula is very fundamental, and I am not going to try to explain it. As mentioned above, Feynman tries to ‘explain’ it by some story about probabilities and normalization, but I think his ‘explanation’ just confuses things even more. Really, what else would it be? The formula above really encapsulates what it means if we say that p and x are conjugate variables. [I can already note, of course, that symmetry implies that we can write something similar for energy and time. Indeed, we can define a state of definite energy as 〈E | ψ〉, and then ‘unpack’ it in the same way, and see that one of the two factors in the integrand would be equal to 〈E | t〉 and, of course, we’d associate a similar formula with it:

E | t〉 = ei(E/ħ)∙t

But let me get back to the lesson here. We’re analyzing stuff in space now, not in time. Feynman gives a simple example here. He suggests a wavefunction which has the following form:

ψ(x) = K·ex2/4σ2

The example is somewhat disingenuous because this is not a complex– but real-valued function. In fact, squaring it, and then calculating applying the normalization condition (all probabilities have to add up to one), yields the normal probability distribution:

prob (x, Δx) = P(x)dx = (2πσ2)−1/2ex2/2σ2dx

So that’s just the normal distribution for μ = 0, as illustrated below.


In any case, the integral we have to solve now is:Integral 3

Now, I hate integrals as much as you do (probably more) and so I assume you’re also only interested in the result (if you want the detail: check it in Feynman), which we can write as:

φ(p) = (2πη2)−1/4·ep2/4η2, with η = ħ/2σ

This formula is totally identical to the ψ(x) = (2πσ2)−1/4·ex2/4σdistribution we started with, except that it’s got another sigma value, which we denoted by η (and that’s not nu but eta), with 

η = ħ/2σ

Just for the record, Feynman refers to η and σ as the ‘half-width’ of the respective distributions. Mathematicians would say they’re the standard deviation. The concept are nearly the same, but not quite. In any case, that’s another thing I’ll let you find our for yourself. 🙂 The point is: η and σ are inversely proportional to each other, and the constant of proportionality is equal to ħ/2.

Now, if we take η and σ as measures of the uncertainty in and respectively – which is what they are, obviously ! – then we can re-write that η = ħ/2σ as ησ = ħ/2 or, better still, as the Uncertainty Principle itself:

ΔpΔx = ħ/2

You’ll say: that’s great, but we usually see the Uncertainty Principle written as:

ΔpΔx ≥ ħ/2

So where does that come from? Well… We choose a normal distribution (or the Gaussian distribution, as physicists call it), and so that yields the ΔpΔx = ħ/2 identity. If we’d chosen another one, we’d find a slightly different relation and so… Well… Let me quote Feynman here: “Interestingly enough, it is possible to prove that for any other form of a distribution in x or p, the product ΔpΔcannot be smaller than the one we have found here, so the Gaussian distribution gives the smallest possible value for the ΔpΔproduct.”

This is great. So what about the even more approximate ΔpΔx ≥ ħ formula? Where does that come from? Well… That’s more like a qualitative version of it: it basically says the minimum value of the same product is of the same order as ħ which, as you know, is pretty tiny: it’s about 0.0000000000000000000000000000000006626 J·s. 🙂 The last thing to note is its dimension: momentum is expressed in newton-second and position in meter, obviously. So the uncertainties in them are expressed in the same unit, and so the dimension of the product is N·m·s = J·s. So this dimension combines force, distance and time. That’s quite appropriate, I’d say. The ΔEΔproduct obviously does the same. But… Well… That’s it, folks! I enjoyed writing this – and I cannot always say the same of other posts! So I hope you enjoyed reading it. 🙂

Schrödinger’s equation: the original approach

Of course, your first question when seeing the title of this post is: what’s original, really? Well… The answer is simple: it’s the historical approach, and it’s original because it’s actually quite intuitive. Indeed, Lecture no. 16 in Feynman’s third Volume of Lectures on Physics is like a trip down memory lane as Feynman himself acknowledges, after presenting Schrödinger’s equation using that very rudimentary model we developed in our previous post:

“We do not intend to have you think we have derived the Schrödinger equation but only wish to show you one way of thinking about it. When Schrödinger first wrote it down, he gave a kind of derivation based on some heuristic arguments and some brilliant intuitive guesses. Some of the arguments he used were even false, but that does not matter; the only important thing is that the ultimate equation gives a correct description of nature.”

So… Well… Let’s have a look at it. 🙂 We were looking at some electron we described in terms of its location at one or the other atom in a linear array (think of it as a line). We did so by defining base states |n〉 = |xn〉, noting that the state of the electron at any point in time could then be written as:

|φ〉 = ∑ |xnCn(t) = ∑ |xn〉〈xn|φ〉 over all n

The Cn(t) = 〈xn|φ〉 coefficient is the amplitude for the electron to be at xat t. Hence, the Cn(t) amplitudes vary with t as well as with x. We’ll re-write them as Cn(t) = C(xn, t) = C(xn). Note that the latter notation does not explicitly show the time dependence. The Hamiltonian equation we derived in our previous post is now written as:

iħ·(∂C(xn)/∂t) = E0C(xn) − AC(xn+b) − AC(xn−b)

Note that, as part of our move from the Cn(t) to the C(xn) notation, we write the time derivative dCn(t)/dt now as ∂C(xn)/∂t, so we use the partial derivative symbol now (∂). Of course, the other partial derivative will be ∂C(x)/∂x) as we move from the count variable xto the continuous variable x, but let’s not get ahead of ourselves here. The solution we found for our C(xn) functions was the following wavefunction:

C(xn) = a·ei(k∙xn−ω·t) ei∙ω·t·ei∙k∙xn ei·(E/ħ)·t·ei·k∙xn

We also found the following relationship between E and k:

E = E0 − 2A·cos(kb)

Now, even Feynman struggles a bit with the definition of E0 and k here, and their relationship with E, which is graphed below.


Indeed, he first writes, as he starts developing the model, that E0 is, physically, the energy the electron would have if it couldn’t leak away from one of the atoms, but then he also adds: “It represents really nothing but our choice of the zero of energy.”

This is all quite enigmatic because we cannot just do whatever we want when discussing the energy of a particle. As I pointed out in one of my previous posts, when discussing the energy of a particle in the context of the wavefunction, we generally consider it to be the sum of three different energy concepts:

  1. The particle’s rest energy m0c2, which de Broglie referred to as internal energy (Eint), and which includes the rest mass of the ‘internal pieces’, as Feynman puts it (now we call those ‘internal pieces’ quarks), as well as their binding energy (i.e. the quarks’ interaction energy).
  2. Any potential energy it may have because of some field (i.e. if it is not traveling in free space), which we usually denote by U. This field can be anything—gravitational, electromagnetic: it’s whatever changes the energy of the particle because of its position in space.
  3. The particle’s kinetic energy, which we write in terms of its momentum p: m·v2/2 = m2·v2/(2m) = (m·v)2/(2m) = p2/(2m).

It’s obvious that we cannot just “choose” the zero point here: the particle’s rest energy is its rest energy, and its velocity is its velocity. So it’s not quite clear what the E0 in our model really is. As far as I am concerned, it represents the average energy of the system really, so it’s just like the E0 for our ammonia molecule, or the E0 for whatever two-state system we’ve seen so far. In fact, when Feynman writes that we can “choose our zero of energy so that E0 − 2A = 0″ (so the minimum of that curve above is at the zero of energy), he actually makes some assumption in regard to the relative magnitude of the various amplitudes involved.

We should probably think about it in this way: −(i/ħ)·E0 is the amplitude for the electron to just stay where it is, while i·A/ħ is the amplitude to go somewhere else—and note we’ve got two possibilities here: the electron can go to |xn+1〉,  or, alternatively, it can go to |xn−1〉. Now, amplitudes can be associated with probabilities by taking the absolute square, so I’d re-write the E0 − 2A = 0 assumption as:

E0 = 2A ⇔ |−(i/ħ)·E0|= |(i/ħ)·2A|2

Hence, in my humble opinion, Feynman’s assumption that E0 − 2A = 0 has nothing to do with ‘choosing the zero of energy’. It’s more like a symmetry assumption: we’re basically saying it’s as likely for the electron to stay where it is as it is to move to the next position. It’s an idea I need to develop somewhat further, as Feynman seems to just gloss over these little things. For example, I am sure it is not a coincidence that the EI, EIIEIII and EIV energy levels we found when discussing the hyperfine splitting of the hydrogen ground state also add up to 0. In fact, you’ll remember we could actually measure those energy levels (E= EII = EIII = A ≈ 9.23×10−6 eV, and EIV = −3A ≈ −27.7×10−6 eV), so saying that we can “choose” some zero energy point is plain nonsense. The question just doesn’t arise. In any case, as I have to continue the development here, I’ll leave this point for further analysis in the future. So… Well… Just note this E0 − 2A = 0 assumption, as we’ll need it in a moment.

The second assumption we’ll need concerns the variation in k. As you know, we can only get a wave packet if we allow for uncertainty in k which, in turn, translates into uncertainty for E. We write:

ΔE = Δ[E0 − 2A·cos(kb)]

Of course, we’d need to interpret the Δ as a variance (σ2) or a standard deviation (σ) so we can apply the usual rules – i.e. var(a) = 0, var(aX) = a2·var(X), and var(aX ± bY) = a2·var(X) + b2·var(Y) ± 2ab·cov(X, Y) – to be a bit more precise about what we’re writing here, but you get the idea. In fact, let me quickly write it out:

var[E0 − 2A·cos(kb)] = var(E0) + 4A2·var[cos(kb)] ⇔ var(E) = 4A2·var[cos(kb)]

Now, you should check my post scriptum to my page on the Essentials, to see how the probability density function of the cosine of a randomly distributed variable looks like, and then you should go online to find a formula for its variance, and then you can work it all out yourself, because… Well… I am not going to do it for you. What I want to do here is just show how Feynman gets Schrödinger’s equation out of all of these simplifications.

So what’s the second assumption? Well… As the graph shows, our k can take any value between −π/b and +π/b, and therefore, the kb argument in our cosine function can take on any value between −π and +π. In other words, kb could be any angle. However, as Feynman puts it—we’ll be assuming that kb is ‘small enough’, so we can use the small-angle approximations whenever we see the cos(kb) and/or sin(kb) functions. So we write: sin(kb) ≈ kb and cos(kb) ≈ 1 − (kb)2/2 = 1 − k2b2/2. Now, that assumption led to another grand result, which we also derived in our previous post. It had to do with the group velocity of our wave packet, which we calculated as:

= dω/dk = (2Ab2/ħ)·k

Of course, we should interpret our k here as “the typical k“. Huh? Yes… That’s how Feynman refers to it, and I have no better term for it. It’s some kind of ‘average’ of the Δk interval, obviously, but… Well… Feynman does not give us any exact definition here. Of course, if you look at the graph once more, you’ll say that, if the typical kb has to be “small enough”, then its expected value should be zero. Well… Yes and no. If the typical kb is zero, or if is zero, then is zero, and then we’ve got a stationary electron, i.e. an electron with zero momentum. However, because we’re doing what we’re doing (that is, we’re studying “stuff that moves”—as I put it unrespectfully in a few of my posts, so as to distinguish from our analyses of “stuff that doesn’t move”, like our two-state systems, for example), our “typical k” should not be zero here. OK… We can now calculate what’s referred to as the effective mass of the electron, i.e. the mass that appears in the classical kinetic energy formula: K.E. = m·v2/2. Now, there are two ways to do that, and both are somewhat tricky in their interpretation:

1. Using both the E0 − 2A = 0 as well as the “small kb” assumption, we find that E = E0 − 2A·(1 − k2b2/2) = A·k2b2. Using that for the K.E. in our formula yields:

meff = 2A·k2b2/v= 2A·k2b2/[(2Ab2/ħ)·k]= ħ2/(2Ab2)

2. We can use the classical momentum formula (p = m·v), and then the 2nd de Broglie equation, which tells us that each wavenumber (k) is to be associated with a value for the momentum (p) using the p = ħk (so p is proportional to k, with ħ as the factor of proportionality). So we can now calculate meff as meff = ħk/v. Substituting again for what we’ve found above, gives us the same:

meff = 2A·k2b2/v = ħ·k/[(2Ab2/ħ)·k] = ħ2/(2Ab2)

Of course, we’re not supposed to know the de Broglie relations at this point in time. 🙂 But, now that you’ve seen them anyway, note how we have two formulas for the momentum:

  • The classical formula (p = m·v) tells us that the momentum is proportional to the classical velocity of our particle, and m is then the factor of proportionality.
  • The quantum-mechanical formula (p = ħk) tells us that the (typical) momentum is proportional to the (typical) wavenumber, with Planck’s constant (ħ) as the factor of proportionality. Combining both combines the classical and quantum-mechanical perspective of a moving particle:

v = ħk

I know… It’s an obvious equation but… Well… Think of it. It’s time to get back to the main story now. Remember we were trying to find Schrödinger’s equation? So let’s get on with it. 🙂

To do so, we need one more assumption. It’s the third major simplification and, just like the others, the assumption is obvious on first, but not on second thought. 😦 So… What is it? Well… It’s easy to see that, in our meff = ħ2/(2Ab2) formula, all depends on the value of 2Ab2. So, just like we should wonder what happens with that kb factor in the argument of our sine or cosine function if b goes to zero—i.e. if we’re letting the lattice spacing go to zero, so we’re moving from a discrete to a continuous analysis now—we should also wonder what happens with that 2Ab2 factor! Well… Think about it. Wouldn’t it be reasonable to assume that the effective mass of our electron is determined by some property of the material, or the medium (so that’s the silicon in our previous post) and, hence, that it’s constant really. Think of it: we’re not changing the fundamentals really—we just have some electron roaming around in some medium and all that we’re doing now is bringing those xcloser together. Much closer. It’s only logical, then, that our amplitude to jump from xn±1 to xwould also increase, no? So what we’re saying is that 2Ab2 is some constant which we write as ħ2/meff or, what amounts to the same, that Ab= ħ2/2·meff.

Of course, you may raise two objections here:

  1. The Ab= ħ2/2·meff assumption establishes a very particular relation between A and b, as we can write A as A = [ħ2/(2meff)]·b−2 now. So we’ve got like an y = 1/x2 relation here. Where the hell does that come from?
  2. We were talking some real stuff here: a crystal lattice with atoms that, in reality, do have some spacing, so that corresponds to some real value for b. So that spacing gives some actual physical significance to those xvalues.

Well… What can I say? I think you should re-read that quote of Feynman when I started this post. We’re going to get Schrödinger’s equation – i.e. the ultimate prize for all of the hard work that we’ve been doing so far – but… Yes. It’s really very heuristic, indeed! 🙂 But let’s get on with it now! We can re-write our Hamiltonian equation as:

iħ·(∂C(xn)/∂t) = E0C(xn) − AC(xn+b) − AC(xn−b)]

= (E0−2A)C(xn) + A[2C(xn) − C(xn+b) − C(xn−b) = A[2C(xn) − C(xn+b) − C(xn−b)]

Now, I know your brain is about to melt down but, fiddling with this equation as we’re doing right now, Schrödinger recognized a formula for the second-order derivative of a function. I’ll just jot it down, and you can google it so as to double-check where it comes from:

second derivative

Just substitute f(x) for C(xn) in the second part of our equation above, and you’ll see we can effectively write that 2C(xn) − C(xn+b) − C(xn−b) factor as:

formula 1

We’re done. We just iħ·(∂C(xn)/∂t) on the left-hand side now and multiply the expression above with A, to get what we wanted to get, and that’s – YES! – Schrödinger’s equation:

Schrodinger 2

Whatever your objections to this ‘derivation’, it is the correct equation. For a particle in free space, we just write m instead of meff, but it’s exactly the same. I’ll now give you Feynman’s full quote, which is quite enlightening:

“We do not intend to have you think we have derived the Schrödinger equation but only wish to show you one way of thinking about it. When Schrödinger first wrote it down, he gave a kind of derivation based on some heuristic arguments and some brilliant intuitive guesses. Some of the arguments he used were even false, but that does not matter; the only important thing is that the ultimate equation gives a correct description of nature. The purpose of our discussion is then simply to show you that the correct fundamental quantum mechanical equation [i.e. Schrödinger’s equation] has the same form you get for the limiting case of an electron moving along a line of atoms. We can think of it as describing the diffusion of a probability amplitude from one point to the next along the line. That is, if an electron has a certain amplitude to be at one point, it will, a little time later, have some amplitude to be at neighboring points. In fact, the equation looks something like the diffusion equations which we have used in Volume I. But there is one main difference: the imaginary coefficient in front of the time derivative makes the behavior completely different from the ordinary diffusion such as you would have for a gas spreading out along a thin tube. Ordinary diffusion gives rise to real exponential solutions, whereas the solutions of Schrödinger’s equation are complex waves.”

So… That says it all, I guess. Isn’t it great to be where we are? We’ve really climbed a mountain here. And I think the view is gorgeous. 🙂

Oh—just in case you’d think I did not give you Schrödinger’s equation, let me write it in the form you’ll usually see it:

schrodinger 3

Done! 🙂

Quantum math in solid-state physics

I’ve said it a couple of times already: so far, we’ve only studied stuff that does not move in space. Hence, till now, time was the only variable in our wavefunctions. So now it’s time to… Well… Study stuff that does move in space. 🙂

Is that compatible with the title of this post? Solid-state physics? Solid-state stuff doesn’t move, does it? Well… No. But what we’re going to look at is how an electron travels through a solid crystal or, more generally, how an atomic excitation can travel through. In fact, what we’re really going to look at is how the wavefunction itself travels through space. However, that’s a rather bold statement, and so you should just read this post and judge for yourself. To be specific, we’re going to look at what happens in semiconductor material, like the silicon that’s used in microelectronic components like transistors and integrated circuits (ICs). You surely know the classical idea of that, which involves imagining an electron can be situated in a kind of ‘pit’ at one particular atom (or an electron hole, as it’s usually referred to), and it just moves from pit to pit. The Wikipedia article on it defines an electron hole as follows: an electron hole is the absence of an electron from a full valence band: the concept is used to conceptualize the interactions of the electrons within a nearly full system, i.e. a system which is missing just a few electrons. But here we’re going to forget about the classical picture. We’ll try to model it using the wavefunction concept. So how does that work? Feynman approaches it as follows.

If we look at a (one-dimensional) line of atoms – we can extend to a two- and three-dimensional analysis later – we may define an infinite number of base states for the extra electron that we think of as moving through the crystal. If the electron is with the n-th atom, then we’ll say it’s in a base state which we shall write as |n〉. Likewise, if it’s at atom n+1 or n−1, then we’ll associate that with base state |n+1〉 and |n−1〉 respectively. That’s what visualized below, and you should just along with the story here: don’t think classically, i.e. in terms of the electron is either here or, else, somewhere else. No. It’s got an amplitude to be anywhere. If you can’t take that… Well… I am sorry but that’s what QM is all about!

electron moving

As usual, we write the amplitude for the electron to be in one of those states |n〉 as Cn(t) = 〈n|φ〉, and so we can the write the electron’s state at any point in time t by superposing all base states, so that’s the weighted sum of all base states, with the weights being equal to the associated amplitudes. So we write:

|φ〉 = ∑ |nCn(t) = ∑ |n〉〈n|φ〉 over all n

Now we add some assumptions. One assumption is that an electron cannot directly jump to its next nearest neighbor: if it goes to the next nearest one, it will first have to go to nearest one. So two steps are needed to go from state |n−1〉 to state |n+1〉. This assumption simplifies the analysis: we can discuss more general cases later. To be specific, we’ll assume the amplitude to go from one base state to another, e.g. from |n〉 to |n+1〉, or |n−1〉 to state |n〉, is equal to i·A/ħ. You may wonder where this comes from, but it’s totally in line with equating our Hamiltonian non-diagonal elements to –A. Let me quickly insert a small digression here—for those who do really wonder where this comes from. 🙂


Just check out those two-state systems we described, or that post of mine in which I explained why the following formulas are actually quite intuitive and easy to understand:

  • U12(t + Δt, t) = − (i/ħ)·H12(t)·Δt = (i/ħ)·A·Δt and
  • U21(t + Δt, t) = − (i/ħ)·H21(t)·Δt = (i/ħ)·A·Δt

More generally, you’ll remember that we wrote Uij(t + Δt, t) as:

Uij(t + Δt, t) = Uij(t, t) + Kij·Δt = δij(t, t) + Kij·Δt = δij(t, t) − (i/ħ)·Hij(t)·Δt

That looks monstrous but, frankly, what we have here is just a formula like this:

 f(x+Δx) = f(x) + [df(x)/dt]·Δx

In case you didn’t notice, the formula is just the definition of the derivative if we write it as Δy/Δx = df(x)/dt for Δx → 0. Hence, the Kij coefficient in this formula is to be interpreted as a time derivative. Now, we re-wrote that Kij coefficient as the amplitude −(i/ħ)·Hij(t) and, therefore, that amplitude – i.e. the i·A/ħ factor (for ij) I introduced above – is to be interpreted as a time derivative. [Now that we’re here, let me quickly add that a time derivative gives the time rate of change of some quantity per unit time. So that i·A/ħ factor is also expressed per unit time.] We’d then just move the − (i/ħ) factor in that −(i/ħ)·Hij(t) coefficient to the other side to get the grand result we got for two-state systems, i.e. the Hamiltonian equations, which we could write in a number of ways, as shown below:

hamiltonian equations

So… Well… That’s all there is to it, basically. Quantum math is not easy but, if anything, it is logical. You just have to get used to that imaginary unit (i) in front of stuff. That makes it always look very mysterious. 🙂 However, it should never scare you. You can just move it in or out of the differential operator, for example: i·df(x)/dt = d[i·f(x)]/dt. [Of course, i·f(x) ≠ f(i·x)!] So just think of as a reminder that the number that follows it points in a different direction. To be precise: its angle with the other number is 90°. It doesn’t matter what we call those two numbers. The convention is to say that one is the real part of the wavefunction, while the other is the imaginary part but, frankly, in quantum math, both numbers are just as real. 🙂


Yes. Let me get back to the lesson here. The assumption is that the Hamiltonian equations for our system here, i.e. the electron traveling from hole to hole, look like the following equation:


It’s really like those iħ·(dC1/dt) = E0C1 − AC2 and iħ·(dC2/dt) = − AC1 + E0C2 equations above, except that we’ve got three terms here:

  1. −(i/ħ)·E0 is the amplitude for the electron to just stay where it is, so we multiply that with the amplitude of the electron to be there at that time, i.e. the amplitude Cn(t), and bingo! That’s the first contribution to the time rate of change of the Cn amplitude (i.e. dCn/dt). [Note that all I brought that iħ factor in front to the other side: 1/(iħ) = −(i/ħ).] Of course, you also need to know what Eis now: that’s just the (average) energy of our electron. So it’s really like the Eof our ammonia molecule—or the average energy of any two-state system, really.
  2. −(i/ħ)·(−A) = i·A/ħ is the amplitude to go from one base state to another, i.e. from |n+1〉 to |n〉, for example. In fact, the second term models exactly that: i·A/ħ times the amplitude to be in state |n+1〉 is the second contribution to to the time rate of change of the Cn amplitude.
  3. Finally, the electron may also be in state |n−1〉 and go to |n〉 from there, so i·A/ħ times the amplitude to be in state |n−1〉 is yet another contribution to to the time rate of change of the Cn amplitude.

Now, we don’t want to think about what happens at the start and the end of our line of atoms, so we’ll just assume we’ve got an infinite number of them. As a result, we get an infinite number of equations, which Feynman summarizes as:

hamiltonian equations - 2

Holy cow! How do we solve that? We know that the general solution for those Cn amplitudes is likely to be some function like this:

Cn(t) = an·e−(i/ħ)·E·t

In case you wonder where this comes from, check my post on the general solution for N-state systems. If we substitute that trial solution in that iħ·(dCn/dt) = E0Cn − ACn+1 − ACn−1, we get:

Ea= E0an − Aan+1 − Aan−1

[Just do that derivative, and you’ll see the iħ can be scrapped. Also, the exponentials on both sides of the equation cancel each other out.] Now, that doesn’t look too bad, and we can also write it as (E − E0a= − A(an+1 + an−1 ), but… Well… What’s the next step? We’ve got an infinite number of coefficients ahere, so we can’t use the usual methods to solve this set of equations. Feynman tries something completely different here. It looks weird but… Well… He gets a sensible result, so… Well… Let’s go for it.

He first writes these coefficients aas a function of a distance, which he defines as xn = xn−1 + b, with the atomic spacing, i.e. the distance between two atoms (see the illustration). So now we write a= f(xn) = a(xn). Note that we don’t write a= fn(xn) = an(xn). No. It’s just one function f = a, not an infinite number of functions f= an. Of course, once you see what comes of it, you’ll say: sure! The (complex) acoefficient in that function is the non-time-varying part of our function, and it’s about time we insert some part that’s varying in space and so… Well… Yes, of course! Our acoefficients don’t vary in time, so they must vary in space. Well… Yes. I guess so. 🙂 Our Ea= E0an − Aan+1 − Aan−1 equation becomes:

a(xn) = E0·a(xn) − a(xn+1) − A·a(xn+1) = E0·a(xn) − a(xn+b) − A·a(xn−b)

We can write this, once again, as (E − E0a(xn) = − A·[a(xn+b) + a(xn−b)]. Feynman notes this equation is like a differential equation, in the sense that it relates the value of some function (i.e. our a function, of course) at some point x to the values of the same function at nearby points, i.e. ± b here. Frankly, I struggle a bit to see how it works exactly but Feynman now offers the following trial solution:

a(xn) = eikxn

Huh? Why? And what’s k? Be patient. Just go along with this for a while. Let’s first do a graph. Think of xas a nearly continuous variable representing position in space. We then know that this parameter k is then equal to the spatial frequency of our wavefunction: larger values for k give the wavefunction a higher density in space, as shown below. 


In fact, I shouldn’t confuse you here, but you’ll surely think of the wavefunction you saw so many times already:

ψ(x, t) = a·ei·[(E/ħ)·t − (p/ħ)∙x] = a·ei·(ω·t − k∙x) = a·ei(k∙x−ω·t) = a·ei∙k∙x·ei∙ω·t

This was the elementary wavefunction we’d associate with any particle, and so would be equal to p/ħ, which is just the second of the two de Broglie relations: E = ħω and p = ħk (or, what amounts to the same: E = hf and λ = h/p). But you shouldn’t get confused. Not at this point. Or… Well… Not yet. 🙂

Let’s just take this proposed solution and plug it in. We get:

eikxn = E0·eikxn − eik(xn+b) − A·eik(xn−b) ⇔ E = E0 − eikb − A·eikb ⇔ E = E0 − 2A·cos(kb)

[In case you wonder what happens here: we just divide both sides by the common factor eikxand then we also know that eiθ+eiθ = 2·cosθ property.] So each is associated with some energy E. In fact, to be precise, that E = E0 − 2A·cos(kb) function is a periodic function: it’s depicted below, and it reaches a maximum at k = ± π/b. [It’s easy to see why: E0 − 2A·cos(kb) reaches a maximum if cos(kb) = −1, i.e. if kb = ± π.]


Of course, we still don’t really know what k or E are really supposed to represent, but think of it: it’s obvious that E can never be larger or smaller than E ± 2A, whatever the value of k. Note that, once again, it doesn’t matter if we used +A or −A in our equations: the energy band remains the same. And… Well… We’ve dropped the term now: the energy band of a semiconductor. That’s what it’s all about. What we’re saying here is that our electron, as it moves about, can have no other energies than the values in this band. Having said, that still doesn’t determine its energy: any energy level within that energy band is possible. So what does that mean? Hmm… Let’s take a break and not bother too much about k for the moment. Let’s look at our Cn(t) equations once more. We can now write them as:

Cn(t) =  eikxn·e−(i/ħ)·E·t = eikxn·e−(i/ħ)·[E0 − 2A·cos(kb)]·t

You have enough experience now to sort of visualize what happens here. We can look at a certain xvalue – read: a certain position in the lattice and watch, as time goes by, how the real and imaginary part of our little Cwavefunction varies sinusoidally. We can also do it the other way around, and take a snapshot of the lattice at a certain point in time, and then we see how the amplitudes vary from point to point. That’s easy enough.

The thing is: we’re interested in probabilities in the end, and our wavefunction does not satisfy us in that regard: if we take the absolute square, its phase vanishes, and so we get the same probability everywhere! [Note that we didn’t normalize our wavefunctions here. It doesn’t matter. We can always do that later.] Now that’s not great. So what can we do about that? Now that’s where that comes back in the game. Let’s have a look.

The effective mass of an electron

We’d like to find a solution which sort of ‘localizes’ our electron in space. Now, we know that we can do, in general, by superposing wavefunctions having different frequencies. There are a number of ways to go about, but the general idea is illustrated below.

Fourier_series_and_transform beats

The first animation (for which credit must go to Wikipedia once more) is, obviously, the most sophisticated one. It shows how a new function – in red, and denoted by s6(x) – is constructed by summing six sine functions of different amplitudes and with harmonically related frequencies. This particular sum is referred to as a Fourier series, and the so-called Fourier transform, i.e. the S(f) function (in blue), depicts the six frequencies and their amplitudes.

We’re more interested in the second animation here (for which credit goes to another nice site), which shows how a pattern of beats is created by just mixing two slightly different cosine waves. We want to do something similar here: we want to get a ‘wave packet‘ like the one below, which shows the real part only—but you can imagine the imaginary part 🙂 of course. [That’s exactly the same but with a phase shift, cf. the sine and cosine bit in Euler’s formula: eiθ = cosθ + i·sinθ.]


As you know, we must know make a distinction between the group velocity of the wave, and its phase velocity. That’s got to do with the dispersion relation, but we’re not going to get into the nitty-gritty here. Just remember that the group velocity corresponds to the classical velocity of our particle – so that must be the classical velocity of our electron here – and, equally important, also remember the following formula for that group velocity:

group velocity

Let’s see how that plays out. The ω in this equation is equal to E/ħ = [E0 − 2A·cos(kb)]/ħ, so dω/dk = d[− (2A/ħ)·cos(kb)]/dk = (2Ab/ħ)·sin(kb). However, we’ll usually assume k is fairly small, so the variation of the amplitude from one xn to the other is fairly small. In that case, kb will be fairly small, and then we can use the so-called small angle approximation formula sin(ε) ≈ ε. [Note the reasoning here is a bit tricky, though, because – theoretically – k may vary between −π/b and +π/b and, hence, kb can take any value between −π and +π.] Using the small angle approximation, we get:

solution velocity

So we’ve got a quantum-mechanical calculation here that yields a classical velocity. Now, we can do something interesting now: we can calculate what is known as the effective mass of the electron, i.e. the mass that appears in the classical kinetic energy formula: K.E. = m·v2/2. Or in the classical momentum formula: p = m·vSo we can now write: K.E. = meff·v2/2 and p = meff·vBut… Well… The second de Broglie equation tells us that p = ħk, so we find that meff = ħk/v. Substituting for what we’ve found above, gives us:

formula for m eff

Unsurprisingly, we find that the value of meff is inversely proportional to A. It’s usually stated in units of the true mass of the electron, i.e. its mass in free space (m≈ 9.11×10−31 kg) and, in these units, it’s usually in the range of 0.01 to 10. You’ll say: 0.01, i.e. one percent of its actual mass? Yes. An electron may travel more freely in matter than it does in free space. 🙂 That’s weird but… Well… Quantum mechanics is weird.

In any case, I’ll wrap this post up now. You’ ve got a nice model here. As Feynman puts it:

“We have now explained a remarkable mystery—how an electron in a crystal (like an extra electron put into germanium) can ride right through the crystal and flow perfectly freely even though it has to hit all the atoms. It does so by having its amplitudes going pip-pip-pip from one atom to the next, working its way through the crystal. That is how a solid can conduct electricity.”

Well… There you go. 🙂

Systems with 2 spin-1/2 particles (II)

In our previous post, we noted the Hamiltonian for a simple system of two spin-1/2 particles—a proton and an electron (i.e. a hydrogen atom, in other words):


After noting that this Hamiltonian is “the only thing that it can be, by the symmetry of space, i.e. so long as there is no external field,” Feynman also notes the constant term (A) depends on the level we choose to measure energies from, so one might just as well take E= 0, in which case the formula reduces to H = Aσe·σp. Feynman analyzes this term as follows:

If there are two magnets near each other with magnetic moments μe and μp, the mutual energy will depend on μe·μp = |μe||μp|cosα = μeμpcosα — among other things. Now, the classical thing that we call μe or μp appears in quantum mechanics as μeσand μpσrespectively (where μis the magnetic moment of the proton, which is about 1000 times smaller than μe, and has the opposite sign). So the H = Aσe·σp equation says that the interaction energy is like the interaction between two magnets—only not quite, because the interaction of the two magnets depends on the radial distance between them. But the equation could be—and, in fact, is—some kind of an average interaction. The electron is moving all around inside the atom, and our Hamiltonian gives only the average interaction energy. All it says is that for a prescribed arrangement in space for the electron and proton there is an energy proportional to the cosine of the angle between the two magnetic moments, speaking classically. Such a classical qualitative picture may help you to understand where the H = Aσe·σequation comes from.

That’s loud and clear, I guess. The next step is to introduce an external field. The formula for the Hamiltonian (we don’t distinguish between the matrix and the operator here) then becomes:

H = Aσe·σp − μeσe·B − μpσp·B

The first term is the term we already had. The second term is the energy the electron would have in the magnetic field if it were there alone. Likewise, the third term is the energy the proton would have in the magnetic field if it were there alone. When reading this, you should remember the following convention: classically, we write the energy U as U = −μ·B, because the energy is lowest when the moment is along the field. Hence, for positive particles, the magnetic moment is parallel to the spin, while for negative particles it’s opposite. In other words, μp is a positive number, while μe is negative. Feynman sums it all up as follows:

Classically, the energy of the electron and the proton together, would be the sum of the two, and that works also quantum mechanically. In a magnetic field, the energy of interaction due to the magnetic field is just the sum of the energy of interaction of the electron with the external field, and of the proton with the field—both expressed in terms of the sigma operators. In quantum mechanics these terms are not really the energies, but thinking of the classical formulas for the energy is a way of remembering the rules for writing down the Hamiltonian.

That’s also loud and clear. So now we need to solve those Hamiltonian equations once again. Feynman does so first assuming B is constant and in the z-direction. I’ll refer you to him for the nitty-gritty. The important thing is the results here:


He visualizes these – as a function of μB/A – as follows:


The illustration shows how the four energy levels have a different B-dependence:

  • EI, EII, EIII start at (0, 1) but EI increases linearly with B—with slope μ, to be precise (cf. the EI = A + μB expression);
  • In contrast, EII decreases linearly with B—again, with slope μ (cf. the EII = A − μB expression);
  • We then have the EIII and EIV curves, which start out horizontally, to then curve and approach straight lines for large B, with slopes equal to μ’.

Oh—I realize I forget to define μ and μ’. Let me do that now: μ = −(μep) and μ’ = −(μe−μp). And remember what we said above: μis about 1000 times smaller than μe, and has opposite sign. OK. The point is: the magnetic field shifts the energy levels of our hydrogen atom. This is referred to as the Zeeman effect. Feynman describes it as follows:

The curves show the Zeeman splitting of the ground state of hydrogen. When there is no magnetic field, we get just one spectral line from the hyperfine structure of hydrogen. The transitions between state IV and any one of the others occurs with the absorption or emission of a photon whose (angular) frequency is 1/ħ times the energy difference 4A. [See my previous post for the calculation.] However, when the atom is in a magnetic field B, there are many more lines, and there can be transitions between any two of the four states. So if we have atoms in all four states, energy can be absorbed—or emitted—in any one of the six transitions shown by the vertical arrows in the illustration above.

The last question is: what makes the transitions go? Let me also quote Feynman’s answer to that:

The transitions will occur if you apply a small disturbing magnetic field that varies with time (in addition to the steady strong field B). It’s just as we saw for a varying electric field on the ammonia molecule. Only here, it is the magnetic field which couples with the magnetic moments and does the trick. But the theory follows through in the same way that we worked it out for the ammonia. The theory is the simplest if you take a perturbing magnetic field that rotates in the xy-plane—although any horizontal oscillating field will do. When you put in this perturbing field as an additional term in the Hamiltonian, you get solutions in which the amplitudes vary with time—as we found for the ammonia molecule. So you can calculate easily and accurately the probability of a transition from one state to another. And you find that it all agrees with experiment.

Alright! All loud and clear. 🙂

The magnetic quantum number

At very low magnetic fields, we still have the Zeeman splitting, but we can now approximate it as follows:

magnetic quantum number

This simplified representation of things explains an older concept you may still see mentioned: the magnetic quantum number, which is usually denoted by m. Feynman’s explanation of it is quite straightforward, and so I’ll just copy it as is:


As he notes: the concept of the magnetic quantum number has nothing to do with new physics. It’s all just a matter of notation. 🙂

Well… This concludes our short study of four-state systems. On to the next! 🙂

Systems with 2 spin-1/2 particles (I)

I agree: this is probably the most boring title of a post ever. However, it should be interesting, as we’re going to apply what we’ve learned so far – i.e. the quantum-mechanical model of two-state systems – to a much more complicated problem—the solution of which can then be generalized to describe even more complicated situations.

Two spin-1/2 particles? Let’s recall the most obvious example. In the ground state of a hydrogen atom (H), we have one electron that’s bound to one proton. The electron occupies the lowest energy state in its ground state, which – as Feynman shows in one of his first quantum-mechanical calculations – is equal to −13.6 eV. More or less, that is. 🙂  You’ll remember the reason for the minus sign: the electron has more energy when it’s unbound, which it releases as radiation when it joins an ionized hydrogen atom or, to put it simply, when a proton and an electron come together. In-between being bound and unbound, there are other discrete energy states – illustrated below – and we’ll learn how to describe the patterns of motion of the electron in each of those states soon enough.


Not in this post, however. 😦 In this post, we want to focus on the ground state only. Why? Just because. That’s today’s topic. 🙂 The proton and the electron can be in either of two spin states. As a result, the so-called ground state is not really a single definite-energy state. The spin states cause the so-called hyperfine structure in the energy levels: it splits them into several nearly equal energy levels, so that’s what referred to as hyperfine splitting.

[…] OK. Let’s go for it. As Feynman points out, the whole model is reduced to a set of four base states:

  1. State 1: |++〉 = |1〉 (the electron and proton are both ‘up’)
  2. State 2: |+−〉 = |2〉  (the electron is ‘up’ and the proton is ‘down’)
  3. State 3: |−+〉 = |3〉  (the electron is ‘down’ and the proton is ‘up’)
  4. State 4: |−−〉 = |4〉  (the electron and proton are both ‘down’)

The simplification is huge. As you know, the spin of electrically charged elementary particles is related to their motion in space, but so we don’t care about exact spatial relationships here: the direction of spin can be in any direction, but all that matters here is the relative orientation, and so all is simplified to some direction as defined by the proton and the electron itself. Full stop.

You know that the whole problem is to find the Hamiltonian coefficients, i.e. the energy matrix. Let me give them to you straight away. The energy levels involved are the following:

  • E= EII = EIII = A ≈ 9.23×10−6 eV
  • EIV = −3A ≈ 27.7×10−6 eV

So the difference in energy levels is measured in ten-millionths of an electron-volt and, hence, the hyperfine splitting is really hyper-fine. The question is: how do we get these values? So that is what this post is about. Let’s start by reminding ourselves of what we learned so far.

The Hamiltonian operator

We know that, in quantum mechanics, we describe any state in terms of the base states. In this particular case, we’d do so as follows:

|ψ〉 = |1〉C1 + |2〉C2 + |3〉C3 +|4〉C4 with Ci = 〈i|ψ〉

We refer to |ψ〉 as the spin state of the system, and so it’s determined by those four Ci amplitudes. Now, we know that those Ci amplitudes are functions of time, and they are, in turn, determined by the Hamiltonian matrix. To be precise, we find them by solving a set of linear differential equations that we referred to as Hamiltonian equations. To be precise, we’d describe the behavior of |ψ〉 in time by the following equation:

hamiltonian operator

In case you forgot, the expression above is a short-hand for the following expression:

hamiltonian operator 2The index would range over all base states and, therefore, this expression gives us everything we want: it really does describe the behavior, in time, of an N-state system. You’ll also remember that, when we’d use the Hamiltonian matrix in the way it’s used above (i.e. as an operator on a state), we’d put a little hat over it, so we defined the Hamiltonian operator as:


So far, so good—but this does not solve our problem: how do we find the Hamiltonian for this four-state system? What is it?

Well… There’s no one-size-fits-all answer to that: the analysis of two different two-state systems, like an ammonia molecule, or one spin-1/2 particle in a magnetic field, was different. Having said that, we did find we could generalize some of the solutions we’d find. For example, we’d write the Hamiltonian for a spin-1/2 particle, with a magnetic moment that’s assumed to be equal to μ, in a magnetic field B = (Bx, By, Bz) as:

sigma matrices

In this equation, we’ve got a set of 4 two-by-two matrices (three so-called sigma matrices (σx, σy, σz), and then the unit matrix δij = 1) which we referred to as the Pauli spin matrices, and which we wrote as:


You’ll remember that expression – which we further abbreviated, even more elegantly, to H = −μσ·B – covered all two-state systems involving a magnetic moment in a magnetic field. In fact, you’ll remember we could actually easily adapt the model to cover two-state systems in electric fields as well.

In short, these sigma matrices made our life very easy—as they covered a whole range of two-state models. So… Well… To make a long story short, what we want to do here is find some similar sigma matrices for four-state problems. So… Well… Let’s do that.

First, you should remind yourself of the fact that we could also use these sigma matrices as little operators themselves. To be specific, we’d let them ‘operate’ on the base states, and we’d find they’d do the following:


You need to read this carefully. What it says that the σz matrix, as an operator, acting on the ‘up’ base state, yields the same base state (i.e. ‘up’), and that the same operator, acting on the ‘down’ state, gives us the same but with a minus sign in front. Likewise, the σy matrix operating on the ‘up’ and ‘down’ states respectively, will give us i·|down〉 and −i·|up〉 respectively.

The trick to solve our problem here (i.e. our four-state system) is to apply those sigma matrices to the electron and the proton separately. Feynman introduces a new notation here by distinguishing the electron and proton sigma operators: the electron sigma operators (σxe, σye, and σze) operate on the electron spin only, while – you guessed it – the proton sigma operator ((σxp, σyp, and σzp) acts on the proton spin only. Applying it to the four states we’re looking at (i.e. |++〉, |+−〉, |−+〉 and |−−〉), we get the following bifurcation for our σx operator:

  1. σxe|++〉 = |−+〉
  2. σxe|+−〉 = |−−〉
  3. σxe|−+〉 = |++〉
  4. σxe|−−〉 = |+−〉
  5. σxp|++〉 = |+−〉
  6. σxp|+−〉 = |++〉
  7. σxp|−+〉 = |−−〉
  8. σxp|−−〉 = |−+〉

You get the idea. We had three operators acting on two states, i.e. 6 possibilities. Now we combine these three operators with two different particles, so we have six operators now, and we let them act on four possible system states, so we have 24 possibilities now. Now, we can, of course, let these operators act one after another. Check the following for example:

 σxeσzp|+−〉 = σxezp|+−〉] = –σxe|+−〉 = –|–−〉

[I now realize that I should have used the ↑ and ↓ symbols for the ‘up’ and ‘down’ states, as the minus sign is used to denote two very different things here, but… Well… So be it.]

Note that we only have nine possible σxeσzp-like combinations, because σxeσz= σzpσxe, and then we have the 2×3 = six σe and σp operators themselves, so that makes for 15 new operators. [Note that the commutativity of these operators (σxeσz= σzpσxe) is not some general property of quantum-mechanical operators.] If we include the unit operator (δij = 1) – i.e. an operator that leaves all unchanged – we’ve got 16 in total. Now, we mentioned that we could write the Hamiltonian for a two-state system – i.e. a two-by-two matrix – as a linear combination of the four Pauli spin matrices. Likewise, one can demonstrate that the Hamiltonian for a four-state system can always be written as some linear combination of those sixteen ‘double-spin’ matrices. To be specific, we can write it as:


We should note a few things here. First, the E0 constant is, of course, to be multiplied by the unit matrix, so we should actually write E0δij instead of E0, but… Well… Quantum physicists always want to confuse you. 🙂 Second, the σeσis like the σ·notation: we can look at the σxe, σye, σze and σxp, σyp, σzp matrices as being the three components of two new (matrix) vectors, which we write as σand σrespectively. Thirdly, and most importantly, you’ll want proof of that equation above. Well… I am sorry but I am going to refer you to Feynman here: he shows that the expression above “is the only thing that the Hamiltonian can be.” The proof is based on the fundamental symmetry of space. He also adds that space is symmetrical only so long as there is no external field. 🙂

Final question: what’s A? Well… Feynman is quite honest here as he says the following: “A can be calculated accurately once you understand the complete quantum theory of the hydrogen atom—which we so far do not. It has, in fact, been calculated to an accuracy of about 30 parts in one million. So, unlike the flip-flop constant A of the ammonia molecule, which couldn’t be calculated at all well by a theory, our constant A for the hydrogen can be calculated from a more detailed theory. But never mind, we will for our present purposes think of the A as a number which could be determined by experiment, and analyze the physics of the situation.”

So… Well… So far so good. We’ve got the Hamiltonian. That’s all we wanted, actually. But, now that we have come so far, let’s write it all out now.

Solving the equations

If that expression above is the Hamiltonian – and we assume it is, of course! – then our system of Hamiltonian equations can be written as:


[Note that we’ve switched to Newton’s ‘over-dot’ notation to denote time derivatives here.] Now, I could walk you through Feynman’s exposé but I guess you’ll trust the result. The equation above is equivalent to the following set of four equations:


We know that, because the Hamiltonian looks like this:


How do we know that? Well… Sorry: just check Feynman. 🙂 He just writes it all out. Now, we want to find those Ci functions. [When studying physics, the most important thing is to remember what it is that you’re trying to do. 🙂 ] Now, from my previous post (i.e. my post on the general solution for N-state systems), you’ll remember that those Ci functions should have the following functional form:

Ci(t) = ai·ei·(E/ħ)·t 

If we substituting Ci(t) for that functional form in our set of Hamiltonian equations, we can cancel the exponentials so we get the following delightfully simple set of new equations:


The trivial solution, of course, is that all of the ai coefficients are zero, but – as mentioned in my previous post – we’re looking for non-trivial solutions here. Well… From what you see above, it’s easy to appreciate that one non-trivial but simple solution is:

a1 = 1 and a2 = a3 = a4 = 0

So we’ve got one set of ai coefficients here, and we’ll associate it with the first eigenvalue, or energy level, really—which we’ll denote as EI. [I am just being consistent here with what I wrote in my previous post, which explained how general solutions to N-state systems look like.] So we find the following:

E= A

[Another thing you learn when studying physics is that the most amazing things are often summarized in super-terse equations, like this one here. 🙂 ]

But – Hey! Look at the symmetry between the first and last equation! 

We immediately get another simple – but non-trivial! – solution:

a4 = 1 and a1 = a2 = a3 = 0

We’ll associate the second energy level with that, so we write:


We’ve got two left. I’ll leave that to Feynman to solve:

feDone! Four energy levels En (n = I, II, III, IV), and four associated energy state vectors – |n〉 – that describe their configuration (and which, as Feynman puts it, have the time dependence “factored out”). Perfect!

Now, we mentioned the experimental values:

  • E= EII = EIII = A ≈ 9.23×10−6 eV
  • EIV = −3A ≈ 27.7×10−6 eV

How can scientists measure these values? The theoretical analysis gives us the A and −3A values, but what about the empirical measurements? Well… We should find those values as the hydrogen atoms in state I, II or III should get rid of the energy by emitting some radiation. Now, the frequency of that radiation will give us the information we need, as illustrated below. The difference between E= EII = EIII = A and EIV = −3A (i.e. 4A) should correspond to the (angular) frequency of the radiation that’s being emitted or absorbed as atoms go from one energy state to the other. Now, hydrogen atoms do absorb and emit microwave radiation with a frequency that’s equal to 1,420,405,751.8 Hz. More or less, that is. 🙂 The standard error in the measurement is about two parts in 100 billion—and I am quoting some measurement done in the early 1960s here!]


Bingo! If = ω/2π = (4A/ħ)/2π = 1,420,405,751.8 Hz, then A = f·2π·ħ/4 ≈ 9.23×10−6 eV.

So… Well… We’re done! I’ll see you tomorrow. 🙂 Tomorrow, we’re going to look at what happens when space is not symmetric, i.e. when we would have some external field! C u ! Cheers !

N-state systems

On the 10th of December, last year, I wrote that my next post would generalize the results we got for two-state systems. That didn’t happen: I didn’t write the ‘next post’—not till now, that is. No. Instead, I started digging—as you can see from all the posts in-between this one and the 10 December piece. And you may also want to take a look at my new Essentials page. 🙂 In any case, it is now time to get back to Feynman’s Lectures on quantum mechanics. Remember where we are: halfway, really. The first half was all about stuff that doesn’t move in space. The second half, i.e. all that we’re going to study now, is about… Well… You guessed it. 🙂 That’s going to be about stuff that does move in space. To see how that works, we first need to generalize the two-state model to an N-state model. Let’s do it.

You’ll remember that, in quantum mechanics, we describe stuff by saying it’s in some state which, as long as we don’t measure in what state exactly, is written as some linear combination of a set of base states. [And please do think about what I highlight here: some state, measureexactly. It all matters. Think about it!] The coefficients in that linear combination are complex-valued functions, which we referred to as wavefunctions, or (probability) amplitudes. To make a long story short, we wrote:


These Ci coefficients are a shorthand for 〈 i | ψ(t) 〉 amplitudes. As such, they give us the amplitude of the system to be in state i as a function of time. Their dynamics (i.e. the way they evolve in time) are governed by the Hamiltonian equations, i.e.:


The Hij coefficients in this set of equations are organized in the Hamiltonian matrix, which Feynman refers to as the energy matrix, because these coefficients do represent energies indeed. So we applied all of this to two-state systems and, hence, things should not be too hard now, because it’s all the same, except that we have N base states now, instead of just two.

So we have a N×N matrix whose diagonal elements Hij are real numbers. The non-diagonal elements may be complex numbers but, if they are, the following rule applies: Hij* = Hji. [In case you wonder: that’s got to do with the fact that we can write any final 〈χ| or 〈φ| state as the conjugate transpose of the initial |χ〉 or |φ〉 state, so we can write: 〈χ| = |χ〉*, or 〈φ| = |φ〉*.]

As usual, the trick is to find those N Ci(t) functions: we do so by solving that set of N equations, assuming we know those Hamiltonian coefficients. [As you may suspect, the real challenge is to determine the Hamiltonian, which we assume to be given here. But… Well… You first need to learn how to model stuff. Once you get your degree, you’ll be paid to actually solve problems using those models. 🙂 ] We know the complex exponential is a functional form that usually does that trick. Hence, generalizing the results from our analysis of two-state systems once more, the following general solution is suggested:

Ci(t) = ai·ei·(E/ħ)·t 

Note that we introduce only one E variable here, but N ai coefficients, which may be real- or complex-valued. Indeed, my examples – see my previous posts – often involved real coefficients, but that’s not necessarily the case. Think of the C2(t) = i·e(i/ħ)·E0·t·sin[(A/ħ)·t] function describing one of the two base state amplitudes for the ammonia molecule—for example. 🙂

Now, that proposed general solution allows us to calculate the derivatives in our Hamiltonian equations (i.e. the d[Ci(t)]/dt functions) as follows:

d[Ci(t)]/dt = −i·(E/ħ)·ai·ei·(E/ħ)·t 

You can now double-check that the set of equations reduces to the following:


Please do write it out: because we have one E only, the ei·(E/ħ)·t factor is common to all terms, and so we can cancel it. The other stuff is plain arithmetic: i·i = i2 = 1, and the ħ constants cancel out too. So there we are: we’ve got a very simple set of N equations here, with N unknowns (i.e. these a1, a2,…, aN coefficients, to be specific). We can re-write this system as:


The δij here is the Kronecker delta, of course (it’s one for i = j and zero for j), and we are now looking at a homogeneous system of equations here, i.e. a set of linear equations in which all the constant terms are zero. You should remember it from your high school math course. To be specific, you’d write it as Ax = 0, with A the coefficient matrix. The trivial solution is the zero solution, of course: all a1, a2,…, aN coefficients are zero. But we don’t want the trivial solution. Now, as Feynman points out – tongue-in-cheek, really – we actually have to be lucky to have a non-trivial solution. Indeed, you may or may not remember that the zero solution was actually the only solution if the determinant of the coefficient matrix was not equal to zero. So we only had a non-trivial solution if the determinant of A was equal to zero, i.e. if Det[A] = 0. So A has to be some so-called singular matrix. You’ll also remember that, in that case, we got an infinite number of solutions, to which we could apply the so-called superposition principle: if x and y are two solutions to the homogeneous set of equations Ax = 0, then any linear combination of x and y is also a solution. I wrote an addendum to this post (just scroll down and you’ll find it), which explains what systems of linear equations are all about, so I’ll refer you to that in case you’d need more detail here. I need to continue our story here. The bottom line is: the [Hij–δijE] matrix needs to be singular for the system to have meaningful solutions, so we will only have a non-trivial solution for those values of E for which

Det[Hij–δijE] = 0

Let’s spell it out. The condition above is the same as writing:


So far, so good. What’s next? Well… The formula for the determinant is the following:

det physicists

That looks like a monster, and it is, but, in essence, what we’ve got here is an expression for the determinant in terms of the permutations of the matrix elements. This is not a math course so I’ll just refer you Wikipedia for a detailed explanation of this formula for the determinant. The bottom line is: if we write it all out, then Det[Hij–δijE] is just an Nth order polynomial in E. In other words: it’s just a sum of products with powers of E up to EN, and so our Det[Hij–δijE] = 0 condition amounts to equating it with zero.

In general, we’ll have N roots, but – sorry you need to remember so much from your high school math classes here – some of them may be multiple roots (i.e. two or more roots may be equal). We’ll call those roots—you guessed it:

EI, EII,…, En,…, EN

Note I am following Feynman’s exposé, and so he uses n, rather than k, to denote the nth Roman numeral (as opposed to Latin numerals). Now, I know your brain is near the melting point… But… Well… We’re not done yet. Just hang on. For each of these values E = EI, EII,…, En,…, EN, we have an associated set of solutions ai. As Feynman puts it: you get a set which belongs to En. In order to not forget that, for each En, we’re talking a set of N coefficients ai (= 1, 2,…, N), we denote that set not by ai(n) but by ai(n). So that’s why we use boldface for our index n: it’s special—and not only because it denotes a Roman numeral! It’s just one of Feynman’s many meaningful conventions.

Now remember that Ci(t) = ai·ei·(E/ħ)·t formula. For each set of ai(n) coefficients, we’ll have a set of Ci(n) functions which, naturally, we can write as:

Ci(n) = ai(nei·(En/ħ)·t

So far, so good. We have N ai(n) coefficients and N Ci(n) functions. That’s easy enough to understand. Now we’ll define also define a set of N new vectors,  which we’ll write as |n〉, and which we’ll refer to as the state vectors that describe the configuration of the definite energy states En (n = I, II,… N). [Just breathe right now: I’ll (try to) explain this in a moment.] Moreover, we’ll write our set of coefficients ai(n) as 〈i|n〉. Again, the boldface n reminds us we’re talking a set of N complex numbers here. So we re-write that set of N Ci(n) functions as follows:

Ci(n) = 〈i|n〉·ei·(En/ħ)·t

We can expand this as follows:

Ci(n) = 〈 i | ψn(t) 〉 = 〈 i | 〉·ei·(En/ħ)·t

which, of course, implies that:

| ψn(t) 〉 = |n〉·ei·(En/ħ)·t

So now you may understand Feynman’s description of those |n〉 vectors somewhat better. As he puts it:

“The |n〉 vectors – of which there are N – are the state vectors that describe the configuration of the definite energy states En (n = I, II,… N), but have the time dependence factored out.”

Hmm… I know. This stuff is hard to swallow, but we’re not done yet: if your brain hasn’t melted yet, it may do so now. You’ll remember we talked about eigenvalues and eigenvectors in our post on the math behind the quantum-mechanical model of our ammonia molecule. Well… We can generalize the results we got there:

  1. The energies EI, EII,…, En,…, EN are the eigenvalues of the Hamiltonian matrix H.
  2. The state vectors |n〉 that are associated with each energy En, i.e. the set of vectors |n〉, are the corresponding eigenstates.

So… Well… That’s it! We’re done! This is all there is to it. I know it’s a lot but… Well… We’ve got a general description of N-state systems here, and so that’s great!

Let me make some concluding remarks though.

First, note the following property: if we let the Hamiltonian matrix act on one of those state vectors |n〉, the result is just En times the same state. We write:


We’re writing nothing new here really: it’s just a consequence of the definition of eigenstates and eigenvalues. The more interesting thing is the following. When describing our two-state systems, we saw we could use the states that we associated with the Eand EII as a new base set. The same is true for N-state systems: the state vectors |n〉 can also be used as a base set. Of course, for that to be the case, all of the states must be orthogonal, meaning that for any two of them, say |n〉 and |m〉, the following equation must hold:

n|m〉 = 0

Feynman shows this will be true automatically if all the energies are different. If they’re not – i.e. if our polynomial in E would accidentally have two (or more) roots with the same energy – then things are more complicated. However, as Feynman points out, this problem can be solved by ‘cooking up’ two new states that do have the same energy but are also orthogonal. I’ll refer you to him for the detail, as well as for the proof of that 〈n|m〉 = 0 equation.

Finally, you should also note that – because of the homogeneity principle – it’s possible to multiply the N ai(n) coefficients by a suitable factor so that all the states are normalized, by which we mean:

n|n〉 = 1

Well… We’re done! For today, at least! 🙂

Addendum on Systems of Linear Equations

It’s probably good to briefly remind you of your high school math class on systems of linear equations. First note the difference between homogeneous and non-homogeneous equations. Non-homogeneous equations have a non-zero constant term. The following three equations are an example of a non-homogeneous set of equations:

  • 3x + 2y − z = 1
  • 2x − 2y + 4z = −2
  • −x + y/2 − z = 0

We have a point solution here: (x, y, z) = (1, −2, −2). The geometry of the situation is something like this:


One of the equations may be a linear combination of the two others. In that case, that equation can be removed without affecting the solution set. For the three-dimensional case, we get a line solution, as illustrated below.  Intersecting_Planes_2

Homogeneous and non-homogeneous sets of linear equations are closely related. If we write a homogeneous set as Ax = 0, then a non-homogeneous set of equations can be written as Ax = b. They are related. More in particular, the solution set for Ax = b is going to be a translation of the solution set for Ax = 0. We can write that more formally as follows:

If p is any specific solution to the linear system Ax = b, then the entire solution set can be described as {p + v|v is any solution to Ax = 0}

The solution set for a homogeneous system is a linear subspace. In the example above, which had three variables and, hence, for which the vector space was three-dimensional, there were three possibilities: a point, line or plane solution. All are (linear) subspaces—although you’d want to drop the term ‘linear’ for the point solution, of course. 🙂 Formally, a subspace is defined as follows: if V is a vector space, then W is a subspace if and only if:

  1. The zero vector (i.e. 0) is in W.
  2. If x is an element of W, then any scalar multiple ax will be an element of W too (this is often referred to as the property of homogeneity).
  3. If x and y are elements of W, then the sum of x and y (i.e. x + y) will be an element of W too (this is referred to as the property of additivity).

As you can see, the superposition principle actually combines the properties of homogeneity and additivity: if x and y are solutions, then any linear combination of them will be a solution too.

The solution set for a non-homogeneous system of equations is referred to as a flat. It’s a subset too, so it’s like a subspace, except that it need not pass through the origin. Again, the flats in two-dimensional space are points and lines, while in three-dimensional space we have points, lines and planes. In general, we’ll have flats, and subspaces, of every dimension from 0 to n−1 in n-dimensional space.

OK. That’s clear enough, but what is all that talk about eigenstates and eigenvalues about? Mathematically, we define eigenvectors, aka as characteristic vectors, as follows:

  • The non-zero vector v is an eigenvector of a square matrix A if Av is a scalar multiple of v, i.e. Av = λv.
  • The associated scalar λ is known as the eigenvalue (or characteristic value) associated with the eigenvector v.

Now, in physics, we talk states, rather than vectors—although our states are vectors, of course. So we’ll call them eigenstates, rather than eigenvectors. But the principle is the same, really. Now, I won’t copy what you can find elsewhere—especially not in an addendum to a post, like this one. So let me just refer you elswhere. Paul’s Online Math Notes, for example, are quite good on this—especially in the context of solving a set of differential equations, which is what we are doing here. And you can also find a more general treatment in the Wikipedia article on eigenvalues and eigenstates which, while being general, highlights their particular use in quantum math.