Bose and Fermi

Pre-scriptum (dated 26 June 2020): This post suffered from the DMCA take-down of material from Feynman’s Lectures, so some graphs are lacking and the layout was altered as a result. In any case, I now think the distinction between bosons and fermions is one of the most harmful scientific myths in physics. I deconstructed quite a few myths in my realist interpretation of QM, but I focus on this myth in particular in this paper: Feynman’s Worst Jokes and the Boson-Fermion Theory.

Original post:

Probability amplitudes: what are they?

Instead of reading Penrose, I’ve started to read Richard Feynman again. Of course, reading the original is always better than whatever others try to make of that, so I’d recommend you read Feynman yourself – instead of this blog. But then you’re doing that already, aren’t you? 🙂

Let’s explore those probability amplitudes somewhat more. They are complex numbers. In a fine little book on quantum mechanics (QED, 1985), Feynman calls them ‘arrows’ – and that’s what they are: two-dimensional vectors, aka complex numbers. So they have a direction and a length (or magnitude). When talking amplitudes, the direction and length are known as the phase and the modulus (or absolute value) respectively and you also know by now that the modulus squared represents a probability or probability density, such as the probability of detecting some particle (a photon or an electron) at some location x or some region Δx, or the probability of some particle going from A to B, or the probability of a photon being emitted or absorbed by an electron (or a proton), etcetera. I’ve inserted two illustrations below to explain the matter.

The first illustration just shows what a complex number really is: a two-dimensional number (z) with a real part (Re(z) = x) and an imaginary part (Im(z) = y). We can represent it in two ways: one uses the (x, y) coordinate system (z = x + iy), and the other is the so-called polar form: z = reiφ. The (real) number e in the latter equation is just Euler’s number, so that’s a mathematical constant (just like π). The little i is the imaginary unit, so that’s the thing we introduce to add a second (vertical) dimension to our analysis: i can be written as 0+= (0, 1) indeed, and so it’s like a (second) basis vector in the two-dimensional (Cartesian or complex) plane.

polar form of complex number

I should not say much more about this, but I must list some essential properties and relationships:

  • The coordinate and polar form are related through Euler’s formula: z = x + iy = reiφ = r(cosφ + isinφ).
  • From this, and the fact that cos(-φ) = cosφ and sin(-φ) = –sinφ, it follows that the (complex) conjugate z* = x – iy of a complex number z = x + iy is equal to z* = reiφ. [I use z* as a symbol, instead of z-bar, because I can’t find a z-bar in the character set here.]  This equality is illustrated above.
  • The length/modulus/absolute value of a complex number is written as |z| and is equal to |z| = (x2 + y2)1/2 = |reiφ| = r (so r is always a positive (real) number).
  • As you can see from the graph, a complex number z and its conjugate z* have the same absolute value: |z| = |x+iy| = |z*| = |x-iy|.
  • Therefore, we have the following: |z||z|=|z*||z*|=|z||z*|=|z|2, and we can use this result to calculate the (multiplicative) inverse: z-1 = 1/z = z*/|z|2.
  • The absolute value of a product of complex numbers equals the product of the absolute values of those numbers: |z1z2| = |z1||z2|.
  • Last but not least, it is important to be aware of the geometric interpretation of the sum and the product of two complex numbers:
    • The sum of two complex numbers amounts to adding vectors, so that’s the familiar parallelogram law for vector addition: (a+ib) + (c+id) = (a+b) + i(c+d).
    • Multiplying two complex numbers amounts to adding the angles and multiplying their lengths – as evident from writing such product in its polar form: reiθseiΘ = rsei(θ+Θ). The result is, quite obviously, another complex number. So it is not the usual scalar or vector product which you may or may not be familiar with.

[For the sake of completeness: (i) the scalar product (aka dot product) of two vectors (ab) is equal to the product of is the product of the magnitudes of the two vectors and the cosine of the angle between them: ab = |a||b|cosα; and (ii) the result of a vector product (or cross product) is a vector which is perpendicular to both, so it’s a vector that is not in the same plane as the vectors we are multiplying: a×b = |a||b| sinα n, with n the unit vector perpendicular to the plane containing a and b in the direction given by the so-called right-hand rule. Just be aware of the difference.]

The second illustration (see below) comes from that little book I mentioned above already: Feynman’s exquisite 1985 Alix G. Mautner Memorial Lectures on Quantum Electrodynamics, better known as QED: the Strange Theory of Light and Matter. It shows how these probability amplitudes, or ‘arrows’ as he calls them, really work, without even mentioning that they are ‘probability amplitudes’ or ‘complex numbers’. That being said, these ‘arrows’ are what they are: probability amplitudes.

To be precise, the illustration below shows the probability amplitude of a photon (so that’s a little packet of light) reflecting from the front surface (front reflection arrow) and the back (back reflection arrow) of a thin sheet of glass. If we write these vectors in polar form (reiφ), then it is obvious that they have the same length (r = 0.2) but their phase φ is different. That’s because the photon needs to travel a bit longer to reach the back of the glass: so the phase varies as a function of time and space, but the length doesn’t. Feynman visualizes that with the stopwatch: as the photon is emitted from a light source and travels through time and space, the stopwatch turns and, hence, the arrow will point in a different direction.

[To be even more precise, the amplitude for a photon traveling from point A to B is a (fairly simple) function (which I won’t write down here though) which depends on the so-called spacetime interval. This spacetime interval (written as I or s2) is equal to I = [(x-x1)2+(y-y1)2+(z-z1)2] – (t-t1)2. So the first term in this expression is the square of the distance in space, and the second term is the difference in time, or the ‘time distance’. Of course, we need to measure time and distance in equivalent units: we do that either by measuring spatial distance in light-seconds (i.e. the distance traveled by light in one second) or by expressing time in units that are equal to the time it takes for light to travel one meter (in the latter case we ‘stretch’ time (by multiplying it with c, i.e. the speed of light) while in the former, we ‘stretch’ our distance units). Because of the minus sign between the two terms, the spacetime interval can be negative, zero, or positive, and we call these intervals time-like (I < 0), light-like (I = 0) or space-like (I > 0). Because nothing travels faster than light, two events separated by a space-like interval cannot have a cause-effect relationship. I won’t go into any more detail here but, at this point, you may want to read the article on the so-called light cone relating past and future events in Wikipedia, because that’s what we’re talking about here really.]

front and back reflection amplitude

Feynman adds the two arrows, because a photon may be reflected either by the front surface or by the back surface and we can’t know which of the two possibilities was the case. So he adds the amplitudes here, not the probabilities. The probability of the photon bouncing off the front surface is the modulus of the amplitude squared, (i.e. |reiφ|2 = r2), and so that’s 4% here (0.2·0.2). The probability for the back surface is the same: 4% also. However, the combined probability of a photon bouncing back from either the front or the back surface – we cannot know which path was followed – is not 8%, but some value between 0 and 16% (5% only in the top illustration, and 16% (i.e. the maximum) in the bottom illustration). This value depends on the thickness of the sheet of glass. That’s because it’s the thickness of the sheet that determines where the hand of our stopwatch stops. If the glass is just thick enough to make the stopwatch make one extra half turn as the photon travels through the glass from the front to the back, then we reach our maximum value of 16%, and so that’s what shown in the bottom half of the illustration above.

For the sake of completeness, I need to note that the full explanation is actually a bit more complex. Just a little bit. 🙂 Indeed, there is no such thing as ‘surface reflection’ really: a photon has an amplitude for scattering by each and every layer of electrons in the glass and so we have actually have many more arrows to add in order to arrive at a ‘final’ arrow. However, Feynman shows how all these arrows can be replaced by two so-called ‘radius arrows’: one for ‘front surface reflection’ and one for ‘back surface reflection’. The argument is relatively easy but I have no intention to fully copy Feynman here because the point here is only to illustrate how probabilities are calculated from probability amplitudes. So just remember: probabilities are real numbers between 0 and 1 (or between 0 and 100%), while amplitudes are complex numbers – or ‘arrows’ as Feynman calls them in this popular lectures series.

In order to give somewhat more credit to Feynman – and also to be somewhat more complete on how light really reflects from a sheet of glass (or a film of oil on water or a mud puddle), I copy one more illustration here – with the text – which speaks for itself: “The phenomenon of colors produced by the partial reflection of white light by two surfaces is called iridescence, and can be found in many places. Perhaps you have wondered how the brilliant colors of hummingbirds and peacocks are produced. Now you know.” The iridescence phenomenon is caused by really small variations in the thickness of the reflecting material indeed, and it is, perhaps, worth noting that Feynman is also known as the father of nanotechnology… 🙂

Iridescence

Light versus matter

So much for light – or electromagnetic waves in general. They consist of photons. Photons are discrete wave-packets of energy, and their energy (E) is related to the frequency of the light (f) through the Planck relation: E = hf. The factor h in this relation is the Planck constant, or the quantum of action in quantum mechanics as this tiny number (6.62606957×10−34) is also being referred to. Photons have no mass and, hence, they travel at the speed of light indeed. But what about the other wave-like particles, like electrons?

For these, we have probability amplitudes (or, more generally, a wave function) as well, the characteristics of which are given by the de Broglie relations. These de Broglie relations also associate a frequency and a wavelength with the energy and/or the momentum of the ‘wave-particle’ that we are looking at: f = E/h and λ = h/p. In fact, one will usually find those two de Broglie relations in a slightly different but equivalent form: ω = E/ħ and k = p/ħ. The symbol ω stands for the angular frequency, so that’s the frequency expressed in radians. In other words, ω is the speed with which the hand of that stopwatch is going round and round and round. Similarly, k is the wave number, and so that’s the wavelength expressed in radians (or the spatial frequency one might say). We use k and ω in wave functions because the argument of these wave functions is the phase of the probability amplitude, and this phase is expressed in radians. For more details on how we go from distance and time units to radians, I refer to my previous post. [Indeed, I need to move on here otherwise this post will become a book of its own! Just check out the following: λ = 2π/k and f = ω/2π.]

How should we visualize a de Broglie wave for, let’s say, an electron? Well, I think the following illustration (which I took from Wikipedia) is not too bad.    

2000px-Quantum_mechanics_travelling_wavefunctions_wavelength

Let’s first look at the graph on the top of the left-hand side of the illustration above. We have a complex wave function Ψ(x) here but only the real part of it is being graphed. Also note that we only look at how this function varies over space at some fixed point of time, and so we do not have a time variable here. That’s OK. Adding the complex part would be nice but it would make the graph even more ‘complex’ :-), and looking at one point in space only and analyzing the amplitude as a function of time only would yield similar graphs. If you want to see an illustration with both the real as well as the complex part of a wave function, have a look at my previous post.

We also have the probability – that’s the red graph – as a function of the probability amplitude: P = |Ψ(x)|2 (so that’s just the modulus squared). What probability? Well, the probability that we can actually find the particle (let’s say an electron) at that location. Probability is obviously always positive (unlike the real (or imaginary) part of the probability amplitude, which oscillate around the x-axis). The probability is also reflected in the opacity of the little red ‘tennis ball’ representing our ‘wavicle’: the opacity varies as a function of the probability. So our electron is smeared out, so to say, over the space denoted as Δx.

Δx is the uncertainty about the position. The question mark next to the λ symbol (we’re still looking at the graph on the top left-hand side of the above illustration only: don’t look at the other three graphs now!) attributes this uncertainty to uncertainty about the wavelength. As mentioned in my previous post, wave packets, or wave trains, do not tend to have an exact wavelength indeed. And so, according to the de Broglie equation λ = h/p, if we cannot associate an exact value with λ, we will not be able to associate an exact value with p. Now that’s what’s shown on the right-hand side. In fact, because we’ve got a relatively good take on the position of this ‘particle’ (or wavicle we should say) here, we have a much wider interval for its momentum : Δpx. [We’re only considering the horizontal component of the momentum vector p here, so that’s px.] Φ(p) is referred to as the momentum wave function, and |Φ(p)|2 is the corresponding probability (or probability density as it’s usually referred to).

The two graphs at the bottom present the reverse situation: fairly precise momentum, but a lot of uncertainty about the wavicle’s position (I know I should stick to the term ‘particle’ – because that’s what physicists prefer – but I think ‘wavicle’ describes better what it’s supposed to be). So the illustration above is not only an illustration of the de Broglie wave function for a particle, but it also illustrates the Uncertainty Principle.

Now, I know I should move on to the thing I really want to write about in this post – i.e. bosons and fermions – but I feel I need to say a few things more about this famous ‘Uncertainty Principle’ – if only because I find it quite confusing. According to Feynman, one should not attach too much importance to it. Indeed, when introducing his simple arithmetic on probability amplitudes, Feynman writes the following about it: “The uncertainty principle needs to be seen in its historical context. When the revolutionary ideas of quantum physics were first coming out, people still tried to understand them in terms of old-fashioned ideas (such as, light goes in straight lines). But at a certain point, the old-fashioned ideas began to fail, so a warning was developed that said, in effect, ‘Your old-fashioned ideas are no damn good when…’ If you get rid of all the old-fashioned ideas and instead use the ideas that I’m explaining in these lectures – adding arrows for all the ways an event can happen – there is no need for the uncertainty principle!” So, according to Feynman, wave function math deals with all and everything and therefore we should, perhaps, indeed forget about this rather mysterious ‘principle’.

However, because it is mentioned so much (especially in the more popular writing), I did try to find some kind of easy derivation of its standard formulation: ΔxΔp ≥ ħ (ħ = h/2π, i.e. the quantum of angular momentum in quantum mechanics). To my surprise, it’s actually not easy to derive the uncertainty principle from other basic ‘principles’. As mentioned above, it follows from the de Broglie equation  λ = h/p that momentum (p) and wavelength (λ) are related, but so how do we relate the uncertainty about the wavelength (Δλ) or the momentum (Δp) to the uncertainty about the position of the particle (Δx)? The illustration below, which analyzes a wave packet (aka a wave train), might provide some clue. Before you look at the illustration and start wondering what it’s all about, remember that a wave function with a definite (angular) frequency ω and wave number k (as described in my previous post), which we can write as Ψ = Aei(ωt-kx), represents the amplitude of a particle with a known momentum p = ħ/at some point x and t, and that we had a big problem with such wave, because the squared modulus of this function is a constant: |Ψ|2 = |Aei(ωt-kx)|= A2. So that means that the probability of finding this particle is the same at all points. So it’s everywhere and nowhere really (so it’s like the second wave function in the illustration above, but then with Δx infinitely long and the same wave shape all along the x-axis). Surely, we can’t have this, can we? Now we cannot – if only because of the fact that if we add up all of the probabilities, we would not get some finite number. So, in reality, particles are effectively confined to some region Δor – if we limit our analysis to one dimension only (for the sake of simplicity) – Δx (remember that bold-type symbols represent vectors). So the probability amplitude of a particle is more likely to look like something that we refer to as a wave packet or a wave train. And so that’s what’s explained more in detail below.

Now, I said that localized wave trains do not tend to have an exact wavelength. What do I mean with that? It doesn’t sound very precise, does it? In fact, we actually can easily sketch a graph of a wave packet with some fixed wavelength (or fixed frequency), so what am I saying here? I am saying that, in quantum physics, we are only looking at a very specific type of wave train: they are a composite of a (potentially infinite) number of waves whose wavelengths are distributed more or less continuously around some average, as shown in the illustration below, and so the addition of all of these waves – or their superposition as the process of adding waves is usually referred to – results in a combined ‘wavelength’ for the localized wave train that we cannot, indeed, equate with some exact number. I have not mastered the details of the mathematical process referred to as Fourier analysis (which refers to the decomposition of a combined wave into its sinusoidal components) as yet, and, hence, I am not in a position to quickly show you how Δx and Δλ are related exactly, but the point to note is that a wider spread of wavelengths results in a smaller Δx. Now, a wider spread of wavelengths corresponds to a wider spread in p too, and so there we have the Uncertainty Principle: the more we know about Δx, the less we know about Δx, and so that’s what the inequality ΔxΔp ≥ h/2π represents really.

Explanation of uncertainty principle

[Those who like to check things out may wonder why a wider spread in wavelength implies a wider spread in momentum. Indeed, if we just replace λ and p with Δλ and Δp  in the de Broglie equation λ = h/p, we get Δλ = h/Δp and so we have an inversely proportional relationship here, don’t we? No. We can’t just write that Δλ = Δ(h/p) but this Δ is not some mathematical operator than you can simply move inside of the brackets. What is Δλ? Is it a standard deviation? Is it the spread and, if so, what’s the spread? We could, for example, define it as the difference between some maximum value λmax and some minimum value λmin, so as Δλ = λmax – λmin. These two values would then correspond with pmax =h/λmin and pmin =h/λmax and so the corresponding spread in momentum would be equal to Δp = pmax – pmin =  h/λmin – h/λmax = h(λmax – λmin)/(λmaxλmin). So a wider spread in wavelength does result in a wider spread in momentum, but the relationship is more subtle than you might think at first. In fact, in a more rigorous approach, we would indeed see the standard deviation (represented by the sigma symbol σ) from some average as a measure of the ‘uncertainty’. To be precise, the more precise formulation of the Uncertainty Principle is: σxσ≥ ħ/2, but don’t ask me where that 2 comes from!]

I really need to move on now, because this post is already way too lengthy and, hence, not very readable. So, back to that very first question: what’s that wave function math? Well, that’s obviously too complex a topic to be fully exhausted here. 🙂 I just wanted to present one aspect of it in this post: Bose-Einstein statistics. Huh? Yes.

When we say Bose-Einstein statistics, we should also say its opposite: Fermi-Dirac statistics. Bose-Einstein statistics were ‘discovered’ by the Indian scientist Satyanendra Nath Bose (the only thing Einstein did was to give Bose’s work on this wider recognition) and they apply to bosons (so they’re named after Bose only), while Fermi-Dirac statistics apply to fermions (‘Fermi-Diraqions’ doesn’t sound good either obviously). Any particle, or any wavicle I should say, is either a fermion or a boson. There’s a strict dichotomy: you can’t have characteristics of both. No split personalities. Not even for a split second.

The best-known examples of bosons are photons and the recently experimentally confirmed Higgs particle. But, in case you have heard of them, gluons (which mediate the so-called strong interactions between particles), and the W+, W and Z particles (which mediate the so-called weak interactions) are bosons too. Protons, neutrons and electrons, on the other hand, are fermions.

More complex particles, such as atomic nuclei, are also either bosons or fermions. That depends on the number of protons and neutrons they consist of. But let’s not get ahead of ourselves. Here, I’ll just note that bosons – unlike fermions – can pile on top of one another without limit, all occupying the same ‘quantum state’. This explains superconductivity, superfluidity and Bose-Einstein condensation at low temperatures. Indeed, these phenomena usually involve (bosonic) helium. You can’t do it with fermions. Superfluid helium has very weird properties, including zero viscosity – so it flows without dissipating energy and it creeps up the wall of its container, seemingly defying gravity: just Google one of the videos on the Web! It’s amazing stuff! Bose statistics also explain why photons of the same frequency can form coherent and extremely powerful laser beams, with (almost) no limit as to how much energy can be focused in a beam.

Fermions, on the other hand, avoid one another. Electrons, for example, organize themselves in shells around a nucleus stack. They can never collapse into some kind of condensed cloud, as bosons can. If electrons would not be fermions, we would not have such variety of atoms with such great range of chemical properties. But, again, let’s not get ahead of ourselves. Back to the math.

Bose versus Fermi particles

When adding two probability amplitudes (instead of probabilities), we are adding complex numbers (or vectors or arrows or whatever you want to call them), and so we need to take their phase into account or – to put it simply – their direction. If their phase is the same, the length of the new vector will be equal to the sum of the lengths of the two original vectors. When their phase is not the same, then the new vector will be shorter than the sum of the lengths of the two amplitudes that we are adding. How much shorter? Well, that obviously depends on the angle between the two vectors, i.e. the difference in phase: if it’s 180 degrees (or π radians), then they will cancel each other out and we have zero amplitude! So that’s destructive or negative interference. If it’s less than 90 degrees, then we will have constructive or positive interference.

It’s because of this interference effect that we have to add probability amplitudes first, before we can calculate the probability of an event happening in one or the other (indistinguishable) way (let’s say A or B) – instead of just adding probabilities as we would do in the classical world. It’s not subtle. It makes a big difference: |ΨA + ΨB|2 is the probability when we cannot distinguish the alternatives (so when we’re in the world of quantum mechanics and, hence, we have to add amplitudes), while |ΨA|+ |ΨB|is the probability when we can see what happens (i.e. we can see whetheror B was the case). Now, |ΨA + ΨB|is definitely not the same as |ΨA|+ |ΨB|– not for real numbers, and surely not for complex numbers either. But let’s move on with the argument – literally: I mean the argument of the wave function at hand here.

That stopwatch business above makes it easier to introduce the thought experiment which Feynman also uses to introduce Bose versus Fermi statistics (Feynman Lectures (1965), Vol. III, Lecture 4). The experimental set-up is shown below. We have two particles, which are being referred to as particle a and particle b respectively (so we can distinguish the two), heading straight for each other and, hence, they are likely to collide and be scattered in some other direction. The experimental set-up is designed to measure where they are likely to end up, i.e. to measure probabilities. [There’s no certainty in the quantum-mechanical world, remember?] So, in this experiment, we have a detector (or counter) at location 1 and a detector/counter at location 2 and, after many many measurements, we have some value for the (combined) probability that particle a goes to detector 1 and particle b goes to counter 2. This amplitude is a complex number and you may expect it will depend on the angle θ as shown in the illustration below.

scattering identical particles

So this angle θ will obviously show up somehow in the argument of our wave function. Hence, the wave function, or probability amplitude, describing the amplitude of particle a ending up in counter 1 and particle b ending up in counter 2 will be some (complex) function Ψ1= f(θ). Please note, once again, that θ is not some (complex) phase but some real number (expressed in radians) between 0 and 2π that characterizes the set-up of the experiment above. It is also worth repeating that f(θ) is not the amplitude of particle a hitting detector 1 only but the combined amplitude of particle a hitting counter 1 and particle b hitting counter 2! It makes a big difference and it’s essential in the interpretation of this argument! So, the combined probability of a going to 1 and of particle b going to 2, which we will write as P1, is equal to |Ψ1|= |f(θ)|2.

OK. That’s obvious enough. However, we might also find particle a in detector 2 and particle b in detector 1. Surely, the probability amplitude probability for this should be equal to f(θ+π)? It’s just a matter of switching counter 1 and 2 – i.e. we rotate their position over 180 degrees, or π (in radians) – and then we just insert the new angle of this experimental set-up (so that’s θ+π) into the very same wave function and there we are. Right?

Well… Maybe. The probability of a going to 2 and b going to 1, which we will write as P2, will be equal to |f(θ+π)|indeed. However, our probability amplitude, which I’ll write as Ψ2may not be equal to f(θ+π). It’s just a mathematical possibility. I am not saying anything definite here. Huh? Why not? 

Well… Think about the thing we said about the phase and the possibility of a phase shift: f(θ+π) is just one of the many mathematical possibilities for a wave function yielding a probability P=|Ψ2|= |f(θ+π)|2. But any function eiδf(θ+π) will yield the same probability. Indeed, |z1z2| = |z1||z2| and so |eiδ f(θ+π)|2 = (|eiδ||f(θ+π)|)= |eiδ|2|f(θ+π)|= |f(θ+π)|(the square of the modulus of a complex number on the unit circle is always one – because the length of vectors on the unit circle is equal to one). It’s a general thing: if Ψ is some wave function (i.e. it describes some complex amplitude in space and time, then eiδΨ is the same wave function but with a phase shift equal to δ. Huh? Yes. Think about it: we’re multiplying complex numbers here, so that’s adding angles and multiplying lengths. Now the length of eiδ is 1 (because it’s a complex number on the unit circle) but its phase is δ. So multiplying Ψ with eiδ does not change the length of Ψ but it does shift its phase by an amount (in radians) equal to δ. That should be easy enough to understand.

You probably wonder what I am being so fussy, and what that δ could be, or why it would be there. After all, we do have a well-behaved wave function f(θ) here, depending on x, t and θ, and so the only thing we did was to change the angle θ (we added π radians to it). So why would we need to insert a phase shift here? Because that’s what δ really is: some random phase shift. Well… I don’t know. This phase factor is just a mathematical possibility as for now. So we just assume that, for some reason which we don’t understand right now, there might be some ‘arbitrary phase factor’ (that’s how Feynman calls δ) coming into play when we ‘exchange’ the ‘role’ of the particles. So maybe that δ is there, but maybe not. I admit it looks very ugly. In fact, if the story about Bose’s ‘discovery’ of this ‘mathematical possibility’ (in 1924) is correct, then it all started with an obvious ‘mistake’ in a quantum-mechanical calculation – but a ‘mistake’ that, miraculously, gave predictions that agreed with experimental results that could not be explained without introducing this ‘mistake’. So let the argument go full circle – literally – and take your time to appreciate the beauty of argumentation in physics.

Let’s swap detector 1 and detector 2 a second time, so we ‘exchange’ particle a and b once again. So then we need to apply this phase factor δ once again and, because of symmetry in physics, we obviously have to use the same phase factor δ – not some other value γ or something. We’re only rotating our detectors once again. That’s it. So all the rest stays the same. Of course, we also need to add π once more to the argument in our wave function f. In short, the amplitude for this is:

eiδ[eiδf(θ+π+π)] = (eiδ)f(θ) = ei2δ f(θ)

Indeed, the angle θ+2π is the same as θ. But so we have twice that phase shift now: 2δ. As ugly as that ‘thing’ above: eiδf(θ+π). However, if we square the amplitude, we get the same probability: P= |Ψ1|= |ei2δ f(θ)| = |f(θ)|2. So it must be right, right? Yes. But – Hey! Wait a minute! We are obviously back at where we started, aren’t we? We are looking at the combined probability – and amplitude – for particle a going to counter 1 and particle b going to counter 2, and the angle is θ! So it’s the same physical situation, and – What the heck! – reality doesn’t change just because we’re rotating these detectors a couple of times, does it? [In fact, we’re actually doing nothing but a thought experiment here!] Hence, not only the probability but also the amplitude must be the same.  So (eiδ)2f(θ) must equal f(θ) and so… Well… If (eiδ)2f(θ) = f(θ), then (eiδ)2 must be equal to 1. Now, what does that imply for the value of δ?

Well… While the square of the modulus of all vectors on the unit circle is always equal to 1, there are only two cases for which the square of the vector itself yields 1: (I) eiδ = eiπ =  eiπ = –1 (check it: (eiπ)= (–1)ei2π = ei0 = +1), and (II) eiδ = ei2π eie= +1 (check it: ei2π)= (+1)ei4π = ei0 = +1). In other words, our phase factor δ is either δ = 0 (or 0 ± 2nπ) or, else, δ = π (or π ± 2nπ). So eiδ = ± 1 and Ψ2 is either +f(θ+π) or, else, –f(θ+π). What does this mean? It means that, if we’re going to be adding the amplitudes, then the ‘exchanged case’ may contribute with the same sign or, else, with the opposite sign.

But, surely, there is no need to add amplitudes here, is there? Particle a can be distinguished from particle b and so the first case (particle a going into counter 1 and particle b going into counter 2) is not the same as the ‘exchanged case’ (particle a going into counter 2 and b going into counter 1). So we can clearly distinguish or verify which of the two possible paths are followed and, hence, we should be adding probabilities if we want to get the combined probability for both cases, not amplitudes. Now that is where the fun starts. Suppose that we have identical particles here – so not some beam of α-particles (i.e. helium nuclei) bombarding beryllium nuclei for instance but, let’s say, electrons on electrons, or photons on photons indeed – then we do have to add the amplitudes, not the probabilities, in order to calculate the combined probability of a particle going into counter 1 and the other particle going into counter 2, for the simple reason that we don’t know which is which and, hence, which is going where.

Let me immediately throw in an important qualifier: defining ‘identical particles’ is not as easy as it sounds. Our ‘wavicle’ of choice, for example, an electron, can have its spin ‘up’ or ‘down’ – and so that’s two different things. When an electron arrives in a counter, we can measure its spin (in practice or in theory: it doesn’t matter in quantum mechanics) and so we can distinguish it and, hence, an electron that’s ‘up’ is not identical to one that’s ‘down’. [I should resist the temptation but I’ll quickly make the remark: that’s the reason why we have two electrons in one atomic orbital: one is ‘up’ and the other one is ‘down’. Identical particles need to be in the same ‘quantum state’ (that’s the standard expression for it) to end up as ‘identical particles’ in, let’s say, a laser beam or so. As Feynman states it: in this (theoretical) experiment, we are talking polarized beams, with no mixture of different spin states.]

The wonderful thing in quantum mechanics is that mathematical possibility usually corresponds with reality. For example, electrons with positive charge, or anti-matter in general, is not only a theoretical possibility: they exist. Likewise, we effectively have particles which interfere with positive sign – these are called Bose particles – and particles which interfere with negative sign – Fermi particles.

So that’s reality. The factor eiδ = ± 1 is there, and it’s a strict dichotomy: photons, for example, always behave like Bose particles, and protons, neutrons and electrons always behave like Fermi particles. So they don’t change their mind and switch from one to the other category, not for a short while, and not for a long while (or forever) either. In fact, you may or may not be surprised to hear that there are experiments trying to find out if they do – just in case. 🙂 For example, just Google for Budker and English (2010) from the University of California at Berkeley. The experiments confirm the dichotomy: no split personalities here, not even for a nanosecond (10−9 s), or a picosecond (10−12 s). [A picosecond is the time taken by light to travel 0.3 mm in a vacuum. In a nanosecond, light travels about one foot.]

In any case, does all of this really matter? What’s the difference, in practical terms that is? Between Bose or Fermi, I must assume we prefer the booze.

It’s quite fundamental, however. Hang in there for a while and you’ll see why.

Bose statistics

Suppose we have, once again, some particle a and b that (i) come from different directions (but, this time around, not necessarily in the experimental set-up as described above: the two particles may come from any direction really), (ii) are being scattered, at some point in space (but, this time around, not necessarily the same point in space), (iii) end up going in one and the same direction and – hopefully – (iv) arrive together at some other point in space. So they end up in the same state, which means they have the same direction and energy (or momentum) and also whatever other condition that’s relevant. Again, if the particles are not identical, we can catch both of them and identify which is which. Now, if it’s two different particles, then they won’t take exactly the same path. Let’s say they travel along two infinitesimally close paths referred to as path 1 and 2 and so we should have two infinitesimally small detectors: one at location 1 and the other at location 2. The illustration below (credit to Feynman once again!) is for n particles, but here we’ll limit ourselves to the calculations for just two.

Boson particles

Let’s denote the amplitude of a to follow path 1 (and end up in counter 1) as a1, and the amplitude of b to follow path 2 (and end up in counter 2) as b1. Then the amplitude for these two scatterings to occur at the same time is the product of these two amplitudes, and so the probability is equal to |a1b1|= [|a1||b1|]= |a1|2|b1|2. Similarly, the combined amplitude of a following path 2 (and ending up in counter 2) and b following path 1 (etcetera) is |a2|2|b2|2. But so we said that the directions 1 and 2 were infinitesimally close and, hence, the values for aand a2, and for band b2, should also approach each other, so we can equate them with a and b respectively and, hence, the probability of some kind of combined detector picking up both particles as they hit the counter is equal to P = 2|a|2|b|2 (just substitute and add). [Note: For those who would think that separate counters and ‘some kind of combined detector’ radically alter the set-up of this thought experiment (and, hence, that we cannot just do this kind of math), I refer to Feynman (Vol. III, Lecture 4, section 4): he shows how it works using differential calculus.]

Now, if the particles cannot be distinguished – so if we have ‘identical particles’ (like photons, or polarized electrons) – and if we assume they are Bose particles (so they interfere with a positive sign – i.e. like photons, but not like electrons), then we should no longer add the probabilities but the amplitudes, so we get a1b+ a2b= 2ab for the amplitude and – lo and behold! – a probability equal to P = 4|a|2|b|2So what? Well… We’ve got a factor 2 difference here: 4|a|2|b|is two times 2|a|2|b|2.

This is a strange result: it means we’re twice as likely to find two identical Bose particles scattered into the same state as you would assuming the particles were different. That’s weird, to say the least. In fact, it gets even weirder, because this experiment can easily be extended to a situation where we have n particles present (which is what the illustration suggests), and that makes it even more interesting (more ‘weird’ that is). I’ll refer to Feynman here for the (fairly easy but somewhat lengthy) calculus in case we have n particles, but the conclusion is rock-solid: if we have n bosons already present in some state, then the probability of getting one extra boson is n+1 times greater than it would be if there were none before.

So the presence of the other particles increases the probability of getting one more: bosons like to crowd. And there’s no limit to it: the more bosons you have in one space, the more likely it is another one will want to occupy the same space. It’s this rather weird phenomenon which explains equally weird things such as superconductivity and superfluidity, or why photons of the same frequency can form such powerful laser beams: they don’t mind being together – literally on the same spot – in huge numbers. In fact, they love it: a laser beam, superfluidity or superconductivity are actually quantum-mechanical phenomena that are visible at a macro-scale.

OK. I won’t go into any more detail here. Let me just conclude by showing how interference works for Fermi particles. Well… That doesn’t work or, let me be more precise, it leads to the so-called (Pauli) Exclusion Principle which, for electrons, states that “no two electrons can be found in exactly the same state (including spin).” Indeed, we get a1b– a2b1= ab – ab = 0 (zero!) if we let the values of aand a2, and band b2, come arbitrarily close to each other. So the amplitude becomes zero as the two directions (1 and 2) approach each other. That simply means that it is not possible at all for two electrons to have the same momentum, location or, in general, the same state of motion – unless they are spinning opposite to each other (in which case they are not ‘identical’ particles). So what? Well… Nothing much. It just explains all of the chemical properties of atoms. 🙂

In addition, the Pauli exclusion principle also explains the stability of matter on a larger scale: protons and neutrons are fermions as well, and so they just “don’t get close together with one big smear of electrons around them”, as Feynman puts it, adding: “Atoms must keep away from each other, and so the stability of matter on a large scale is really a consequence of the Fermi particle nature of the electrons, protons and neutrons.”

Well… There’s nothing much to add to that, I guess. 🙂

Post scriptum:

I wrote that “more complex particles, such as atomic nuclei, are also either bosons or fermions”, and that this depends on the number of protons and neutrons they consist of. In fact, bosons are, in general, particles with integer spin (0 or 1), while fermions have half-integer spin (1/2). Bosonic Helium-4 (He4) has zero spin. Photons (which mediate electromagnetic interactions), gluons (which mediate the so-called strong interactions between particles), and the W+, W and Z particles (which mediate the so-called weak interactions) all have spin one (1). As mentioned above, Lithium-7 (Li7) has half-integer spin (3/2). The underlying reason for the difference in spin between He4 and Li7 is their composition indeed: He4  consists of two protons and two neutrons, while Liconsists of three protons and four neutrons.

However, we have to go beyond the protons and neutrons for some better explanation. We now know that protons and neutrons are not ‘fundamental’ any more: they consist of quarks, and quarks have a spin of 1/2. It is probably worth noting that Feynman did not know this when he wrote his Lectures in 1965, although he briefly sketches the findings of Murray Gell-Man and Georg Zweig, who published their findings in 1961 and 1964 only, so just a little bit before, and describes them as ‘very interesting’. I guess this is just another example of Feynman’s formidable intellect and intuition… In any case, protons and neutrons are so-called baryons: they consist of three quarks, as opposed to the short-lived (unstable) mesons, which consist of one quark and one anti-quark only (you may not have heard about mesons – they don’t live long – and so I won’t say anything about them). Now, an uneven number of quarks result in half-integer spin, and so that’s why protons and neutrons have half-integer spin. An even number of quarks result in integer spin, and so that’s why mesons have spin zero 0 or 1. Two protons and two neutrons together, so that’s He4, can condense into a bosonic state with spin zero, because four half-integer spins allows for an integer sum. Seven half-integer spins, however, cannot be combined into some integer spin, and so that’s why Li7 has half-integer spin (3/2). Electrons also have half-integer spin (1/2) too. So there you are.

Now, I must admit that this spin business is a topic of which I understand little – if anything at all. And so I won’t go beyond the stuff I paraphrased or quoted above. The ‘explanation’ surely doesn’t ‘explain’ this fundamental dichotomy between bosons and fermions. In that regard, Feynman’s 1965 conclusion still stands: “It appears to be one of the few places in physics where there is a rule which can be stated very simply, but for which no one has found a simple and easy explanation. The explanation is deep down in relativistic quantum mechanics. This probably means that we do not have a complete understanding of the fundamental principle involved. For the moment, you will just have to take it as one of the rules of the world.”

Some content on this page was disabled on June 20, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/
Some content on this page was disabled on June 20, 2020 as a result of a DMCA takedown notice from Michael A. Gottlieb, Rudolf Pfeiffer, and The California Institute of Technology. You can learn more about the DMCA here:

https://wordpress.com/support/copyright-and-the-dmca/

An easy piece: introducing quantum mechanics and the wave function

Pre-scriptum (dated 26 June 2020): A quick glance at this piece – so many years after I have written it – tells me it is basically OK. However, it is quite obvious that, in terms of interpreting the math, I have come a very long way. However, I would recommend you go through the piece so as to get the basic math, indeed, and then you may or may not be ready for the full development of my realist or classical interpretation of QM. My manuscript may also be a fun read for you.

Original post:

After all those boring pieces on math, it is about time I got back to physics. Indeed, what’s all that stuff on differential equations and complex numbers good for? This blog was supposed to be a journey into physics, wasn’t it? Yes. But wave functions – functions describing physical waves (in classical mechanics) or probability amplitudes (in quantum mechanics) – are the solution to some differential equation, and they will usually involve complex-number notation. However, I agree we have had enough of that now. Let’s see how it works. By the way, the title of this post – An Easy Piece – is an obvious reference to (some of) Feynman’s 1965 Lectures on Physics, some of which were re-packaged in 1994 (six years after his death that is) in ‘Six Easy Pieces’ indeed – but, IMHO, it makes more sense to read all of them as part of the whole series.

Let’s first look at one of the most used mathematical shapes: the sinusoidal wave. The illustration below shows the basic concepts: we have a wave here – some kind of cyclic thing – with a wavelength λ, an amplitude (or height) of (maximum) A0, and a so-called phase shift equal to φ. The Wikipedia definition of a wave is the following: “a wave is a disturbance or oscillation that travels through space and matter, accompanied by a transfer of energy.” Indeed, a wave transports energy as it travels (oh – I forgot to mention the speed or velocity of a wave (v) as an important characteristic of a wave), and the energy it carries is directly proportional to the square of the amplitude of the wave: E ∝ A2 (this is true not only for waves like water waves, but also for electromagnetic waves, like light).

Cosine wave concepts

Let’s now look at how these variables get into the argument – literally: into the argument of the wave function. Let’s start with that phase shift. The phase shift is usually defined referring to some other wave or reference point (in this case the origin of the x and y axis). Indeed, the amplitude – or ‘height’ if you want (think of a water wave, or the strength of the electric field) – of the wave above depends on (1) the time t (not shown above) and (2) the location (x), but so we will need to have this phase shift φ in the argument of the wave function because at x = 0 we do not have a zero height for the wave. So, as we can see, we can shift the x-axis left or right with this φ. OK. That’s simple enough. Let’s look at the other independent variables now: time and position.

The height (or amplitude) of the wave will obviously vary both in time as well as in space. On this graph, we fixed time (t = 0) – and so it does not appear as a variable on the graph – and show how the amplitude y = A varies in space (i.e. along the x-axis). We could also have looked at one location only (x = 0 or x1 or whatever other location) and shown how the amplitude varies over time at that location only. The graph would be very similar, except that we would have a ‘time distance’ between two crests (or between two troughs or between any other two points separated by a full cycle of the wave) instead of the wavelength λ (i.e. a distance in space). This ‘time distance’ is the time needed to complete one cycle and is referred to as the period of the wave (usually denoted by the symbol T or T– in line with the notation for the maximum amplitude A0). In other words, we will also see time (t) as well as location (x) in the argument of this cosine or sine wave function. By the way, it is worth noting that it does not matter if we use a sine or cosine function because we can go from one to the other using the basic trigonometric identities cos θ = sin(π/2 – θ) and sin θ = cos(π/2 – θ). So all waves of the shape above are referred to as sinusoidal waves even if, in most cases, the convention is to actually use the cosine function to represent them.

So we will have x, t and φ in the argument of the wave function. Hence, we can write A = A(x, t, φ) = cos(x + t + φ) and there we are, right? Well… No. We’re adding very different units here: time is measured in seconds, distance in meter, and the phase shift is measured in radians (i.e. the unit of choice for angles). So we can’t just add them up. The argument of a trigonometric function (like this cosine function) is an angle and, hence, we need to get everything in radians – because that’s the unit we use to measure angles. So how do we do that? Let’s do it step by step.

First, it is worth noting that waves are usually caused by something. For example, electromagnetic waves are caused by an oscillating point charge somewhere, and radiate out from there. Physical waves – like water waves, or an oscillating string – usually also have some origin. In fact, we can look at a wave as a way of transmitting energy originating elsewhere. In the case at hand here – i.e. the nice regular sinusoidal wave illustrated above – it is obvious that the amplitude at some time t = tat some point x = x1 will be the same as the amplitude of that wave at point x = 0 some time ago. How much time ago? Well… The time (t) that was needed for that wave to travel from point x = 0 to point x = xis easy to calculate: indeed, if the wave originated at t = 0 and x = 0, then x1 (i.e. the distance traveled by the wave) will be equal to its velocity (v) multiplied by t1, so we have x1= v.t1 (note that we assume the wave velocity is constant – which is a very reasonable assumption). In other words, inserting x1and t1 in the argument of our cosine function should yield the same value as inserting zero for x and t. Distance and time can be substituted so to say, and that’s we will have something like x – vt or vt – x in the argument in that cosine function: we measure both time and distance in units of distance so to say. [Note that x – vt and –(x-vt) = vt – x are equivalent because cos θ = cos (-θ)]

Does this sound fishy? It shouldn’t. Think about it. In the (electric) field equation for electromagnetic radiation (that’s one of the examples of a wave which I mentioned above), you’ll find the so-called retarded acceleration a(t – x/c) in the argument: that’s the acceleration (a)of the charge causing the electric field at point x to change not at time t but at time t – x/c. So that’s the retarded acceleration indeed: x/c is the time it took for the wave to travel from its origin (the oscillating point charge) to x and so we subtract that from t. [When talking electromagnetic radiation (e.g. light), the wave velocity v is obviously equal to c, i.e. the speed of light, or of electromagnetic radiation in general.] Of course, you will now object that t – x/c is not the same as vt – x, and you are right: we need time units in the argument of that acceleration function, not distance. We can get to distance units if we would multiply the time with the wave velocity v but that’s complicated business because the velocity of that moving point charge is not a constant.

[…] I am not sure if I made myself clear here. If not, so be it. The thing to remember is that we need an input expressed in radians for our cosine function, not time, nor distance. Indeed, the argument in a sine or cosine function is an angle, not some distance. We will call that angle the phase of the wave, and it is usually denoted by the symbol θ  – which we also used above. But so far we have been talking about amplitude as a function of distance, and we expressed time in distance units too – by multiplying it with v. How can we go from some distance to some angle? It is simple: we’ll multiply x – vt with 2π/λ.

Huh? Yes. Think about it. The wavelength will be expressed in units of distance – typically 1 m in the SI International System of Units but it could also be angstrom (10–10 m = 0.1 nm) or nano-meter (10–9 m = 10 Å). A wavelength of two meter (2 m) means that the wave only completes half a cycle per meter of travel. So we need to translate that into radians, which – once again – is the measure used to… well… measure angles, or the phase of the wave as we call it here. So what’s the ‘unit’ here? Well… Remember that we can add or subtract 2π (and any multiple of 2π, i.e. ± 2nπ with n = ±1, ±2, ±3,…) to the argument of all trigonometric functions and we’ll get the same value as for the original argument. In other words, a cycle characterized by a wavelength λ corresponds to the angle θ going around the origin and describing one full circle, i.e. 2π radians. Hence, it is easy: we can go from distance to radians by multiplying our ‘distance argument’ x – vt with 2π/λ. If you’re not convinced, just work it out for the example I gave: if the wavelength is 2 m, then 2π/λ equals 2π/2 = π. So traveling 6 meters along the wave – i.e. we’re letting x go from 0 to 6 m while fixing our time variable – corresponds to our phase θ going from 0 to 6π: both the ‘distance argument’ as well as the change in phase cover three cycles (three times two meter for the distance, and three times 2π for the change in phase) and so we’re fine. [Another way to think about it is to remember that the circumference of the unit circle is also equal to 2π (2π·r = 2π·1 in this case), so the ratio of 2π to λ measures how many times the circumference contains the wavelength.]

In short, if we put time and distance in the (2π/λ)(x-vt) formula, we’ll get everything in radians and that’s what we need for the argument for our cosine function. So our sinusoidal wave above can be represented by the following cosine function:

A = A(x, t) = A0cos[(2π/λ)(x-vt)]

We could also write A = A0cosθ with θ = (2π/λ)(x-vt). […] Both representations look rather ugly, don’t they? They do. And it’s not only ugly: it’s not the standard representation of a sinusoidal wave either. In order to make it look ‘nice’, we have to introduce some more concepts here, notably the angular frequency and the wave number. So let’s do that.

The angular frequency is just like the… well… the frequency you’re used to, i.e. the ‘non-angular’ frequency f,  as measured in cycles per second (i.e. in Hertz). However, instead of measuring change in cycles per second, the angular frequency (usually denoted by the symbol ω) will measure the rate of change of the phase with time, so we can write or define ω as ω = ∂θ/∂t. In this case, we can easily see that ω = –2πv/λ. [Note that we’ll take the absolute value of that derivative because we want to work with positive numbers for such properties of functions.] Does that look complicated? In doubt, just remember that ω is measured in radians per second and then you can probably better imagine what it is really. Another way to understand ω somewhat better is to remember that the product of ω and the period T is equal to 2π, so that’s a full cycle. Indeed, the time needed to complete one cycle multiplied with the phase change per second (i.e. per unit time) is equivalent to going round the full circle: 2π = ω.T. Because f = 1/T, we can also relate ω to f and write ω = 2π.f = 2π/T.

Likewise, we can measure the rate of change of the phase with distance, and that gives us the wave number k = ∂θ/∂x, which is like the spatial frequency of the wave. So it is just like the wavelength but then measured in radians per unit distance. From the function above, it is easy to see that k = 2π/λ. The interpretation of this equality is similar to the ω.T = 2π equality. Indeed, we have a similar equation for k: 2π = k.λ, so the wavelength (λ) is for k what the period (T) is for ω. If you’re still uncomfortable with it, just play a bit with some numerical examples and you’ll be fine.

To make a long story short, this, then, allows us to re-write the sinusoidal wave equation above in its final form (and let me include the phase shift φ again in order to be as complete as possible at this stage):

A(x, t) = A0cos(kx – ωt + φ)

You will agree that this looks much ‘nicer’ – and also more in line with what you’ll find in textbooks or on Wikipedia. 🙂 I should note, however, that we’re not adding any new parameters here. The wave number k and the angular frequency ω are not independent: this is still the same wave (A = A0cos[(2π/λ)(x-vt)]), and so we are not introducing anything more than the frequency and – equally important – the speed with which the wave travels, which is usually referred to as the phase velocity. In fact, it is quite obvious from the ω.T = 2π and the k = 2π/λ identities that kλ = ω.T and, hence, taking into account that λ is obviously equal to λ = v.T (the wavelength is – by definition – the distance traveled by the wave in one period), we find that the phase (or wave) velocity v is equal to the ratio of ω and k, so we have that v = ω/k. So x, t, ω and k could be re-scaled or so but their ratio cannot change: the velocity of the wave is what it is. In short, I am introducing two new concepts and symbols (ω and k) but there are no new degrees of freedom in the system so to speak.

[At this point, I should probably say something about the difference between the phase velocity and the so-called group velocity of a wave. Let me do that in as brief a way as I can manage. Most real-life waves travel as a wave packet, aka a wave train. So that’s like a burst, or an “envelope” (I am shamelessly quoting Wikipedia here…), of “localized wave action that travels as a unit.” Such wave packet has no single wave number or wavelength: it actually consists of a (large) set of waves with phases and amplitudes such that they interfere constructively only over a small region of space, and destructively elsewhere. The famous Fourier analysis (or infamous if you have problems understanding what it is really) decomposes this wave train in simpler pieces. While these ‘simpler’ pieces – which, together, add up to form the wave train – are all ‘nice’ sinusoidal waves (that’s why I call them ‘simple’), the wave packet as such is not. In any case (I can’t be too long on this), the speed with which this wave train itself is traveling through space is referred to as the group velocity. The phase velocity and the group velocity are usually very different: for example, a wave packet may be traveling forward (i.e. its group velocity is positive) but the phase velocity may be negative, i.e. traveling backward. However, I will stop here and refer to the Wikipedia article on group and phase velocity: it has wonderful illustrations which are much and much better than anything I could write here. Just one last point that I’ll use later: regardless of the shape of the wave (sinusoidal, sawtooth or whatever), we have a very obvious relationship relating wavelength and frequency to the (phase) velocity: v = λ.f, or f = v/λ. For example, the frequency of a wave traveling 3 meter per second and wavelength of 1 meter will obviously have a frequency of three cycles per second (i.e. 3 Hz). Let’s go back to the main story line now.]

With the rather lengthy ‘introduction’ to waves above, we are now ready for the thing I really wanted to present here. I will go much faster now that we have covered the basics. Let’s go.

From my previous posts on complex numbers (or from what you know on complex numbers already), you will understand that working with cosine functions is much easier when writing them as the real part of a complex number A0eiθ = A0ei(kx – ωt + φ). Indeed, A0eiθ = A0(cosθ + isinθ) and so the cosine function above is nothing else but the real part of the complex number A0eiθ. Working with complex numbers makes adding waves and calculating interference effects and whatever we want to do with these wave functions much easier: we just replace the cosine functions by complex numbers in all of the formulae, solve them (algebra with complex numbers is very straightforward), and then we look at the real part of the solution to see what is happening really. We don’t care about the imaginary part, because that has no relationship to the actual physical quantities – for physical and electromagnetic waves that is, or for any other problem in classical wave mechanics. Done. So, in classical mechanics, the use of complex numbers is just a mathematical tool.

Now, that is not the case for the wave functions in quantum mechanics: the imaginary part of a wave equation – yes, let me write one down here – such as Ψ = Ψ(x, t) = (1/x)ei(kx – ωt) is very much part and parcel of the so-called probability amplitude that describes the state of the system here. In fact, this Ψ function is an example taken from one of Feynman’s first Lectures on Quantum Mechanics (i.e. Volume III of his Lectures) and, in this case, Ψ(x, t) = (1/x)ei(kx – ωt) represents the probability amplitude of a tiny particle (e.g. an electron) moving freely through space – i.e. without any external forces acting upon it – to go from 0 to x and actually be at point x at time t. [Note how it varies inversely with the distance because of the 1/x factor, so that makes sense.] In fact, when I started writing this post, my objective was to present this example – because it illustrates the concept of the wave function in quantum mechanics in a fairly easy and relatively understandable way. So let’s have a go at it.

First, it is necessary to understand the difference between probabilities and probability amplitudes. We all know what a probability is: it is a real number between o and 1 expressing the chance of something happening. It is usually denoted by the symbol P. An example is the probability that monochromatic light (i.e. one or more photons with the same frequency) is reflected from a sheet of glass. [To be precise, this probability is anything between 0 and 16% (i.e. P = 0 to 0.16). In fact, this example comes from another fine publication of Richard Feynman – QED (1985) – in which he explains how we can calculate the exact probability, which depends on the thickness of the sheet.]

A probability amplitude is something different. A probability amplitude is a complex number (3 + 2i, or 2.6ei1.34, for example) and – unlike its equivalent in classical mechanics – both the real and imaginary part matter. That being said, probabilities and probability amplitudes are obviously related: to be precise, one calculates the probability of an event actually happening by taking the square of the modulus (or the absolute value) of the probability amplitude associated with that event. Huh? Yes. Just let it sink in. So, if we denote the probably amplitude by Φ, then we have the following relationship:

P =|Φ|2

P = probability

Φ = probability amplitude

In addition, where we would add and multiply probabilities in the classical world (for example, to calculate the probability of an event which can happen in two different ways – alternative 1 and alternative 2 let’s say – we would just add the individual probabilities to arrive at the probably of the event happening in one or the other way, so P = P1+ P2), in the quantum-mechanical world we should add and multiply probability amplitudes, and then take the square of the modulus of that combined amplitude to calculate the combined probability. So, formally, the probability of a particle to reach a given state by two possible routes (route 1 or route 2 let’s say) is to be calculated as follows:

Φ = Φ1+ Φ2

and P =|Φ|=|Φ1+ Φ2|2

Also, when we have only one route, but that one route consists of two successive stages (for example: to go from A to C, the particle would have first have to go from A to B, and then from B to C, with different probabilities of stage AB and stage BC actually happening), we will not multiply the probabilities (as we would do in the classical world) but the probability amplitudes. So we have:

Φ = ΦAB ΦBC

and P =|Φ|=|ΦAB ΦBC|2

In short, it’s the probability amplitudes (and, as mentioned, these are complex numbers, not real numbers) that are to be added and multiplied etcetera and, hence, the probability amplitudes act as the equivalent, so to say, in quantum mechanics, of the conventional probabilities in classical mechanics. The difference is not subtle. Not at all. I won’t dwell too much on this. Just re-read any account of the double-slit experiment with electrons which you may have read and you’ll remember how fundamental this is. [By the way, I was surprised to learn that the double-slit experiment with electrons has apparently only been done in 2012 in exactly the way as Feynman described it. So when Feynman described it in his 1965 Lectures, it was still very much a ‘thought experiment’ only – even a 1961 experiment (not mentioned by Feynman) had clearly established the reality of electron interference.]

OK. Let’s move on. So we have this complex wave function in quantum mechanics and, as Feynman writes, “It is not like a real wave in space; one cannot picture any kind of reality to this wave as one does for a sound wave.” That being said, one can, however, get pretty close to ‘imagining’ what it actually is IMHO. Let’s go by the example which Feynman gives himself – on the very same page where he writes the above actually. The amplitude for a free particle (i.e. with no forces acting on it) with momentum p = m to go from location rto location ris equal to

Φ12 = (1/r12)eip.r12/ħ with r12 = rr

I agree this looks somewhat ugly again, but so what does it say? First, be aware of the difference between bold and normal type: I am writing p and v in bold type above because they are vectors: they have a magnitude (which I will denote by p and v respectively) as well as a direction in space. Likewise, r12 is a vector going from r1 to r2 (and rand r2 themselves are space vectors themselves obviously) and so r12 (non-bold) is the magnitude of that vector. Keeping that in mind, we know that the dot product p.r12 is equal to the product of the magnitudes of those vectors multiplied by cosα, with α the angle between those two vectors. Hence, p.r12  .= p.r12.cosα. Now, if p and r12 have the same direction, the angle α will be zero and so cosα will be equal to one and so we just have p.r12 = p.r12 or, if we’re considering a particle going from 0 to some position x, p.r12 = p.r12 = px.

Now we also have Planck’s constant there, in its reduced form ħ = h/2π. As you can imagine, this 2π has something to do with the fact that we need radians in the argument. It’s the same as what we did with x in the argument of that cosine function above: if we have to express stuff in radians, then we have to absorb a factor of 2π in that constant. However, here I need to make an additional digression. Planck’s constant is obviously not just any constant: it is the so-called quantum of action. Indeed, it appears in what may well the most fundamental relations in physics.

The first of these fundamental relations is the so-called Planck relation: E = hf. The Planck relation expresses the wave-particle duality of light (or electromagnetic waves in general): light comes in discrete quanta of energy (photons), and the energy of these ‘wave particles’ is directly proportional to the frequency of the wave, and the factor of proportionality is Planck’s constant.

The second fundamental relation, or relations – in plural – I should say, are the de Broglie relations. Indeed, Louis-Victor-Pierre-Raymond, 7th duc de Broglie, turned the above on its head: if the fundamental nature of light is (also) particle-like, then the fundamental nature of particles must (also) be wave-like. So he boldly associated a frequency f and a wavelength λ with all particles, such as electrons for example – but larger-scale objects, such as billiard balls, or planets, also have a de Broglie wavelength and frequency! The de Broglie relation determining the de Broglie frequency is – quite simply – the re-arranged Planck relation: f = E/h. So this relation relates the de Broglie frequency with energy. However, in the above wave function, we’ve got momentum, not energy. Well… Energy and momentum are obviously related, and so we have a second de Broglie relation relating momentum with wavelength: λ = h/p.

We’re almost there: just hang in there. 🙂 When we presented the sinusoidal wave equation, we introduced the angular frequency (ω)  and the wave number (k), instead of working with f and λ. That’s because we want an argument expressed in radians. Here it’s the same. The two de Broglie equations have a equivalent using angular frequency and wave number: ω = E/ħ and k = p/ħ. So we’ll just use the second one (i.e. the relation with the momentum in it) to associate a wave number with the particle (k = p/ħ).

Phew! So, finally, we get that formula which we introduced a while ago already:  Ψ(x) = (1/x)eikx, or, including time as a variable as well (we made abstraction of time so far):

Ψ(x, t) = (1/x)ei(kx – ωt)

The formula above obviously makes sense. For example, the 1/x factor makes the probability amplitude decrease as we get farther away from where the particle started: in fact, this 1/x or 1/r variation is what we see with electromagnetic waves as well: the amplitude of the electric field vector E varies as 1/r and, because we’re talking some real wave here and, hence, its energy is proportional to the square of the field, the energy that the source can deliver varies inversely as the square of the distance. [Another way of saying the same is that the energy we can take out of a wave within a given conical angle is the same, no matter how far away we are: the energy flux is never lost – it just spreads over a greater and greater effective area. But let’s go back to the main story.]

We’ve got the math – I hope. But what does this equation mean really? What’s that de Broglie wavelength or frequency in reality? What wave are we talking about? Well… What’s reality? As mentioned above, the famous de Broglie relations associate a wavelength λ and a frequency f to a particle with momentum p and energy E, but it’s important to mention that the associated de Broglie wave function yields probability amplitudes. So it is, indeed, not a ‘real wave in space’ as Feynman would put it. It is a quantum-mechanical wave equation.

Huh? […] It’s obviously about time I add some illustrations here, and so that’s what I’ll do. Look at the two cases below. The case on top is pretty close to the situation I described above: it’s a de Broglie wave – so that’s a complex wave – traveling through space (in one dimension only here). The real part of the complex amplitude is in blue, and the green is the imaginary part. So the probability of finding that particle at some position x is the modulus squared of this complex amplitude. Now, this particular wave function ignores the 1/x variation and, hence, the squared modulus of Aei(kx – ωt) is equal to a constant. To be precise, it’s equal to A2 (check it: the squared modulus of a complex number z equals the product of z and its complex conjugate, and so we get Aas a result indeed). So what does this mean? It means that the probability of finding that particle (an electron, for example) is the same at all points! In other words, we don’t know where it is! In the illustration below (top part), that’s shown as the (yellow) color opacity: the probability is spread out, just like the wave itself, so there is no definite position of the particle indeed.

2000px-Propagation_of_a_de_broglie_wave

[Note that the formula in the illustration above (which I took from Wikipedia once again) uses p instead of k as the factor in front of x. While it does not make a big difference from a mathematical point of view (ħ is just a factor of proportionality: k = p/ħ), it does make a big difference from a conceptual point of view and, hence, I am puzzled as to why the author of this article did this. Also, there is some variation in the opacity of the yellow (i.e. the color of our tennis (or ping pong) ball representing our ‘wavicle’) which shouldn’t be there because the probability associated with this particular wave function is a constant indeed: so there is no variation in the probability (when squaring the absolute value of a complex number, the phase factor does not come into play). Also note that, because all probabilities have to add up to 100% (or to 1), a wave function like this is quite problematic. However, don’t worry about it just now: just try to go with the flow.]

By now, I must assume you shook your head in disbelief a couple of time already. Surely, this particle (let’s stick to the example of an electron) must be somewhere, yes? Of course.

The problem is that we gave an exact value to its momentum and its energy and, as a result, through the de Broglie relations, we also associated an exact frequency and wavelength to the de Broglie wave associated with this electron.  Hence, Heisenberg’s Uncertainty Principle comes into play: if we have exact knowledge on momentum, then we cannot know anything about its location, and so that’s why we get this wave function covering the whole space, instead of just some region only. Sort of. Here we are, of course, talking about that deep mystery about which I cannot say much – if only because so many eminent physicists have already exhausted the topic. I’ll just state Feynman once more: “Things on a very small scale behave like nothing that you have any direct experience with. […] It is very difficult to get used to, and it appears peculiar and mysterious to everyone – both to the novice and to the experienced scientist. Even the experts do not understand it the way they would like to, and it is perfectly reasonable that they should not because all of direct, human experience and of human intuition applies to large objects. We know how large objects will act, but things on a small scale just do not act that way. So we have to learn about them in a sort of abstract or imaginative fashion and not by connection with our direct experience.” And, after describing the double-slit experiment, he highlights the key conclusion: “In quantum mechanics, it is impossible to predict exactly what will happen. We can only predict the odds [i.e. probabilities]. Physics has given up on the problem of trying to predict exactly what will happen. Yes! Physics has given up. We do not know how to predict what will happen in a given circumstance. It is impossible: the only thing that can be predicted is the probability of different events. It must be recognized that this is a retrenchment in our ideal of understanding nature. It may be a backward step, but no one has seen a way to avoid it.”

[…] That’s enough on this I guess, but let me – as a way to conclude this little digression – just quickly state the Uncertainty Principle in a more or less accurate version here, rather than all of the ‘descriptions’ which you may have seen of it: the Uncertainty Principle refers to any of a variety of mathematical inequalities asserting a fundamental limit (fundamental means it’s got nothing to do with observer or measurement effects, or with the limitations of our experimental technologies) to the precision with which certain pairs of physical properties of a particle (these pairs are known as complementary variables) such as, for example, position (x) and momentum (p), can be known simultaneously. More in particular, for position and momentum, we have that σxσp ≥ ħ/2 (and, in this formulation, σ is, obviously the standard symbol for the standard deviation of our point estimate for x and p respectively).

OK. Back to the illustration above. A particle that is to be found in some specific region – rather than just ‘somewhere’ in space – will have a probability amplitude resembling the wave equation in the bottom half: it’s a wave train, or a wave packet, and we can decompose it, using the Fourier analysis, in a number of sinusoidal waves, but so we do not have a unique wavelength for the wave train as a whole, and that means – as per the de Broglie equations – that there’s some uncertainty about its momentum (or its energy).

I will let this sink in for now. In my next post, I will write some more about these wave equations. They are usually a solution to some differential equation – and that’s where my next post will connect with my previous ones (on differential equations). Just to say goodbye – as for now that is – I will just copy another beautiful illustration from Wikipedia. See below: it represents the (likely) space in which a single electron on the 5d atomic orbital of a hydrogen atom would be found. The solid body shows the places where the electron’s probability density (so that’s the squared modulus of the probability amplitude) is above a certain value – so it’s basically the area where the likelihood of finding the electron is higher than elsewhere. The hue on the colored surface shows the complex phase of the wave function.

Hydrogen_eigenstate_n5_l2_m1

It is a wonderful image, isn’t it? At the very least, it increased my understanding of the mystery surround quantum mechanics somewhat. I hope it helps you too. 🙂

Post scriptum 1: On the need to normalize a wave function

In this post, I wrote something about the need for probabilities to add up to 1. In mathematical terms, this condition will resemble something like

probability amplitude adding up to some constant

In this integral, we’ve got – once again – the squared modulus of the wave function, and so that’s the probability of find the particle somewhere. The integral just states that all of the probabilities added all over space (Rn) should add up to some finite number (a2). Hey! But that’s not equal to 1 you’ll say. Well… That’s a minor problem only: we can create a normalized wave function ψ out of ψ0 by simply dividing ψ by a so we have ψ = ψ0/a, and then all is ‘normal’ indeed. 🙂

Post scriptum 2: On using colors to represent complex numbers

When inserting that beautiful 3D graph of that 5d atomic orbital (again acknowledging its source: Wikipedia), I wrote that “the hue on the colored surface shows the complex phase of the wave function.” Because this kind of visual representation of complex numbers will pop up in other posts as well (and you’ve surely encountered it a couple of times already), it’s probably useful to be explicit on what it represents exactly. Well… I’ll just copy the Wikipedia explanation, which is clear enough: “Given a complex number z = reiθ, the phase (also known as argument) θ can be represented by a hue, and the modulus r =|z| is represented by either intensity or variations in intensity. The arrangement of hues is arbitrary, but often it follows the color wheel. Sometimes the phase is represented by a specific gradient rather than hue.” So here you go…

Unit circle domain coloring.png

Post scriptum 3: On the de Broglie relations

The de Broglie relations are a wonderful pair. They’re obviously equivalent: energy and momentum are related, and wavelength and frequency are obviously related too through the general formula relating frequency, wavelength and wave velocity: fλ = v (the product of the frequency and the wavelength must yield the wave velocity indeed). However, when it comes to the relation between energy and momentum, there is a little catch. What kind of energy are we talking about? We were describing a free particle (e.g. an electron) traveling through space, but with no (other) charges acting on it – in other words: no potential acting upon it), and so we might be tempted to conclude that we’re talking about the kinetic energy (K.E.) here. So, at relatively low speeds (v), we could be tempted to use the equations p = mv and K.E. = p2/2m = mv2/2 (the one electron in a hydrogen atom travels at less than 1% of the speed of light, and so that’s a non-relativistic speed indeed) and try to go from one equation to the other with these simple formulas. Well… Let’s try it.

f = E/h according to de Broglie and, hence, substituting E with p2/2m and f with v/λ, we get v/λ = m2v2/2mh. Some simplification and re-arrangement should then yield the second de Broglie relation: λ = 2h/mv = 2h/p. So there we are. Well… No. The second de Broglie relation is just λ = h/p: there is no factor 2 in it. So what’s wrong? The problem is the energy equation: de Broglie does not use the K.E. formula. [By the way, you should note that the K.E. = mv2/2 equation is only an approximation for low speeds – low compared to c that is.] He takes Einstein’s famous E = mc2 equation (which I am tempted to explain now but I won’t) and just substitutes c, the speed of light, with v, the velocity of the slow-moving particle. This is a very fine but also very deep point which, frankly, I do not yet fully understand. Indeed, Einstein’s E = mcis obviously something much ‘deeper’ than the formula for kinetic energy. The latter has to do with forces acting on masses and, hence, obeys Newton’s laws – so it’s rather familiar stuff. As for Einstein’s formula, well… That’s a result from relativity theory and, as such, something that is much more difficult to explain. While the difference between the two energy formulas is just a factor of 1/2 (which is usually not a big problem when you’re just fiddling with formulas like this), it makes a big conceptual difference.

Hmm… Perhaps we should do some examples. So these de Broglie equations associate a wave with frequency f and wavelength λ with particles with energy E, momentum p and mass m traveling through space with velocity v: E = hf and p = h/λ. [And, if we would want to use some sine or cosine function as an example of such wave function – which is likely – then we need an argument expressed in radians rather than in units of time or distance. In other words, we will need to convert frequency and wavelength to angular frequency and wave number respectively by using the 2π = ωT = ω/f and 2π = kλ relations, with the wavelength (λ), the period (T) and the velocity (v) of the wave being related through the simple equations f = 1/T and λ = vT. So then we can write the de Broglie relations as: E = ħω and p =  ħk, with ħ = h/2π.]

In these equations, the Planck constant (be it h or ħ) appears as a simple factor of proportionality (we will worry about what h actually is in physics in later posts) – but a very tiny one: approximately 6.626×10–34 J·s (Joule is the standard SI unit to measure energy, or work: 1 J = 1 kg·m2/s2), or 4.136×10–15 eV·s when using a more appropriate (i.e. larger) measure of energy for atomic physics: still, 10–15 is only 0.000 000 000 000 001. So how does it work? First note, once again, that we are supposed to use the equivalent for slow-moving particles of Einstein’s famous E = mcequation as a measure of the energy of a particle: E = mv2. We know velocity adds mass to a particle – with mass being a measure for inertia. In fact, the mass of so-called massless particles,  like photons, is nothing but their energy (divided by c2). In other words, they do not have a rest mass, but they do have a relativistic mass m = E/c2, with E = hf (and with f the frequency of the light wave here). Particles, such as electrons, or protons, do have a rest mass, but then they don’t travel at the speed of light. So how does that work out in that E = mvformula which – let me emphasize this point once again – is not the standard formula (for kinetic energy) that we’re used to (i.e. E = mv2/2)? Let’s do the exercise.

For photons, we can re-write E = hf as E = hc/λ. The numerator hc in this expression is 4.136×10–15 eV·s (i.e. the value of the Planck constant h expressed in eV·s) multiplied with 2.998×108 m/s (i.e. the speed of light c) so that’s (more or less) hc ≈ 1.24×10–6 eV·m. For visible light, the denominator will range from 0.38 to 0.75 micrometer (1 μm = 10–6 m), i.e. 380 to 750 nanometer (1 nm = 10–6 m), and, hence, the energy of the photon will be in the range of 3.263 eV to 1.653 eV. So that’s only a few electronvolt (an electronvolt (eV) is, by definition, the amount of energy gained (or lost) by a single electron as it moves across an electric potential difference of one volt). So that’s 2.6 to 5.2 Joule (1 eV = 1.6×10–19 Joule) and, hence, the equivalent relativistic mass of these photons is E/cor 2.9 to 5.8×10–34 kg. That’s tiny – but not insignificant. Indeed, let’s look at an electron now.

The rest mass of an electron is about 9.1×10−31 kg (so that’s a scale factor of a thousand as compared to the values we found for the relativistic mass of photons). Also, in a hydrogen atom, it is expected to speed around the nucleus with a velocity of about 2.2×10m/s. That’s less than 1% of the speed of light but still quite fast obviously: at this speed (2,200 km per second), it could travel around the earth in less than 20 seconds (a photon does better: it travels not less than 7.5 times around the earth in one second). In any case, the electron’s energy – according to the formula to be used as input for calculating the de Broglie frequency – is 9.1×10−31 kg multiplied with the square of 2.2×106 m/s, and so that’s about 44×10–19 Joule or about 70 eV (1 eV = 1.6×10–19 Joule). So that’s – roughly – 35 times more than the energy associated with a photon.

The frequency we should associate with 70 eV can be calculated from E = hv/λ (we should, once again, use v instead of c), but we can also simplify and calculate directly from the mass: λ = hv/E = hv/mv2 = h/m(however, make sure you express h in J·s in this case): we get a value for λ equal to 0.33 nanometer, so that’s more than one thousand times shorter than the above-mentioned wavelengths for visible light. So, once again, we have a scale factor of about a thousand here. That’s reasonable, no? [There is a similar scale factor when moving to the next level: the mass of protons and neutrons is about 2000 times the mass of an electron.] Indeed, note that we would get a value of 0.510 MeV if we would apply the E = mc2, equation to the above-mentioned (rest) mass of the electron (in kg): MeV stands for mega-electronvolt, so 0.510 MeV is 510,000 eV. So that’s a few hundred thousand times the energy of a photon and, hence, it is obvious that we are not using the energy equivalent of an electron’s rest mass when using de Broglie’s equations. No. It’s just that simple but rather mysterious E = mvformula. So it’s not mcnor mv2/2 (kinetic energy). Food for thought, isn’t it? Let’s look at the formulas once again.

They can easily be linked: we can re-write the frequency formula as λ = hv/E = hv/mv2 = h/mand then, using the general definition of momentum (p = mv), we get the second de Broglie equation: p = h/λ. In fact, de Broglie‘s rather particular definition of the energy of a particle (E = mv2) makes v a simple factor of proportionality between the energy and the momentum of a particle: v = E/p or E = pv. [We can also get this result in another way: we have h = E/f = pλ and, hence, E/p = fλ = v.]

Again, this is serious food for thought: I have not seen any ‘easy’ explanation of this relation so far. To appreciate its peculiarity, just compare it to the usual relations relating energy and momentum: E =p2/2m or, in its relativistic form, p2c2 = E2 – m02c4 . So these two equations are both not to be used when going from one de Broglie relation to another. [Of course, it works for massless photons: using the relativistic form, we get p2c2 = E2 – 0 or E = pc, and the de Broglie relation becomes the Planck relation: E = hf (with f the frequency of the photon, i.e. the light beam it is part of). We also have p = h/λ = hf/c, and, hence, the E/p = c comes naturally. But that’s not the case for (slower-moving) particles with some rest mass: why should we use mv2 as a energy measure for them, rather than the kinetic energy formula?

But let’s just accept this weirdness and move on. After all, perhaps there is some mistake here and so, perhaps, we should just accept that factor 2 and replace λ = h/p by λ = 2h/p. Why not? 🙂 In any case, both the λ = h/mv and λ = 2h/p = 2h/mv expressions give the impression that both the mass of a particle as well as its velocity are on a par so to say when it comes to determining the numerical value of the de Broglie wavelength: if we double the speed, or the mass, the wavelength gets shortened by half. So, one would think that larger masses can only be associated with extremely short de Broglie wavelengths if they move at a fairly considerable speed. But that’s where the extremely small value of h changes the arithmetic we would expect to see. Indeed, things work different at the quantum scale, and it’s the tiny value of h that is at the core of this. Indeed, it’s often referred to as the ‘smallest constant’ in physics, and so here’s the place where we should probably say a bit more about what h really stands for.

Planck’s constant h describes the tiny discrete packets in which Nature packs energy: one cannot find any smaller ‘boxes’. As such, it’s referred to as the ‘quantum of action’. But, surely, you’ll immediately say that it’s cousin, ħ = h/2π, is actually smaller. Well… Yes. You’re actually right: ħ = h/2π is actually smaller. It’s the so-called quantum of angular momentum, also (and probably better) known as spin. Angular momentum is a measure of… Well… Let’s call it the ‘amount of rotation’ an object has, taking into account its mass, shape and speed. Just like p, it’s a vector. To be precise, it’s the product of a body’s so-called rotational inertia (so that’s similar to the mass m in p = mv) and its rotational velocity (so that’s like v, but it’s ‘angular’ velocity), so we can write L = Iω but we’ll not go in any more detail here. The point to note is that angular momentum, or spin as it’s known in quantum mechanics, also comes in discrete packets, and these packets are multiples of ħ. [OK. I am simplifying here but the idea or principle that I am explaining here is entirely correct.]

But let’s get back to the de Broglie wavelength now. As mentioned above, one would think that larger masses can only be associated with extremely short de Broglie wavelengths if they move at a fairly considerable speed. Well… It turns out that the extremely small value of h upsets our everyday arithmetic. Indeed, because of the extremely small value of h as compared to the objects we are used to ( in one grain of salt alone, we will find about 1.2×1018 atoms – just write a 1 with 18 zeroes behind and you’ll appreciate this immense numbers somewhat more), it turns out that speed does not matter all that much – at least not in the range we are used to. For example, the de Broglie wavelength associated with a baseball weighing 145 grams and traveling at 90 mph (i.e. approximately 40 m/s) would be 1.1×10–34 m. That’s immeasurably small indeed – literally immeasurably small: not only technically but also theoretically because, at this scale (i.e. the so-called Planck scale), the concepts of size and distance break down as a result of the Uncertainty Principle. But, surely, you’ll think we can improve on this if we’d just be looking at a baseball traveling much slower. Well… It does not much get better for a baseball traveling at a snail’s pace – let’s say 1 cm per hour, i.e. 2.7×10–6 m/s. Indeed, we get a wavelength of 17×10–28 m, which is still nowhere near the nanometer range we found for electrons.  Just to give an idea: the resolving power of the best electron microscope is about 50 picometer (1 pm = ×10–12 m) and so that’s the size of a small atom (the size of an atom ranges between 30 and 300 pm). In short, for all practical purposes, the de Broglie wavelength of the objects we are used to does not matter – and then I mean it does not matter at all. And so that’s why quantum-mechanical phenomena are only relevant at the atomic scale.

Ordinary Differential Equations (II)

Pre-scriptum (dated 26 June 2020): In pre-scriptums for my previous posts on math, I wrote that the material in posts like this remains interesting but that one, strictly speaking, does not need it to understand quantum mechanics. This post is a little bit different: one has to understand the basic concept of a differential equation as well as the basic solution methods. So, yes, it is a prerequisite. :-/

Original post:

According to the ‘What’s Physics All About?’ title in Usborne Children’s Books series, physics is all about ‘discovering why things fall to the ground, how sound travels through walls and how many wonderful inventions exist thanks to physics.’

The Encyclopædia Britannica rephrases that definition of physics somewhat and identifies physics with ‘the science that deals with the structure of matter and the interactions between the fundamental constituents of the observable universe.’

[…]

Now, if I would have to define physics at this very moment, I’d say that physics is all about solving differential equations and complex integration. Let’s be honest: is there any page in any physics textbook that does not have any ∫ or ∂ symbols on it?

When everything is said and done, I guess that’s the Big Lie behind all these popular books, including Penrose’s Road to Reality. You need to learn how to write and speak in the language of physics to appreciate them and, for all practical purposes, the language of physics is math. Period.

I am also painfully aware of the fact that the type of differential equations I had to study as a student in economics (even at the graduate or Master’s level) are just a tiny fraction of what’s out there. The variety of differential equations that can be solved is truly intimidating and, because each and every type comes with its own step-by-step methodology, it’s not easy to remember what needs to be done.

Worse, I actually find it quite difficult to remember what ‘type’ this or that equation actually is. In addition, one often needs to reduce or rationalize the equation or – more complicated – substitute variables to get the equation in a form which can then be used to apply a certain method. To top if all off, there’s also this intimidating fact that – despite all these mathematical acrobatics – the vast majority of differential equations can actually not be solved analytically. Hence, in order to penetrate that area of darkness, one has to resort to numerical approaches, which I have yet to learn (the oldest of such numerical methods was apparently invented by the great Leonhard Euler, an 18th century mathematician and physicist from Switzerland).

So where am I actually in this mathematical Wonderland?

I’ve looked at ordinary differential equations only so far, i.e. equations involving one dependent variable (usually written as y) and one independent variable (usually written as x or t), and at equations of the first order only. So that means that (a) we don’t have any ∂ symbols in these differential equations (let me use the DE abbreviation from now on) but just the differential symbol d (so that’s what makes them ordinary DEs, as opposed to partial DEs), and that (b) the highest-order derivative in them is the first derivative only (i.e. y’ = dy/dx). Hence, the only ‘lower-order derivative’ is the function y itself (remember that there’s this somewhat odd mathematical ‘convention’ identifying a function with the zeroth derivative of itself).

Such first-order DEs will usually not be linear things and, even if they look like linear things, don’t jump to conclusions because the term linear (first-order) differential equation is very specific: it means that the (first) derivative and the function itself appear in a linear combination. To be more specific, the term linear differential equation (for the first-order case) is reserved for DEs of the form

a1(t) y'(t) + a0(t) y(t) = q(t).

So, besides y(t) and y'(t) – whose functional form we don’t know because (don’t forget!) finding y(t) is the objective of solving these DEs 🙂 – we have three other random functions of the independent variable t here, namely  a1(t), a0(t) and q(t). Now, these functions may or may not be linear functions of t (they’re probably not) but that doesn’t matter: the important thing – to qualify as ‘linear’ – is that (1) y(t) and y'(t), i.e. the dependent variable and its derivative, appear in a linear combination and have these ‘coefficients’ a1(t) and a0(t) (which, I repeat, may be constants but, more likely, will probably be functions of t themselves), and (2) that, on the other side of the equation, we’ve got this q(t) function, which also may or – more likely – may not be a constant.

Are you still with me? [If not, read again. :-)]

This type of equation – of which the example in my previous post was a specimen – can be solved by introducing a so-called integrating factor. Now, I won’t explain that here – not because the explanation is too easy (it’s not), but because it’s pretty standard and, much more importantly, because it’s too lengthy to copy here. [If you’d be looking for an ‘easy’ explanation, I’d recommend Paul’s Online Math Notes once again.]

So I’ll continue with my ‘typology’ of first-order DEs. However, I’ll do so only after noting that, before letting that integrating factor loose (OK, let me say something about it: in essence, the integrating factor is some function λ(x) which we’ll multiply with the whole equation and which, because of a clever choice of λ(x) obviously, helps us to solve the equation), you’ll have to rewrite these linear first-order DEs as y'(t) + (a0(t)/a1(t)) y(t) = q(t)/a1(t) (so just divide both sides by this a1(t) function) or, using the more prevalent notation x for the independent variable (instead of t) and equating a0(x)/a1(x) with F(x) and q(x)/a1(x) with G(x), as:

dy/dx + F(x) y = G(x), or y‘ + F(x) y = G(x)

So, that’s one ‘type’ of first-order differential equations: linear DEs. [We’re only dealing with first-order DEs here but let me note that the general form of a linear DE of the nth order is an(x) y(n) + an-1(x) y(n-1) + … + a1(x) y’ + a0(x) y = q(x), and that most standard texts on higher-order DEs focus on linear DEs only, so they are important – even if they are only a tiny fraction of the DE universe.]

The second major ‘exam-type’ of DEs which you’ll encounter is the category of so-called separable DEs. Separable (first-order) differential equations are equations of the form:

P(xdx + Q(ydy = 0, which can also be written as G(y) y‘ = F(x)

or dy/dx = F(x)/G(y)

The notion of ‘separable’ refers to the fact that we can neatly separate out the terms involving y and x respectively, in order to then bring them on the left- and right-hand side of the equation respectively (cf. the G(yy‘ = F(x) form), which is what we’ll need to do to solve the equation.

I’ve been rather vague on that ‘integrating factor’ we use to solve linear equations – for the obvious reason that it’s not all that simple – but, in contrast, solving separable equations is very straightforward. We don’t need to use an integrating factor or substitute something. We actually don’t need any mathematical acrobatics here at all! We can just ‘separate’ the variables indeed and integrate both sides.

Indeed, if we write the equation as G(y)y’ = G(y)[dy/dx] = F(x), we can integrate both sides over xbut use the fact that ∫G(y)[dy/dx]dx = ∫G(y)dy. So the equation becomes ∫G(y)dy = ∫F(x)dx, and so we’re actually integrating a function of y over y on the left-hand side, and the other function (of x), on the right-hand side, over x. We then get an implicit function with y and x as variables and, usually, we can solve that implicit equation and find y in terms of x (i.e. we can solve the implicit equation for y(x) – which is the solution for our problem). [I do say ‘usually’ here. That means: not always. In fact, for most implicit functions, there’s no formula which defines them explicitly. But that’s OK and I won’t dwell on that.]

So that’s what meant with ‘separation’ of variables: we put all the things with y on one side, and all the things with x on the other, and then we integrate both sides. Sort of. 🙂

OK. You’re with me. In fact, you’re ahead of me and you’ll say: Hey! Hold it! P(x)dx + Q(y)dy is a linear combination as well, isn’t it? So we can look at this as a linear DE as well, isn’t it? And so why wouldn’t we use the other method – the one with that factor thing?

Well… No. Go back and read again. We’ve got a linear combination of the differentials dx and dy here, but so that’s obviously not a linear combination of the derivative y’ and the function y. In addition, the coefficient in front of dy is a function in y, i.e. a function of the dependent variable, not a function in x, so it’s not like these an(x) coefficients which we would need to see in order to qualify the DE as a linear one. So it’s not linear. It’s separable. Period.

[…] Oh. I see. But are these non-linear things allowed really?

Of course. Linear differential equations are only a tiny little fraction of the DE universe: first, we can have these ‘coefficients’, which can be – and usually will be – a function of both x and y, and then, secondly, the various terms in the DE do not need to constitute a nice linear combination. In short, most DEs are not linear – in the context-specific definitional sense of the word ‘linear’ that is (sorry for my poor English). 🙂

[…] OK. Got it. Please carry on.

That brings us to the third type of first-order DEs: these are the so-called exact DEs. Exact DEs have the same ‘shape’ as separable equations but the ‘coefficients’ of the dx and dy terms are a function of both x and y indeed. In other words, we can write them as:

P(x, y) dx + Q(x, y) dy = 0, or as A(x, y) dx + B(x, y) dy = 0,

or, as you will also see it, dy/dx = M(x, y)/N(x, y) (use whatever letter you want).

However, in order to solve this type of equation, an additional condition will need to be fulfilled, and that is that ∂P/∂y = ∂Q/∂x (or ∂A/∂y = ∂B/∂x if you use the other representation). Indeed, if that condition is fulfilled – which you have to verify by checking these derivatives for the case at hand – then this equation is a so-called exact equation and, then… Well… Then we can find some function U(x, y) of which P(x, y) and Q(x, y) are the partial derivatives, so we’ll have that ∂U(x, y)/∂x = P(x, y) and ∂U(x, y)/∂y = Q(x, y). [As for that condition we need to impose, that’s quite logical if you write down the second-order cross-partials, ∂P(x, y)/∂y and ∂Q(x, y)/∂x and remember that such cross-partials are equal to each other, i.e. Uxy = Uyx.]

We can then find U(x, y), of course, by integrating P or Q. And then we just write that dU = P(x, y) dx + Q(x, y) dy = Ux dx + Uy dy = 0 and, because we’ve got the functional form of U, we’ll get, once again, an implicit function in y and x, which we may or may not be able to solve for y(x).

Are you still with me? [If not, read again. :-)]

So, we’ve got three different types of first-order DEs here: linear, separable, and exact. Are there any other types? Well… Yes.

Yes of course! Just write down any random equation with a first-order derivative in it – don’t think: just do it – and then look at what you’ve jotted down and compare its form with the form of the equations above: the probability that it will not fit into any of the three mentioned categories is ‘rather high’, as the Brits would say – euphemistically. 🙂

That being said, it’s also quite probable that a good substitution of the variable could make it ‘fit’. In addition, we have not exhausted our typology of first-order DEs as yet and, hence, we’ve not exhausted our repertoire of methods to solve them either.

For example, if we would find that the conditions for exactness for the equation P(x, y) dx + Q(x, y) dy = 0 are not fulfilled, we could still solve that equation if another condition would turn out to be true: if the functions P(x, y) and Q(x, y) would happen to be homogeneous, i.e. P(x, y) and Q(x, y) would both happen to satisfy the equality P(ax, ay) = ar P(x, y) and Q(ax, ay) = ar Q(x, y) (i.e. they are both homogeneous functions of degree r), then we can use the substitution v(x) = y/x (i.e. y = vx) and transform the equation into a separable one, which we can then solve for v.

Indeed, the substitution yields dv/dx = [F(v)-v]/x, and so that’s nicely separable. We can then find y, after we’ve solved the equation, by substituting v for y/x again. I’ll refer to the Wikipedia article on homogeneous functions for the proof that, if P(x, y) and Q(x, y) are homogeneous indeed, we can write the differential equation as:

dy/dx = M(x, y)/N(x, y) = F(y/x) or, in short, y’ = F(y/x)

[…]

Hmm… OK. What’s next? That condition of homogeneity which we are imposing here is quite restrictive too, isn’t it?

It is: the vast majority of M(x, y) and N(x, y) functions will not be homogeneous and so then we’re stuck once again. But don’t worry, the mathematician’s repertoire of substitutions is vast, and so there’s plenty of other stuff out there which we can try – if we’d remember it at least 🙂 .

Indeed, another nice example of a type of equation which can be made separable through the use of a substitution are equations of the form y’ = G(ax + by), which can be rewritten as a separable equation by substituting ax + by for v. If we do this substitution, we can then rewrite the equation – after some re-arranging of the terms at least – as dv/dx = a + b G(v), and so that’s, once again, an equation which is separable and, hence, solvable. Tick! 🙂

Finally, we can also solve DEs which come in the form of a so-called Bernoulli equation through another clever substitution. A Bernoulli equation is a non-linear differential equation in the form:

y’ + F(x) y = G(x) yn

The problem here is, obviously, that exponent n in the right-hand side of the equation (i.e. the exponent of y), which makes the equation very non-linear indeed. However, it turns out that, if one substitutes y for v = y1-n, we are back at the linear situation and so we can then use the method for the linear case (i.e. the use of an integrating factor). [If you want to try this without consulting a math textbook, then don’t forget that v’ will be equal to v’ = (1-n)y-ny’ (so y-ny’ = v’/(1-n), and also that you’ll need to rewrite the equation as y-ny’ + f(x) y1-n = g(x) before doing that substitution. Of course, also remember that, after the substitution, you’ll still have to solve the linear equation, so then you need to know how to use that integrating factor. Good luck! :-)]

OK. I understand you’ve had enough by now. So what’s next? Well, frankly, this is not so bad as far as first-order differential equations go. I actually covered a lot of terrain here, although Mathews and Walker go much and much further (so don’t worry: I know what to do in the days ahead!).

The thing now is to get good at solving these things, and to understand how to model physical systems using such equations. But so that’s something which is supposed to be fun: it should be all about “discovering why things fall to the ground, how sound travels through walls and how many wonderful inventions exist thanks to physics” indeed.

Too bad that, in order to do that, one has to do quite some detour!

Post Scriptum: The term ‘homogeneous’ is quite confusing: there is also the concept of linear homogeneous differential equations and it’s not the same thing as a homogeneous first-order differential equation. I find it one of the most striking examples of how the same word can mean entirely different things even in mathematics. What’s the difference?

Well… A homogeneous first-order DE is actually not linear. See above: a homogeneous first-order DE is an equation in the form dy/dx = M(x, y)/N(x, y). In addition, there’s another requirement, which is as important as the form of the DE, and that is that M(x, y) and N(x, y) should be homogeneous functions, i.e. they should have that F(ax, ay) = ar F(x, y) property. In contrast, a linear homogeneous DE is, in the first place, a linear DE, so it’s general form must be L(y) = an(x) y(n) + an-1(x) y(n-1) + … + a1(x) y’ + a0(x) y = q(x) (so L(y) must be a linear combination whose terms have coefficients which may be constants but, more often than not, will be functions of the variable x). In addition, it must be homogeneous, and this means – in this context at least – that q(x) is equal to zero (so q(x) is equal to the constant 0). So we’ve got L(y) = 0 or, if we’d use the y’ + F(x) y = G(x) formulation, we have y’ + F(xy = 0 (so that G(x) function in the more general form of a linear first-order DE is equal to zero).

So is this yet another type of differential equation? No. A linear homogeneous DE is, in the first place, linear, 🙂 so we can solve it with that method I mentioned above already, i.e. we should introduce an integrating factor. An integrating factor is a new function λ(x), which helps us – after we’ve multiplied the whole equation with this λ(x) – to solve the equation. However, while the procedure is not difficult at all, its explanation is rather lengthy and, hence, I’ll skip that and just refer my imaginary readers here to the Web.

But, now that we’re here, let me quickly complete my typology of first-order DEs and introduce a generalization of the (first) notion of homogeneity, and that’s isobaric differential equations.

An isobaric DE is an equation which has the same general form as the homogeneous (first-order) DE, so an isobaric DE looks like dy/dx = F(x, y), but we have a more general condition than homogeneity applying to F(x, y), namely the property of isobarity (which is another word with multiple meanings but let us not be bothered by that). An isobaric function F(x, y) satisfies the following equality: F(ax, ary) = ar-1F(x, y), and it can be shown that the isobaric differential equation dy/dx = F(x, y), i.e. a DE of this form with F(x, y) being isobaric, becomes separable when using the y = vxr substitution.

OK. You’ll say: So what? Well… Nothing much I guess. 🙂

Let me wrap up by noting that we also have the so-called Clairaut equations as yet another type of first-order DEs. Clairaut equations are first-order DEs in the form y – xy’ = F(y’). When we differentiate both sides, we get y”(F'(y’) + x) = 0.

Now, this equation holds if (i) y” = 0 or (ii) F'(y’) + x = 0 (or both obviously). Solving (i), so solving for y” = 0, yields a family of (infinitely many) straight-line functions y = ax + b as the general solution, while solving (ii) yields only one solution, the so-called singular solution, whose graph is the envelope of the graphs of the general solution. The graph below shows these solutions for the square and cube functional forms respectively (so the solutions for y – xy’ = [y’]2 and y – xy’ = [y’]3 respectively).

Clairaut f(t)=t^2Clairaut equation f(t)=t^3

For the F(y’) = [y’]functional form, you have a parabola (i.e. the graph of a quadratic function indeed) as the envelope of all of the straight lines. As for the F(y’) = [y’]function, well… I am not sure. It reminds me of those plastic French curves we used as little kids to make all kinds of silly drawings. It also reminds me of those drawings we had to make in high school on engineering graph paper using an expensive 0.1 or 0.05 mm pen. 🙂

In any case, we’ve got quite a collection of first-order DEs now – linear, separable, exact, homogeneous, Bernouilli-type, isobaric, Clairaut-type, … – and so I think I should really stop now. Remember I haven’t started talking about higher-order DEs (e.g. second-order DEs) as yet, and I haven’t talked about partial differential equations either, and so you can imagine that the universe of differential equations is much and much larger than what this brief overview here suggests. Expect much more to come as I’ll dig into it!

Post Scriptum 2: There is a second thing I wanted to jot down somewhere, and this post may be the appropriate place. Let me ask you something: have you never wondered why the same long S symbol (i.e. the summation or integration symbol ∫) is used to denote both definite and indefinite integrals? I did. I mean the following: when we write ∫f(x)dx or ∫[a, b] f(x)dx, we refer to two very different things, don’t we? Things that, at first sight, have nothing to do with each other.

Huh? 

Well… Think about it. When we write ∫f(x)dx, then we actually refer to infinitely many functions F1(x), F2(x), F3(x), etcetera (we generally write them as F(x) + c, because they differ by a constant only) which all belong to the same ‘family’ because they all have the same derivative, namely that function f(x) in the integrand. So we have F1‘(x) = F2‘(x) = F3‘(x) = … = F'(x) = f(x). The graphs of these functions cover the whole plane, and we can say all kinds of things about them, but it is not obvious that these functions can be related to some sum, finite or infinite. Indeed, when we look for those functions by solving, for example, an integral such as ∫(xe6x+x5/3+√x)dx, we use a lot of rules and various properties of functions (this one will involve integration by parts for example) but nothing of that reminds us, not even remotely, of doing some kind of finite or infinite sum.

On the other hand, ∫[a, b] f(x)dx, i.e. the definite integral of f(x) over the interval [a, b], yields a real number with a very specific meaning: it’s the area between point a and point b under the graph y = f(x), and the long S symbol (i.e. the summation symbol ∫) is particularly appropriate because the expression ∫[a, b] f(x)dx stands for an infinite sum indeed. That’s why Leibniz chose the symbol back in 1675!

Let me give an example here. Let x be the distance which an object has traveled since we started observing it. Now, that distance is equal to an infinite sum which we can write as ∑v(t)Δt, . What we do here amounts to multiplying the speed v at time t, i.e. v(t), with (the length of) the time interval Δt over an infinite number of little time intervals, and then we sum all those products to get the total distance. If we use the differential notation (d) for infinitesimally small quantities (dv, dx, dt etcetera), then this distance x will be equal to the sum of all little distances dx = v(t)dt. So we have an infinite sum indeed which, using the long S (i.e. Leibniz’s summation symbol), we can write as ∑v(t)dt  = ∑dx = ∫[0, t]v(t)dt  = ∫[0, t]dx = x(t).

The illustration below gives an idea of how this works. The black curve is the v(t) function, so velocity (vertical axis) as a function of time (horizontal axis). Don’t worry about the function going negative: negative velocity would mean that we allow our object to reverse direction. As you can see, the value of v(t) is the (approximate) height of each of these rectangles (note that we take irregular partitions here, but that doesn’t matter), and then just imagine that the time intervals Δt (i.e. the width of the rectangular areas) become smaller and smaller – infinitesimally small in fact.

600px-Integral_Riemann_sum

I guess I don’t need to be more explicit here. The point is that we have such infinite sum interpretation for the definite integral only, not for an indefinite one. So why would we use the same summation symbol ∫ for the indefinite integral? Why wouldn’t we use some other symbol for it (because it is something else, isn’t it?)? Or, if we wouldn’t want to introduce any new symbols (because we’ve got quite a bunch already here), then why wouldn’t we combine the common inverse function symbol (i.e. f-1) and the differentiation operator DDx or d/dx, so we would write D-1f(x) or Dx-1 instead of ∫f(x)dx? If we would do that, we would write the Fundamental Theorem of Calculus, which you obviously know (as you need it to solve definite integrals), as:

Capture

You have seen this formula, haven’t you? Except for the D-1f(x) notation of course. This Theorem tells us that, to solve the definite integral on the left-hand side, we should just (i) take an antiderivative of f(x) (and it really doesn’t matter which one because the constant c will appear two times in the F(b) – F(a) equation,  as c — c = 0 to be precise, and, hence, this constant just vanishes, regardless of its value), (ii) plug in the values a and b, (iii) subtract one from the other (i.e. F(a) from F(b), not the other way around—otherwise we’ll have the sign of the integral wrong), and there we are: we’ve got the answer—for our definite integral that is.

But so I am not using the standard ∫ symbol for the antiderivative above. I am using… well… a new symbol, D-1, which, in my view, makes it clear what we have to do, and that is to find an antiderivative of f(x) so we can solve that definite integral. [Note that, if we’d want to keep track of what variable we’re integrating over (in case we’d be dealing with partial differential equations for instance, or if it would not be sufficiently clear from the context), we should use the Dx-1 notation, rather than just D.]

OK. You may think this is hairsplitting. What’s in a name after all? Or in a symbol in this case? Well… In math, you need to make sure that your notations make perfect sense and that you don’t write things that may be confusing.

That being said, there’s actually a very good reason to re-use the long S symbol for indefinite integrals also.

Huh? Why? You just said the definite and indefinite integral are two very different things and so that’s why you’d rather see that new D-1f(x) notation instead of ∫f(x)dx !? 

Well… Yes and no. You may or may not remember from your high school course in calculus or analysis that, in order to get to that fundamental theorem of calculus, we need the following ‘intermediate’ result: IF we define a function F(x) in some interval [a, b] as F(x) = ∫[a, xf(t)dt (so a ≤ x ≤ b and a ≤ t ≤ x) — so, in other words, we’ve got a definite integral here with some fixed value a as the lower boundary but with the variable x itself as the upper boundary (so we have x instead of the fixed value b, and b now only serves as the upper limit of the interval over which we’re defining this new function F(x) here) — THEN it’s easy to show that the derivative of this F(x) function will be equal to f(x), so we’ll find that F'(x) = f(x).

In other words, F(x) = ∫[a, xf(t)dt is, quite obviously, one of the (infinitely many) antiderivatives of f(x), and if you’d wonder which one, well… That obviously depends on the value of a that we’d be picking. So there actually is a pretty straightforward relationship between the definite and indefinite integral: we can find an antiderivative F(x) + c of a function f(x) by evaluating a definite integral from some fixed point a to the variable x itself, as illustrated below.

Relation between definite and indefinite integral

Now, remember that we just need one antiderivative to solve a definite integral, not the whole family, and which one we’ll get will depend on that value a (or x0as that fixed point is being referred to in the formula used the illustration above), so it will depend on what choice we make there for the lower boundary. Indeed, you can work that out for yourself by just solving ∫[x0xf(t)dt for two different values of x0 (i.e. a and b in the example below):

Capture

The point is that we can get all of the antiderivatives of f(x) through that definite integral: it just depends on a judicious choice of x0 but so you’ll get the same family of functions F(x) + c. Hence, it is logical to use the same summation symbol, but with no bounds mentioned, to designate the whole family of antiderivatives. So, writing the Fundamental Theorem of Calculus as

Capture

instead of that alternative with the D-1f(x) notation does make sense. 🙂

Let me wrap up this conversation by noting that the above-mentioned ‘intermediate’ result (I mean F(x) = ∫[a, xf(t)dt with F'(x) = f(x) here) is actually not ‘intermediate’ at all: it is equivalent to the fundamental theorem of calculus itself (indeed, the author of the Wikipedia article of the fundamental theorem of calculus presents the expression above as a ‘corollary’ to the F(x) = ∫[a, xf(t)dt result, which he or she presents as the theorem itself). So, if you’ve been able to prove the ‘intermediate’ result, you’ve also proved the theorem itself. One can easily see that by verifying the identities below:

Capture

Huh? Is this legal? It is. Just jot down a graph with some function f(t) and the values a, x and b, and you’ll see it all makes sense. 🙂

An easy piece: Ordinary Differential Equations (I)

Pre-scriptum (dated 26 June 2020): In pre-scriptums for my previous posts on math, I wrote that the material in posts like this remains interesting but that one, strictly speaking, does not need it to understand quantum mechanics. This post is a little bit different: one has to understand the basic concept of a differential equation as well as the basic solution methods. So, yes, it is a prerequisite. :-/

Original post:

Although Richard Feynman’s iconic Lectures on Physics are best read together, as an integrated textbook that is, smart publishers bundled some of the lectures in two separate publications: Six Easy Pieces and Six Not-So-Easy Pieces. Well… Reading Penrose has been quite exhausting so far and, hence, I feel like doing an easy piece here – just for a change. 🙂

In addition, I am half-way through this graduate-level course on Complex variables and Applications (from McGraw-Hill’s Brown—Churchill Series) but I feel that I will gain much more from the remaining chapters (which are focused on applications) if I’d just branch off for a while and first go through another classic graduate-level course dealing with math, but perhaps with some more emphasis on physics. A quick check reveals that Mathematical Methods of Physics, written by Jon Mathews and R.L. Walker will probably fit the bill. This textbook is used it as a graduate course at the University of Chicago and, in addition, Mathews and Walker were colleagues of Feynman and, hence, their course should dovetail nicely with Feynman’s Lectures: that’s why I bought it when I saw this 2004 reprint for the Indian subcontinent in a bookshop in Delhi. [As for Feynman’s Lectures, I wouldn’t recommend these Lectures if you want to know more about quantum mechanics, but for classical mechanics and electromagnetism/electrodynamics they’re still great.]

So here we go: Chapter 1, on Differential Equations.

Of course, I mean ordinary differential equations, so things with one dependent and one independent variable only, as opposed to partial differential equations, which have partial derivatives (i.e. terms with δ symbols in them, as opposed to the used in dy and dy) because there’s more than one independent variable. We’ll need to get into partial differential equations soon enough, if only because wave equations are partial differential equations, but let’s start with the start.

While I thought I knew a thing or two about differential equations from my graduate-level courses in economics, I’ve discovered many new things already. One of them is the concept of a slope field, or a direction field. Below the examples I took from Paul’s Online Notes in Mathematics (http://tutorial.math.lamar.edu/Classes/DE/DirectionFields.aspx), who’s a source I warmly recommend (his full name is Paul Dawkins, and he developed these notes for Lamar University, Texas):

Direction field 1  Direction field 3Direction field 2

These things are great: they helped me to understand what a differential equation actually is. So what is it then? Well, let’s take the example of the first graph. That example models the following situation: we have a falling object with mass m (so the force of gravity acts on it) but its fall gets slowed down because of air resistance. So we have two forces FG and Facting on the object, as depicted below:

Forces on m

Now, the force of gravity is proportional to the mass m of the falling object, with the factor of proportionality equal to the gravitational constant of course. So we have FG = mg with g = 9.8 m/s2. [Note that forces are measured in newtons and 1 N = 1 (kg)(m)/(s2).] 

The force due to air resistance has a negative sign because it acts like a brake and, hence, it has the opposite direction of the gravity force. The example assumes that it is proportional to the velocity v of the object, which seems reasonable enough: if it goes faster and faster, the air will slow it down more and more so we have FA = —γv, with v = v(t) the velocity of the object and γ some (positive) constant representing the factor of proportionality for this force. [In fact, the force due to air resistance is usually referred to as the drag, and it is proportional to the square of the velocity, but so let’s keep it simple here.]

Now, when things are being modeled like this, I find the thing that is most difficult is to keep track of what depends on what exactly. For example, it is obvious that, in this example, the total force on the object will also depend on the velocity and so we have a force here which we should write as a function of both time and velocity. Newton’s Law of Motion (the Second Law to be precise, i.e. ma = m(dv/dt) =F) thus becomes

m(dv/dt) = F(t, v) = mg – γv(t).

Note the peculiarity of this F(t, v) function: in the end, we will want to write v(t) as an explicit function of t, but so here we write F as a function with two separate arguments t and v. So what depends on what here? What does this equation represent really?

Well… The equation does surely not represent one or the other implicit function: an implicit function, such as x2 + y2 = 1 for example (i.e. the unit circle), is still a function: it associates one of the variables (usually referred to as the value) to the other variables (the arguments). But, surely, we have that too here? No. If anything, a differential equation represents a family of functions, just like an indefinite integral.

Indeed, you’ll remember that an indefinite integral ∫f(x)dx represents all functions F(x) for which F'(x) = dF(x)/dx = f(x). These functions are, for a very obvious reason, referred to as the anti-derivatives of f(x) and it turns out that all these antiderivatives differ from each other by a constant only, so we can write ∫f(x)dx = F(x) + c, and so the graphs of all the antiderivatives of a given function are, quite simply, vertical translations of each other, i.e. their vertical location depends on the value of c. I don’t want to anticipate too much, but so we’ll have something similar here, except that our ‘constant’ will usually appear in a somewhat more complicated format such as, in this example, as v(t) = 50 + ce—0.196t. So we also have a family of primitive functions v(t) here, which differ from each other by the constant c (and, hence, are ‘indefinite’ so to say), but so when we would graph this particular family of functions, their vertical distance will not only depend on c but also on t. But let us not run before we can walk.  

The thing to note – and to always remember when you’re looking at a differential equation – is that the equation itself represents a world of possibilities, or parallel universes if you want :-), but, also, that’s it in only one of them that things are actually happening. That’s why differential equations usually have an infinite number of general (or possible) solutions but only one of these will be the actual solution, and which one that is will depend on the initial conditions, i.e. where we actually start from: is the object at rest when we start looking, is it in equilibrium, or is it somewhere in-between?

What we know for sure is that, at any one point of time t, this object can only have one velocity, and, because it’s also pretty obvious that, in the real world, t is the independent variable and v the dependent one (the velocity of our object does not change time), we can thus write v = v(t) = du/dt indeed. [The variable u = u(t) is the vertical position of the object and its velocity is, obviously, the rate of change of this vertical position, i.e. the derivative with regard to time.]

So that’s the first thing you should note about these direction fields: we’re trying to understand what is going on with these graphs and so we identify the dependent variable with the y axis and the independent variable with the x axis, in line with the general convention that such graphs will usually depict a y = y(x) relationship. In this case, we’re interested in the velocity of the object (not its position), and so v = v(t) is the variable on the y axis of that first graph.

Now, there’s a world of possibilities out there indeed, but let’s suppose we start watching when the object is at rest, i.e. we have v(t) = v(0) = 0 and so that’s depicted by the origin point. Let’s also make it all more real by assigning the values m = 2 kg and γ = 0.392 to m an γ in Newton’s formula. [In case you wonder where this value for γ comes from, note that its value is 1/25 of the gravitational constant and so it’s just a number to make sure the solution for v(t) is a ‘nice’ number, i.e. an integer instead of some decimal. In any case, I am taking this example from Paul’s Online Notes and I won’t try to change it.]

So we start at point zero with zero velocity but so now we’ve got the force F with us. 🙂 Hence, the object’s velocity v(t) will not stay zero. As the clock ticks, its movement will respect Newton’s Law, i.e. m(dv/dt) = F(t, v), which is m(dv/dt) = mg – γv(t) in this case. Now, if we plug in the above-mentioned values for m and γ (as well as the 9.8 approximation for g), we get dv(t)/dt = 9.8 – 0.196v(t) (we brought m over to the other side, and so then it becomes 1/m on the right-hand side).

Now, let’s insert some values into these equation. Let’s first take the value v(0) = 0, i.e. our point of departure. We obviously get d(v(0)/dt = 9.8 – 0.196.0 = 9.8 (so that’s close to 10 but not quite).

Let’s take another value for v(0). If v(0) would be equal to 30 m/s (this means that the object is already moving at a speed of 30 m/s when we start watching), then we’d get a value for dv/dt of 3.92, which is much less – but so that reflects the fact that, at such speed, air resistance is counteracting gravity.

Let’s take yet another value for v(0). Let’s take 100 now for example: we get dv/dt = – 9.8.

Ooops! What’s that? Minus 9.8? A negative value for dv/dt? Yes. It indicates that, at such high speed, air resistance is actually slowing down the object. [Of course, if that’s the case, then you may wonder how it got to go so fast in the first place but so that’s none of our own business: maybe it’s an object that got launched up into the air instead of something that was dropped out of an airplane. Note that a speed of 100 m/s is 360 km/h so we’re not talking any supersonic launch speeds here.]

OK. Enough of that kids’ stuff now. What’s the point?

Well, it’s these values for dv/dt (so these values of 9.8, 3.92, -9.8 etcetera) that we use for that direction field, or slope field as it’s often referred to. Note that we’re currently considering the world of possibilities, not the actual world so to say, and so we are contemplating any possible combination of v and t really.

Also note that, in this particular example that is, it’s only the value of v that determines the value of dv/dt, not the value of t. So, if, at some other point in time (e.g. t = 3), we’d be imagining the same velocities for our object, i.e. 0 m/s, 30 m/s or 100 m/s, we’d get the same values 9.8, 3.92 and -9.8 for dv/dt. So the little red arrows which represent the direction field all have the same magnitude and the same direction for equal values of v(t). [That’s also the case in the second graph above, but not for the third graph, which presents a far more general case: think of a changing electromagnetic field for instance. A second footnote to be made here concerns the length – or magnitude – of these arrows: they obviously depend on the scale we’re using but so they do reflect the values for dv/dt we calculated.]

So that slope field, or direction field, i.e. all of these little red arrows, represents the fact that the world of possibilities, or all parallel universes which may exist out there, have one thing in common: they all need to respect Newton or, at the very least, his m(dv/dt) = mg – γv(t) equation which, in this case, is dv(t)/dt = 9.8 – 0.196v(t). So, wherever we are in this (v, t) space, we look at the nearest arrow and it will tell us how our speed v will change as a function of t.

As you can see from the graph, the slope of these little arrows (i.e. dv/dt) is negative above the v(t) = 50 m/s line, and positive underneath it, and so we should not be surprised that, when we try to calculate at what speed dv/dt would be equal to zero (we do this by writing 9.8 – 0.196v(t) = 0), we find that this is the case if and only if v(t) = 9.8/0.196 = 50 indeed. So that looks like the stable situation: indeed, you’ll remember that derivatives reflect the rate of change, and so when dv/dt = 0, it means the object won’t change speed.

Now, the dynamics behind the graph are obviously clear: above the v(t) = 50 m/s line, the object will be slowing down, and underneath it, it will be speeding up. At the v(t) line itself, the gravity and air resistance forces will balance each other and the object’s speed will be constant – that is until it hits the earth of course :-).

So now we can have a look at these blue lines on the graph. If you understood something of the lengthy story about the red arrows above, then you’ll also understand, intuitively at least, that the blue lines on this graph represent the various solutions to the differential equation. Huh? Well. Yes.

The blue lines show how the velocity of the object will gradually converge to 50 m/s, and that the actual path being followed will depend on our actual starting point, which may be zero, less than 50 m/s, or equal or more than 50 m/s. So these blue lines still represent the world of possibilities, or all of the possible parallel universes, but so one of them – and one of them only – will represent the actual situation. Whatever that actual situation is (i.e. whatever point we start at when t = 0), the dynamics at work will make sure the speed converges to 50 m/s, so that’s the longer-term equilibrium for this situation. [Note that all is relative of course: if the object is being dropped out of a plane at an altitude of two or three km only, then ‘longer-term’ means like a minute or so, after which time the object will hit the ground and so then the equilibrium speed is obviously zero. :-)]

OK. I must assume you’re fine with the intuitive interpretation of these blue curves now. But so what are they really, beyond this ‘intuitive’ interpretation? Well, they are the solutions to the differential equation really and, because these solutions are found through an integration process indeed, they are referred to as the integral curves. I have to refer my imaginary reader here to Paul’s Notes (or any other math course) for as to how exactly that integration process works (it’s not as easy as you might think) but the equation for these blue curves is

v(t) = 50 + ce—0.196t 

In this equation, we have Euler’s number e (so that’s the irrational number e = 2.718281… etcetera) and also a constant c which depends on the initial conditions indeed. The graph below shows some of these curves for various values of c. You can calculate some more yourself of course. For example, if we start at the origin indeed, so if we have zero speed at t = 0, then we have v(0) = 50 + ce-0.196.0 = 50 + ce0 = 50 + c and, hence, c = -50 will represent that initial condition. [And, yes, please do note the similarity with the graphs of the antiderivatives (i.e. the indefinite integral) of a given function, because the c in that v(t) function is, effectively, the result of an integration process.]   

Solution for falling object

So that’s it really: the secret behind differential equations has been unlocked. There’s nothing more to it.

Well… OK. Of course we still need to learn how to actually solve these differential equations, and we’ll also have to learn how to solve partial differential equations, including equations with complex numbers as well obviously, and so on and son on. Even those other two ‘simple’ situations depicted above (see the two other graphs) are obviously more intimidating already (the second graph involves three equilibrium solutions – one stable, one unstable and one semi-stable – while the third graph shows not all situations have equilibrium solutions). However, I am sure I’ll get through it: it has been great fun so far, and what I read so far (i.e. this morning) is surely much easier to digest than all the things I wrote about in my other posts. 🙂

In addition, the example did involve two forces, and so it resembles classical electrodynamics, in which we also have two forces, the electric and magnetic force, which generate force fields that influence each other. However, despite of all the complexities, it is fair to say that, when push comes to shove, understanding Maxwell’s equations is a matter of understanding a particular set of partial differential equations. However, I won’t dwell on that now. My next post might consist of a brief refresher on all of that but I will probably first want to move on a bit with that course of Mathews and Walker. I’ll keep you posted on progress. 🙂

Post scriptum:

My imaginary reader will obviously note that this direction field looks very much like a vector field. In fact, it obviously is a vector field. Remember that a vector field assigns a vector to each point, and so a vector field in the plane is visualized as a collection of arrows indeed, with a given magnitude and direction attached to a point in the plane. As Wikipedia puts it: ‘vector fields are often used to model the strength and direction of some force, such as the electrostatic, magnetic or gravitational force. And so, yes, in the example above, we’re indeed modeling a force obeying Newton’s law: the change in the velocity of the object (i.e. the factor a = dv/dt in the F = ma equation) is proportional to the force (which is a force combining gravity and drag in this example), and the factor of proportionality is the inverse of the object’s mass (a = F/m and, hence, the greater its mass, the less a body accelerates under given force). [Note that the latter remark just underscores the fact that Newton’s formula shows that mass is nothing but a measure of the object’s inertia, i.e. its resistance to being accelerated or change its direction of motion.]

A second post scriptum point to be made, perhaps, is my remark that solving that dv(t)/dt = 9.8 – 0.196v(t) equation is not as easy as it may look. Let me qualify that remark: it actually is an easy differential equation, but don’t make the mistake of just putting an integral sign in front and writing something like ∫(0.196v + v’) dv = ∫9.8 dv, to then solve it as 0.098 v2 + v = 9.8v + c, which is equivalent to 0.098 v2 – 8.8 v + c = 0. That’s nonsensical because it does not give you v as an implicit or explicit function of t and so it’s a useless approach: it just yields a quadratic function in v which may or may not have any physical interpretation.

So should we, perhaps, use t as the variable of integration on one side and, hence, write something like ∫(0.196v + v’) dv = ∫9.8 dt? We then find 0.098 v+ v = 9.8t + c, and so that looks good, doesn’t it?  No. It doesn’t. That’s worse than that other quadratic expression in v (I mean the one which didn’t have t in it), and a lot worse, because it’s not only meaningless but wrongvery wrong. Why? Well, you’re using a different variable of integration (v versus t) on both sides of the equation and you can’t do that: you have to apply the same operation to both sides of the equation, whether that’s multiplying it with some factor or bringing one of the terms over to the other side (which actually mounts to subtracting the same term from both sides) or integrating both sides: we have to integrate both sides over the same variable indeed.

But – hey! – you may remember that’s what we do when differential equations are separable, isn’t? And so that’s the case here, isn’t it?We’ve got all the y’s on one side and all the x’s on the other side of the equation here, don’t we? And so then we surely can integrate one side over y and the other over x, isn’t it? Well… No. And yes. For a differential equation to be separable, all the x‘s and all the y’s must be nicely separated on both sides of the equation indeed but all the y’s in the differential equation (so not just one of them) must be part of the product with the derivative. Remember, a separable equation is an equation in the form of B(y)(dy/dx) = A(x), with B(y) some function of y indeed, and A(x) some function of x, but so the whole B(y) function is multiplied with dy/dx, not just one part of it. If, and only if, the equation can be written in this form, we can (a) integrate both sides over x but (b) also use the fact that ∫[B(y)dy/dx]dx = ∫B(y)dy. So, it looks like we’re effectively integrating one part (or one side) of the equation over the dependent variable y here, and the other over x, but the condition for being allowed to do so is that the whole B(y) function can be written as a factor in a product involving the dy/dx  derivative. Is that clear? I guess not. 😦 But then I need to move on.

The lesson here is that we always have to make sure that we write the differential equation in its ‘proper form’ before we do the integration, and we should note that the ‘proper form’ usually depends on the method we’re going to select to solve the equation: if we can’t write the equation in its proper form, then we can’t apply the method. […] Oh… […] But so how do we solve that equation then? Well… It’s done using a so-called integrating factor but, just as I did in the text above already, I’ll refer you to a standard course on that, such as Paul’s Notes indeed, because otherwise my posts would become even longer than they already are, and I would have even less imaginary readers. 🙂

Compactifying complex spaces

Pre-scriptum (dated 26 June 2020): the material in this post remains interesting but is, strictly speaking, not a prerequisite to understand quantum mechanics. It’s yet another example of how one can get lost in math when studying or teaching physics. :-/

Original post:

In this post, I’ll try to explain how Riemann surfaces (or topological spaces in general) are transformed into compact spaces. Compact spaces are, in essence, closed and bounded subsets of some larger space. The larger space is unbounded – or ‘infinite’ if you want (the term ‘infinite’ is less precise – from a mathematical point of view at least).

I am sure you have all seen it: the Euclidean or complex plane gets wrapped around a sphere (the so-called Riemann sphere), and the Riemann surface of a square root function becomes a torus (i.e. a donut-like object). And then the donut becomes a coffee cup (yes: just type ‘donut and coffee cup’ and look at the animation). The sphere and the torus (and the coffee cup of course) are compact spaces indeed – as opposed to the infinite plane, or the infinite Riemann surface representing the domain of a (complex) square root function. But what does it all mean?

Let me, for clarity, start with a note on the symbols that I’ll be using in this post. I’ll use a boldface z for the complex number z = (x, y) = reiθ in this post (unlike what I did in my previous posts, in which I often used standard letters for complex numbers), or for any other complex number, such as w = u + iv. That’s because I want to reserve the non-boldface letter z for the (real) vertical z coordinate in the three-dimensional (Cartesian or Euclidean) coordinate space, i.e. R3. Likewise, non-boldface letters such as x, y or u and v, denote other real numbers. Note that I will also use a boldface and a boldface to denote the set of real numbers and the complex space respectively. That’s just because the WordPress editor has its limits and, among other things, it can’t do blackboard bold (i.e. these double struck symbols which you usually see as symbols for the set of real and complex numbers respectively). OK. Let’s go for it now.

In my previous post, I introduced the concept of a Riemann surface using the multivalued square root function w = z1/2 = √z. The square root function has only two values. If we write z as z = rei θ, then we can write these two values as w1 = √r ei(θ/2) and w2 = √r ei(θ/2 ± π). Now, √r ei(θ/2 ± π) is equal to √r ei(±π)ei(θ/2) =  – √r ei(θ/2) and, hence, the second root is just the opposite of the first one, so w= – w1.

Introducing the concept of a Riemann surface using a ‘simple’ quadratic function may look easy enough but, in fact, this square root function is actually not the easiest one to start with. First, a simple single-valued function, such as w = 1/z (i.e. the function that is associated with the Riemann sphere) for example, would obviously make for a much easier point of departure. Secondly, the fact that we’re working with a limited number of values, as opposed to an infinite number of values (which is the case for the log z function for example) introduces this particularity of a surface turning back into itself which, as I pointed out in my previous post, makes the visualization of the surface somewhat tricky – to the extent it may actually prevent a good understanding of what’s actually going on.

Indeed, in the previous post I explained how the Riemann surface of the square root function can be visualized in the three-dimensional Euclidean space (i.e. R3). However, such representations only show the real part of z1/2, i.e. the vertical distance Re(z1/2) = √r cos(θ/2 + nπ), with n = 0 or ± 1. So these representations, like the one below for example, do not show the imaginary part, i.e.  Im(z1/2) = √r sin(θ/2 + nπ) (n = 0, ± 1).

That’s both good and bad. It’s good because, in a graph like this, you want one point to represent one point only, and so you wouldn’t get that if you would superimpose the plot with the imaginary part of wz1/2 on the plot showing the real part only. But it’s also bad, because one often forgets that we’re only seeing some part of the ‘real’ picture here, namely the real part, and so one often forgets to imagine the imaginary part. 🙂 

sqrt

The thick black polygonal line in the two diagrams in the illustration above shows how, on this Riemann surface (or at least its real part), the argument θ of  z = rei θ will go from 0 to 2π (and further), i.e. we’re making (more than) a full turn around the vertical axis, as the argument Θ of w =  z1/2 = √reiΘ makes half a turn only (i.e. Θ goes from 0 to π only). That’s self-evident because Θ = θ/2. [The first diagram in the illustration above represents the (flat) w plane, while the second one is the Riemann surface of the square root function, so it represents but so we have like two points for every z on the flat z plane: one for each root.]

All these visualizations of Riemann surfaces (and the projections on the z and w plane that come with them) have their limits, however. As mentioned in my previous post, one major drawback is that we cannot distinguish the two distinct roots for all of the complex numbers z on the negative real axis (i.e. all the points z = rei θ for which θ is equal to ±π, ±3π,…). Indeed, the real part of wz1/2, i.e. Re(w), is equal to zero for both roots there, and so, when looking at the plot, you may get the impression that we get the same values for w there, so that the two distinct roots of z (i.e. wand w2) coincide. They don’t: the imaginary part of  wand wis different there, so we need to look at the imaginary part of w too. Just to be clear on this: on the diagram above, it’s where the two sheets of the Riemann surface cross each other, so it’s like there’s an infinite number of branch points, which is not the case: the only branch point is the origin.

So we need to look at the imaginary part too. However, if we look at the imaginary part separately, we will have a similar problem on the positive real axis: the imaginary part of the two roots coincides there, i.e. Im(w) is zero, for both roots, for all the points z = rei θ for which θ = 0, 2π, 4π,… That’s what represented and written in the graph below.

branch point

The graph above is a cross-section, so to say, of the Riemann surface  w = z1/2 that is orthogonal to the z plane. So we’re looking at the x axis from -∞ to +∞ along the y axis so to say. The point at the center of this graph is the origin obviously, which is the branch point of our function w = z1/2, and so the y axis goes through it but we can’t see it because we’re looking along that axis (so the y-axis is perpendicular to the cross-section).

This graph is one I made as I tried to get some better understanding of what a ‘branch point’ actually is. Indeed, the graph makes it perfectly clear – I hope 🙂 – that we really have to choose between one of the two branches of the function when we’re at the origin, i.e. the branch point. Indeed, we can pick either the n = 0 branch or the n = ±1 branch of the function, and then we can go in any direction we want as we’re traveling on that Riemann surface, but so our initial choice has consequences: as Dr. Teleman (whom I’ll introduce later) puts it, “any choice of w, followed continuously around the origin, leads, automatically, to the opposite choice as we turn around it.” For example, if we take the w1 branch (or the ‘positive’ root as I call it – even if complex numbers cannot be grouped into ‘positive’ or ‘negative’ numbers), then we’ll encounter the negative root wafter one loop around the origin. Well… Let me immediately qualify that statement: we will still be traveling on the wbranch but so the value of w1 will be the opposite or negative value of our original was we add 2π to arg z = θ. Mutatis mutandis, we’re in a similar situation if we’d take the w2 branch. Does that make sense?

Perhaps not, but I can’t explain it any better. In any case, the gist of the matter is that we can switch from the wbranch to the wbranch at the origin, and also note that we can only switch like that there, at the branch point itself: we can’t switch anywhere else. So there, at the branch point, we have some kind of ‘discontinuity’, in the sense that we have a genuine choice between two alternatives.

That’s, of course, linked to the fact that one cannot define the value of our function at the origin: 0 is not part of the domain of the (complex) square root function, or of the (complex) logarithmic function in general (remember that our square root function is just a special case of the log function) and, hence, the function is effectively not analytic there. So it’s like what I said about the Riemann surface for the log z function: at the origin, we can ‘take the elevator’ to any other level, so to say, instead of having to walk up and down that spiral ramp to get there. So we can add or subtract ± 2nπ to θ without any sweat.

So here it’s the same. However, because it’s the square root function, we’ll only see two buttons to choose from in that elevator, and our choice will determine whether we get out at level Θ = α (i.e. the wbranch) or at level Θ = α ± π (i.e. the wbranch). Of course, you can try to push both buttons at the same time but then I assume that the elevator will make some kind of random choice for you. 🙂 Also note that the elevator in the log z parking tower will probably have a numpad instead of buttons, because there’s infinitely many levels to choose from. 🙂

OK. Let’s stop joking. The idea I want to convey is that there’s a choice here. The choice made determines whether you’re going to be looking at the ‘positive’ roots of z, i.e. √r(cosΘ+sinΘ) or at the ‘negative’ roots of z, i.e. √r(cos(Θ±π)+isin(Θ±π)), or, equivalently (because Θ = θ/2) if you’re going to be looking at the values of w for θ going from 0 to 2π, or the values of w for θ going from 2π to 4π.

Let’s try to imagine the full picture and think about how we could superimpose the graphs of both the real and imaginary part of w. The illustration below should help us to do so: the blue and red image should be shifted over and across each other until they overlap completely. [I am not doing it here because I’d have to make one surface transparent so you can see the other one behind – and that’s too much trouble now. In addition, it’s good mental exercise for you to imagine the full picture in your head.]  

Real and imaginary sheets

It is important to remember here that the origin of the complex z plane, in both images, is at the center of these cuboids (or ‘rectangular prisms’ if you prefer that term). So that’s what the little red arrow points is pointing at in both images and, hence, the final graph, consisting of the two superimposed surfaces (the imaginary and the real one), should also have one branch point only, i.e. at the origin.

[…]

I guess I am really boring my imaginary reader here by being so lengthy but so there’s a reason: when I first tried to imagine that ‘full picture’, I kept thinking there was some kind of problem along the whole x axis, instead of at the branch point only. Indeed, these two plots suggest that we have two or even four separate sheets here that are ‘joined at the hip’ so to say (or glued or welded or stitched together – whatever you want to call it) along the real axis (i.e. the x axis of the z plane). In such (erroneous) view, we’d have two sheets above the complex z plane (one representing the imaginary values of √z and one the real part) and two below it (again one with the values of the imaginary part of √z and one representing the values of the real part). All of these ‘sheets’ have a sharp fold on the x axis indeed (everywhere else they are smooth), and that’s where they join in this (erroneous) view of things.

Indeed, such thinking is stupid and leads nowhere: the real and imaginary parts should always be considered together, and so there’s no such thing as two or four sheets really: there is only one Riemann surface with two (overlapping) branches. You should also note that where these branches start or end is quite arbitrary, because we can pick any angle α to define the starting point of a branch. There is also only one branch point. So there is no ‘line’ separating the Riemann surface into two separate pieces. There is only that branch point at the origin, and there we decide what branch of the function we’re going to look at: the n = 0 branch (i.e. we consider arg w = Θ to be equal to θ/2) or the n = ±1 branch (i.e. we take the Θ = θ/2 ± π equation to calculate the values for wz1/2).

OK. Enough of these visualizations which, as I told you above already, are helpful only to some extent. Is there any other way of approaching the topic?

Of course there is. When trying to understand these Riemann surfaces (which is not easy when you read Penrose because he immediately jumps to Riemann surfaces involving three or more branch points, which makes things a lot more complicated), I found it useful to look for a more formal mathematical definition of a Riemann surface. I found such more formal definition in a series of lectures of a certain Dr. C. Teleman (Berkeley, Lectures on Riemann surfaces, 2003). He defines them as graphs too, or surfaces indeed, just like Penrose and others, but, in contrast, he makes it very clear, right from the outset, that it’s really the association (i.e. the relation) between z and w which counts, not these rather silly attempts to plot all these surfaces in three-dimensional space.

Indeed, according to Dr. Teleman’s introduction to the topic, a Riemann surface S is, quite simply, a set of ‘points’ (z, w) in the two-dimensional complex space C= C x(so they’re not your typical points in the complex plane but points with two complex dimensions), such that w and z are related with each other by a holomorphic function w = f(z), which itself defines the Riemann surface. The same author also usefully points out that this holomorphic function is usually written in its implicit form, i.e. as P(z, w) = 0 (in case of a polynomial function) or, more generally, as F(z, w) = 0.

There are two things you should note here. The first one is that this eminent professor suggests that we should not waste too much time by trying to visualize things in the three-dimensional R3 = R x R x R space: Riemann surfaces are complex manifolds and so we should tackle them in their own space, i.e. the complex Cspace. The second thing is linked to the first: we should get away from these visualizations, because these Riemann surfaces are usually much and much more complicated than a simple (complex) square root function and, hence, are usually not easy to deal with. That’s quite evident when we consider the general form of the complex-analytical (polynomial) P(z, w) function above, which is P(z, w) = wn + pn-1(z)wn-1 + … + p1(z)w + p0(z), with the pk(z) coefficients here being polynomials in z themselves.

That being said, Dr. Teleman immediately gives a ‘very simple’ example of such function himself, namely w = [(z– 1) + ((zk2)]1/2. Huh? If that’s regarded as ‘very simple’, you may wonder what follows. Well, just look him up I’d say: I only read the first lecture and so there are fourteen more. 🙂

But he’s actually right: this function is not very difficult. In essence, we’ve got our square root function here again (because of the 1/2 exponent), but with four branch points this time, namely ± 1 and ± k (i.e. the positive and negative square roots of 1 and krespectively, cf. the (z– 1)  and (z– k2) terms in the argument of this function), instead of only one (the origin).

Despite the ‘simplicity’ of this function, Dr. Teleman notes that “we cannot identify this shape by projection (or in any other way) with the z-plane or the w-plane”, which confirms the above: Riemann surfaces are usually not simple and, hence, these ‘visualizations’ don’t help all that much. However, while not ‘identifying the shape’ of this particular square root function, Dr. Teleman does make the following drawing of the branch points:

Compactification 1

This is also some kind of cross-section of the Riemann surface, just like the one I made above for the ‘super-simple’ w = √z function: the dotted lines represent the imaginary part of w = [(z– 1) + (z– k2)]1/2, and the non-dotted lines are the real part of the (double-valued) w function. So that’s like ‘my’ graph indeed, except that we’ve got four branch points here, so we can make a choice between one of the two branches of at each of them.

[Note that one obvious difficulty in the interpretation of Dr. Teleman’s little graph above is that we should not assume that the complex numbers k and k are actually lying on the same line as 1 and -1 (i.e. the real line). Indeed, k and k are just standard complex numbers and most complex numbers do not lie on the real line. While that makes the interpretation of that simple graph of Dr. Teleman somewhat tricky, it’s probably less misleading than all these fancy 3D graphs. In order to proceed, we can either assume that this z axis is some polygonal line really, representing line segments between these four branch points or, even better, I think we should just accept the fact we’re looking at the z plane here along the z plane itself, so we can only see it as a line and we shouldn’t bother about where these points k and –k are located. In fact, their absolute value may actually be smaller than 1, in which case we’d probably want to change the order of the branch points in Dr. Teleman’s little graph).]

Dr. Teleman doesn’t dwell too long on this graph and, just like Penrose, immediately proceeds to what’s referred to as the compactification of the Riemann space, so that’s this ‘transformation’ of this complex surface into a donut (or a torus as it’s called in mathematics). So how does one actually go about that?

Well… Dr. Teleman doesn’t waste too many words on that. In fact, he’s quite cryptic, although he actually does provide much more of an explanation than Penrose does (Penrose’s treatment of the matter is really hocus-pocus I feel). So let me start with a small introduction of my own once again.

I guess it all starts with the ‘compactification’ of the real line, which is visualized below: we reduce the notion of infinity to a ‘point’ (this ‘point’ is represented by the symbol ∞ without a plus or minus sign) that bridges the two ‘ends’ of the real line (i.e. the positive and negative real half-line). Like that, we can roll up single lines and, by extension, the whole complex plane (just imagine rolling up the infinite number of lines that make up the plane I’d say :-)). So then we’ve got an infinitely long cylinder.

374px-Real_projective_line

But why would we want to roll up a line, or the whole plane for that matter? Well… I don’t know, but I assume there are some good reasons out there: perhaps we actually do have some thing going round and round, and so then it’s probably better to transform our ‘real line’ domain into a ‘real circle’ domain. The illustration below shows how it works for a finite sheet, and I’d also recommend my imaginary reader to have a look at the Riemann Project website (http://science.larouchepac.com/riemann/page/23), where you’ll find some nice animations (but do download Wolfram’s browser plugin if your Internet connection is slow: downloading the video takes time). One of the animations shows how a torus is, indeed, ideally suited as a space for a phenomenon characterized by two “independent types of periodicity”, not unlike the cylinder, which is the ‘natural space’ for “motion marked by a single periodicity”.

plane to torus

However, as I explain in a note below this post, the more natural way to roll or wrap up a sheet or a plane is to wrap it around a sphere, rather than trying to create a donut. Indeed, if we’d roll the infinite plane up in a donut, we’ll still have a line representing infinity (see below) and so it looks quite ugly: if you’re tying ends up, it’s better you tie all of them up, and so that’s what you do when you’d wrap the plane up around a sphere, instead of a torus.

From plane to torus

OK. Enough on planes. Back to our Riemann surface. Because the square root function w has two values for each z, we cannot make a simple sphere: we have to make a torus. That’s because a sphere has one complex dimension only, just like a plane, and, hence, they are topologically equivalent so to say. In contrast, a double-valued function has two ‘dimensions’ so to say and, hence, we have to transform the Riemann surface into something which accommodates that, and so that’s a torus (or a coffee cup :-)). In topological jargon, a torus has genus one, while the complex plane (and the Riemann sphere) has genus zero.

[Please do note that this will be the case regardless of the number of branch points. Indeed, Penrose gives the example of the function w = (1 – z3)1/2, which has three branch points, namely the three cube roots of the 1 – zexpression (these three roots are obviously equal to the three cube roots of unity). However, ‘his’ Riemann surface is also a Riemann surface of a square root function (albeit one with a more complicated form than the ‘core’ w = z1/2 example) and, hence, he also wraps it up as a donut indeed, instead of a sphere or something else.]

I guess that you, my imaginary reader, have stopped reading all of this nonsense. If you haven’t, you’re probably thinking: why don’t we just do it? How does it work? What’s the secret?

Frankly, the illustration in Penrose’s Road to Reality (i.e. Fig. 8.2 on p. 137) is totally useless in terms of understanding how it’s being done really. In contrast, Dr. Teleman is somewhat more explicit and so I’ll follow him here as much as I can while I try to make sense of it (which is not as easy as you might think). 

The short story is the following: Dr. Teleman first makes two ‘cuts’ (or ‘slits’) in the z plane, using the four branch points as the points where these ‘cuts’ start and end. He then uses these cuts to form two cylinders, and then he joins the ends of these cylinders to form that torus. That’s it. The drawings below illustrate the proceedings. 

Cuts

Compactification 3

Huh? OK. You’re right: the short story is not correct. Let’s go for the full story. In order to be fair to Dr. Teleman, I will literally copy all what he writes on what is illustrated above, and add my personal comments and interpretations in square brackets (so when you see those square brackets, that’s [me] :-)). So this is what Dr. Teleman has to say about it:

The function w = [(z– 1) + ((z– k2)]1/2 behaves like the [simple] square root [function] near ± 1 and ± k. The important thing is that there is no continuous single-valued choice of w near these points [shouldn’t he say ‘on’ these points, instead of ‘near’?]: any choice of w, followed continuously round any of the four points, leads to the opposite choice upon return.

[The formulation may sound a bit weird, but it’s the same as what happens on the simple z1/2 surface: when we’re on one of the two branches, the argument of w changes only gradually and, going around the origin, starting from one root of z (let’s say the ‘positive’ root w1), we arrive, after one full loop around the origin on the z plane (i.e. we add 2π to arg z = θ), at the opposite value, i.e. the ‘negative’ root w= -w1.] 

Defining a continuous branch for the function necessitates some cuts. The simplest way is to remove the open line segments joining 1 with k and -1 with –k. On the complement of these segments [read: everywhere else on the z plane], we can make a continuous choice of w, which gives an analytic function (for z ≠ ±1, ±k). The other branch of the graph is obtained by a global change of sign. [Yes. That’s obvious: the two roots are each other’s opposite (w= –w1) and so, yes, the two branches are, quite simply, just each other’s opposite.]

Thus, ignoring the cut intervals for a moment, the graph of w breaks up into two pieces, each of which can be identified, via projection, with the z-plane minus two intervals (see Fig. 1.4 above). [These ‘projections’ are part and parcel of this transformation business it seems. I’ve encountered more of that stuff and so, yes, I am following you, Dr. Teleman!]

Now, over the said intervals [i.e. between the branch points], the function also takes two values, except at the endpoints where those coincide. [That’s true: even if the real parts of the two roots are the same (like on the negative real axis for our  z1/2 s), the imaginary parts are different and, hence, the roots are different for points between the various branch points, and vice versa of course. This is actually one of the reasons why I don’t like Penrose’s illustration on this matter: his illustration suggests that this is not the case.]

To understand how to assemble the two branches of the graph, recall that the value of w jumps to its negative as we cross the cuts. [At first, I did not get this, but so it’s the consequence of Dr. Teleman ‘breaking up the graph into tow pieces’. So he separates the two branches indeed, and he does so at the ‘slits’ he made, so that’s between the branch points. It follows that the value of w will indeed jump to its opposite value as we cross them, because we’re jumping on the other branch there.]

Thus, if we start on the upper sheet and travel that route, we find ourselves exiting on the lower sheet. [That’s the little arrows on these cuts.] Thus, (a) the far edges of the cuts on the top sheet must be identified with the near edges of the cuts on the lower sheet; (b) the near edges of the cuts on the top sheet must be identified with the far edges on the lower sheet; (c) matching endpoints are identified; (d) there are no other identifications. [Point (d) seems to be somewhat silly but I get it: here he’s just saying that we can’t do whatever we want: if we glue or stitch or weld all of these patches of space together (or should I say copies of patches of space?), we need to make sure that the points on the edges of these patches are the same indeed.]

A moment’s thought will convince us that we cannot do all this in in R3, with the sheets positioned as depicted, without introducing spurious crossings. [That’s why Brown and Churchill say it’s ‘physically impossible.’] To rescue something, we flip the bottom sheet about the real axis.  

[Wow! So that’s the trick! That’s the secret – or at least one of them! Flipping the sheet means rotating it by 180 degrees, or multiplying all points twice with i, so that’s i2 = -1 and so then you get the opposite values. Now that’s a smart move!] 

The matching edges of the cuts are now aligned, and we can perform the identifications by stretching each of the surfaces around the cut to pull out a tube. We obtain a picture representing two planes (ignore the boundaries) joined by two tubes (see Fig. 1.5.a above).

[Hey! That’s like the donut-to-coffee-cup animation, isn’t it? Pulling out a tube? Does that preserve angles and all that? Remember it should!]

For another look at this surface, recall that the function z → R2/z identifies the exterior of the circle ¦z¦ < R with the punctured disc z: ¦z¦ < R and z ≠ 0 (it’s a punctured disc so its center is not part of the disc). Using that, we can pull the exteriors of the discs, missing from the picture above, into the picture as punctured discs, and obtain a torus with two missing points as the definitive form of our Riemann surface (see Fig. 1.5.b).

[Dr. Teleman is doing another hocus-pocus thing here. So we have those tubes with an infinite plane hanging on them, and so it’s obvious we just can’t glue these two infinite planes together because it wouldn’t look like a donut 🙂. So we first need to transform them into something more manageable, and so that’s the punctured discs he’s describing. I must admit I don’t quite follow him here, but I can sort of sense – a little bit at least – what’s going on.] 

[…]

Phew! Yeah, I know. My imaginary reader will surely feel that I don’t have a clue of what’s going on, and that I am actually not quite ready for all of this high-brow stuff – or not yet at least. He or she is right: my understanding of it all is rather superficial at the moment and, frankly, I wish either Penrose or Teleman would explain this compactification thing somewhat better. I also would like them to explain why we actually need to do this compactification thing, why it’s relevant for the real world.

Well… I guess I can only try to move forward as good as I can. I’ll keep you/myself posted.

Note: As mentioned above, there is more than one way to roll or wrap up the complex plane, and the most natural way of doing this is to do it around a sphere, i.e. the so-called Riemann sphere, which is illustrated below. This particular ‘compactification’ exercise is equivalent to a so-called stereographic projection: it establishes a one-on-one relationship between all points on the sphere and all points of the so-called extended complex plane, which is the complex plane plus the ‘point’ at infinity (see my explanation on the ‘compactification’ of the real line above).

Riemann_sphereStereographic_projection_in_3D

But so Riemann surfaces are associated with complex-analytic functions, right? So what’s the function? Well… The function with which the Riemann sphere is associated is w = 1/z. [1/z is equal to z = z*/¦z¦, with z* = x – iy, i.e. the complex conjugate of z = x + iy, and ¦z¦ the modulus or absolute value of z, and so you’ll recognize the formulas for the stereographic projection here indeed.]

OK. So what? Well… Nothing much. This mapping from the complex z plane to the complex w plane is conformal indeed, i.e. it preserves angles (but not areas) and whatever else that comes with complex analyticity. However, it’s not as straightforward as Penrose suggests. The image below (taken from Brown and Churchill) shows what happens to lines parallel to the x and y axis in the z plane respectively: they become circles in the w plane. So this particular function actually does map circles to circles (which is what holomorphic functions have to do) but only if we think of straight lines as being particular cases of circles, namely circles “of infinite radius”, as Penrose puts it.

inverse of z function

Frankly, it is quite amazing what Penrose expects in terms of mental ‘agility’ of the reader. Brown and Churchill are much more formal in their approach (lots of symbols and equations I mean, and lots of mathematical proofs) but, to be honest, I find their stuff easier to read, even if their textbook is a full-blown graduate level course in complex analysis.

I’ll conclude this post here with two more graphs: they give an idea of how the Cartesian and polar coordinate spaces can be mapped to the Riemann sphere. In both cases, the grid on the plane appears distorted on the sphere: the grid lines are still perpendicular, but the areas of the grid squares shrink as they approach the ‘north pole’.

CartesianStereoProj

PolarStereoProj

The mathematical equations for the stereographic projection, and the illustration above, suggest that the w = 1/z function is basically just another way to transform one coordinate system into another. But then I must admit there is a lot of finer print that I don’t understand – as yet that is. It’s sad that Penrose doesn’t help out very much here.

Riemann surfaces (II)

Pre-scriptum (dated 26 June 2020): the material in this post remains interesting but is, strictly speaking, not a prerequisite to understand quantum mechanics. It’s yet another example of how one can get lost in math when studying or teaching physics. :-/

Original post:

This is my second post on Riemann surfaces, so they must be important. [At least I hope so, because it takes quite some time to understand them. :-)]

From my first post on this topic, you may or may not remember that a Riemann surface is supposed to solve the problem of multivalued complex functions such as, for instance, the complex logarithmic function (log z = ln r + i(θ + 2nπ) or the complex exponential function (zc = ec log z). [Note that the problem of multivaluedness for the (complex) exponential function is a direct consequence of its definition in terms of the (complex) logarithmic function.]

In that same post, I also wrote that it all looked somewhat fishy to me: we first use the function causing the problem of multivaluedness to construct a Riemann surface, and then we use that very same surface as a domain for the function itself to solve the problem (i.e. to reduce the function to a single-valued (analytic) one). Penrose does not have any issues with that though. In Chapter 8 (yes, that’s where I am right now: I am moving very slowly on his Road to Reality, as it’s been three months of reading now, and there are 34 chapters!), he writes that  “Complex (analytic) functions have a mind of their own, and decide themselves what their domain should be, irrespective of the region of the complex plane which we ourselves may initially have allotted to it. While we may regard the function’s domain to be represented by the Riemann surface associated with the function, the domain is not given ahead of time: it is the explicit form of the function itself that tells us which Riemann surface the domain actually is.” 

Let me retrieve the graph of the Riemannian domain for the log z function once more:

Riemann_surface_log

For each point z in the complex plane (and we can represent z both with rectangular as well as polar coordinates: z = x + iy = reiθ), we have an infinite number of log z values: one for each value of n in the log z = ln r + i(θ + 2nπ) expression (n = 0, ±1, ±2, ±3,…, ±∞). So what we do when we promote this Riemann surface as a domain for the log z function is equivalent to saying that point z is actually not one single point z with modulus r and argument θ + 2nπ, but an infinite collection of points: these points all have the same modulus ¦z¦ = r but we distinguish the various ‘representations’ of z by treating θ, θ ± 2π, θ ±+ 4π, θ ± 6π, etcetera, as separate argument values as we go up or down on that spiral ramp. So that is what is represented by that infinite number of sheets, which are separated from each other by a vertical distance of 2π. These sheets are all connected at or through the origin (at which the log z function is undefined: therefore, the origin is not part of the domain), which is the branch point for this function. Let me copy some formal language on the construction of that surface here:

“We treat the z plane, with the origin deleted, as a thin sheet Rwhich is cut along the positive half of the real axis. On that sheet, let θ range from 0 to 2π. Let a second sheet  Rbe cut in the same way and placed in front of the sheet R0. The lower edge of the slit in Ris then joined to the upper edge of the slit in R1. On  R1, the angle θ ranges from 2π to 4π; so, when z is represented by a point on R1, the imaginary component of log z ranges from 2π to 4π.” And then we repeat the whole thing, of course: “A sheet Ris then cut in the same way and placed in front of R1. The lower edge of the slit in R1. is joined to the upper edge of the slit in this new sheet, and similarly for sheets R3R4… A sheet R-1R-2, R-3,… are constructed in like manner.” (Brown and Churchill, Complex Variables and Applications, 7th edition, p. 335-336)

The key phrase above for me is this “when z is represented by a point on R1“, because that’s what it is really: we have an infinite number of representations of z here, namely one representation of z for each branch of the log z function. So, as n = 0, ±1, , ±2, ±3 etcetera, we have an infinite number of them indeed. You’ll also remember that each branch covers a range from some random angle α to α + 2π. Imagine a continuous curve around the origin on this Riemann surface: as we move around, the angle of z changes from 0 to 2θ on sheet R0, and then from 2π to 4π on sheet Rand so on and so on.

The illustration above also illustrates the meaning of a branch point. Imagine yourself walking on that surface and approaching the origin, from any direction really. At the origin itself, you can choose what to do: either you take the elevator up or down to some other level or, else, the elevator doesn’t work and so then you have to walk up or down that ramp to get to another level. If you choose to walk along the ramp, the angle θ changes gradually or, to put it in mathematical terms, in a continuous way. However, if you took the elevator and got out at some other level, you’ll find that you’ve literally ‘jumped’ one or more levels. Indeed, remember that log z = ln r + i(θ + 2nπ) and so ln r, the horizontal distance from the origin didn’t change, but you did add some multiple of 2π to the vertical distance, i.e. the imaginary part of the log z value. 

Let us now construct a Riemann surface for some other multiple-valued functions. Let’s keep it simple and start with the square root of z, so c = 1/2, which is nothing else than a specific example of the complex exponential function zc = zc = ec log z: we just take a real number for c here. In fact, we’re taking a very simple rational number value for c: 1/2 = 0.5. Taking the square, cube, fourth or  nth root of a complex number is indeed nothing but a special case of the complex exponential function. The illustration below (taken from Wikipedia) shows us the Riemann surface for the square root function.

Riemann_sqrt

As you can see, the spiraling surface turns back into itself after two turns. So what’s going on here? Well… Our multivalued function here does not have an infinite number of values for each z: it has only two, namely √r ei(θ/2) and √r ei(θ/2 + π). But what’s that? We just said that the log function – of which this function is a special case – had an infinite number of values? Well… To be somewhat more precise:  z1/2 actually does have an infinite number of values for each z (just like any other complex exponential function), but it has only two values that are different from each other. All the others coincide with one of the two principal ones. Indeed, we can write the following:

w = √z = z1/2 =  e(1/2) log z e(1/2)[ln r + i(θ + 2nπ)] = r1/2 ei(θ/2 + nπ) = √r ei(θ/2 + nπ) 

(n = 0, ±1,  ±2,  ±3,…)

For n = 0, this expression reduces to z1/2 = √r eiθ/2. For n = ±1, we have  z1/2 = √r ei(θ/2 + π), which is different than the value we had for n = 0. In fact, it’s easy to see that this second root is the exact opposite of the first root: √r ei(θ/2 + π) = √r eiθ/2eiπ = – √r eiθ/2). However, for n = 2, we have  z1/2 = √r ei(θ/2 + 2π), and so that’s the same value (z1/2 = √r eiθ/2) as for n = 0. Indeed, taking the value of n = 2 amounts to adding 2π to the argument of w and so get the same point as the one we found for n = 0. [As for the plus or minus sign, note that, for n = -1, we have  z1/2 = √r ei(θ/2 -π) = √r ei(θ/2 -π+2π) =  √r ei(θ/2 +π) and, hence, the plus or minus sign for n does not make any difference indeed.]

In short, as mentioned above, we have only two different values for w = √z = z1/2 and so we have to construct two sheets only, instead of an infinite number of them, like we had to do for the log z function. To be more precise, because the sheet for n = ±2 will be the same sheet as for n = 0, we need to construct one sheet for n = 0 and one sheet for n = ±1, and so that’s what shown above: the surface has two sheets (one for each branch of the function) and so if we make two turns around the origin (one on each sheet), we’re back at the same point, which means that, while we have a one-to-two relationship between each point z on the complex plane and the two values z1/2 for this point, we’ve got a one-on-one relationship between every value of z1/2 and each point on this surface.

For ease of reference in future discussions, I will introduce a personal nonsensical convention here: I will refer to (i) the n = 0 case as the ‘positive’ root, or as w1, i.e. the ‘first’ root, and to (ii) the n = ± 1 case as the ‘negative’ root, or w2, i.e. the ‘second’ root. The convention is nonsensical because there is no such thing as positive or negative complex numbers: only their real and imaginary parts (i.e. real numbers) have a sign. Also, these roots also do not have any particular order: there are just two of them, but neither of the two is like the ‘principal’ one or so. However, you can see where it comes from: the two roots are each other’s exact opposite w= u2 + iv= —w= -u1 – iv1. [Note that, of course, we have w1w = w12 = w2w = w2= z, but that the product of the two distinct roots is equal to —z. Indeed, w1w2 = w2w1 = √rei(θ/2)√rei(θ/2 + π) = rei(θ+π) = reiθeiπ = -reiθ = -z.]

What’s the upshot? Well… As I mentioned above already, what’s happening here is that we treat z = rei(θ+2π) as a different ‘point’ than z = reiθ. Why? Well… Because of that square root function. Indeed, we have θ going from 0 to 2π on the first ‘sheet’, and then from 2π0 to 4π on the second ‘sheet’. Then this second sheet turns back into the first sheet and so then we’re back at normal and, hence, while θ going from 0π  to 2π is not the same as θ going from 2π  to 4π, θ going from 4π  to 6π  is the same as θ going from 0 to 2π (in the sense that it does not affect the value of w = z1/2). That’s quite logical indeed because, if we denote w as w = √r eiΘ (with Θ = θ/2 + nπ, and n = 0 or ± 1), then it’s clear that arg w = Θ will range from 0 to 2π if (and only if) arg z = θ ranges from 0 to 4π. So as the argument of w makes one loop around the origin – which is what ‘normal’ complex numbers do – the argument of z makes two loops. However, once we’re back at Θ = 2π, then we’ve got the same complex number w again and so then it’s business as usual.

So that will help you to understand why this Riemann surface is said to have two complex dimensions, as opposed to the plane, which has only one complex dimension.

OK. That should be clear enough. Perhaps one question remains: how do you construct a nice graph like the one above?

Well, look carefully at the shape of it. The vertical distance reflects the real part of √z for n = 0, i.e. √r cos(θ/2). Indeed, the horizontal plane is the the complex z plane and so the horizontal axes are x and y respectively (i.e. the x and y coordinates of z = x + iy). So this vertical distance equals 1 when x = 1 and y = 0 and that’s the highest point on the upper half of the top sheet on this plot (i.e. the ‘high-water mark’ on the right-hand (back-)side of the cuboid (or rectangular prism) in which this graph is being plotted). So the argument of z is zero there (θ = 0). The value on the vertical axis then falls from one to zero as we turn counterclockwise on the surface of this first sheet, and that’s consistent with a value for θ being equal to π there (θ = π), because then we have cos(π/2) = 0. Then we go underneath the z plane and make another half turn, so we add another π radians to the value θ and we arrive at the lowest point on the lower half of the bottom sheet on this plot, right under the point where we started, where θ = 2π and, hence, Re(√z) = √r cos(θ/2) (for n = 0) = cos(2π/2) = cos(2π/2) = -1.

We can then move up again, counterclockwise on the bottom sheet, to arrive once again at the spot where the bottom sheet passes through the top sheet: the value of θ there should be equal to θ = 3π, as we have now made three half turns around the origin from our original point of departure (i.e. we added three times π to our original angle of departure, which was θ = 0) and, hence, we have Re(√z) = √r cos(3θ/2) = 0 again. Finally, another half turn brings us back to our point of departure, i.e. the positive half of the real axis, where θ has now reached the value of θ = 4π, i.e. zero plus two times 2π. At that point, the argument of w (i.e. Θ) will have reached the value of 2π, i.e. 4π/2, and so we’re talking the same w = z1/2 as when we started indeed, where we had Θ = θ/2 = 0.

What about the imaginary part? Well… Nothing special really (as for now at least): a graph of the imaginary part of √z would be equally easy to establish: Im(√z) = √r sin(θ/2) and, hence, rotating this plot 180 degrees around the vertical axis will do the trick.

Hmm… OK. What’s next? Well… The graphs below show the Riemann surfaces for the third and fourth root of z respectively, i.e. z1/3 and z1/4 respectively. It’s easy to see that we now have three and four sheets respectively (instead of two only), and that we have to take three and four full turns respectively to get back at our starting point, where we should find the same values for z1/3 and z1/4 as where we started. That sounds logical, because we always have three cube roots of any (complex) numbers, and four fourth roots, so we’d expect to need the same number of sheets to differentiate between these three or four values respectively.

Riemann_surface_cube_rootRiemann_surface_4th_root

In fact, the table below may help to interpret what’s going on for the cube root function. We have three cube roots of z: w1, wand w3. These three values are symmetrical though, as indicated be the red, green and yellow colors in the table below: for example, the value of w for θ ranging from 4π to 6π for the n = 0 case (i.e. w1) is the same as the value of w for θ ranging from 0 to 2π for the n = 1 case (or the n = -2 case, which is equivalent to the n = 1 case).

Cube roots

So the origin (i.e. the point zero) for all of the above surfaces is referred to as the branch point, and the number of turns one has to make to get back at the same point determines the so-called order of the branch point. So, for w = z1/2, we have a branch point of order 2; for for w = z1/3, we have a branch point of order 3; etcetera. In fact, for the log z function, the branch point does not have a finite order: it is said to have infinite order.

After a very brief discussion of all of this, Penrose then proceeds and transforms a ‘square root Riemann surface’ into a torus (i.e. a donut shape). The correspondence between a ‘square root Riemann surface’ and a torus does not depend on the number of branch points: it depends on the number of sheets, i.e. the order of the branch point. Indeed, Penrose’s example of a square root function is w = (1 – z3)1/2, and so that’s a square root function with three branch points (the three roots of unity), but so these branch points are all of order two and, hence, there are two sheets only and, therefore, the torus is the appropriate shape for this kind of ‘transformation’. I will come back to that in the next post.

OK… But I still don’t quite get why this Riemann surfaces are so important. I must assume it has something to do with the mystery of rolled-up dimensions and all that (so that’s string theory), but I guess I’ll be able to shed some more light on that question only once I’ve gotten through that whole chapter on them (and the chapters following that one).  I’ll keep you posted. 🙂

Post scriptum: On page 138 (Fig. 8.3), Penrose shows us how to construct the spiral ramp for the log z function. He insists on doing this by taking overlapping patches of space, such as the L(z) and Log z branch of the log z function, with θ going from 0 to 2π for the L(z) branch) and from -π to +π for the Log z branch (so we have an overlap here from 0 to +π). Indeed, one cannot glue or staple patches together if the patch surfaces don’t overlap to some extent… unless you use sellotape of course. 🙂 However, continuity requires some overlap and, hence, just joining the edges of patches of space with sellotape, instead of gluing overlapping areas together, is not allowed. 🙂

So, constructing a model of that spiral ramp is not an extraordinary intellectual challenge. However, constructing a model of the Riemann surfaces described above (i.e. z1/2, z1/3, z1/4 or, more in general, constructing a Riemann surface for any rational power of z, i.e. any function w = zn/m, is not all that easy: Brown and Churchill, for example, state that is actually ‘physically impossible’ to model that (see Brown and Churchill, Complex Variables and Applications (7th ed.), p. 337).

Huh? But so we just did that for z1/2, z1/3 and z1/4, didn’t we? Well… Look at that plot for w = z1/2 once again. The problem is that the two sheets cut through each other. They have to do that, of course, because, unlike the sheets of the log z function, they have to join back together again, instead of just spiraling endlessly up or down. So we just let these sheets cross each other. However, at that spot (i.e. the line where the sheets cross each other), we would actually need two representations of z. Indeed, as the top sheet cuts through the bottom sheet (so as we’re moving down on that surface), the value of θ will be equal to π, and so that corresponds to a value for w equal to w = z1/2 = √r eiπ/2 (I am looking at the n = 0 case here). However, when the bottom sheet cuts through the top sheet (so if we’re moving up instead of down on that surface), θ’s value will be equal to 3π (because we’ve made three half-turns now, instead of just one) and, hence, that corresponds to a value for w equal to w = z1/2 = √r e3iπ/2, which is obviously different from √r eiπ/2. I could do the same calculation for the n = ±1 case: just add ±π to the argument of w.

Huh? You’ll probably wonder what I am trying to say here. Well, what I am saying here is that plot of the surface gives us the impression that we do not have two separate roots w1 and won the (negative) real axis. But so that’s not the case: we do have two roots there, but we can’t distinguish them with that plot of the surface because we’re only looking at the real part of w.

So what?

Well… I’d say that shouldn’t worry us all that much. When building a model, we just need to be aware that it’s a model only and, hence, we need to be aware of the limitations of what we’re doing. I actually build a paper model of that surface by taking two paper disks: one for the top sheet, and one for the bottom sheet. Then I cut those two disks along the radius and folded and glued both of them like a Chinese hat (yes, like the one the girl below is wearing). And then I took those two little paper Chinese hats, put one of them upside down, and ‘connected’ them (or should I say, ‘stitched’ or ‘welded’ perhaps? :-)) with the other one along the radius where I had cut into these disks. [I could go through the trouble of taking a digital picture of it but it’s better you try it yourself.]

ChineseHat

Wow! I did not expect to be used as an illustration in a blog on math and physics! 🙂

🙂 OK. Let’s get somewhat more serious again. The point to note is that, while these models (both the plot as well as the two paper Chinese hats :-)) look nice enough, Brown and Churchill are right when they note that ‘the points where two of the edges are joined are distinct from the points where the two other edges are joined’. However, I don’t agree with their conclusion in the next phrase, which states that it is ‘thus physically impossible to build a model of that Riemann surface.’ Again, the plot above and my little paper Chinese hats are OK as a model – as long as we’re aware of how we should interpret that line where the sheets cross each other: that line represents two different sets of points.

Let me go one step further here (in an attempt to fully exhaust the topic) and insert a table here with the values of both the real and imaginary parts of √z for both roots (i.e. the n = 0 and n = ± 1 case). The table shows what is to be expected: the values for the n = ± 1 case are the same as for n = 0 but with the opposite sign. That reflects the fact that the two roots are each other’s opposite indeed, so when you’re plotting the two square roots of a complex number z = reiθ, you’ll see they are on opposite sides on a circle with radius √r. Indeed, rei(θ/2 + π) = rei(θ/2)eiπ = –rei(θ/2). [If the illustration below is too small to read the print, then just click on it and it should expand.]

values of square root of z

The grey and green colors in the table have the same role as the red, green and yellow colors I used to illustrated how the cube roots of z come back periodically. We have the same thing here indeed: the values we get for the n = 0 case are exactly the same as for the n = ± 1 case but with a difference in ‘phase’ I’d say of one turn around the origin, i.e. a ‘phase’ difference of 2π. In other words, the value of √z in the n = 0 case for θ going from 0 to 2π is equal to the value of √z in the n = ± 1 case but for θ going from 2π to 4π and, vice versa, the value of √z in the n = ±1 case for θ going from 0 to 2π is equal to the value of √z in the n = 0 case for θ going from 2π to 4π. Now what’s the meaning of that? 

It’s quite simple really. The two different values of n mark the different branches of the w function, but branches of functions always overlap of course. Indeed, look at the value of the argument of w, i.e. Θ: for the n = 0 case, we have 0 < Θ < 2π, while for the n = ± 1 case, we have -π < Θ < +π. So we’ve got two different branches here indeed, but they overlap for all values Θ between 0 and π and, for these values, where Θ1 = Θ2, we will obviously get the same value for w, even if we’re looking at two different branches (Θ1 is the argument of w1, and Θ2 is the argument of w2). 

OK. I guess that’s all very self-evident and so I should really stop here. However, let me conclude by noting the following: to understand the ‘fully story’ behind the graph, we should actually plot both the surface of the imaginary part of √z as well as the surface of the real part of of √z, and superimpose both. We’d obviously get something that would much more complicated than the ‘two Chinese hats’ picture. I haven’t learned how to master math software (such as Maple for instance), as yet, and so I’ll just copy a plot which I found on the web: it’s a plot of both the real and imaginary part of the function w = z2. That’s obviously not the same as the w = z1/2 function, because w = z2 is a single-valued function and so we don’t have all these complications. However, the graph is illustrative because it shows how two surfaces – one representing the real part and the other the imaginary part of a function value – cut through each other thereby creating four half-lines (or rays) which join at the origin. 

complex parabola w = z^2

So we could have something similar for the w = z1/2 function if we’d have one surface representing the imaginary part of z1/2 and another representing the  real part of z1/2. The sketch below illustrates the point. It is a cross-section of the Riemann surface along the x-axis (so the imaginary part of z is zero there, as the values of θ are limited to 0, π, 2π, 3π, back to 4π = 0), but with both the real as well as the imaginary part of  z1/2 on it. It is obvious that, for the w = z1/2 function, two of the four half-lines marking where the two surfaces are crossing each other coincide with the positive and negative real axis respectively: indeed, Re( z1/2) = 0 for θ = π and 3π (so that’s the negative real axis), and Im(z1/2) = 0 for θ = 0, 2π and 4π (so that’s the positive real axis).

branch point

The other two half-lines are orthogonal to the real axis. They follow a curved line, starting from the origin, whose orthogonal projection on the z plane coincides with the y axis. The shape of these two curved lines (i.e. the place where the two sheets intersect above and under the axis) is given by the values for the real and imaginary parts of the √z function, i.e. the vertical distance from the y axis is equal to ± (√2√r)/2.

Hmm… I guess that, by now, you’re thinking that this is getting way too complicated. In addition, you’ll say that the representation of the Riemann surface by just one number (i.e. either the real or the imaginary part) makes sense, because we want one point to represent one value of w only, don’t we? So we want one point to represent one point only, and that’s not what we’re getting when plotting both the imaginary as well as the real part of w in a combined graph. Well… Yes and no. Insisting that we shouldn’t forget about the imaginary part of the surface makes sense in light of the next post, in which I’ll say a think or two about ‘compactifying’ surfaces (or spaces) like the one above. But so that’s for the next post only and, yes, you’re right: I should stop here.

Analytic continuation

Pre-scriptum (dated 26 June 2020): the material in this post remains interesting but is, strictly speaking, not a prerequisite to understand quantum mechanics. It’s yet another example of how one can get lost in math when studying or teaching physics. :-/

Original post:

In my previous post, I promised to say something about analytic continuation. To do so, let me first recall Taylor’s theorem: if we have some function f(z) that is analytic in some domain D, then we can write this function as an infinite power series:

f(z) = ∑ an(z-z0)n

with n = 0, 1, 2,… going all the way to infinity (n = 0 → ∞) and the successive coefficients an equal to  an = [f(n)(z0)]/n! (with f(n)(z0) denoting the derivative of the nth order, and n! the factorial function n! = 1 x 2 x 3 x … x n).

I should immediately add that this domain D will always be an open disk, as illustrated below. The term ‘open’ means that the boundary points (i.e. the circle itself) are not part of the domain. This open disk is the so-called circle of convergence for the complex function f(z) = 1/(1-z), which is equivalent to the (infinite) power series f(z) = 1 + z + z+ z3 + z4 +… [A clever reader will probably try to check this using Taylor’s theorem above, but I should note the exercise involves some gymnastics. Indeed, the development involves the use of the identity 1 + z + z+ z3 + … + zn = (1 – zn+1)/(1 – z).]

singularity

This power series converges only when the absolute value of z is (strictly) smaller than 1, so only when ¦z¦ < 1. Indeed, the illustration above shows the singularity at the point 1 (or the point (1, 0) if you want) on the real axis: the denominator of the function 1/(1-z) effectively becomes zero there. But so that’s one point only and, hence, we may ask ourselves why this domain should be bounded a circle going through this one point. Why not some square or rectangle or some other weird shape avoiding this point? That question takes a few theorems to answer, and so I’ll just say that this is just one of the many remarkable things about analytic functions: if a power series such as the one above converges to f(z) within some circle whose radius is the distance from z(in this case, zis the origin and so we’re actually dealing with a so-called Maclaurin expansion here, i.e. an oft-used special case of the Taylor expansion) to the nearest point zwhere f fails to be analytic (that point zis equal to 1 in this case), then this circle will actually be the largest circle centered at zsuch that the series converges to f(z) for all z interior to it.

PuffThat’s quite a mouthful so let me rephrase it. What is being said here is that there’s usually a condition of validity for the power series expansion of a function: if that condition of validity is not fulfilled, then the function cannot be represented by the power series. In this particular case, the expansion of f(z) = 1/(1-z) = 1 + z + z+ z3 + z4 +… is only valid when ¦z¦ < 1, and so there is no larger circle about  z0 (i.e. the origin in this particular case) such that at each point interior to it, the Taylor series (or the Maclaurin series in this case) converges to f(z).

That being said, we can usually work our way around such singularities, especially when they are isolated, such as in this example (there is only this one point 1 that is causing trouble), and that is where the concept of analytic continuation comes in. However, before I explain this, I should first introduce Laurent’s theorem, which is like Taylor’s theorem but it applies to functions which are not as ‘nice’ as the functions for which Taylor’s theorem holds (i.e. functions that are not analytic everywhere), such as this 1/(1-z) function indeed. To be more specific, Laurent’s theorem says that, if we have a function f which is analytic in an annular domain (i.e. the red area in the illustration below) centered at some point z(in the illustration below, that’s point c) then f(z) will have a series representation involving both positive and negative powers of the term (z – z0).

256px-Laurent_series

More in particular, f(z) will be equal to f(z) = ∑ [an(z – z0)n] + ∑ [bn/(z-z0)n], with n = 0, 1,…, ∞ and with an and  bn coefficients involving complex integrals which I will not write them down here because WordPress lacks a good formula editor and so it would look pretty messy. An alternative representation of the Laurent series is to write f(z) as f(z) = ∑ [cn(z – z0)n], with c= (1/2πi) ∫C [f(z)/(z – z0)n+1]dz (n = 0, ±1, ±2,…, ±∞). Well – so here I actually did write down the integral. I hope it’s not too messy 🙂 .

It’s relatively easy to verify that this Laurent series becomes the Taylor series if there would be no singularities, i.e. if the domain would cover the whole disk (so if there would be red everywhere, even at the origin point). In that case, the cn coefficient becomes (1/2πi) ∫C [f(z)/(z – z0)-1+1]dz for n = -1 and we can use the fact that, if f(z) is analytic everywhere, the integral ∫C [f(z)dz will be zero along any contour in the domain of f(z). For n = -2, the integrand becomes f(z)/(z – z0)-2+1 = f(z)(z – z0) and that’s an analytic function as well because the function z – zis analytic everywhere and, hence, the product of this (analytic) function with f(z) will also be analytic everywhere (sums, products and compositions of analytic functions are also analytic). So the integral will be zero once again. Similarly, for n = -3, the integrand f(z)/(z – z0)-3+1 = f(z)(z – z0)2 is analytic and, hence, the integral is again zero. In short, all bn coefficients (i.e. all ‘negative’ powers) in the Laurent series will be zero, except for n = 0, in which case bn = b0 = a0. As for the acoefficients, one can see they are equal to the Taylor coefficients by using what Penrose refers to as the ‘higher-order’ version of the Cauchy integral formula: f(n)(z0)/n! = (1/2πi) ∫C [f(z)/(z – z0)n+1]dz.

It is also easy to verify that this expression also holds for the special case of a so-called punctured disk, i.e. an annular domain for which the ‘hole’ at the center is limited to the center point only, so this ‘annular domain’ then consists of all points z around z0 for which 0 < ¦z – z0¦ < R. We can then write the Laurent series as f(z) = ∑ an(z-z0)+ b1/(z – z0) + b2/(z – z0)+…+ bn/(z – z0)+… with n = 0, 1, 2,…, ∞.

OK. So what? Well… The point to note is that we can usually deal with singularities. That’s what the so-called theory of residues and poles is for. The term pole is a more illustrious term for what is, in essence, an isolated singular point: it has to do with the shape of the (modulus) surface of f(z) near this point, which is, well… shaped like a tent on a (vertical) pole indeed. As for the term residue, that’s a term used to denote this coefficient b1 in this power series above. The value of the residue at one or more isolated singular points can be used to evaluate integrals (so residues are used for solving integrals), but we won’t go into any more detail here, especially because, despite my initial promise, I still haven’t explained what analytic continuation actually is. Let me do that now.

For once, I must admit that Penrose’s explanation here is easier to follow than other texts (such as the Wikipedia article on analytic continuation, which I looked at but which, for once, seems to be less easy to follow than Penrose’s notes on it), so let me closely follow his line of reasoning here.

If, instead of the origin, we would use a non-zero point z0 for our expansion of this function f(z) = 1/(1-z) = 1 + z + z+ z3 + z4 +… (i.e. a proper Taylor expansion, instead of the Maclaurin expansion around the origin), then we would, once again, find a circle of convergence for this function which would, once again, be bounded by the singularity at point (1, 0), as illustrated below. In fact, we can move even further out and expand this function around the (non-zero) point zand so on and so on. See the illustration: it is essential that the successive circles of convergence around the origin, z0, zetcetera overlap when ‘moving out’ like this.

analytic continuation

So that’s this concept of ‘analytic continuation’. Paraphrasing Penrose, what’s happening here is that the domain D of the analytic function f(z) is being extended to a larger region D’ in which the function f(z) will also be analytic (or holomorphic – as this is the term which Penrose seems to prefer over ‘analytic’ when it comes to complex-valued functions).

Now, we should note something that, at first sight, seems to be incongruent: as we wander around a singularity like that (or, to use the more mathematically correct term, a pole of the function) to then return to our point of departure, we may get (in fact, we are likely to get) different function values ‘back at base’. Indeed, the illustration below shows what happens when we are ‘wandering’ around the origin for the log z function. You’ll remember (if not, see the previous posts) that, if we write z using polar coordinates (so we write z as z = reiθ), then log z is equal to log z = lnr + i(θ + 2nπ). So we have a multiple-valued function here and we dealt with that by using branches, i.e. we limited the values which the argument of z (arg z = θ) could take to some range α < θ < α + 2π. However, when we are wandering around the origin, we don’t limit the range of θ. In fact, as we are wandering around the origin, we are effectively constructing this Riemann surface (which we introduced in one of our previous posts also), thereby effectively ‘gluing’ successive branches of the log z function together, and adding 2πto the value of our log z function as we go around. [Note that the vertical axis in the illustration below keeps track of the imaginary part of log z only, i.e. the part with θ in it only. If my imaginary reader would like to see the real part of log z, I should refer him to the post with the entry on Riemann surfaces.]

Imaginary_log_analytic_continuation

But what about the power series? Well… The log z function is just like any other analytic function and so we can and do expand it as we go. For example, if we expand the log z function about the point (1, 0), we get log z = (z – 1) – (1/2)(z – 1)+ (1/3)(z – 1)– (1/4)(z – 1)+… etcetera. But as we wonder around, we’ll move into a different branch of the log z function and, hence, we’ll get a different value when we get back to that point. However, I will leave the details of figuring that one out to you 🙂 and end this post, because the intention here is just to illustrate the principle, and not to copy some chapter out of a math course (or, at least, not to copy all of it let’s say :-)).

If you can’t work out it out, you can always try to read the Wikipedia article on analytic continuation. While less ‘intuitive’ than Penrose’s notes on it, it’s definitely more complete, even if does not quite exhaust the topic. Wikipedia defines analytic continuation as “a technique to extend the domain of a given analytic function by defining further values of that function, for example in a new region where an infinite series representation in terms of which it is initially defined becomes divergent.” The Wikipedia article also notes that, “in practice, this continuation is often done by first establishing some functional equation on the small domain and then using this equation to extend the domain: examples are the Riemann zeta function and the gamma function.” But so that’s sturdier stuff which Penrose does not touch upon – for now at least, but I expect him to develop such things in later Road to Reality chapters).

Post scriptum: Perhaps this is an appropriate place to note that, at first sight, singularities may look like no big deal: so we have a infinitesimally small hole in the domain of function log z or 1/z s or whatever, so what? Well… It’s probably useful to note that, if we wouldn’t have that ‘hole’ (i.e. the singulariy), any integral of this function (I mean the integral of this function along any closed contour around that point, i.e. ∫C f(z)dz, would be equal to zero, but when we do have that little hole, like for f(z) = 1/z, we don’t have that result. In this particular case (i.e. f(z) = 1/z), you should note that the integral ∫C (1/z)dz, for any closed contour around the origin, equals 2πi, or that, just to give one more example here, that the value of the integral ∫C [f(z)/(z – z0)]dz is equal to 2πf(z0). Hence, even if f(z) would be analytic over the whole open disk, including the origin, the ‘quotient function’ f(z)/z will not be analytic at the origin and, hence, the value of the integral of this ‘quotient function’ f(z)/z around the origin will not be zero but equal to 2πi times the value of the original f(z) function at the origin, i.e. 2πi times f(0). Vice versa, if we find that the value of the integral of some function around a closed contour – and I mean any closed contour really – is not equal to zero, we know we’ve got a problem somewhere and so we should look out for one or more infinitesimally small little ‘holes’ somewhere in the domain. Hence, singularities, and this complex theory of poles and residues which shows us how we can work with them, is extremely relevant indeed: it’s surely not a matter of just trying to get some better approximation for this or that value or formula or so. 🙂

In light of the above, it is now also clear that the term ‘residue’ is well chosen: this coefficient bis equal to ∫C f(z)dz divided by 2π(I take the case of a Maclaurin expansion here) and, hence, there would be no singularity, this integral (and, hence, the coefficient b1) would be equal to zero. Now, because of the singularity, we have a coefficient b≠ 0 and, hence, using the term ‘residue’ for this ‘remnant’ is quite appropriate.

Complex integrals

Pre-scriptum (dated 26 June 2020): the material in this post remains interesting but is, strictly speaking, not a prerequisite to understand quantum mechanics. It’s yet another example of how one can get lost in math when studying or teaching physics. :-/

Original post:

Roger Penrose packs a lot in his chapter on complex-number calculus (Road to Reality, Chapter 7). He summarily introduces the concept of contour integration and then proceeds immediately to discuss power series representations of complex functions as well as fairly advanced ways to deal with singularities (see the section on analytic continuation). Brown and Churchill use not less than three chapters to develop this (Integrals (chapter 4), Series (chapter 5), and Residues and Poles (chapter 6)), and that’s probably what is needed for some kind of understanding of it all. Let’s start with integrals. However, let me first note here that Wordpress does not seem to have a formula editor (so it is not like MS Word) and, hence, I have to keep the notation simple: I’ll use the symbol ∫for a contour (or line) integral along a curve C, and the symbol ∫[a, b] for an integral on a (closed) interval [a, b].

OK. Here we go. First, it is important to note that Penrose, and Brown and Churchill, are talking about complex integrals, i.e. integrals of complex-valued functions, whose value itself is (usually) a complex number too. That is very different from the line integrals I was exposed to when reading Feynman’s Lectures on Physics. Indeed, Feynman’s Lectures (Volume II, on Electromagnetism) offer a fine introduction to contour integrals, where the function to be integrated is either a scalar field (e.g. electric potential) or, else, some vector field (e.g. magnetic field, gravitational field). Now, vector fields are two-dimensional things: they have an x- and a y-coordinate and, hence, we may think the functions involving vectors are complex too. They are but, that being said, the integrals involved all yield real-number values because the integrand is likely to be a dot product of vectors (and dot products of vectors, as opposed to cross products, yield a real number). I won’t go into the details here but, for those who’d want to have such details, the Wikipedia article offers a fine description (including some very nice animations) of what integration over a line or a curve in such fields actually means. So I won’t repeat that here. I can only note what Brown and Churchill say about them: these (real-valued) integrals can be interpreted as areas under the curve, and they would usually also have one or the other obvious physical meaning, but complex integrals do usually not have such ‘helpful geometric or physical interpretation.’

So what are they then? Let’s first start with some examples of curves.

curve examples

The illustration above makes it clear that, in practice, the curves which we are dealing with are usually parametric curves. In other words, the coordinates of all points z of the curve C can be represented as some function of a real-number parameter: z(t) = x(t) + iy(t). We can then define a complex integral as the integral of a complex-valued function f(z) of a complex variable z along a curve C from point zto point zand write such integral as ∫f(z)dz.

Moreover, if C can be parametrized as z(t), we will have some (real) number a and b such that  z1= z(a) and z2= z(b) and, taking into account that dz = z'(t)dt with z'(t)=dz/dt (i.e. the derivative of the (complex-valued) function z(t) with respect to the (real) parameter t), we can write ∫f(z)dz as:

f(z)dz = ∫[a, b] f[z(t)]z'(t)dt 

OK, so what? Well, there are a lot of interesting things to be said about this, but let me just summarize some of the main theorems. The first important theorem does not seem to associated with any particular mathematician (unlike Cauchy or Goursat, which I’ll introduce in a minute) but is quite central: if we have some (complex-valued) function f(z) which would happen to be continuous in some domain D, then all of the following statements will be true if one of them is true:

(I) f(z) has an antiderivative F(z) in D ; (II) the integrals of f(z) along contours lying entirely in D and extending from any fixed point z1to any fixed point zall have the same value; and, finally, (III) the integrals of f(z) around closed contours lying entirely in D all have value zero.

This basically means that the integration of f(z) from z1to z2 is not dependent on the path that is taken. But so when do we have such path independence? Well… You may already have guessed the answer to that question: it’s when the function is analytic or, in other words, when these Cauchy-Riemann equations u= vy and u= – vare satisfied (see my other post on analytic (or holomorphic) complex-valued functions). That’s, in a nutshell, what’s stated in the so-called Cauchy-Goursat theorem, and it should be noted that it is an equivalence really, so we also have the vice versa statement: if the integrals of f(z) around closed contours in some domain D are zero, then we know that f(z) is holomorphic.

In short, we’ll always be dealing with ‘nice’ functions and then we can show that the s0-called ‘fundamental’ theorem of calculus (i.e. the one that links integrals with derivatives, or – to be somewhat more precise – with the antiderivative of the integrand) also applies to complex-valued valued functions. We have:

f(z)dz = ∫[a, b] f[z(t)]z'(t)dt = F[z(b)] – F[z(a)]

or, more in general: ∫f(z)dz = F(z2) – F(z1)

We also need to note the Cauchy integral formula: if we have a function f that is analytic inside and on a closed contour C, then the value of this function for any point zwill be equal to:

f(z0) = (1/2πi) ∫C [f(z)/(z – z0)]dz

This may look like just another formula, but it’s quite spectacular really: it basically says that the function value of any point zwithin a region enclosed by a curve is completely determined by the values of this function on this curve. Moreover, integrating both sides of this equation repeatedly leads to similar formulas for the derivatives of the first, second, third, and higher order of f: f'(z0) = (1/2πi) ∫[f(z)/(z – z0)2]dz, f”(z0) = (1/2πi) ∫[f(z)/(z – z0)3]dz or, more in general:

f(n)(z0) = (n!/2πi) ∫[f(z)/(z – z0)n+1]dz (n = 1, 2, 3,…)

This formula is also known as Cauchy’s differentiation formula. It is a central theorem in complex analysis really, as it leads to many other interesting theorems, including Gauss’s mean value theorem, Liouville’s theorem, the maximum (and miniumum) modulus principle It is also essential for the next chapter in Brown and Churchill’s course: power series representations of complex functions. However, I will stop here because I guess this ‘introduction’ to complex integrals is already confusing enough.

Post scriptum: I often wondered why one would label one theorem as ‘fundamental’, as it implies that all the other theorems may be important but, obviously, somewhat less fundamental. I checked it out and it turns out there is some randomness here. The Wikipedia article boldly states that the fundamental theorem of algebra (which states that every non-constant single-variable polynomial with complex coefficients has at least one complex roots) is not all that ‘fundamental’ for modern algebra: its title just reflects the fact that there was a time when algebra focused almost exclusively on studying polynomials. The same might be true for the fundamental theorem of arithmetic (i.e. the unique(-prime)-factorization theorem), which states that every integer greater than 1 is either a prime itself or the product of prime numbers, e.g. 1200 = (24)(31)(52).

That being said, the fundamental theorem of calculus is obviously pretty ‘fundamental’ indeed. It leads to many results that are indeed key to understanding and solving problems in physics. One of these is the Divergence Theorem (or Gauss’s Theorem), which states that the outward flux of a vector field through a closed surface is equal to the volume integral of the divergence over the region inside the surface. Huh? Well… Yes. It pops up in any standard treatment of electromagnetism. There are others (like Stokes’ Theorem) but I’ll leave it at that for now, especially because these are theorems involving real-valued integrals.

Riemann surfaces (I)

Pre-scriptum (dated 26 June 2020): the material in this post remains interesting but is, strictly speaking, not a prerequisite to understand quantum mechanics. It’s yet another example of how one can get lost in math when studying or teaching physics. :-/

Original post:

In my previous post on this blog, I once again mentioned the issue of multiple-valuedness. It is probably time to deal with the issue once and for all by introducing Riemann surfaces.

Penrose attaches a lot of importance to these Riemann surfaces (so I must assume they are very important). In contrast, in their standard textbook on complex analysis, Brown and Churchill note that the two sections on Riemann surfaces are not essential reading, as it’s just ‘a geometric device’ to deal with multiple-valuedness. But so let’s go for it.

I already signaled that complex powers w = zc are multiple-valued functions of z and so that causes all kinds of problems, because we can’t do derivatives and integrals and all that. In fact,  zc = ec log z and so we have two components on the right-hand side of this equation. The first one is the (complex) exponential function ec, i.e. the real number e raised to a complex power c. We already know (see the other posts below) that this is a periodic function with (imaginary) period 2πie= ec+2πi eiec = 1ec. While this periodic component of zc is somewhat special (as compared to exponentiation in real analysis), it is not this periodic component but the log z component which is causing the problem of multiple-valuedness. [Of course, it’s true that the problem of multiple-valuedness of the log function is, in fact, a logical consequence of the periodicity of the complex power function, but so you can figure that out yourself I guess.] So let’s look at that log z function once again.

If we write z in its polar form z = reiθ, then log z will be equal to log z = ln r + i(θ+2nπ) with n = 0, ±1, ±2,… Hence, if we write log z in rectangular coordinates (i.e. log z = x + iy) , then we note that the x component (i.e.the real part) of log z is equal to ln r and, hence, x is just an ordinary real number with some fixed value (x = ln r). However, the y component (i.e. the imaginary part of log z) does not have any fixed value: θ is just one of the values, but so are θ+2π and θ – 2π and θ+4π etcetera. In short, we have an infinite number of values for y, and so that’s the issue: what do we do with all these values? It’s not a proper function anymore.

Now, this problem of multiple-valuedness is usually solved by just picking a so-called principal value for log z, which is written as Log z = ln r + iθ, and which is defined by mathematicians by imposing the condition that θ takes a value in the interval between -π and +π only (hence, -π < θ < π). In short, the mathematicians usually just pretend that the 2nπthing doesn’t matter.

However, this is not trivial: as we are imposing these restrictions on the value of Θ, we are actually defining some new single-valued function Log z = ln r + iθ. This Log z function, then, is a complex-valued analytic function with two real-valued components: x = ln r and y = θ. So, while x = ln r can take any value on the real axis, we let θ range from -π to +π only (in the usual counterclockwise or ‘positive’ direction, because that happens to be the convention). If we do this, we get a principal value for zas well: P.V. zc = ec Log z, and so we’ve ‘solved’ the problem of multiple values for the function ztoo in this way.

What we are doing here has a more general significance: we are taking a so-called branch out of a multiple-valued function, in order to make it single-valued and, hence, analytic. To illustrate what is really going on here, let us go back to the original multiple-valued log z = ln r + i(θ+2nπ) function and let’s do away with this integer n by writing log z in the more general form log z = ln r + iΘ. Of course, Θ is equal to θ+2nπ but so we’ll just forget about the θ and, most importantly, about the n, and allow the y component (i.e. the imaginary part) of the imaginary number log z = x + iy to take on any value Θ in the real field. In other words, we treat this angle Θ just like any other ordinary real number. We can now define branches of log z again, but in a more general way: we can pick any value α and say that’s a branch point, as it will define a range α <  Θ < α + 2π in which, once again, we limit the possible values of log z to just one.

For example, if we choose α = -π, then Θ will range from -π to +π and so then we’re back to log z’s principal branch, i.e. Log z. However, let us now, instead of taking this Log z branch, define another branch – we’ll call it the L(z) branch – by choosing α = 0 and, hence, letting Θ range from 0 to 2π. So we have 0 < Θ < 2π and, of course, you’ll note that this range overlaps with the range that is being used for the principal branch of log z (i.e. Log z). It does, and it’s not a problem. Indeed, for values 0 < Θ < π (i.e. the overlapping half-plane) we get the same set of values Log z = L(z) for log z, and so we are talking the same function indeed.

OK. I guess we understand that. So what? Well… The fact is that we have found a very very nice way of illustrating the multiple-valuedness of the log z function and – more importantly – a nice way of ‘solving’ it too. Have a look at the beautiful 3D graph below. It represents the log z function. [Well… Let me be correct and note that, strictly speaking, this particular surface seems to represent the imaginary part of the log z function only, but that’s OK at this stage.]

Riemann_surface_log

Huh? What’s happening here? Well, this spiral surface represents the log z function by ‘gluing’ successive log z branches together. I took the illustration from Wikipedia’s article on the complex logarithm and, to explain how this surface has been constructed, let’s start at the origin, which is located right in the center of this graph, between the yellow-green and the red-pinkish sheets (so the horizontal (x, y) plane we start from is not the bottom of this rectangular prism: you should imagine it at its center).

From there, we start building up the first ‘level’ of this graph (i.e. the yellowish level above the origin) as the angle Θ sweeps around the origin,  in counterclockwise direction, across the upper half of the complex z plane. So it goes from 0 to π and, when Θ crosses the negative side of the real axis, it has added π to its original value. With ‘original value’, I mean its value when it crossed the positive real axis the previous time. As we’ve just started, Θ was equal to 0. We then go from π to 2π, across the lower half of the complex plane, back to the positive real axis: that gives us the first ‘level’ of this spiral staircase (so the vertical distance reflects the value of Θ indeed, which is the imaginary part of log z) . Then we can go around the origin once more, and so Θ goes from 2π to 4π, and so that’s how we get the second ‘level’ above the origin – i.e. the greenish one. But – hey! – how does that work? The angle 2π is the same as zero, isn’t it? And 4π as well, no?Well… No. Not here. It is the same angle in the complex plane, but is not the same ‘angle’ if we’re using it here in this log z = ln r + iΘ function.

Let’s look at the first two levels (so the yellow-green ones) of this 3D graph once again. Let’s start with Θ = 0 and keep Θ fixed at this zero value for a while. The value of log z is then just the real component of this log z = ln r + iΘ function, and so we have log z = ln r + i0 = ln r. This ln r function (or ln(x) as it is written below) is just the (real) logarithmic function, which has the familiar form shown below. I guess there is no need to elaborate on that although I should, perhaps, remind you that r (or x in the graph below) is always some positive real number, as it’s the modulus of a vector – or a vector length if that’s easier to understand. So, while ln(r) can take on any (real-number) value between -∞ and +∞, the argument r is always a positive real number.

375px-Logarithm_derivative

Let us now look at what happens with this log z function as Θ moves from 0 to 2π, first through the upper half of the complex z plane, to Θ = π first, and then further to 2π through the lower half of the complex plane. That’s less easy to visualize, but the illustration below might help. The circles in the plane below (which is the z plane) represent the real part of log z: the parametric representation of these circles is: Re(log z) = ln r = constant. In short, when we’re on these circles, going around the origin, we keep r fixed in the z plane (and, hence, ln r is constant indeed) but we let the argument of z (i.e. Θ) vary from 0 to 2π and, hence, the imaginary part of log z (which is equal to Θ)  will also vary. On the rays it is the other way around: we let r vary but we keep the argument Θ of the complex number z = reiθ fixed. Hence, each ray is the parametric representation of Im(log z) = Θ = constant, so Θ is some fixed angle in the interval 0 < π < 2π.

Logez02

Let’s now go back to that spiral surface and construct the first level of that surface (or the first ‘sheet’ as it’s often referred to) once again. In fact, there is actually more than way to construct such spiral surface: while the spiral ramp above seems to depict the imaginary part of log z only, the vertical distance on the illustration below includes both the real as well as the imaginary part of log z (i.e. Re log z + Im log z = ln r + Θ).

Final graph of log z

Again, we start at the origin, which is, again, the center of this graph (there is a zero (0) marker nearby, but that’s actually just the value of Θ on that ray (Θ = 0), not a marker for the origin point). If we move outwards from the center, i.e. from the origin, on the horizontal two-dimensional z = x + iy = (x,y) plane but along the ray Θ = 0, then we again have log z = ln r + i0 = ln r. So, looking from above, we would see an image resembling the illustration above: we move on a circle around the origin if we keep r constant, and we move on rays if we keep Θ constant. So, in this case, we fix the value of Θ at 0 and move out on a ray indeed and, in three dimensions, the shape of that ray reflects the ln r function. As we then become somewhat more adventurous and start moving around the origin, rather than just moving away from it, the iΘ term in this ln r + iΘ function kicks in and the imaginary part of w (i.e. Im(log z) = y = Θ) grows. To be precise, the value 2π gets added to y with every loop around the origin as we go around it. You can actually ‘measure’ this distance 2π ≈ 6.3 between the various ‘sheets’ on the spiral surface along the vertical coordinate axis (that is if you could read the tiny little figures along the vertical coordinate axis in these 3D graphs, which you probably can’t).

So, by now you should get what’s going on here. We’re looking at this spiral surface and combining both movements now. If we move outwards, away from this center, keeping Θ constant, we can see that the shape of this spiral surface reflects the shape of the ln r function, going to -∞ as we are close to the center of the spiral, and taking on more moderate (positive) values further away from it. So if we move outwards from the center, we get higher up on this surface. We can also see that we also move higher up this surface as we move (counterclockwise) around the origin, rather than away from it. Indeed, as mentioned above, the vertical coordinate in the graph above (i.e. the measurements along the vertical axis of the spiral surface) is equal to the sum of Re(log) and Im(log z). In other words, the ‘z’ coordinate in the Euclidean three-dimensional (x, y, z) space which the illustrations above are using is equal to ln r + Θ, and, hence, as 2π gets added to the previous value of Θ with every turn we’re making around the origin, we get to the next ‘level’ of the spiral, which is exactly 2π higher than the previous level. Vice versa, 2π gets subtracted from the previous value of Θ as we’re going down the spiral, i.e. as we are moving clockwise (or in the ‘negative’ direction as it is aptly termed).

OK. This has been a very lengthy explanation but so I just wanted to make sure you got it. The horizontal plane is the z plane, so that’s all the points z = x + iy = reiθ, and so that’s the domain of the log z function. And then we have the image of all these points z under the log z function, i.e. the points w = ln r + iΘ right above or right below the z points on the horizontal plane through the origin.

Fine. But so how does this ‘solve’ the problem of multiple-valuedness, apart from ‘illustrating’ it? Well… From the title of this post, you’ll have inferred – and rightly so – that the spiral surface which we have just constructed is one of these so-called Riemann surfaces.

We may look at this Riemann surface as just another complex surface because, just like the complex plane, it is a two-dimensional manifold. Indeed, even if we have represented it in 3D, it is not all that different from a sphere as a non-Euclidean two-dimensional surface: we only need two real numbers (r and Θ) to identify any point on this surface and so it’s two-dimensional only indeed (although it has more ‘structure’ than the ‘flat’ complex plane we are used to) . It may help to note that there are other surfaces like this, such as the ones below, which are Riemann surfaces for other multiple-valued functions: in this case, the surfaces below are Riemann surfaces for the (complex) square root function (f(z) = z1/2) and the (complex) arcsin(z) function.

Riemann_surface_sqrtRiemann_surface_arcsin

Nice graphs, you’ll say but, again, what is this all about? These graphs surely illustrate the problem of multiple-valuedness but so how do they help to solve it? Well… The trick is to use such Riemann surface as a domain really: now that we’ve got this Riemann surface, we can actually use it as a domain and then log z (or z1/2 or arcsin(z) if we use these other Riemann surfaces) will be a nice single-valued (and analytic) function for all points on that surface. 

Huh? What? […] Hmm… I agree that it looks fishy: we first use the function itself to construct a ‘Riemannian’ surface, and then we use that very same surface as a ‘Riemannian’ domain for the function itself? Well… Yes. As Penrose puts it: “Complex (analytic) functions have a mind of their own, and decide themselves what their domain should be, irrespective of the region of the complex plane which we ourselves may initially have allotted to it. While we may regard the function’s domain to be represented by the Riemann surface associated with the function, the domain is not given ahead of time: it is the explicit form of the function itself that tells us which Riemann surface the domain actually is.”

I guess we’ll have to judge the value of this bright Riemannian idea (Bernhardt Riemann had many bright ideas during his short lifetime it seems) when we understand somewhat better why we’d need these surfaces for solving physics problems. Back to Penrose. 🙂

Post scriptum: Brown and Churchill seem to approach the matter of how to construct a Riemann surface somewhat less rigorously than I do, as they do not provide any 3D illustrations but just talk about joining thin sheets, by cutting them along the positive half of the real axis and then joining the lower edge of the slit of the first sheet to the upper edge of the slit in the second sheet. This should be done, obviously, by making sure there is no (additional) tearing of the original sheet surfaces and all that (so we’re talking ‘continuous deformations’ I guess), but so that could be done, perhaps, without creating that ‘tornado vortex’ around the vertical axis, which you can clearly see in that gray 3D graph above. If we don’t include the ln r term in the definition of the ‘z’ coordinate in the Euclidean three-dimensional (x, y, z) space which the illustrations above are using, then we’d have a spiral ramp without a ‘hole’ in the center. However, that being said, in order to construct a ‘proper’ two-dimensional manifold, we would probably need some kind function of r in the definition of ‘z’. In fact, we would probably need to write r as some function of Θ in order to make sure we’ve got a proper analytic mapping. I won’t go into detail here (because I don’t know the detail) but leave it to you to check it out on the Web: just check on various parametric representations of spiral ramps: there’s usually (and probably always) a connection between Θ and how, and also how steep, spiral ramps climb around their vertical axis.

Complex functions and power series

Pre-scriptum (dated 26 June 2020): the material in this post remains interesting but is, strictly speaking, not a prerequisite to understand quantum mechanics. It’s yet another example of how one can get lost in math when studying or teaching physics. :-/

Original post:

As I am going back and forth between this textbook on complex analysis (Brown and Churchill, Complex Variables and Applications) and Roger Penrose’s Road to Reality, I start to wonder how complete Penrose’s ‘Complete Guide to the Laws of the Universe actually is or, to be somewhat more precise, how (in)accessible. I guess the advice of an old friend – a professor emeritus in nuclear physics, so he should know! – might have been appropriate. He basically said I should not try to take any shortcuts (because he thinks there aren’t any), and that I should just go for some standard graduate-level courses on physics and math, instead of all these introductory texts that I’ve been trying to read (such as Roger Penrose’s books – but it’s true I’ve tried others too). The advice makes sense, if only because such standard courses are now available on-line. Better still: they are totally free. One good example is the Physics OpenCourseWare (OCW) from MIT: I just went on their website (ocw.mit.edu/courses/physics) and I was truly amazed.

Roger Penrose is not easy to read indeed: he also takes almost 200 pages to explain complex analysis, i.e. as many pages as the Brown and Churchill textbook,  but I find the more formal treatment of the subject-matter in the math handbook easier to read than Penrose’s prose. So, while I won’t drop Penrose as yet (this time I really do not want to give up), I will probably to (continue to) invest more time in other books – proper textbooks really – than in reading Penrose. In fact, I’ve started to look at Penrose’s prose as a more creative approach, but one that makes sense only after you’ve gone through all of the ‘basics’. And so these ‘basics’ are probably easier to grasp by reading some tried and tested textbooks on math and physics first.

That being said, let me get back to the matter on hand by making good on at least one of the promises I made in the previous posts, and that is to say something more about the Taylor expansion of analytic functions. I wrote in one of these posts that this Taylor expansion is something truly amazing. It is, in my humble view at least. We have all these (complex-valued) functions of complex variables out there – such as ez, log z, zc, complex polynomials, complex trigonometric and hyperbolic functions, and all of the possible combinations of the aforementioned – and so all of these functions can be represented by a (infinite) sum of powers f(z) = Σ an(z-z0)n (with n going from 0 to infinity and with zbeing some arbitrary point in the function’s domain). So that’s the Taylor power series.

All complex functions? Well… Yes. Or no. All analytic functions. I won’t go into the details (if only because it is hard to integrate mathematical formulas with the XML editor I am using here) but so it is an amazing result, which leads to many other amazing results. In fact, the proof of Taylor’s Theorem is, in itself, rather marvelous (yes, I went through it) as it involves other spectacular formulas (such as the Cauchy integral formula). However, I won’t go into this here. Just take it for granted:  Taylor’s Theorem is great stuff!

But so the function has to be analytic – or well-behaved as I’d say. Otherwise we can’t use Taylor’s Theorem and, hence, this power series expansion doesn’t work. So let’s define (complex) analyticity: a function w = f(z) = f(x+iy) = u(x) + i(y) is analytic (a often-used synonym is holomorphic) if its partial derivatives ux, uy, vand vy exist and respect the so-called Cauchy-Riemann equations: ux = vy and u= -vx.

These conditions are restrictive (much more restrictive than the conditions for analyticity for real-valued functions). Indeed, there are many complex functions which look good at first sight – if only because there’s no problem whatsoever with their real-valued components u(x,y) and v(x,y) in real analysis/calculus – but which do not satisfy these Cauchy-Riemann conditions. Hence, they are not ‘well-behaved’ in the complex space (in Penrose’s words: they do not conform to the ‘Eulerian notion’ of a function), and so they are of relatively little use – for solving complex problems that is!

A function such as f(z) = 2x + ixy2 is an example: there are no complex numbers for which the Cauchy-Riemann conditions hold (check it out: the Cauchy-Riemann conditions are xy = 1 and y = 0, and these two equations contradict each other). Hence, we can’t do much with this function really. For other functions, such as x2 + iy2, the Cauchy-Riemann conditions are only satisfied in very limited subsets of the functions’ domain: in this particular case, the Cauchy-Riemann conditions only hold when y = x. We also have functions for which the Cauchy-Riemann conditions hold everywhere except in one or more singularities. The very simple function f(z) = 1/z is an example of this: it is easy to see we have a problem when z = 0, because the function value is not determined there.

As for the last category of functions, one would expect there is an easy way out, using limits or something. And there is. Singularities are not a big problem and we can work our way around them. I found out that ‘working our way around them’ usually involves a so-called Laurent series representation of the function, which is a more generalized version of the Taylor expansion involving not only positive but also negative powers.

One of the other things I learned is how to solve contour integrals. Solving contour integrals is the equivalent, in the complex world, of integrating a real-valued function over an interval [a, b] on the real line. Contours are curves in the complex plane. They can be simple and closed (like a circle or an ellipse for instance), and usually they are, but then they don’t have to be simple and closed: they can self-intersect, for example, or they can go around some point or around some other curve more than once (and, yes, that makes a big difference: when you go around twice or more, you’re talking a different curve really).

But so these things can all be solved relatively easily – everything is relative of course 🙂 – if (and only if) the functions involved are analytic and/or if the singularities involved are isolated. In fact, we can extend the definition of analytic functions somewhat and define meromorphic functions: meromorphic functions are functions that are analytic throughout their domain except for one or more isolated singular points (also referred to as poles for some strange reason).

Holomorphic (and meromorphic) functions w = f(z) can be looked at as transformations: they map some domain D in the (complex) z plane to some region (referred to as the image of D) in the (complex) w plane. What makes them holomorphic is the fact that they preserve angles – as illustrated below.

342px-Conformal_map.svg

If you have read the first post on this blog, then you have seen this illustration already. Let me therefore present something better. The image below illustrates the function w = f(z) = z2 or, vice versa, the function z = √w = w1/2 (i.e. the square root of w). Indeed, that’s a very well-behaved function in the complex plane: every complex number (including negative real numbers) has two square roots in the complex plane, and so that’s what is shown below.   

hans squaredhans_squared-2

Huh? What’s this?

It’s simple: the illustration above uses color (in this case, a simple gray scale only really) to connect the points in the square region of the domain (i.e. the z plane) with an equally square region in the w plane (i.e. the image of the square region in the z plane). You can verify the properties of the z = w1/2 function indeed. At z=i we easily recognize a spot on the right ear of this person: it’s the w=−1 point in the w plane. Now, the same spot is found at z=−i. This reflects the fact that i2 = (-i)=−1. Similarly, this guy’s mouth, which represents the region near w=−i, is found near the two square roots of −in the z plane, which are z=±(1−i )/√2. In fact, every part of this man’s face is found at two places in the z plane, except for the spot between his eyes, which corresponds to w=0, and also to z=0 under this transformation. Finally, you can see that this transformation is holomorphic: all (local) angles are preserved. In that sense, it’s just like a conformal map of the Earth indeed. [Note, however, that I am glossing over the fact that z = w1/2 is multiple-valued: for each value of w, we have two square roots in the z-plane. That actually creates a bit of a problem when interpreting the image above. See the post scriptum at the bottom of this post for more text on this.]

[…] OK. This is fun. [And, no, it’s not me: I found this picture on the site of a Swedish guy called Hans Lundmark, and so I give him credit for making complex analysis so much fun: just Google him to find out more.] However, let’s get somewhat more serious again and ask ourselves why we’d need holomorphism?

Well… To be honest, I am not quite sure because I haven’t gone through the rest of the course material yet – or through all these other chapters in Penrose’s book (I’ve done 10 now, so there’s 24 left). That being said, I do note that, besides all of the niceties I described above (like easy solutions for contour integrals), it is also ‘nice’ that the real and imaginary parts of an analytic function automatically satisfy the Laplace equation.

Huh? Yes. Come on! I am sure you have heard about the Laplace equation in college: it is that partial differential equation which we encounter in most physics problems. In two dimensions (i.e. in the complex plane), it’s the condition that δ2f/δx+ δ2f/δyequals zero. It is a condition which pops us in electrostatics, fluid dynamics and many other areas of physical research, and I am sure you’ve seen simple examples of it. 

So, this fact alone (i.e. the fact that analytic functions pop up everywhere in physics) should be justification enough in itself I guess. Indeed, the first nine chapters of Brown and Churchill’s course are only there because of the last three, which focus on applications of complex analysis in physics. But is there anything more to it? 

Of course there is. Roger Penrose would not dedicate more than 200 pages to all of the above if it was not for more serious stuff than some college-level problems in physics, or to explain fluid dynamics or electrostatics. Indeed, after explaining why hypercomplex numbers (such as quaternions) are less useful than one might expect (Chapter 11 of his Road to Reality is about hypercomplex numbers and why they are useful/not useful), he jumps straight into the study of higher-dimensional manifolds (Chapter 12) and symmetry groups (Chapter 13). Now I don’t understand anything of that, as yet that is, but I sure do understand I’ll need to work my way through it if I ever want to understand what follows after: spacetime and Minkowskian geometry, quantum algebra and quantum field theory, and then the truly exotic stuff, such as supersymmetry and string theory. [By the way, from what I just gathered from the Internet, string theory has not been affected by the experimental confirmation of the existence of the Higgs particle, as it is said to be compatible with the so-called Standard Model.]

So, onwards we go! I’ll keep you posted. However, as I look at that (long) list of MIT courses, it may take some time before you hear from me again. 🙂

Post scriptum:

The nice picture of this Swedish guy is also useful to illustrate the issue of multiple-valuedness, which is an issue that pops up almost everywhere when you’re dealing with complex functions. Indeed, if we write w in its polar form w = reiθ, then its square root can be written as z = w1/2 = (√r)ei(θ/2+kπ), with k equal to either 0 or 1. So we have two square roots indeed for each w: each root has a length (i.e. its modulus or absolute value) equal to √r (i.e the positive square root of r) but their arguments are θ/2 and θ/2 + π respectively, and so that’s not the same. It means that, if z is a square root of some w in the w plane, then -z will also be a square root of w. Indeed, if the argument of z is equal to θ/2, then the argument of -z will be π/2 + π = π/2 + π – 2π = π/2 – π (we just rotate the vector by 180 degrees, which corresponds to a reflection through the origin). It means that, as we let the vector w = reiθ move around the origin – so if we let θ make a full circle starting from, let’s say, -π/2 (take the value w = –i for instance, i.e. near the guy’s mouth) – then the argument of the image of w will only go from (1/2)(-π/2) = -π/4 to (1/2)(-π/2 + 2π) = 3π/4. These two angles, i.e. -π/4 and 3π/4, correspond to the diagonal y=-x in the complex plane, and you can see that, as we go from -π/4 to 3π/4 in the z-plane, the image over this 180 degree swoop does cover every feature of this guy’s face – and here I mean not half of the guy’s face, but all of it. Continuing in the same direction (i.e. counterclockwise) from 3π/4 back to -π/4 just repeats the image. I will leave it to you to find out what happens with the angles on the two symmetry axes (y = x and y = -x).

Spaces

Pre-scriptum (added on 26 June 2020): The main problem with contemporary physics – all what Paul Ehrenfest referred to as the ‘unendlicher Heisenberg-Born-Dirac-Schrödinger Wurstmachinen-Physik-Betrieb’ – is that it stopped analyzing the physicality of the situation. One should, therefore, probably distinguishing physical from mathematical space. H.A. Lorentz said some useful things about that at the 1927 Solvay Conference. I wrote about that in a more recent paper.

Original post:

The term ‘space’ is all over the place when reading math. There are are all kinds of spaces in mathematics and the terminology is quite confusing. Let’s first start with the definition of a space: a space (in mathematics) is, quite simply, just a set of things (elements or objects) with some kind of structure (and, yes, there is also a definition for the extremely general notion of ‘structure’: a structure on a set consists of ‘additional mathematical objects’ that relate to the set in some manner attach – I am not sure that helps but so there you go).

The elements of the set can be anything. From what I read, I understand that a topological space might be the most general notion of a mathematical space. The Wikipedia article on it defines a topological space as “a set of points, along with a set of neighborhoods for each points, that satisfies a set of axioms relating points and neighborhoods.” It also states that “other spaces, such as manifolds and metric spaces, are (nothing but) specializations of topological spaces with extra structures or constraints.” However, the symbolism involved in explaining the concept is complex and probably prevents me from understanding the finer points. I guess I’d need to study topology for that – but so I am only doing a course in complex analysis right now. 🙂

Let’s go to something something more familiar: the metric spaces. A metric space is a set where a notion of distance (i.e a metric) between elements of the set is defined. That makes sense, doesn’t it?

We have Euclidean and non-Euclidean metric spaces. We all know Euclidean spaces as that is what use for the simple algebra and calculus we all had to learn as a teenager. They can have two, three or more dimensions, but there is only one Euclidean space in any dimension (a line, the plane or, more in general, the Cartesian multidimensional coordinate system). In Euclidean geometry, we have the parallel postulate: within a two-dimensional plane, for any given line L and a point p which is not on that line, there is only one line through that point – the parallel line –which does not intersect with the given line. Euclidean geometry is flat – well… Kind of.

Non-Euclidean metric spaces are a study object in their own but – to put it simply – in non-Euclidean geometry, we do not have the parallel postulate. If we do away with that, then there are two broad possibilities: either we have more than one parallel line or, else, we have none. Let’s start with the latter, i.e. no parallels. The most often quoted example of this is the sphere. A sphere is a two-dimensional surface where lines are actually great circles which divide the sphere in two equal hemispheres, like the equator or the meridians through the poles. They are the shortest path between the two points and so that corresponds to the definition of a line on a sphere indeed. Now, any ‘line’ (any geodesic that is) through a point p off a ‘line’ (geodesic) L will intersect with L, and so there are no parallel lines on a sphere – at least not as per the mathematical definition of a parallel line. Indeed, parallel lines do not intersect. I have to note that it is all a bit confusing because the so-called parallels of latitude on a globe are small circles, not great circles, and, hence, they are not the equivalent of a line in spherical geometry. In short, the parallels of latitude are not parallel lines – in the mathematical sense of the word that is. Does that make sense? For me it makes sense enough, so I guess I should move on.

Other Euclidean geometric facts, such as the angles of a triangle summing up to 180°, cannot be observed on a sphere either: the angles of a triangle on a sphere add up to more than 180° (for the big triangle illustrated below, they add up to 90 + 90 + 50 = 230°). That’s typical of Riemannian or elliptic geometry in general and so, yes, the sphere is an example of Riemannian or elliptic geometry.

Triangles_(spherical_geometry)

Of course, it is probably useful to remind ourselves that a sphere is a two-dimensional surface in elliptic geometry, even if we’re visualizing it in a three-dimensional space when we’re discussing its properties. Think about the flatlander walking on the surface on the globe: for him, the sphere is two-dimensional indeed, and so it’s not us – our world is not flat: we think in 3D – but the flatlander who is living in a spherically geometric world. It’s probably useful to introduce the term ‘manifold’ here. Spheres – and other surfaces, like the saddle-shaped surfaces we will introduce next – are (two-dimensional) manifolds. Now what’s that? I won’t go into the etymology of the term because that doesn’t help I feel: apparently, it has nothing to do with the verb to fold so it’s not something with many folds or so – although a two-dimensional manifold can look like something folded. A manifold is, quite simply, a topological space that near each point resembles Euclidean space, so we can define a metric on them indeed and do all kinds of things with that metric – locally that is. If the flatlander does these things – like measuring angles and lengths and what have you – close enough to where he is, then he won’t notice he’s living in a non-Euclidean space. That ‘fact’ is also shown above: the angles of a small (i.e. a local) triangle do add up to 180° – approximately that is.

Fine. Let’s move on again.

The other type of non-Euclidean geometry is hyperbolic or Lobachevskian geometry. Hyperbolic geometry is the geometry of saddle-shaped surfaces. Many lines (in fact, an infinite number of them) can be drawn parallel to a given line through a given point (the illustration below shows just one), and the angles of a triangle add up to less than 180°.

2000px-Hyperbolic_triangle

OK… Enough about metric spaces perhaps – except for noting that, when physicists talk about curved space, they obviously mean that the the space we are living in (i.e. the universe) is non-Euclidean: gravity curves it. And let’s add one or two other points as well. Anyone who has read something about Einstein’s special relativity theory will remember that the mathematician Hermann Minkowski added a time dimension to the three ordinary dimensions of space, creating a so-called Minkowski space, which is actually four-dimensional spacetime. So what’s that in mathematical terms? The ‘points’ in Minkowski’s four-dimensional spacetime are referred to as ‘events’. They are also referred to as four-vectors. The important thing to note here is that it’s not an Euclidean space: it is pseudo-Euclidean. Huh?

Let’s skip that for now and not make it any more complicated than it already. Let us just note that the Minkowski space is not only a metric space (and a manifold obviously, which resembles Euclidean space locally, which is why we don’t notice its curvature really): a Minkowski space is also a vector space. So what’s a vector space? The formal definition is clear: a vector space V over a (scalar) field F is a set of elements (vectors) together with two binary operations: addition and scalar multiplication. So we can add the vectors (that is the elements of the vector space) and scale them with a scalar, i.e. an element of the field F (that’s actually where the word ‘scalar’ comes from: something that scales vectors). The addition and scalar multiplication operations need to satisfy a number of axioms but these are quite straightforward (like associativity, commutativity, distributivity, the existence of an additive and multiplicative inverse, etcetera). The scalars are usually real numbers: in that case, the field F is equal to R, the set of real numbers (sorry I can’t use blackboard bold here for symbols so I am just using a bold capital R for the set of the real numbers), and the vector space is referred to as a real vector space. However, they can also be complex numbers: in that case, the field F is equal to C, the set of complex numbers, and the vector space is, obviously, referred to as a complex vector space.

N-tuples of elements of F itself (a1, a2,…, an) are a very straightforward example of a vector space: Euclidean spaces can be denoted as R (the real line), R(the Euclidean plane), R3 (Euclidean three-dimensional space), etcetera. Another example, is the set C of complex numbers: that’s a vector space too, and not only over the real numbers (F = R), but also over itself (F = C). OK. Fair enough, I’d say. What’s next?

It becomes somewhat more complicated when the ‘vectors’ (i.e. the elements in the vector space) are mathematical functions: indeed, a space can consists of functions (a function is just another object, isn’t it?), and function spaces can also be vector spaces because we can perform (pointwise) addition and scalar multiplication on functions. Vector spaces can consist of other mathematical objects too. In short, the notion of a ‘vector’ as some kind of arrow defined by some point in space does not cover the true mathematical notion of a vector, which is very general (an element of a ‘vector field’ as defined above). The same goes for fields: we usually think a field consists of numbers (real, complex, or whatever other number one can think of), but a field can also consist of vectors. In fact, vector fields are as common as scalar fields in in physics (think of vectors representing the speed and direction of a moving fluid for example, as opposed to its local temperature – which is a scalar quantity).

Quite confusing, isn’t it? Can we have a vector space over a vector field?

Tricky question. From what I read so far, I am not sure actually. I guess the answer is both yes and no. The second binary operation on the vector space is scalar multiplication, so the field F needs to be a scalar field – not a vector field. Indeed, the formal definition of a vector field is quite formal: F has to be a scalar field. But then complex numbers z can be looked at not only as scalars in their own right (that’s the focus of the complex analysis course that I am currently reading) but also as vectors in their own right. OK, we’re talking a special kind of vectors here (vectors from the origin to the point z) but so what? And then we already noted that C is a vector field over itself, didn’t we?

So how do we define scalar multiplication in this case? Can we use the ‘scalar’ product (aka as the dot product, or the inner product) between two vectors here (as opposed to the cross-product, aka as the vector product tout court)? I am not sure. Not at all actually. Perhaps it is the usual product between two complex numbers – (x+iy)(u+iv) = (xu-yv) + i(xv+yu) – which, unlike the standard ‘scalar’ product between two vectors, returns another complex number as a result (as opposed to the dot product (x,y)·(u,v), which is equal to the real number xu + yv). The Brown & Churchill course on complex analysis which I am reading just notes that “this product [of two complex numbers] is, evidently, neither the scalar nor the vector product used in ordinary vector analysis”. However, because the ‘scalar’ product returns a single (real) number as a result, while the product of two complex numbers is – quite obviously – a complex number in itself, I must assume it’s the above-mentioned ‘usual product between two complex numbers’ that is to be used for ‘scalar’ multiplication in this case (i.e. the case of defining C as a vector field over itself). In addition, the geometric interpretation of multiplying two complex numbers show that it actually is a matter of scaling: the ‘length’ of the complex number zw (i.e. its absolute value) is equal to the product of the absolute value of z and w respectively and, as for its argument (i.e. the angle from the real line), the argument of zw is equal to the sum of the arguments of z and w respectively. In short, this looks very much like what scalar multiplication is supposed to do.

Let’s see if we can confirm the above (i.e. C being a vector field over itself, with scalar multiplication being defined as the product between two complex numbers) at some later point in time. For the moment, I think I’ve had quite enough on – for the moment at least. Indeed, while I note there are many other unexplored spaces, I will not attempt to penetrate these as for now. For example, I note the frequent reference to Hilbert spaces but, from what I understand, this is also some kind of vector space, but then with even more additional structure and/or a more general definition of the ‘objects’ which make up its elements. Its definition involves Cauchy sequences of vectors – and that’s a concept I haven’t studied as yet. To make things even more complicated, there are also Banach spaces – and lots of other things really. So I will need to look at that but, again, for the moment I’ll just leave the matter alone and get back into the nitty-gritty of complex analysis. Doing so will probably clarify all of the more subtle points mentioned above.

[…] Wow! It’s been quite a journey so far, and I am still in the first chapters of the course only!

PS: I did look it up just now (i.e. a few days later than when I wrote the text above) and my inference is correct: C is a complex vector space over itself, and the formula to be used for scalar multiplication is the standard formula for multiplying complex numbers: (x+iy)(u+iv) = (xu-yv) + i(xv+yu). C2, or Cn in general, are other examples of complex vector spaces (i.e. a vector space over C).

Euler’s formula

I went trekking (to the Annapurna Base Camp this time) and, hence, left the math and physics books alone for a week or two. When I came back, it was like I had forgotten everything, and I wasn’t able to re-do the exercises. Back to the basics of complex numbers once again. Let’s start with Euler’s formula:

eix = cos(x) + isin(x)

In his Lectures on Physics, Richard Feynman calls this equation ‘one of the most remarkable, almost astounding, formulas in all of mathematics’, so it’s probably no wonder I find it intriguing and, indeed, difficult to grasp. Let’s look at it. So we’ve got the real (but irrational) number e in it. That’s a fascinating number in itself because it pops up in different mathematical expressions which, at first sight, have nothing in common with each other. For example, e can be defined as the sum of the infinite series e = 1/0! + 1/2! + + 1/3! + 1/4! + … etcetera (n! stands for the factorial of n in this formula), but one can also define it as that unique positive real number for which d(et)/dt = et (in other words, as the base of an exponential function which is its own derivative). And, last but not least, there are also some expressions involving limits which can be used to define e. Where to start? More importantly, what’s the relation between all these expressions and Euler’s formula?

First, we should note that eix is not just any number: it is a complex number – as opposed to the more simple ex expression, which denotes the real exponential function (as opposed to the complex exponential function ez). Moreover, we should note that eix is a complex number on the unit circle. So, using polar coordinates, we should say that eix  is a complex number with modulus 1 (the modulus is the absolute value of the complex number (i.e. the distance from 0 to the point we are looking at) or, alternatively, we could say it is the magnitude of the vector defined by the point we are looking at) and argument x (the argument is the angle (expressed in radians) between the positive real axis and the line from 0 to the point we are looking at).

Now, it is self-evident that cos(x) + isin(x) represents exactly the same: a point on the unit circle defined by the angle x. But so that doesn’t prove Euler’s formula: it only illustrates it. So let’s go to one or the other proof of the formula to try to understand it somewhat better. I’ll refer to Wikipedia for proving Euler’s formula in extenso but let me just summarize it. The Wikipedia article (as I looked at it today) gives three proofs.

The first proof uses the power series expansion (yes, the Taylor/Maclaurin series indeed – more about that later) for the exponential function: eix = 1 + ix + (ix)2/2! + (ix)3/3! +… etcetera. We then substitute using i2 = -1, i3 = –i etcetera and so, when we then re-arrange the terms, we find the Maclaurin series for the cos(x) and sin(x) functions indeed. I will come back to these power series in another post.

The second proof uses one of the limit definitions for ex but applies it to the complex exponential function. Indeed, one can write ez (with z = x+iy) as ez = lim(1 + z/n)n for n going to infinity. The proof substitutes ix for z and then calculates the limit for very large (or infinite) n indeed. This proof is less obvious than it seems because we are dealing with power series here and so one has to take into account issues of convergence and all that.

The third proof also looks complicated but, in fact, is probably the most intuitive of the three proofs given because it uses the derivative definition of e. To be more precise, it takes the derivative of both sides of Euler’s formula using the polar coordinates expression for complex numbers. Indeed, eix is a complex number and, hence, can be written as some number z = r(cosθ+ isinθ), and so the question to solve here is: what’s r and θ? We need to write these two values as a function of x. How do we do that? Well… If we take the derivative of both sides, we get d(eix)/dx = ieix = (cosθ + isinθ)dr/dx + r[d(cosθ + isinθ)/dθ]dθ/dx. That’s just the chain rule for derivatives of course. Now, writing it all out and equating the real and imaginary parts on both sides of the expression yields following: dr/dx = 0 and dθ/dx = 1. In addition, we must have that, for x = 0, ei0 = [ei]0 = 1, so we have r(0) = 1 (the modulus of the complex number (1,0) is one) and θ(0) = 0 (the argument of (1,0) is zero). It follows that the functions r and θ are equal to r = 1 and θ = x, which proves the formula.

While these proofs are (relatively) easy to understand, the formula remains weird, as evidenced also from its special cases, like ei0 = ei = 1 = – eiπ = – eiπ or, equivalently, eiπ + 1 = 0, which is a formula which combines the five most basic quantities in mathematics: 0, 1, i, e and π. It is an amazing formula because we have two irrational numbers here, e and π, which have definitions which do not refer to each other at all (last time I checked, π was still being defined as the simple ratio of a circle’s circumference to its diameter, while the various definitions of e have nothing to do with circles), and so we combine these two seemingly unrelated numbers, also inserting the imaginary unit i (using iπ as an exponent for e) and we get minus 1 as a result (eiπ = – 1). Amazing indeed, isn’t it?

[…] Well… I’d say at least as amazing as the Taylor or Maclaurin expansion of a function – but I’ll save my thoughts on these for another post (even if I am using the results of these expansions in this post). In my view, what Euler’s formula shows is the amazing power of mathematical notation really – and the creativity behind. Indeed, let’s look at what we’re doing with complex numbers: we start from one or two definitions only and suddenly all kinds of wonderful stuff starts popping up. It goes more or less like this really:

We start off with these familiar x and y coordinates of points in a plane. Now we call the x-axis the real axis and then, just to distinguish them from the real numbers, we call the numbers on the y-axis imaginary numbers. Again, it is just to distinguish them from the real numbers because, in fact, imaginary numbers are not imaginary at all: they are as real as the real numbers – or perhaps we should say that the real numbers are as imaginary as the imaginary numbers because, when everything is said and done, the real numbers are mental constructs as well, aren’t they? Imaginary numbers just happen to lie on another line, perpendicular to our so-called real line, and so that’s why we add a little symbol i (the so-called imaginary unit) when we write them down. So we write 1i (or i tout court), 2i, 3i etcetera, or i/2 or whatever (it doesn’t matter if we write i before the real number or after – as long as we’re consistent).

Then we combine these two numbers – the real and imaginary numbers – to form a so-called complex number, which is nothing but a point (x, y) in this Cartesian plane. Indeed, while complex numbers are somewhat more complex than the numbers we’re used to in daily life, they are not out of this world I’d say: they’re just points in space, and so we can also represent them as vectors (‘arrows’) from the origin to (x, y).

But so this is what we are doing really: we combine the real and imaginary numbers by using the very familiar plus (+) sign, so we write z = x + iy. Now that is actually where the magic starts: we are not adding the same things here, like we would do when we are counting apples or so, or when we are adding integers or rational or real numbers in general. No, we are adding here two different things here – real and imaginary numbers – which, in fact, we cannot really add. Indeed, your mommy told you that you cannot compare apples with oranges, didn’t she? Well… That’s exactly what we do here really, and so we will keep these real and imaginary numbers separate in our calculations indeed: we will add the real parts of complex numbers with each other only, and the imaginary parts of them also with each other only.

Addition is quite straightforward: we just add the two vectors. Multiplication is somewhat more tricky but (geometrically) easy to interpret as well: the product of two complex numbers is a vector with a length which is equal to the sum of the lengths of the two vectors we are multiplying (i.e. the two complex numbers which make up the product) , and its angle with the real axis is the sum of the angles of the two original vectors. From this definition, many things follow, all equally amazing indeed, but one of these amazing facts is that i2 = -1, i3 = –i, i4 = 1, i5 = i, etcetera. Indeed: multiplying a complex number z = x + iy = (x, y) with the imaginary unit i amounts to rotating it 90° (counterclockwise) about the origin. So we are not defining i2 as being equal to minus 1 (many textbooks treat this equality as a definition indeed): it just comes as a fact which we can derive from the earlier definition of a complex product. Sweet, isn’t it?

So we have addition and multiplication now. We want to do much more of course. After defining addition and multiplication, we want to do complex powers, and so it’s here that this business with e pops up.

We first need to remind ourselves of the simple fact that the number e is just a real number: it’s equal to 2.718281828459045235360287471 etcetera. We have to write ‘etcetera’ because e is an irrational number, which – whatever the term ‘irrational’ may suggest in everyday language – simply means that e is not a fraction of any integer numbers (so irrational means ‘not rational’). e is also a transcendental number – a word which suggest all kinds of mystical properties but which, in mathematics, only means we cannot write it as a root of some polynomial (a polynomial with rational coefficients that is). So it’s a weird number. That being said, it is also the so-called ‘natural’ base for the exponential function. Huh? Why would mathematicians take such a strange number as a so-called ‘natural’ base? They must be irrational, no? Well… No. If we take e as the base for the exponential function ex (so that’s just this real (but irrational) number e to the power x, with x being the variable running along the x-axis: hence, we have a function here which takes a value from the set of real numbers and which yields some other real number), then we have a function here which is its own derivative: d(ex)/dx = ex. It is also the natural base for the logarithmic function and, as mentioned above, it kind of ‘pops up’ – quite ‘naturally’ indeed I’d say – in many other expressions, such as compound interest calculations for example or the general exponential function ax = ex lna. In other words, we need this and exp(x) and ln(x) functions to define powers of real numbers in general. So that’s why mathematicians call it ‘natural’.

While the example of compound interest calculations does not sound very exciting, all these formulas with e and exponential functions and what have you did inspire all these 18th century mathematicians – like Euler – who were in search of a logical definition of complex powers.

Let’s state the problem once again: we can do addition and multiplication of complex numbers but so the question is how to do complex powers. When trying to figure that one out, Euler obviously wanted to preserve the usual properties of powers, like axay = ax+y and, effectively, this property of the so-called ‘natural’ exponential function that d(ex)/dx = ex. In other words, we also want the complex exponential function to be its own derivative so d(ez)/dz should give us ez once again.

Now, while Euler was thinking of that (and of many other things too of course), he was well aware of the fact that you can expand ex into that power series which I mentioned above: ex = 1/0! + x/1! + (x)2/2! + (x)3/3! +… etcetera. So Euler just sat down, substituted the real number x with the imaginary number ix and looked at it: eix = 1 + ix + (ix)2/2! + (ix)3/3! +… etcetera. Now lo and behold! Taking into account that i2 = -1, i3 = –i, i4 = 1, i5 = i, etcetera, we can put that in and re-arrange the terms indeed and so Euler found that this equation becomes eix = (1 – x2/2! + x4/4! – -x6/6! +…) + i(x – x3/3! + x5/5! -… ). Now these two terms do correspond to the Maclaurin series for the cosine and sine function respectively, so there he had it: eix = cos(x) + isin(x). His formula: Euler’s formula!

From there, there was only one more step to take, and that was to write ez = ex+iy as exeiy, and so there we have our definition of a complex power: it is a product of two factors – ex and ei– both of which we have effectively defined now. Note that the ex factor is just a real number, even if we write it as ex: it acts as a sort of scaling factor for eiwhich, you will remember (as we pointed it out above already), is a point on the unit circle. More generally, it can be shown that eis the absolute value of ez (or the modulus or length or magnitude of the vector – whatever term you prefer: they all refer to the same), while y is the argument of the complex number ez (i.e. the angle of the vector ez with the real axis). [And, yes, for those who would still harbor some doubts here: eis just another complex number and, hence, a two-dimensional vector, i.e. just a point in the Cartesian plane, so we have a function which goes from the set of complex numbers here (it takes z as input) and which yields another complex number.]

Of course, you will note that we don’t have something like zw here, i.e. a complex base (i.e. z) with a complex exponent (i.e. w), or even a formula for complex powers of real numbers in general, i.e. a formula for aw with a any real number (so not only e but any real number indeed) and w a complex exponent. However, that’s a problem which can be solved easily through writing z and w in their so-called polar form, so we write z as z = ¦z¦eiθ = ¦z¦(cosθ + isinθ) and w as ¦w¦ eiσ =  ¦w¦(cosσ + isinσ) and then we can take it further from there. [Note that ¦z¦ and ¦w¦represent the modulus (i.e. the length) of z and w respectively, and the angles θ and σ are obviously the arguments of the same z and w respectively.] Of course, if z is a real number (so if y = 0), then the angle θ will obviously be zero (i.e. the angle of the real axis with itself) and so z will be equal to a real number (i.e. its real part only, as its imaginary part is zero) and then we are back to the case of a real base and a complex exponent. In other words, that covers the aw case.

[…] Wel… Easily? OK. I am simplifying a bit here – as I need to keep the length of this post manageable – but, in fact, it actually really is a matter of using these common properties of powers (such as ea+biec = e(a+c)+bi and it actually does all work out. And all of this magic did actually start with simply ‘adding’ the so-called ‘real’ numbers x on the x-axis with the so-called ‘imaginary’ numbers on the y-axis. 🙂

Post scriptum:

Penrose’s Road to Reality dedicates a whole chapter to complex exponentiation (Chapter 5). However, the development is not all that simple and straightforward indeed. The first step in the process is to take integer powers – and integer roots – of complex numbers, so that’s zn for n = 0, ±1, ±2, ±3… etcetera (or z1/2, z1/3, z1/4 if we’re talking integer roots). That’s easy because it can be solved through using the old formula of Abraham de Moivre: (cosθ + sinθ)n = cos(nθ) + isin(nθ) (de Moivre penned this down in 1707 already, more than 40 years before Euler looked at the matter). However, going from there to full-blown complex powers is, unfortunately, not so straightforward, as it involves a bit of a detour: we need to work with the inverse of the (complex) exponential function ez, i.e. the (complex) natural logarithm.

Now that is less easy than it sounds. Indeed, while the definition of a complex logarithm is as straightforward as the definition of real logarithms (lnz is a function for which elnz = z), the function itself is a bit more… well… complex I should say. For starters, it is a multiple-valued function: if we write the solution w = lnz as w = u+iv, then it is obvious that ew will be equal to eu+iv = eueiv and this complex number ew can then be written in its polar form ew = reiθ with r = eu and v = θ + 2nπ. Of course, ln(eu+iv) = u + iv and so the solution of w will look like w = lnr + i(θ + 2nπ) with n = 0, ±1, ±2, ±3 etcetera. In short, we have an infinite number of solutions for w (one for every n we choose) and so we have this problem of multiple-valuedness indeed. We will not dwell on this here (at least not in this post) but simply note that this problem is linked to the properties of the complex exponential function ez itself. Indeed, the complex exponential function ez has very different properties than the real exponential function ex. First, we should note that, unlike e(which, as we know goes from zero at the far end of the negative side of the real axis to infinity as x goes big on the positive side), eis a periodic function – so it oscillates and yields the same values after some time – with this ‘after some time’ being the periodicity of the function. Indeed, e= e+2πi and so its period 2πi (note that this period is an imaginary number – but so it’s a ‘real’ period, if you know what I mean :-)). In addition, and this is also very much unlike the real exponential function ex, ecan be negative (as well as assume all kinds of other complex values). For example, eiπ = -1, as we noted above already.

That being said, the problem of multiple-valuedness can be solved through the definition of a principal value of lnz and that, then, leads us to what we want here: a consistent definition of a complex power of a complex base (or the definition of a true complex exponential (and logarithmic) function in other words). To those who would want to see the details of this (i.e. my imaginary readers :-)), I would say that Penrose’s treatment of the matter in the above-mentioned Chapter 5 of The Road to Reality is rather cryptic – presumably because he has to keep his book around 1000 pages only (not a lot to explain all of the Laws of the Universe) and, hence, Brown & Churchill’s course (or whatever other course dealing with complex analysis) probably makes for easier reading.

[As for the problem of multiple-valuedness, we should probably also note the following: when taking the nth root of a complex number (i.e. z1/n with n = 2, 3, etcetera), we also obtain a set of n values ck (with k = 0, 1, 2,… n-1), rather than one value only. However, once we have one of these values, we have all of them as we can write these cas ck = r1/nei(θ/n+2kπ/n), (with the original complex number z equal to z = reiθ) then so we could also just consider the principal value c0 and, as such, consider the function as a single-valued one. In short, the problem of multiple-valued functions pops up almost everywhere in the complex space, but it is not an issue really. In fact, we encounter the problem of multiple-valuedness as soon as we extend the exponential function in the space of the real numbers and also allow rational and real exponents, instead of positive integers only. For example, 41/2 is equal to ±2, so we have two results here too and, hence, multiple values. Another example would be the 4th root of 16: we have four 4th roots of 16: +2, -2 and then two complex roots +2i and -2i. However, standard practice is that we only take the positive value into account in order to ensure a ‘well-behaved’ exponential function. Indeed, the standard definition of a real exponential function is bx = (elnb)x = elnbex, and so, if x = 1/n, we’ll only assign the positive 4th root to ex. Standard practice will also restrict the value of b to a positive real number (b > 0). These conventions not only ensures a positive result but also continuity of the function and, hence, the existence of a derivative which we can then use to do other things. By the way, the definition also shows – once again – why e is such a nice (or ‘natural’) number: we can use it to calculate the value for any exponential function (for any real base b > 0). But so we had mentioned that already, and it’s now really time to stop writing. I think the point is clear.]

No royal road to reality

Pre-scriptum dated 19 May 2020: The entry below shows how easy it is to get ‘lost in math’ when studying physics. One doesn’t really need this stuff.

Text: I got stuck in Penrose’s Road to Reality—at chapter 5. That is not very encouraging: the book has 34 chapters, and every chapter seems to be getting more difficult.

In Chapter 5, Penrose introduces complex algebra. As I tried to get through it, I realized I had to do some more self-study. Indeed, while Penrose claims no other books or courses are needed to get through his, I do not find this to be the case. So I bought a fairly standard course in complex analysis (James Ward Brown and Ruel V. Churchill, Complex variables and applications) and I’ve done chapter 1 and 2 now. Although these first two chapters do not nothing else than introduce the subject-matter, I find the matter rather difficult and the terminology confusing. Examples:

1. The term ‘scalar’ is used to denote real numbers. So why use the term ‘scalar’ if the word ‘real’ is available as well? And why not use the term ‘real field’ instead of ‘scalar field’? Well… The term ‘real field’ actually means something else. A scalar field associates a (real) number to every point in space. So that’s simple: think of temperature or pressure. The term ‘scalar’ is said to be derived from ‘scaling’: a scalar is that what scales vectors. Indeed, scalar multiplication of a vector and a real number multiplies the magnitude of the vector without changing its direction. So what is a real field then? Well… A (formally) real field is a field that can be extended with a (not necessarily unique) ordering which makes it an ordered field. Does that help? Somewhat, I guess. But why the qualifier ‘formally real’? I checked and there is no such thing as an ‘informally real’ field. I guess it’s just to make sure we know what we are talking about, because ‘real’ is a word with many meanings.

2. So what’s a field in mathematics? It is an algebraic structure: a set of ‘things’ (like numbers) with operations defined on it, including the notions of addition, subtraction, multiplication, and division. As mentioned above, we have scalar fields and vector fields. In addition, we also have fields of complex numbers. We also have fields with some less likely candidates for addition and multiplication, such as functions (one can add and multiply functions with each other). In short, anything which satisfies the formal definition of a field – and here I should note that the above definition of a field is not formal – is a field. For example, the set of rational numbers satisfies the definition of a field too. So what is the formal definition? First of all, a field is a ring. Huh? Here we are in this abstract classification of algebraic structures: commutative groups, rings, fields, etcetera (there are also modules – a type of algebraic structure which I had never ever heard of before). To put it simply – because we have to move on, of course – a ring (no one seems to know where that word actually comes from) has addition and multiplication only, while a field has division too. In other words, a ring does not need to have multiplicative inverses. Huh?  It’s simply, really: the integers form a ring, but the equation 2·x = 1 does not have a solution in integers (x = ½) and, hence, the integers do not form a field. The same example shows why rational numbers do.

3. But what about a vector field? Can we do division with vectors? Yes, but not by zero – but that is not a problem as that is understood in the definition of a field (or in the general definition of division for that matter). In two-dimensional space, we can represent vectors by complex numbers: z = (x,y), and we have a formula for the so-called multiplicative inverse of a complex number: z-1 = (x/x2+y2, -y/x2+y2). OK. That’s easy. Let’s move on to more advanced stuff.

4. In logic, we have the concept of well-formed formulas (wffs). In math, we have the concept of ‘well-behaved’: we have well-behaved sets, well-behaved spaces and lots of other well-behaved things, including well-behaved functions, which are, of course, those of interest to engineers and scientists (and, hence, in light of the objective of understanding Penrose’s Road to Reality, to me as well). I must admit that I was somewhat surprised to learn that ‘well-behaved’ is one of the very few terms in math that have no formal definition. Wikipedia notes that its definition, in the science of mathematics that is, depends on ‘mathematical interest, fashion, and taste’. Let me quote in full here: “To ensure that an object is ‘well-behaved’ mathematicians introduce further axioms to narrow down the domain of study. This has the benefit of making analysis easier, but cuts down on the generality of any conclusions reached. […] In both pure and applied mathematics (optimization, numerical integration, or mathematical physics, for example), well-behaved means not violating any assumptions needed to successfully apply whatever analysis is being discussed. The opposite case is usually labeled pathological.” Wikipedia also notes that “concepts like non-Euclidean geometry were once considered ill-behaved, but are now common objects of study.”

5. So what is a well-behaved function? There is actually a whole hierarchy, with varying degrees of ‘good’ behavior, so one function can be more ‘well-behaved’ than another. First, we have smooth functions: a smooth function has derivatives of all orders (as for its name, it’s actually well chosen: the graph of a smooth function is actually, well, smooth). Then we have analytic functions: analytic functions are smooth but, in addition to being smooth, an analytic function is a function that can be locally given by a convergent power series. Huh? Let me try an alternative definition: a function is analytic if and only if its Taylor series about x0 converges to the function in some neighborhood for every x0 in its domain. That’s not helping much either, is it? So… Well… Let’s just leave that one for now.

In fact, it may help to note that the authors of the course I am reading (J.W. Brown and R.V. Churchill, Complex Variables and Applications) use the terms analytic, regular and holomorphic as interchangeable, and they define an analytic function simply as a function which has a derivative everywhere. While that’s helpful, it’s obviously a bit loose (what’s the thing about the Taylor series?) and so I checked on Wikipedia, which clears the confusion and also defines the terms ‘holomorphic’ and ‘regular’:

“A holomorphic function is a complex-valued function of one or more complex variables that is complex differentiable in a neighborhood of every point in its domain. The existence of a complex derivative in a neighborhood is a very strong condition, for it implies that any holomorphic function is actually infinitely differentiable and equal to its own Taylor series. The term analytic function is often used interchangeably with ‘holomorphic function’ although the word ‘analytic’ is also used in a broader sense to describe any function (real, complex, or of more general type) that can be written as a convergent power series in a neighborhood of each point in its domain. The fact that the class of complex analytic functions coincides with the class of holomorphic functions is a major theorem in complex analysis.”

Wikipedia also adds following: “Holomorphic functions are also sometimes referred to as regular functions or as conformal maps. A holomorphic function whose domain is the whole complex plane is called an entire function. The phrase ‘holomorphic at a point z0’ means not just differentiable at z0, but differentiable everywhere within some neighborhood of z0 in the complex plane.”

6. What to make of all this? Differentiability is obviously the key and, although there are many similarities between real differentiability and complex differentiability (both are linear and obey the product rule, quotient rule, and chain rule), real-valued functions and complex-valued functions are different animals. What are the conditions for differentiability? For real-valued functions, it is a matter of checking whether or not the limit defining the derivative exists and, of course, a necessary (but not sufficient) condition is continuity of the function.

For complex-valued functions, it is a bit more sophisticated because we’ve got so-called Cauchy-Riemann conditions applying here. How does that work? Well… We write f(z) as the sum of two functions: f(z) = u(x,y) + iv(x,y). So the real-valued function u(x,y) yields the real part of f(z), while v(x,y) yields the imaginary part of f(z). The Cauchy-Riemann equations (to be interpreted as conditions really) are the following: ux = vy and uy = -v(note the minus sign in the second equation).

That looks simple enough, doesn’t it? However, as Wikipedia notes (see the quote above), differentiability at a point z0 is not enough (to ensure the existence of the derivative of f(z) at that point). We need to look at some neighborhood of the point z0 and see if these first-order derivatives (ux, uy, vx and vy) exist everywhere in that neighborhood and satisfy these Cauchy-Riemann equations. So we need to look beyond the point z0 itself hen doing our analysis: we need to  ‘approach’ it from various directions before making any judgment. I know this sounds like Chinese but it became clear to me when doing the exercises.

7. OK. Phew!  I got this far – but so that’s only chapter 1 and 2 of Brown & Churchill’s course !  In fact, chapter 2 also includes a few sections on so-called harmonic functions and harmonic conjugates. Let’s first talk about harmonic functions. Harmonic functions are even better behaved than holomorphic or analytic functions. Well… That’s not the right way to put it really. A harmonic function is a real-valued analytic function (its value could represent temperature, or pressure – just as an example) but, for a function to qualify as ‘harmonic’, an additional condition is imposed. That condition is known as Laplace’s equation: if we denote the harmonic function as H(x,y), then it has to have second-order derivatives which satisfies Hxx(x,y) + Hyy(x,y) = 0.

Huh? Laplace’s equation, or harmonic functions in general, plays an important role in physics, as the condition that is being imposed (the Laplace equation) often reflects a real-life physical constraint and, hence, the function H would describe real-life phenomena, such as the temperature of a thin plate (with the points on the plate defined by the (x,y) coordinates), or electrostatic potential. More about that later. Let’s conclude this first entry with the definition of harmonic conjugates.

8. As stated above, a harmonic function is a real-valued function. However, we also noted that a complex function f(z) can actually be written as a sum of a real and imaginary part using two real-valued functions u(x,y) and v(x,y). More in particular, we can write f(z) = u(x,y) + iv(x,y), with i the imaginary number (0,1). Now, if u and v would happen to be harmonic functions (but so that’s an if of course – see the Laplace condition imposed on their second-order derivatives in order to qualify for the ‘harmonic’ label) and, in addition to that, if their first-order derivatives would happen to satisfy the Cauchy-Riemann equations (in other  words, f(z) should be a well-behaved analytic function), then (and only then) we can label v as the harmonic conjugate of u.

What does that mean? First, one should note that when v is a harmonic conjugate of u in some domain, it is not generally true that u is a harmonic conjugate of v. So one cannot just switch the functions. Indeed, the minus sign in the Cauchy–Riemann equations makes the relationship asymmetric. But so what’s the relevance of this definition of a harmonic conjugate? Well… There is a theorem that turns the definition around: ‘A function f(z) = u(x,y) + iv(x,y) is analytic (or holomorphic to use standard terminology) in a domain D if and only if v is a harmonic conjugate of u. In other words, introducing the definition of a harmonic conjugate (and the conditions which their first- and second-order derivatives have to satisfy), allows us to check whether or not we have a well-behaved complex-valued function (and with ‘well-behaved’ I mean analytic or holomorphic).    

9. But, again, why do we need holomorphic functions? What’s so special about them? I am not sure for the moment, but I guess there’s something deeper in that one phrase which I quoted from Wikipedia above: “holomorphic functions are also sometimes referred to as regular functions or as conformal maps.” A conformal mapping preserves angles, as you can see on the illustration below, which shows a rectangular grid and its image under a conformal map: f maps pairs of lines intersecting at 90° to pairs of curves still intersecting at 90°. I guess that’s very relevant, although I do not know why exactly as for now. More about that in later posts.

342px-Conformal_map.svg