Euler’s formula revisited

Pre-scriptum (dated 26 June 2020): This post – part of a series of rather simple posts on elementary math and physics – did not suffer much from the attack by the dark force—which is good because I still like it. Enjoy !

Original post:

This post intends to take some of the magic out of Euler’s formula. In fact, I started doing that in my previous post but I think that, in this post, I’ve done a better job at organizing the chain of thought. [Just to make sure: with ‘Euler’s formula’, I mean e^ix= cos(x) + isin(x). Euler produced a lot of formulas, indeed, but this one is, for math, what E = mc²is for physics. :-)]

The grand idea is to start with an initial linear approximation for the value of the complex exponential e^is near s = 0 (to be precise, we’ll use the eⁱ^ε = 1 + iε formula) and then show how the ‘magic’ of i – through the i²= –1 factor – gives us the sine and cosine functions. What we are going to do, basically, is to construct the sine and cosine functions algebraically.

Let us, as a starting point – just to get us focused – graph (i) the real exponential function e^x, i.e. the blue graph, and (ii) the real and imaginary part of the complex exponential function e^ix= cos(x) + isin(x), i.e. the red and green graph—the cosine and sine function. From these graphs, it’s clear that e^x and e^ixare two very different beasts.

1. e^xis just a real-valued function of x, so it ‘maps’ the real number x to some other real number y = e^x. That y value ‘rockets’ away, thereby demonstrating the power of exponential growth. There’s nothing really ‘special’ about e^x. Indeed, writing e^xinstead of 10^xobviously looks better when you’re doing a blog on math or physics but, frankly, there’s no real reason to use that strange number e ≈ 2.718 when all you need is just a standard real exponential. In fact, if you’re a high school student and you want to attract attention with some paper involving something that grows or shrinks, I’d recommend the use of π^x. 🙂

2. e^ixis something that’s very different. It’s a complex-valued function of x and it’s not about exponential growth (though it obviously is about exponentiation, i.e. repeated multiplication): y = e^ixdoes not ‘explode’. On the contrary: y is just a periodic ‘thing’ with two components: a sine and a cosine. [Note that we could also change the base, to 10, for example: then we write 10^ix. We’d also get something periodic, but let’s not get lost before we even start.]

Two different beasts, indeed. How can the addition of one tiny symbol – the little i in e^ix– can make such big difference?

The two beasts have one thing in common: the value of the function near x = 0 can be approximated by the same linear formula:

In case you wonder where this comes from, it’s basically the definition of the derivative of a function, as illustrated below. This is nothing special. It’s a so-called first-order approximation of a function. The point to note is that we have a similar-looking formula for the complex-valued e^ixfunction. Indeed, its derivative is d(e^ix)/dx = ie^ixand when we evaluate that derivative at x = 0, then we get ie⁰= i. So… Yes, the grand result is that we can effectively write:

e^iε≈ 1 + iε for small ε

Of course, 1 + iε is also a different ‘beast’ than 1 + ε. Indeed, 1 + ε is just a continuation of our usual walk along the real axis, but 1 + iε points in a different direction (see below). This post will show you where it’s headed.

Let’s first work with e^xagain, and think about a value for ε. We could take any value, of course, like 0.1 or some fraction 1/n. We’ll use a fraction—for reasons that will become clear in a moment. So the question now is: what value should we use for n in that 1/n fraction? Well… Because we are going to use this approximation as the initial value in a series of calculations—be patient: I’ll explain in a moment—we’d like to have a sufficiently small fraction, so our subsequent calculations based on that initial value are not too far off. But what’s sufficiently small? Is it 1/10, or 1/100,000, or 1/10¹⁰⁰? What gives us ‘good enough’ results? In fact, how do we define ‘good enough’?

Good question! In order to try to define what’s ‘good enough’, I’ll turn the whole thing on its head. In the table below, I calculate backwards from e¹= e by taking successive square roots of e. Huh? What? Patience, please! Just go along with me for a while. First, I calculate e^1/2, so our fraction ε, which I’ll just write as x, is equal to 1/2 here, so the approximation for e^1/2 is 1 + 1/2 = 1.5. That’s off. How much? Well… The actual value of e^1/2 is about 1.648721 (see the table below (or use a calculator or spreadsheet yourself): note that, because I copied the table from Excel, e^x is shown as e^x). Now, 1.648721 is 1.5 + 0.148721, so our approximation (1.5) is about 9% off (as compared to the actual value). Not all that much, but let’s see how we can improve. Let’s take the square root once again: (e^1/2)^1/2= e^1/4, so x = 1/4. And then I do that again, so I get e^1/8, and so on and so on. All the way down to x = 1/1024 = 1/2¹⁰, so that’s ten iterations. Our approximation 1 + x (see the fifth/last column in the table below is then equal to 1 + 1/1024 = 1 + 0.0009765625, which we rounded to 1.000977 in the table.

The actual value of e^1/1024is also about 1.000977, as you can see in the third column of the table. Not exactly, of course, but… Well… The accuracy of our approximation here is six digits behind the decimal point, so that’s equivalent to one part in a millionth. That’s not bad, but is it ‘good enough’? Hmm… Let’s think about it, but let’s first calculate some other things. The fourth column in the table above calculates the slope of that AB line in the illustration above: its value converges to one, as we would expect, because that’s the slope of the tangent line at x = 0. [So that’s the value of the derivative of e^xat x = 0. Just check it: de^x/dx = e^x, obviously, and e⁰= 1.] Note that our 1 + x approximation also converges to 1—as it should!

So… Well… Let’s now just assume we’re happy with with that approximation that’s accurate to one part in a million, so let’s just continue to work with this fraction 1/1024 for x. Hence, we will write that e^1/1024≈ 1 + 1/1024 and now we will use that value also for the complex exponential. Huh? What? Why? Just hang in here for a while. Be patient. 🙂 So we’ll just add the i again and, using that e^iε≈ 1 + iε expression, we write:

e^i/1024≈ 1 + i/1024

It’s quite obvious that 1 + i/1024 is a complex number: its real part is 1, and its imaginary part is 1/1024 = 0.0009765625.

Let’s now work our way up again by using that complex number 1 + i/1024 = 1 + i·0.0009765625 to calculate e^i/512, e^i/256, e^i/128etcetera. All the way back up to x = 1, i.e. eⁱ. I’ll just use a different symbol for x: in the table below, I’ll substitute x for s because I’ll refer to the real part of our complex numbers as ‘x’ from time to time (even if I write a and b in the table below), and so I can’t use the symbol x to denote the fraction. [I could have started with s, but then… Well… Real numbers are usually denoted by x, and so it was easier to start that way.] In any case…

The thing to note is how I calculate those values e^i/512, e^i/256, e^i/128etcetera. I am doing it by squaring, i.e. I just multiply the (complex) number by itself. To be very explicit, note that e^i/512 = (e^i/1024)²= e^i·2/1024 = (e^i/1024)(e^i/1024). So all that I am doing in the table below is multiply the complex number that I have with itself, and then I have a new result, and then I square that once again, and then again, and again, and again etcetera. In other words, when going back up, I am just taking the square of a (complex) number. Of course, you know how to multiply a number with itself but, because we’re talking complex numbers here, we should actually write it out:

(a + i·b)²= a²– b² + i·2ab = a²– b² + 2abi

[It would be good to always separate the imaginary unit i from real numbers like a, b, or ab, but then I am lazy and so I hope you’ll always recognize that i is the imaginary unit.] In any case… When we’re going back up (by squaring), the real part of the next number (i.e. the ‘x’ in x + iy) is a²– b² and the complex part (the ‘y’) is 2abi. So that’s what’s shown below—in the fourth and fifth column, that is.

Look at what happens. The x goes to zero and then becomes negative, and the y increases to one. Now, we went down from e^1/n = e¹ = e^1/1to e^1/n = e^1/1024, but we could have started with e², or e^4/n, or whatever. Hence, I should actually continue the calculations above so you can see what happens when s goes to 2, and then to 3, and then to 4, and so on and so on. What you’d see is that the value of the real and imaginary part of this complex exponential goes up and down between –1 and +1. You’d see both are periodic functions, like the sine and cosine functions, which I added in the last two columns of the table above. Now compare those a and b values (i.e. the second and third column) with the cosine and sine values (i.e. the last two columns). […] Do you see it? Do you see how close they are? Only a few parts in a million, indeed.

You need to let this sink it for a while. And I’d recommend you make a spreadsheet yourself, so you really ‘get’ what’s going on here. It’s all there is to the so-called ‘magic’ of Euler’s formula. That simple (a + ib)²= a²– b² + 2abi formula shows us why (and how) the real and imaginary part oscillate between –1 and +1, just like the cosine and sine function. In fact, the values are so close that it’s easy to understand what follows. They are the same—in the limit, of course.

Indeed, these values a²– b² and 2ab, i.e. the real and imaginary part of the next complex number in our series, are what Feynman refers to as the algebraic cosine and sine functions, because we calculate them as (a + ib)²= a²– b² + 2abi. These algebraic cosine and sine values are close to the real cosine and sine values, especially for small fractions s. Of course, there is a discrepancy becomes – when everything is said and done – we do carry a little error with us from the start, because we stopped at 1/n = 1/1024, before going back up.

There’s actually a much more obvious way to appreciate the error: we know that e^1/1024 should be some point on the unit circle itself. Therefore, we should not equate a with 1 if we have some value b > 0. Or – what amounts to saying the same – if if b is slightly bigger than 0, then a should be slightly smaller than 1. So the e^iε≈ 1 + iε is an approximation only. It cannot be exact for positive values of ε. It’s only exact when ε = 0.

So we’re off—but not far off as you can see. In addition, you should note that the error becomes bigger and bigger for larger s. For example, in the line for s = 1, we calculated the values of the algebraic cosine and sine for s = 2 (see the a^2 – b^2 and 2ab column) as –0.416553 and 0.910186, but the actual values are cos(2) = –0.416146 and sin(2) = 0.909297, which shows our algebraic cosine and sine function is gradually losing accuracy indeed (we’re off like one part in a thousand here, instead of one part in a million). That’s what we’d expect, of course, as we’re multiplying the errors as we move ‘back up’.

The graph below plots the values of the table.

This graph also shows that, as we’re doubling our ratio r all the time, the data points are being spaced out more and more. This ‘spacing out’ gets a lot worse when further increasing s: from s = 1 (that’s the ‘highest’ point in the graph above), we’d go to s = 2, and then to s = 4, s = 8, etcetera. Now, these values are not shown above but you can imagine where they are: for s = 2, we’re somewhere in the second quadrant, for s = 4, we’re in the third, etcetera. So that does not make for a smooth graph. We need points in-between. So let’s ‘fix’ this problem by taking just one value for s out of the table (s = 1/4, for example) and we’ll continue to use that value as a multiplier.

That’s what’s done in the table below. It looks somewhat daunting at first but it’s simple really. First, we multiply the value we got for e^1/4with itself once again, so that gives us a real and an imaginary part for e^1/8(we had that already in the table above and you can check: we get the same here). We then take that value (i.e. e^1/8) not to multiply it with itself but with e^1/4once again. Of course, because the complex numbers are not the same, we cannot use the (a + ib)²= a²– b² + 2abi rule any more. We must now use the more general rule for multiplying different complex numbers: (a + ib)(c + id) = (ac – bd) + i(ad + bc). So that’s why I have an a, b, c and d column in this table: a and b are the components of the first number, and c and d of the second (i.e. e^1/4= 0.969031 + 0.247434i)

In the table above, I let s range from zero (0) to seven (7) in steps of 0.25 (= 1/4). Once again, I’ve added the real cosine and sine values for these angles (they are, of course, expressed in radians), because that’s what s is here: an angle, aka as the phase of the complex number. So you can compare.

The table confirms, once again, that we’re slowly losing accuracy (we’re now 3 to 4 parts in a thousand off), but it is very slowly only indeed: we’d need to do many ‘loops’ around the center before we could actually see the difference on a graph. Hey! Let’s do a graph. [Excel is such a great tool, isn’t it?] Here we are: the thick black line describing a circle on the graph below connects the actual cosine and sine values associated with an angle of 1/4, 1/2, 3/8 etcetera, all the way up to 7 (7 is about 2.3π, so we’re some 40 degrees past our original point after the ‘loop’), while the little ‘+‘ marks are the data points for the algebraic cosine and sine. They match perfectly because our eye cannot see the little discrepancy.

So… That’s it. End of story.

What?

Yes. That’s it. End of story. I’ve done what I promised to do. I constructed the sine and cosine functions algebraically. No compass. 🙂 Just plain arithmetic, including one extra rule only: i²= –1. That’s it.

So I hope I succeeded. The goal was to take some of the magic out of Euler’s formula by showing how that eⁱ^ε = 1 + iε approximation and the definition of i²= –1 gives us the cosine and sine function itself as we move around the unit circle starting from the unity point on the real axis, as shown in that little graph:

Of course, the ε we were working with was much smaller than the size of the arrow suggests (it was equal to 1/1024 ≈ 0.000977 to be precise) but that’s just to show how differentials work. 🙂 Pretty good, isn’t it? 🙂

Post scriptum:

I. If anything, all this post did was to demonstrate multiplication of complex numbers. Indeed, when everything is said and done, exponentiation is repeated multiplication–both for real as well as for complex exponents. The only difference is–well… Complex exponents give us these oscillating things, because a complex exponent effectively throws a sine and cosine function in.

Now, we can do all kinds of things with that. In this post, we constructed a circle without a compass. Now, that’s not as good as squaring the circle 🙂 but, still, it would have awed Pythagoras. Below, I construct a spiral doing the same kind of math: I start off with a complex number again but now it’s somewhat more off the unit circle (1 + 0.247434i). In fact, I took the same sine value as the one we had for e^i/4but I replaced the cosine value (0.969031) with 1 exactly). In other words, my ε is a lot bigger here.

Then I multiply that complex number 1 + 0.247434i with itself to get the next number (0.938776 + 0.494868i), and then I multiply that result once again with my first number (1 + 0.247434i), just like we did when we were constructing the circle. And then it goes on and on and on. So the only difference is the initial value: that’s a bit more off the unit circle. [When we constructed the circle, our initial value was also a bit off but much less. Here we go for a much larger difference.]

So you can see what happens: multiplying complex numbers amounts to adding angles and multiplying magnitudes: αe^iβ·γe^iδ = αγe^i(β+^δ)=|αe^iβ|·|γe^iδ|e^i(β+^δ)| = |α||γ|e^i(β+^δ). So, because we started off with a complex number with magnitude slightly bigger than 1 (you calculate it using Pythagoras’ theorem: it’s 1.03, more or less, which is 3% off, as opposed less than one part in a million for the 1 + 0.000977i number), the next point is, of course, slightly off the unit circle too, and some more than 3% actually. And so that goes on and on and on and the ‘some more’ becomes bigger and bigger in the process.

Constructing a graph like this one is like doing the kind of silly stuff I did when programming little games with our Commodore 64 in the 1980s, so I shouldn’t dwell too much on this. In fact, now that I think of it: I should have started near –i, then my spiral would have resembled an e. 🙂 And, yes – for family reading this – this is also like the favorite hobby of our dad: calculating a better value for π. 🙂

However… The only thing I should note, perhaps, is that this kind of iterative process resembles – to some extent – the kind of process that iterative function systems (IFSs) use to create fractals. So… Well… It’s just nice, I guess. [OK. That’s just an excuse. Sorry.]

II. The other thing that I demonstrated in this post may seem to be trivial but I’ll emphasize it here because it helped me (not sure about you though) to understand the essence of real exponentials much better than I did before. So, what is it?

Well… It’s that rather remarkable fact that calculating (real) irrational powers amounts to doing some infinite iteration. What do I mean with that?

Well… Remember that we kept on taking the square root of e, so we calculated e^1/2, and then (e^1/2)^1/2= e^1/4, and then (e^1/4)^1/2= e^1/8, and then we went on: e^1/16, e^1/32, e^1/64, all the way down to e^1/1024, where we stopped. That was 10 iterations only. However, it was clear we could go on and on and on, to find that limit we know so well: e^1/Δtends to 1 (not to zero (0), and not to e either!) for Δ → ∞.

Now, e = e¹ is an exponential itself and so we can switch to another base, base-10 for example, using the general a^s= (b^k)^s= b^ks= b^tformula, with k = log_b(a). Let’s do base-10: we get e¹ = [10^log₁₀(e)]¹= 10^{0.434294…etcetera}. Now, because e is an irrational number, log₁₀(e) is irrational too, so we indeed have an infinite number of decimals behind the decimal point in 0.434294…etcetera. In fact, e is not only irrational but transcendental: we can’t calculate it algebraically, i.e. as the root of some polynomial with rational coefficients. Most irrational numbers are like that, by the way, so don’t think that being ‘transcendental’ is very special. In any case… That’s a finer point that doesn’t matter much here. You get the idea, I hope. It’s the following:

When we have a rational power a^m/n , it helps to think of it as a product of m factors a^1/n (and surely if we would want to calculate a^m/n without using a calculator, which, I admit, is not very fashionable anymore and so nobody ever does that: too bad, because the manual work involved does help to better understand things). Let’s write it down: a^m/n = a^m·(1/n) =(a^1/n)^m = a^1/n·a^1/n·a^1/n·a^1/n =·… (m times). That’s simple indeed: exponentiation is repeated multiplication. [Of course, if m is negative, then we just write a^m/nas 1/(a^m/n), but so that doesn’t change the general idea of exponentiation.]
However, it is much more difficult to see why, and how, exponentiation with irrational powers amounts to repeated multiplication too. The rather lengthy exposé above shows… Well, perhaps not why, but surely how. [And in math, if we can show how, that usually amounts to showing why also, isn’t it? :-)] Indeed, when we think of a^r (i.e. an irrational power of some (real) number a), we can think of it as a product of an infinite number of factors a^r/Δ. Indeed, we can write a^ras:

a^r= a^{r(1/Δ + 1/Δ + 1/Δ + 1/Δ +…)}= a^r/Δ·a^r/Δ·a^r/Δ·a^r/Δ…

Not convinced? Let’s work an example: 10^π= [e^ln10]^π= [e^ln10]^π = e^ln10·^π= e^ln10·^π= e^7.233784…Of course, if you take your calculator, you’ll find something like 1385.455731, both for 10^π and e^7.233784 (hopefully!), but so that’s not the point here. We’ve shown that e is an infinite product e^1/Δ·e^1/Δ·e^1/Δ·e^1/Δ·… =e^(1/Δ+^{1/Δ+1/Δ+1/Δ+…)}= e^Δ/Δ with Δ some infinitely large (but integer) number. In our example, we stopped the calculation at Δ = 1024, but you see the logic: we could have gone on forever. Therefore, we can write e^7.233784… as

e^7.233784… = e^7.233784…(^1/Δ+^{1/Δ+1/Δ+1/Δ+…)}= e^{7.233784…/Δ}·e^{7.233784…/Δ}·e^{7.233784…/Δ}…

Still not convinced? Let’s revert back to base 10. We can write the factors e^{7.233784…/Δ}as e^{(ln10·π)/Δ}= [e^ln10]^π/Δ= 10^π/Δ. So our original power 10^πis equal to: 10^π= 10^π/Δ·10^π/Δ·10^π/Δ·10^π/Δ·10^π/Δ·10^π/Δ… = 10^π(Δ/Δ), and of course, 10^1/Δ also tends to 1 as Δ goes to infinity (not to zero, and not to 10 either). 🙂 So, yes, we can do this for any real number a and for any r really.

Again, this may look very trivial to the trained mathematical eye but, as a novice in Mathematical Wonderland, I felt I had to go through this to truly understand irrational powers. So it may or may not help you, depending on where you are in MW.

[Proving that the limit for Δ/Δ goes to 1 as Δ goes to ∞ should not be necessary, I hope? 🙂 But, just in case you wonder how the formula for rational and irrational powers could possibly be related, we can just write a^m/n= a^{(m/n)(1/Δ + 1/Δ + 1/Δ + 1/Δ +…)}= a^m/nΔ·a^m/nΔ·a^m/nΔ·a^m/nΔ·…= (a^{1/Δ + 1/Δ + 1/Δ + 1/Δ +…})^m/n= a^m/n, as we would expect. :-)]

III. So how does that a^r= a^r/Δ·a^r/Δ·a^r/Δ·a^r/Δ… formula work for complex exponentials? We just add the i, so we write a^ir but we know what effect that has: we have a different beast now. A complex-valued function of r, or… Well… If we keep the exponent fixed, then it’s a complex-valued function of a! Indeed, do remember we have a choice here (and two inverse functions as well!).

However, note that we can write a^ir in two slightly different ways. We have two interpretations here really:

A. The first interpretation is the easiest one: we write a^ir as a^ir = (a^r)ⁱ= (a^{r/Δ + r/Δ + r/Δ + r/Δ +…})ⁱ.

So we have a real power here, a^r, and so that’s some real number, and then we raise it to the power i to create that new beast: a complex-valued function with two components, one imaginary and one real. And then we know how to relate these to the sine and cosine function: we just change the base to e and then we’re done.

In fact, now that we’re here, let’s go all the way and do it. As mentioned in my previous post – it follows out of that a^s= (e^k)^s= e^ks= e^tformula, with k = ln(a) – the only effect of a change of base is a change of scale of the horizontal axis: the graph of a^sis fully identical to the graph of e^tindeed: we just we need to substitute s by t = ks = ln(a)·s. That’s all. So we actually have our ‘Euler formula for a^ishere. For example, for base 10, we have 10^is= cos[ln(a)·s] + isin[ln(a)·s].

But let’s not get lost in the nitty-gritty here. The idea here is that we let i ‘act’ on a^r, so to say. And then, of course, we can write a^r as we want, but that doesn’t change the essence of what we’re dealing with.

B. The second interpretation is somewhat more tricky: we write a^ir as a^ir = a^ir/Δ·a^ir/Δ·a^ir/Δ·a^ir/Δ·…

So that’s a product of an (infinite) number of complex factors a^ir/Δ. Now, that is a very different interpretation than the one above, even if the mathematical result when putting real numbers in for a and r will – obviously – have to be the same. If the result is the same, then what am I saying really? Well… Nothing much, I guess. Just that the interpretation of an exponentiation as repeated multiplication makes sense for complex exponentials as well:

For rational r, we’ll have a finite number of complex factors: a^im/n = a^i/n·a^i/n·a^i/n·a^i/n·… (m times).
For irrational r, we’ll have an infinite number of complex factors a^ir = a^ir/Δ·a^ir/Δ·a^ir/Δ·a^ir/Δ… etcetera.

So the difference with the first interpretation is that, instead of looking at a^iras a real number a^rthat’s being raised to the complex power i, we’re looking at a^iras a complex number aⁱthat’s being raised to the real power r. As said, the mathematical result when putting real numbers in for a and r will – obviously – have to be the same. [Otherwise we’d be in serious trouble of course: math is math. We can’t have the same thing being associated with two different results.] But, as said, we can effectively interpret a^ir in two ways.

[…]

What I am doing here, of course, is contemplating all kinds of mathematical operations here – including exponentiation – on the complex space, rather on the real space. So the first step is to raise a complex number to a real power (as opposed to raising a real number to a complex power). The next step will be to raise a complex number to a complex power. So then we’re talking complex-valued functions of complex variables.

Now, that’s what complex analysis is all about, and I’ve written very extensively about that in my October-November 2013 post. So I would encourage you to re-read those, now that you’ve got, hopefully, a bit more of an ‘intuitive’ understanding of complex numbers with the background given in this and my previous post.

Complex analysis involves mapping (i.e. mapping from one complex space to another) and that, in turn, involves the concept of so-called analytic and/or holomorphic functions. Understanding those advanced concepts is, in turn, essential to understanding the kind of things that Penrose is writing about in Chapter 9 to 12 of his Road to Reality. […] I’ll probably re-visit these chapters myself in the coming weeks, as I realize I might understand them somewhat better now. If I could get through these, I’d be at page 250 or so, so that’s only one quarter of the total volume. Just an indication of how long that Road to Reality really is. 🙂

And then I am still not sure if it really leads to ‘reality’ because, when everything is said and done, those new theories (supersymmetry, M-theory, or string theory in general) are quite speculative, aren’t they? 🙂

Riemann surfaces (II)

Pre-scriptum (dated 26 June 2020): the material in this post remains interesting but is, strictly speaking, not a prerequisite to understand quantum mechanics. It’s yet another example of how one can get lost in math when studying or teaching physics.

Original post:

This is my second post on Riemann surfaces, so they must be important. [At least I hope so, because it takes quite some time to understand them. :-)]

From my first post on this topic, you may or may not remember that a Riemann surface is supposed to solve the problem of multivalued complex functions such as, for instance, the complex logarithmic function (log z = ln r + i(θ + 2nπ) or the complex exponential function (z^c = e^{c log z}). [Note that the problem of multivaluedness for the (complex) exponential function is a direct consequence of its definition in terms of the (complex) logarithmic function.]

In that same post, I also wrote that it all looked somewhat fishy to me: we first use the function causing the problem of multivaluedness to construct a Riemann surface, and then we use that very same surface as a domain for the function itself to solve the problem (i.e. to reduce the function to a single-valued (analytic) one). Penrose does not have any issues with that though. In Chapter 8 (yes, that’s where I am right now: I am moving very slowly on his Road to Reality, as it’s been three months of reading now, and there are 34 chapters!), he writes that “Complex (analytic) functions have a mind of their own, and decide themselves what their domain should be, irrespective of the region of the complex plane which we ourselves may initially have allotted to it. While we may regard the function’s domain to be represented by the Riemann surface associated with the function, the domain is not given ahead of time: it is the explicit form of the function itself that tells us which Riemann surface the domain actually is.”

Let me retrieve the graph of the Riemannian domain for the log z function once more:

For each point z in the complex plane (and we can represent z both with rectangular as well as polar coordinates: z = x + iy = re^iθ), we have an infinite number of log z values: one for each value of n in the log z = ln r + i(θ + 2nπ) expression (n = 0, ±1, ±2, ±3,…, ±∞). So what we do when we promote this Riemann surface as a domain for the log z function is equivalent to saying that point z is actually not one single point z with modulus r and argument θ + 2nπ, but an infinite collection of points: these points all have the same modulus ¦z¦ = r but we distinguish the various ‘representations’ of z by treating θ, θ ± 2π, θ ±+ 4π, θ ± 6π, etcetera, as separate argument values as we go up or down on that spiral ramp. So that is what is represented by that infinite number of sheets, which are separated from each other by a vertical distance of 2π. These sheets are all connected at or through the origin (at which the log z function is undefined: therefore, the origin is not part of the domain), which is the branch point for this function. Let me copy some formal language on the construction of that surface here:

“We treat the z plane, with the origin deleted, as a thin sheet R₀which is cut along the positive half of the real axis. On that sheet, let θ range from 0 to 2π. Let a second sheet R₁be cut in the same way and placed in front of the sheet R₀. The lower edge of the slit in R₀is then joined to the upper edge of the slit in R₁. On R₁, the angle θ ranges from 2π to 4π; so, when z is represented by a point on R₁, the imaginary component of log z ranges from 2π to 4π.” And then we repeat the whole thing, of course: “A sheet R₂is then cut in the same way and placed in front of R₁. The lower edge of the slit in R₁. is joined to the upper edge of the slit in this new sheet, and similarly for sheets R₃, R₄… A sheet R_-1, R_-2, R_-3,… are constructed in like manner.” (Brown and Churchill, Complex Variables and Applications, 7th edition, p. 335-336)

The key phrase above for me is this “when z is represented by a point on R₁“, because that’s what it is really: we have an infinite number of representations of z here, namely one representation of z for each branch of the log z function. So, as n = 0, ±1, , ±2, ±3 etcetera, we have an infinite number of them indeed. You’ll also remember that each branch covers a range from some random angle α to α + 2π. Imagine a continuous curve around the origin on this Riemann surface: as we move around, the angle of z changes from 0 to 2θ on sheet R₀, and then from 2π to 4π on sheet R₁and so on and so on.

The illustration above also illustrates the meaning of a branch point. Imagine yourself walking on that surface and approaching the origin, from any direction really. At the origin itself, you can choose what to do: either you take the elevator up or down to some other level or, else, the elevator doesn’t work and so then you have to walk up or down that ramp to get to another level. If you choose to walk along the ramp, the angle θ changes gradually or, to put it in mathematical terms, in a continuous way. However, if you took the elevator and got out at some other level, you’ll find that you’ve literally ‘jumped’ one or more levels. Indeed, remember that log z = ln r + i(θ + 2nπ) and so ln r, the horizontal distance from the origin didn’t change, but you did add some multiple of 2π to the vertical distance, i.e. the imaginary part of the log z value.

Let us now construct a Riemann surface for some other multiple-valued functions. Let’s keep it simple and start with the square root of z, so c = 1/2, which is nothing else than a specific example of the complex exponential function z^c = z^c = e^{c log z}: we just take a real number for c here. In fact, we’re taking a very simple rational number value for c: 1/2 = 0.5. Taking the square, cube, fourth or n^th root of a complex number is indeed nothing but a special case of the complex exponential function. The illustration below (taken from Wikipedia) shows us the Riemann surface for the square root function.

As you can see, the spiraling surface turns back into itself after two turns. So what’s going on here? Well… Our multivalued function here does not have an infinite number of values for each z: it has only two, namely √r e^i(θ/2)and √r e^{i(θ/2 + π)}. But what’s that? We just said that the log function – of which this function is a special case – had an infinite number of values? Well… To be somewhat more precise: z^1/2actually does have an infinite number of values for each z (just like any other complex exponential function), but it has only two values that are different from each other. All the others coincide with one of the two principal ones. Indeed, we can write the following:

w = √z = z^1/2= e^{(1/2) log z}= e^{(1/2)[ln r + i(θ + 2nπ)]}= r^1/2e^{i(θ/2 + nπ)}= √re^{i(θ/2 + nπ)}

(n = 0, ±1, ±2, ±3,…)

For n = 0, this expression reduces to z^1/2= √re^iθ/2. For n = ±1, we have z^1/2= √re^{i(θ/2 + π)}, which is different than the value we had for n = 0. In fact, it’s easy to see that this second root is the exact opposite of the first root: √re^{i(θ/2 + π)} = √re^iθ/2e^iπ = – √re^iθ/2). However, for n = 2, we have z^1/2= √re^{i(θ/2 + 2π)}, and so that’s the same value (z^1/2= √re^iθ/2) as for n = 0. Indeed, taking the value of n = 2 amounts to adding 2π to the argument of w and so get the same point as the one we found for n = 0. [As for the plus or minus sign, note that, for n = -1, we have z^1/2= √re^{i(θ/2 -π)}= √re^{i(θ/2 -π+2π)}= √re^{i(θ/2 +π)}and, hence, the plus or minus sign for n does not make any difference indeed.]

In short, as mentioned above, we have only two different values for w = √z = z^1/2and so we have to construct two sheets only, instead of an infinite number of them, like we had to do for the log z function. To be more precise, because the sheet for n = ±2 will be the same sheet as for n = 0, we need to construct one sheet for n = 0 and one sheet for n = ±1, and so that’s what shown above: the surface has two sheets (one for each branch of the function) and so if we make two turns around the origin (one on each sheet), we’re back at the same point, which means that, while we have a one-to-two relationship between each point z on the complex plane and the two values z^1/2for this point, we’ve got a one-on-one relationship between every value of z^1/2and each point on this surface.

For ease of reference in future discussions, I will introduce a personal nonsensical convention here: I will refer to (i) the n = 0 case as the ‘positive’ root, or as w₁, i.e. the ‘first’ root, and to (ii) the n = ± 1 case as the ‘negative’ root, or w₂, i.e. the ‘second’ root. The convention is nonsensical because there is no such thing as positive or negative complex numbers: only their real and imaginary parts (i.e. real numbers) have a sign. Also, these roots also do not have any particular order: there are just two of them, but neither of the two is like the ‘principal’ one or so. However, you can see where it comes from: the two roots are each other’s exact opposite w₂= u₂ + iv₂= —w₁= -u₁ – iv₁. [Note that, of course, we have w₁w₁ = w₁²= w₂w₂ = w₂²= z, but that the product of the two distinct roots is equal to —z. Indeed, w₁w₂= w₂w₁= √re^i(θ/2)√re^{i(θ/2 + π)} = re^i(θ+^π) = re^iθeⁱ^π= -re^iθ= -z.]

What’s the upshot? Well… As I mentioned above already, what’s happening here is that we treat z = re^i(θ+2π)as a different ‘point’ than z = re^iθ. Why? Well… Because of that square root function. Indeed, we have θ going from 0 to 2π on the first ‘sheet’, and then from 2π0 to 4π on the second ‘sheet’. Then this second sheet turns back into the first sheet and so then we’re back at normal and, hence, while θ going from 0π to 2π is not the same as θ going from 2π to 4π, θ going from 4π to 6π is the same as θ going from 0 to 2π (in the sense that it does not affect the value of w = z^1/2). That’s quite logical indeed because, if we denote w as w = √r e^iΘ(with Θ = θ/2 + nπ, and n = 0 or ± 1), then it’s clear that arg w = Θ will range from 0 to 2π if (and only if) arg z = θ ranges from 0 to 4π. So as the argument of w makes one loop around the origin – which is what ‘normal’ complex numbers do – the argument of z makes two loops. However, once we’re back at Θ = 2π, then we’ve got the same complex number w again and so then it’s business as usual.

So that will help you to understand why this Riemann surface is said to have two complex dimensions, as opposed to the plane, which has only one complex dimension.

OK. That should be clear enough. Perhaps one question remains: how do you construct a nice graph like the one above?

Well, look carefully at the shape of it. The vertical distance reflects the real part of √z for n = 0, i.e. √r cos(θ/2). Indeed, the horizontal plane is the the complex z plane and so the horizontal axes are x and y respectively (i.e. the x and y coordinates of z = x + iy). So this vertical distance equals 1 when x = 1 and y = 0 and that’s the highest point on the upper half of the top sheet on this plot (i.e. the ‘high-water mark’ on the right-hand (back-)side of the cuboid (or rectangular prism) in which this graph is being plotted). So the argument of z is zero there (θ = 0). The value on the vertical axis then falls from one to zero as we turn counterclockwise on the surface of this first sheet, and that’s consistent with a value for θ being equal to π there (θ = π), because then we have cos(π/2) = 0. Then we go underneath the z plane and make another half turn, so we add another π radians to the value θ and we arrive at the lowest point on the lower half of the bottom sheet on this plot, right under the point where we started, where θ = 2π and, hence, Re(√z) = √r cos(θ/2) (for n = 0) = cos(2π/2) = cos(2π/2) = -1.

We can then move up again, counterclockwise on the bottom sheet, to arrive once again at the spot where the bottom sheet passes through the top sheet: the value of θ there should be equal to θ = 3π, as we have now made three half turns around the origin from our original point of departure (i.e. we added three times π to our original angle of departure, which was θ = 0) and, hence, we have Re(√z) = √r cos(3θ/2) = 0 again. Finally, another half turn brings us back to our point of departure, i.e. the positive half of the real axis, where θ has now reached the value of θ = 4π, i.e. zero plus two times 2π. At that point, the argument of w (i.e. Θ) will have reached the value of 2π, i.e. 4π/2, and so we’re talking the same w = z^1/2 as when we started indeed, where we had Θ = θ/2 = 0.

What about the imaginary part? Well… Nothing special really (as for now at least): a graph of the imaginary part of √z would be equally easy to establish: Im(√z) = √r sin(θ/2) and, hence, rotating this plot 180 degrees around the vertical axis will do the trick.

Hmm… OK. What’s next? Well… The graphs below show the Riemann surfaces for the third and fourth root of z respectively, i.e. z^1/3and z^1/4 respectively. It’s easy to see that we now have three and four sheets respectively (instead of two only), and that we have to take three and four full turns respectively to get back at our starting point, where we should find the same values for z^1/3and z^1/4 as where we started. That sounds logical, because we always have three cube roots of any (complex) numbers, and four fourth roots, so we’d expect to need the same number of sheets to differentiate between these three or four values respectively.

In fact, the table below may help to interpret what’s going on for the cube root function. We have three cube roots of z: w_1,w₂and w₃. These three values are symmetrical though, as indicated be the red, green and yellow colors in the table below: for example, the value of w for θ ranging from 4π to 6π for the n = 0 case (i.e. w₁) is the same as the value of w for θ ranging from 0 to 2π for the n = 1 case (or the n = -2 case, which is equivalent to the n = 1 case).

So the origin (i.e. the point zero) for all of the above surfaces is referred to as the branch point, and the number of turns one has to make to get back at the same point determines the so-called order of the branch point. So, for w = z^1/2, we have a branch point of order 2; for for w = z^1/3, we have a branch point of order 3; etcetera. In fact, for the log z function, the branch point does not have a finite order: it is said to have infinite order.

After a very brief discussion of all of this, Penrose then proceeds and transforms a ‘square root Riemann surface’ into a torus (i.e. a donut shape). The correspondence between a ‘square root Riemann surface’ and a torus does not depend on the number of branch points: it depends on the number of sheets, i.e. the order of the branch point. Indeed, Penrose’s example of a square root function is w = (1 – z³)^1/2, and so that’s a square root function with three branch points (the three roots of unity), but so these branch points are all of order two and, hence, there are two sheets only and, therefore, the torus is the appropriate shape for this kind of ‘transformation’. I will come back to that in the next post.

OK… But I still don’t quite get why this Riemann surfaces are so important. I must assume it has something to do with the mystery of rolled-up dimensions and all that (so that’s string theory), but I guess I’ll be able to shed some more light on that question only once I’ve gotten through that whole chapter on them (and the chapters following that one). I’ll keep you posted. 🙂

Post scriptum: On page 138 (Fig. 8.3), Penrose shows us how to construct the spiral ramp for the log z function. He insists on doing this by taking overlapping patches of space, such as the L(z) and Log z branch of the log z function, with θ going from 0 to 2π for the L(z) branch) and from -π to +π for the Log z branch (so we have an overlap here from 0 to +π). Indeed, one cannot glue or staple patches together if the patch surfaces don’t overlap to some extent… unless you use sellotape of course. 🙂 However, continuity requires some overlap and, hence, just joining the edges of patches of space with sellotape, instead of gluing overlapping areas together, is not allowed. 🙂

So, constructing a model of that spiral ramp is not an extraordinary intellectual challenge. However, constructing a model of the Riemann surfaces described above (i.e. z^1/2, z^1/3, z^1/4 or, more in general, constructing a Riemann surface for any rational power of z, i.e. any function w = z^n/m, is not all that easy: Brown and Churchill, for example, state that is actually ‘physically impossible’ to model that (see Brown and Churchill, Complex Variables and Applications (7th ed.), p. 337).

Huh? But so we just did that for z^1/2, z^1/3and z^1/4, didn’t we? Well… Look at that plot for w = z^1/2 once again. The problem is that the two sheets cut through each other. They have to do that, of course, because, unlike the sheets of the log z function, they have to join back together again, instead of just spiraling endlessly up or down. So we just let these sheets cross each other. However, at that spot (i.e. the line where the sheets cross each other), we would actually need two representations of z. Indeed, as the top sheet cuts through the bottom sheet (so as we’re moving down on that surface), the value of θ will be equal to π, and so that corresponds to a value for w equal to w = z^1/2 = √r e^iπ/2 (I am looking at the n = 0 case here). However, when the bottom sheet cuts through the top sheet (so if we’re moving up instead of down on that surface), θ’s value will be equal to 3π (because we’ve made three half-turns now, instead of just one) and, hence, that corresponds to a value for w equal to w = z^1/2 = √r e^3iπ/2, which is obviously different from √r e^iπ/2. I could do the same calculation for the n = ±1 case: just add ±π to the argument of w.

Huh? You’ll probably wonder what I am trying to say here. Well, what I am saying here is that plot of the surface gives us the impression that we do not have two separate roots w₁and w₂on the (negative) real axis. But so that’s not the case: we do have two roots there, but we can’t distinguish them with that plot of the surface because we’re only looking at the real part of w.

So what?

Well… I’d say that shouldn’t worry us all that much. When building a model, we just need to be aware that it’s a model only and, hence, we need to be aware of the limitations of what we’re doing. I actually build a paper model of that surface by taking two paper disks: one for the top sheet, and one for the bottom sheet. Then I cut those two disks along the radius and folded and glued both of them like a Chinese hat (yes, like the one the girl below is wearing). And then I took those two little paper Chinese hats, put one of them upside down, and ‘connected’ them (or should I say, ‘stitched’ or ‘welded’ perhaps? :-)) with the other one along the radius where I had cut into these disks. [I could go through the trouble of taking a digital picture of it but it’s better you try it yourself.]

Wow! I did not expect to be used as an illustration in a blog on math and physics! 🙂

🙂 OK. Let’s get somewhat more serious again. The point to note is that, while these models (both the plot as well as the two paper Chinese hats :-)) look nice enough, Brown and Churchill are right when they note that ‘the points where two of the edges are joined are distinct from the points where the two other edges are joined’. However, I don’t agree with their conclusion in the next phrase, which states that it is ‘thus physically impossible to build a model of that Riemann surface.’ Again, the plot above and my little paper Chinese hats are OK as a model – as long as we’re aware of how we should interpret that line where the sheets cross each other: that line represents two different sets of points.

Let me go one step further here (in an attempt to fully exhaust the topic) and insert a table here with the values of both the real and imaginary parts of √z for both roots (i.e. the n = 0 and n = ± 1 case). The table shows what is to be expected: the values for the n = ± 1 case are the same as for n = 0 but with the opposite sign. That reflects the fact that the two roots are each other’s opposite indeed, so when you’re plotting the two square roots of a complex number z = re^iθ, you’ll see they are on opposite sides on a circle with radius √r. Indeed, re^{i(θ/2 + π)}= re^i(θ/2)e^iπ= –re^i(θ/2). [If the illustration below is too small to read the print, then just click on it and it should expand.]

The grey and green colors in the table have the same role as the red, green and yellow colors I used to illustrated how the cube roots of z come back periodically. We have the same thing here indeed: the values we get for the n = 0 case are exactly the same as for the n = ± 1 case but with a difference in ‘phase’ I’d say of one turn around the origin, i.e. a ‘phase’ difference of 2π. In other words, the value of √z in the n = 0 case for θ going from 0 to 2π is equal to the value of √z in the n = ± 1 case but for θ going from 2π to 4π and, vice versa, the value of √z in the n = ±1 case for θ going from 0 to 2π is equal to the value of √z in the n = 0 case for θ going from 2π to 4π. Now what’s the meaning of that?

It’s quite simple really. The two different values of n mark the different branches of the w function, but branches of functions always overlap of course. Indeed, look at the value of the argument of w, i.e. Θ: for the n = 0 case, we have 0 < Θ < 2π, while for the n = ± 1 case, we have -π < Θ < +π. So we’ve got two different branches here indeed, but they overlap for all values Θ between 0 and π and, for these values, where Θ₁ = Θ₂, we will obviously get the same value for w, even if we’re looking at two different branches (Θ₁ is the argument of w₁, and Θ₂ is the argument of w₂).

OK. I guess that’s all very self-evident and so I should really stop here. However, let me conclude by noting the following: to understand the ‘fully story’ behind the graph, we should actually plot both the surface of the imaginary part of √z as well as the surface of the real part of of √z, and superimpose both. We’d obviously get something that would much more complicated than the ‘two Chinese hats’ picture. I haven’t learned how to master math software (such as Maple for instance), as yet, and so I’ll just copy a plot which I found on the web: it’s a plot of both the real and imaginary part of the function w = z². That’s obviously not the same as the w = z^1/2 function, because w = z²is a single-valued function and so we don’t have all these complications. However, the graph is illustrative because it shows how two surfaces – one representing the real part and the other the imaginary part of a function value – cut through each other thereby creating four half-lines (or rays) which join at the origin.

So we could have something similar for the w = z^1/2function if we’d have one surface representing the imaginary part of z^1/2 and another representing the real part of z^1/2. The sketch below illustrates the point. It is a cross-section of the Riemann surface along the x-axis (so the imaginary part of z is zero there, as the values of θ are limited to 0, π, 2π, 3π, back to 4π = 0), but with both the real as well as the imaginary part of z^1/2on it. It is obvious that, for the w = z^1/2function, two of the four half-lines marking where the two surfaces are crossing each other coincide with the positive and negative real axis respectively: indeed, Re( z^1/2) = 0 for θ = π and 3π (so that’s the negative real axis), and Im(z^1/2) = 0 for θ = 0, 2π and 4π (so that’s the positive real axis).

The other two half-lines are orthogonal to the real axis. They follow a curved line, starting from the origin, whose orthogonal projection on the z plane coincides with the y axis. The shape of these two curved lines (i.e. the place where the two sheets intersect above and under the y axis) is given by the values for the real and imaginary parts of the √z function, i.e. the vertical distance from the y axis is equal to ± (√2√r)/2.

Hmm… I guess that, by now, you’re thinking that this is getting way too complicated. In addition, you’ll say that the representation of the Riemann surface by just one number (i.e. either the real or the imaginary part) makes sense, because we want one point to represent one value of w only, don’t we? So we want one point to represent one point only, and that’s not what we’re getting when plotting both the imaginary as well as the real part of w in a combined graph. Well… Yes and no. Insisting that we shouldn’t forget about the imaginary part of the surface makes sense in light of the next post, in which I’ll say a think or two about ‘compactifying’ surfaces (or spaces) like the one above. But so that’s for the next post only and, yes, you’re right: I should stop here.

Complex integrals

Original post:

Roger Penrose packs a lot in his chapter on complex-number calculus (Road to Reality, Chapter 7). He summarily introduces the concept of contour integration and then proceeds immediately to discuss power series representations of complex functions as well as fairly advanced ways to deal with singularities (see the section on analytic continuation). Brown and Churchill use not less than three chapters to develop this (Integrals (chapter 4), Series (chapter 5), and Residues and Poles (chapter 6)), and that’s probably what is needed for some kind of understanding of it all. Let’s start with integrals. However, let me first note here that Wordpress does not seem to have a formula editor (so it is not like MS Word) and, hence, I have to keep the notation simple: I’ll use the symbol ∫_Cfor a contour (or line) integral along a curve C, and the symbol ∫_{[a, b]} for an integral on a (closed) interval [a, b].

OK. Here we go. First, it is important to note that Penrose, and Brown and Churchill, are talking about complex integrals, i.e. integrals of complex-valued functions, whose value itself is (usually) a complex number too. That is very different from the line integrals I was exposed to when reading Feynman’s Lectures on Physics. Indeed, Feynman’s Lectures (Volume II, on Electromagnetism) offer a fine introduction to contour integrals, where the function to be integrated is either a scalar field (e.g. electric potential) or, else, some vector field (e.g. magnetic field, gravitational field). Now, vector fields are two-dimensional things: they have an x- and a y-coordinate and, hence, we may think the functions involving vectors are complex too. They are but, that being said, the integrals involved all yield real-number values because the integrand is likely to be a dot product of vectors (and dot products of vectors, as opposed to cross products, yield a real number). I won’t go into the details here but, for those who’d want to have such details, the Wikipedia article offers a fine description (including some very nice animations) of what integration over a line or a curve in such fields actually means. So I won’t repeat that here. I can only note what Brown and Churchill say about them: these (real-valued) integrals can be interpreted as areas under the curve, and they would usually also have one or the other obvious physical meaning, but complex integrals do usually not have such ‘helpful geometric or physical interpretation.’

So what are they then? Let’s first start with some examples of curves.

The illustration above makes it clear that, in practice, the curves which we are dealing with are usually parametric curves. In other words, the coordinates of all points z of the curve C can be represented as some function of a real-number parameter: z(t) = x(t) + iy(t). We can then define a complex integral as the integral of a complex-valued function f(z) of a complex variable z along a curve C from point z₁to point z₂and write such integral as ∫_Cf(z)dz.

Moreover, if C can be parametrized as z(t), we will have some (real) number a and b such that z₁= z(a) and z₂= z(b) and, taking into account that dz = z'(t)dt with z'(t)=dz/dt (i.e. the derivative of the (complex-valued) function z(t) with respect to the (real) parameter t), we can write ∫_Cf(z)dz as:

∫_Cf(z)dz = ∫_{[a, b]}f[z(t)]z'(t)dt

OK, so what? Well, there are a lot of interesting things to be said about this, but let me just summarize some of the main theorems. The first important theorem does not seem to associated with any particular mathematician (unlike Cauchy or Goursat, which I’ll introduce in a minute) but is quite central: if we have some (complex-valued) function f(z) which would happen to be continuous in some domain D, then all of the following statements will be true if one of them is true:

(I) f(z) has an antiderivative F(z) in D ; (II) the integrals of f(z) along contours lying entirely in D and extending from any fixed point z₁to any fixed point z₂all have the same value; and, finally, (III) the integrals of f(z) around closed contours lying entirely in D all have value zero.

This basically means that the integration of f(z) from z₁to z₂is not dependent on the path that is taken. But so when do we have such path independence? Well… You may already have guessed the answer to that question: it’s when the function is analytic or, in other words, when these Cauchy-Riemann equations u_x= v_yand u_y= – v_xare satisfied (see my other post on analytic (or holomorphic) complex-valued functions). That’s, in a nutshell, what’s stated in the so-called Cauchy-Goursat theorem, and it should be noted that it is an equivalence really, so we also have the vice versa statement: if the integrals of f(z) around closed contours in some domain D are zero, then we know that f(z) is holomorphic.

In short, we’ll always be dealing with ‘nice’ functions and then we can show that the s0-called ‘fundamental’ theorem of calculus (i.e. the one that links integrals with derivatives, or – to be somewhat more precise – with the antiderivative of the integrand) also applies to complex-valued valued functions. We have:

∫_Cf(z)dz = ∫_{[a, b]}f[z(t)]z'(t)dt = F[z(b)] – F[z(a)]

or, more in general: ∫_Cf(z)dz = F(z₂) – F(z₁)

We also need to note the Cauchy integral formula: if we have a function f that is analytic inside and on a closed contour C, then the value of this function for any point z₀will be equal to:

f(z₀) = (1/2πi) ∫_C[f(z)/(z – z₀)]dz

This may look like just another formula, but it’s quite spectacular really: it basically says that the function value of any point z₀within a region enclosed by a curve is completely determined by the values of this function on this curve. Moreover, integrating both sides of this equation repeatedly leads to similar formulas for the derivatives of the first, second, third, and higher order of f: f'(z₀) = (1/2πi) ∫_C[f(z)/(z – z₀)²]dz, f”(z₀) = (1/2πi) ∫_C[f(z)/(z – z₀)³]dz or, more in general:

f⁽ⁿ⁾(z₀) = (n!/2πi) ∫_C[f(z)/(z – z₀)ⁿ⁺¹]dz (n = 1, 2, 3,…)

This formula is also known as Cauchy’s differentiation formula. It is a central theorem in complex analysis really, as it leads to many other interesting theorems, including Gauss’s mean value theorem, Liouville’s theorem, the maximum (and miniumum) modulus principle It is also essential for the next chapter in Brown and Churchill’s course: power series representations of complex functions. However, I will stop here because I guess this ‘introduction’ to complex integrals is already confusing enough.

Post scriptum: I often wondered why one would label one theorem as ‘fundamental’, as it implies that all the other theorems may be important but, obviously, somewhat less fundamental. I checked it out and it turns out there is some randomness here. The Wikipedia article boldly states that the fundamental theorem of algebra (which states that every non-constant single-variable polynomial with complex coefficients has at least one complex roots) is not all that ‘fundamental’ for modern algebra: its title just reflects the fact that there was a time when algebra focused almost exclusively on studying polynomials. The same might be true for the fundamental theorem of arithmetic (i.e. the unique(-prime)-factorization theorem), which states that every integer greater than 1 is either a prime itself or the product of prime numbers, e.g. 1200 = (2⁴)(3¹)(5²).

That being said, the fundamental theorem of calculus is obviously pretty ‘fundamental’ indeed. It leads to many results that are indeed key to understanding and solving problems in physics. One of these is the Divergence Theorem (or Gauss’s Theorem), which states that the outward flux of a vector field through a closed surface is equal to the volume integral of the divergence over the region inside the surface. Huh? Well… Yes. It pops up in any standard treatment of electromagnetism. There are others (like Stokes’ Theorem) but I’ll leave it at that for now, especially because these are theorems involving real-valued integrals.

Complex functions and power series

Original post:

As I am going back and forth between this textbook on complex analysis (Brown and Churchill, Complex Variables and Applications) and Roger Penrose’s Road to Reality, I start to wonder how complete Penrose’s ‘Complete Guide to the Laws of the Universe actually is or, to be somewhat more precise, how (in)accessible. I guess the advice of an old friend – a professor emeritus in nuclear physics, so he should know! – might have been appropriate. He basically said I should not try to take any shortcuts (because he thinks there aren’t any), and that I should just go for some standard graduate-level courses on physics and math, instead of all these introductory texts that I’ve been trying to read (such as Roger Penrose’s books – but it’s true I’ve tried others too). The advice makes sense, if only because such standard courses are now available on-line. Better still: they are totally free. One good example is the Physics OpenCourseWare (OCW) from MIT: I just went on their website (ocw.mit.edu/courses/physics) and I was truly amazed.

Roger Penrose is not easy to read indeed: he also takes almost 200 pages to explain complex analysis, i.e. as many pages as the Brown and Churchill textbook, but I find the more formal treatment of the subject-matter in the math handbook easier to read than Penrose’s prose. So, while I won’t drop Penrose as yet (this time I really do not want to give up), I will probably to (continue to) invest more time in other books – proper textbooks really – than in reading Penrose. In fact, I’ve started to look at Penrose’s prose as a more creative approach, but one that makes sense only after you’ve gone through all of the ‘basics’. And so these ‘basics’ are probably easier to grasp by reading some tried and tested textbooks on math and physics first.

That being said, let me get back to the matter on hand by making good on at least one of the promises I made in the previous posts, and that is to say something more about the Taylor expansion of analytic functions. I wrote in one of these posts that this Taylor expansion is something truly amazing. It is, in my humble view at least. We have all these (complex-valued) functions of complex variables out there – such as e^z, log z, z^c, complex polynomials, complex trigonometric and hyperbolic functions, and all of the possible combinations of the aforementioned – and so all of these functions can be represented by a (infinite) sum of powers f(z) = Σ a_n(z-z₀)ⁿ(with n going from 0 to infinity and with z₀being some arbitrary point in the function’s domain). So that’s the Taylor power series.

All complex functions? Well… Yes. Or no. All analytic functions. I won’t go into the details (if only because it is hard to integrate mathematical formulas with the XML editor I am using here) but so it is an amazing result, which leads to many other amazing results. In fact, the proof of Taylor’s Theorem is, in itself, rather marvelous (yes, I went through it) as it involves other spectacular formulas (such as the Cauchy integral formula). However, I won’t go into this here. Just take it for granted: Taylor’s Theorem is great stuff!

But so the function has to be analytic – or well-behaved as I’d say. Otherwise we can’t use Taylor’s Theorem and, hence, this power series expansion doesn’t work. So let’s define (complex) analyticity: a function w = f(z) = f(x+iy) = u(x) + i(y) is analytic (a often-used synonym is holomorphic) if its partial derivatives u_x, u_y, v_xand v_y exist and respect the so-called Cauchy-Riemann equations: u_x = v_y and u_y= -v_x.

These conditions are restrictive (much more restrictive than the conditions for analyticity for real-valued functions). Indeed, there are many complex functions which look good at first sight – if only because there’s no problem whatsoever with their real-valued components u(x,y) and v(x,y) in real analysis/calculus – but which do not satisfy these Cauchy-Riemann conditions. Hence, they are not ‘well-behaved’ in the complex space (in Penrose’s words: they do not conform to the ‘Eulerian notion’ of a function), and so they are of relatively little use – for solving complex problems that is!

A function such as f(z) = 2x + ixy² is an example: there are no complex numbers for which the Cauchy-Riemann conditions hold (check it out: the Cauchy-Riemann conditions are xy = 1 and y = 0, and these two equations contradict each other). Hence, we can’t do much with this function really. For other functions, such as x² + iy², the Cauchy-Riemann conditions are only satisfied in very limited subsets of the functions’ domain: in this particular case, the Cauchy-Riemann conditions only hold when y = x. We also have functions for which the Cauchy-Riemann conditions hold everywhere except in one or more singularities. The very simple function f(z) = 1/z is an example of this: it is easy to see we have a problem when z = 0, because the function value is not determined there.

As for the last category of functions, one would expect there is an easy way out, using limits or something. And there is. Singularities are not a big problem and we can work our way around them. I found out that ‘working our way around them’ usually involves a so-called Laurent series representation of the function, which is a more generalized version of the Taylor expansion involving not only positive but also negative powers.

One of the other things I learned is how to solve contour integrals. Solving contour integrals is the equivalent, in the complex world, of integrating a real-valued function over an interval [a, b] on the real line. Contours are curves in the complex plane. They can be simple and closed (like a circle or an ellipse for instance), and usually they are, but then they don’t have to be simple and closed: they can self-intersect, for example, or they can go around some point or around some other curve more than once (and, yes, that makes a big difference: when you go around twice or more, you’re talking a different curve really).

But so these things can all be solved relatively easily – everything is relative of course 🙂 – if (and only if) the functions involved are analytic and/or if the singularities involved are isolated. In fact, we can extend the definition of analytic functions somewhat and define meromorphic functions: meromorphic functions are functions that are analytic throughout their domain except for one or more isolated singular points (also referred to as poles for some strange reason).

Holomorphic (and meromorphic) functions w = f(z) can be looked at as transformations: they map some domain D in the (complex) z plane to some region (referred to as the image of D) in the (complex) w plane. What makes them holomorphic is the fact that they preserve angles – as illustrated below.

If you have read the first post on this blog, then you have seen this illustration already. Let me therefore present something better. The image below illustrates the function w = f(z) = z² or, vice versa, the function z = √w = w^1/2 (i.e. the square root of w). Indeed, that’s a very well-behaved function in the complex plane: every complex number (including negative real numbers) has two square roots in the complex plane, and so that’s what is shown below.

Huh? What’s this?

It’s simple: the illustration above uses color (in this case, a simple gray scale only really) to connect the points in the square region of the domain (i.e. the z plane) with an equally square region in the w plane (i.e. the image of the square region in the z plane). You can verify the properties of the z = w^1/2 function indeed. At z=i we easily recognize a spot on the right ear of this person: it’s the w=−1 point in the w plane. Now, the same spot is found at z=−i. This reflects the fact that i² = (-i)²=−1. Similarly, this guy’s mouth, which represents the region near w=−i, is found near the two square roots of −i in the z plane, which are z=±(1−i )/√2. In fact, every part of this man’s face is found at two places in the z plane, except for the spot between his eyes, which corresponds to w=0, and also to z=0 under this transformation. Finally, you can see that this transformation is holomorphic: all (local) angles are preserved. In that sense, it’s just like a conformal map of the Earth indeed. [Note, however, that I am glossing over the fact that z = w^1/2 is multiple-valued: for each value of w, we have two square roots in the z-plane. That actually creates a bit of a problem when interpreting the image above. See the post scriptum at the bottom of this post for more text on this.]

[…] OK. This is fun. [And, no, it’s not me: I found this picture on the site of a Swedish guy called Hans Lundmark, and so I give him credit for making complex analysis so much fun: just Google him to find out more.] However, let’s get somewhat more serious again and ask ourselves why we’d need holomorphism?

Well… To be honest, I am not quite sure because I haven’t gone through the rest of the course material yet – or through all these other chapters in Penrose’s book (I’ve done 10 now, so there’s 24 left). That being said, I do note that, besides all of the niceties I described above (like easy solutions for contour integrals), it is also ‘nice’ that the real and imaginary parts of an analytic function automatically satisfy the Laplace equation.

Huh? Yes. Come on! I am sure you have heard about the Laplace equation in college: it is that partial differential equation which we encounter in most physics problems. In two dimensions (i.e. in the complex plane), it’s the condition that δ²f/δx²+ δ²f/δy²equals zero. It is a condition which pops us in electrostatics, fluid dynamics and many other areas of physical research, and I am sure you’ve seen simple examples of it.

So, this fact alone (i.e. the fact that analytic functions pop up everywhere in physics) should be justification enough in itself I guess. Indeed, the first nine chapters of Brown and Churchill’s course are only there because of the last three, which focus on applications of complex analysis in physics. But is there anything more to it?

Of course there is. Roger Penrose would not dedicate more than 200 pages to all of the above if it was not for more serious stuff than some college-level problems in physics, or to explain fluid dynamics or electrostatics. Indeed, after explaining why hypercomplex numbers (such as quaternions) are less useful than one might expect (Chapter 11 of his Road to Reality is about hypercomplex numbers and why they are useful/not useful), he jumps straight into the study of higher-dimensional manifolds (Chapter 12) and symmetry groups (Chapter 13). Now I don’t understand anything of that, as yet that is, but I sure do understand I’ll need to work my way through it if I ever want to understand what follows after: spacetime and Minkowskian geometry, quantum algebra and quantum field theory, and then the truly exotic stuff, such as supersymmetry and string theory. [By the way, from what I just gathered from the Internet, string theory has not been affected by the experimental confirmation of the existence of the Higgs particle, as it is said to be compatible with the so-called Standard Model.]

So, onwards we go! I’ll keep you posted. However, as I look at that (long) list of MIT courses, it may take some time before you hear from me again. 🙂

Post scriptum:

The nice picture of this Swedish guy is also useful to illustrate the issue of multiple-valuedness, which is an issue that pops up almost everywhere when you’re dealing with complex functions. Indeed, if we write w in its polar form w = reⁱ^θ, then its square root can be written as z = w^1/2= (√r)eⁱ^(θ/2+kπ), with k equal to either 0 or 1. So we have two square roots indeed for each w: each root has a length (i.e. its modulus or absolute value) equal to √r (i.e the positive square root of r) but their arguments are θ/2 and θ/2 + π respectively, and so that’s not the same. It means that, if z is a square root of some w in the w plane, then -z will also be a square root of w. Indeed, if the argument of z is equal to θ/2, then the argument of -z will be π/2 + π = π/2 + π – 2π = π/2 – π (we just rotate the vector by 180 degrees, which corresponds to a reflection through the origin). It means that, as we let the vector w = reⁱ^θmove around the origin – so if we let θ make a full circle starting from, let’s say, -π/2 (take the value w = –i for instance, i.e. near the guy’s mouth) – then the argument of the image of w will only go from (1/2)(-π/2) = -π/4 to (1/2)(-π/2 + 2π) = 3π/4. These two angles, i.e. -π/4 and 3π/4, correspond to the diagonal y=-x in the complex plane, and you can see that, as we go from -π/4 to 3π/4 in the z-plane, the image over this 180 degree swoop does cover every feature of this guy’s face – and here I mean not half of the guy’s face, but all of it. Continuing in the same direction (i.e. counterclockwise) from 3π/4 back to -π/4 just repeats the image. I will leave it to you to find out what happens with the angles on the two symmetry axes (y = x and y = -x).

No royal road to reality

Pre-scriptum dated 19 May 2020: The entry below shows how easy it is to get ‘lost in math’ when studying physics. One doesn’t really need this stuff.

Text: I got stuck in Penrose’s Road to Reality—at chapter 5. That is not very encouraging: the book has 34 chapters, and every chapter seems to be getting more difficult.

In Chapter 5, Penrose introduces complex algebra. As I tried to get through it, I realized I had to do some more self-study. Indeed, while Penrose claims no other books or courses are needed to get through his, I do not find this to be the case. So I bought a fairly standard course in complex analysis (James Ward Brown and Ruel V. Churchill, Complex variables and applications) and I’ve done chapter 1 and 2 now. Although these first two chapters do not nothing else than introduce the subject-matter, I find the matter rather difficult and the terminology confusing. Examples:

1. The term ‘scalar’ is used to denote real numbers. So why use the term ‘scalar’ if the word ‘real’ is available as well? And why not use the term ‘real field’ instead of ‘scalar field’? Well… The term ‘real field’ actually means something else. A scalar field associates a (real) number to every point in space. So that’s simple: think of temperature or pressure. The term ‘scalar’ is said to be derived from ‘scaling’: a scalar is that what scales vectors. Indeed, scalar multiplication of a vector and a real number multiplies the magnitude of the vector without changing its direction. So what is a real field then? Well… A (formally) real field is a field that can be extended with a (not necessarily unique) ordering which makes it an ordered field. Does that help? Somewhat, I guess. But why the qualifier ‘formally real’? I checked and there is no such thing as an ‘informally real’ field. I guess it’s just to make sure we know what we are talking about, because ‘real’ is a word with many meanings.

2. So what’s a field in mathematics? It is an algebraic structure: a set of ‘things’ (like numbers) with operations defined on it, including the notions of addition, subtraction, multiplication, and division. As mentioned above, we have scalar fields and vector fields. In addition, we also have fields of complex numbers. We also have fields with some less likely candidates for addition and multiplication, such as functions (one can add and multiply functions with each other). In short, anything which satisfies the formal definition of a field – and here I should note that the above definition of a field is not formal – is a field. For example, the set of rational numbers satisfies the definition of a field too. So what is the formal definition? First of all, a field is a ring. Huh? Here we are in this abstract classification of algebraic structures: commutative groups, rings, fields, etcetera (there are also modules – a type of algebraic structure which I had never ever heard of before). To put it simply – because we have to move on, of course – a ring (no one seems to know where that word actually comes from) has addition and multiplication only, while a field has division too. In other words, a ring does not need to have multiplicative inverses. Huh? It’s simply, really: the integers form a ring, but the equation 2·x = 1 does not have a solution in integers (x = ½) and, hence, the integers do not form a field. The same example shows why rational numbers do.

3. But what about a vector field? Can we do division with vectors? Yes, but not by zero – but that is not a problem as that is understood in the definition of a field (or in the general definition of division for that matter). In two-dimensional space, we can represent vectors by complex numbers: z = (x,y), and we have a formula for the so-called multiplicative inverse of a complex number: z^-1 = (x/x²+y², -y/x²+y²). OK. That’s easy. Let’s move on to more advanced stuff.

4. In logic, we have the concept of well-formed formulas (wffs). In math, we have the concept of ‘well-behaved’: we have well-behaved sets, well-behaved spaces and lots of other well-behaved things, including well-behaved functions, which are, of course, those of interest to engineers and scientists (and, hence, in light of the objective of understanding Penrose’s Road to Reality, to me as well). I must admit that I was somewhat surprised to learn that ‘well-behaved’ is one of the very few terms in math that have no formal definition. Wikipedia notes that its definition, in the science of mathematics that is, depends on ‘mathematical interest, fashion, and taste’. Let me quote in full here: “To ensure that an object is ‘well-behaved’ mathematicians introduce further axioms to narrow down the domain of study. This has the benefit of making analysis easier, but cuts down on the generality of any conclusions reached. […] In both pure and applied mathematics (optimization, numerical integration, or mathematical physics, for example), well-behaved means not violating any assumptions needed to successfully apply whatever analysis is being discussed. The opposite case is usually labeled pathological.” Wikipedia also notes that “concepts like non-Euclidean geometry were once considered ill-behaved, but are now common objects of study.”

5. So what is a well-behaved function? There is actually a whole hierarchy, with varying degrees of ‘good’ behavior, so one function can be more ‘well-behaved’ than another. First, we have smooth functions: a smooth function has derivatives of all orders (as for its name, it’s actually well chosen: the graph of a smooth function is actually, well, smooth). Then we have analytic functions: analytic functions are smooth but, in addition to being smooth, an analytic function is a function that can be locally given by a convergent power series. Huh? Let me try an alternative definition: a function is analytic if and only if its Taylor series about x₀ converges to the function in some neighborhood for every x₀ in its domain. That’s not helping much either, is it? So… Well… Let’s just leave that one for now.

In fact, it may help to note that the authors of the course I am reading (J.W. Brown and R.V. Churchill, Complex Variables and Applications) use the terms analytic, regular and holomorphic as interchangeable, and they define an analytic function simply as a function which has a derivative everywhere. While that’s helpful, it’s obviously a bit loose (what’s the thing about the Taylor series?) and so I checked on Wikipedia, which clears the confusion and also defines the terms ‘holomorphic’ and ‘regular’:

“A holomorphic function is a complex-valued function of one or more complex variables that is complex differentiable in a neighborhood of every point in its domain. The existence of a complex derivative in a neighborhood is a very strong condition, for it implies that any holomorphic function is actually infinitely differentiable and equal to its own Taylor series. The term analytic function is often used interchangeably with ‘holomorphic function’ although the word ‘analytic’ is also used in a broader sense to describe any function (real, complex, or of more general type) that can be written as a convergent power series in a neighborhood of each point in its domain. The fact that the class of complex analytic functions coincides with the class of holomorphic functions is a major theorem in complex analysis.”

Wikipedia also adds following: “Holomorphic functions are also sometimes referred to as regular functions or as conformal maps. A holomorphic function whose domain is the whole complex plane is called an entire function. The phrase ‘holomorphic at a point z₀’ means not just differentiable at z₀, but differentiable everywhere within some neighborhood of z₀ in the complex plane.”

6. What to make of all this? Differentiability is obviously the key and, although there are many similarities between real differentiability and complex differentiability (both are linear and obey the product rule, quotient rule, and chain rule), real-valued functions and complex-valued functions are different animals. What are the conditions for differentiability? For real-valued functions, it is a matter of checking whether or not the limit defining the derivative exists and, of course, a necessary (but not sufficient) condition is continuity of the function.

For complex-valued functions, it is a bit more sophisticated because we’ve got so-called Cauchy-Riemann conditions applying here. How does that work? Well… We write f(z) as the sum of two functions: f(z) = u(x,y) + iv(x,y). So the real-valued function u(x,y) yields the real part of f(z), while v(x,y) yields the imaginary part of f(z). The Cauchy-Riemann equations (to be interpreted as conditions really) are the following: u_x = v_y and u_y = -v_x(note the minus sign in the second equation).

That looks simple enough, doesn’t it? However, as Wikipedia notes (see the quote above), differentiability at a point z₀ is not enough (to ensure the existence of the derivative of f(z) at that point). We need to look at some neighborhood of the point z₀ and see if these first-order derivatives (u_x, u_y, v_x and v_y) exist everywhere in that neighborhood and satisfy these Cauchy-Riemann equations. So we need to look beyond the point z₀itself hen doing our analysis: we need to ‘approach’ it from various directions before making any judgment. I know this sounds like Chinese but it became clear to me when doing the exercises.

7. OK. Phew! I got this far – but so that’s only chapter 1 and 2 of Brown & Churchill’s course ! In fact, chapter 2 also includes a few sections on so-called harmonic functions and harmonic conjugates. Let’s first talk about harmonic functions. Harmonic functions are even better behaved than holomorphic or analytic functions. Well… That’s not the right way to put it really. A harmonic function is a real-valued analytic function (its value could represent temperature, or pressure – just as an example) but, for a function to qualify as ‘harmonic’, an additional condition is imposed. That condition is known as Laplace’s equation: if we denote the harmonic function as H(x,y), then it has to have second-order derivatives which satisfies H_xx(x,y) + H_yy(x,y) = 0.

Huh? Laplace’s equation, or harmonic functions in general, plays an important role in physics, as the condition that is being imposed (the Laplace equation) often reflects a real-life physical constraint and, hence, the function H would describe real-life phenomena, such as the temperature of a thin plate (with the points on the plate defined by the (x,y) coordinates), or electrostatic potential. More about that later. Let’s conclude this first entry with the definition of harmonic conjugates.

8. As stated above, a harmonic function is a real-valued function. However, we also noted that a complex function f(z) can actually be written as a sum of a real and imaginary part using two real-valued functions u(x,y) and v(x,y). More in particular, we can write f(z) = u(x,y) + iv(x,y), with i the imaginary number (0,1). Now, if u and v would happen to be harmonic functions (but so that’s an if of course – see the Laplace condition imposed on their second-order derivatives in order to qualify for the ‘harmonic’ label) and, in addition to that, if their first-order derivatives would happen to satisfy the Cauchy-Riemann equations (in other words, f(z) should be a well-behaved analytic function), then (and only then) we can label v as the harmonic conjugate of u.

What does that mean? First, one should note that when v is a harmonic conjugate of u in some domain, it is not generally true that u is a harmonic conjugate of v. So one cannot just switch the functions. Indeed, the minus sign in the Cauchy–Riemann equations makes the relationship asymmetric. But so what’s the relevance of this definition of a harmonic conjugate? Well… There is a theorem that turns the definition around: ‘A function f(z) = u(x,y) + iv(x,y) is analytic (or holomorphic to use standard terminology) in a domain D if and only if v is a harmonic conjugate of u. In other words, introducing the definition of a harmonic conjugate (and the conditions which their first- and second-order derivatives have to satisfy), allows us to check whether or not we have a well-behaved complex-valued function (and with ‘well-behaved’ I mean analytic or holomorphic).

9. But, again, why do we need holomorphic functions? What’s so special about them? I am not sure for the moment, but I guess there’s something deeper in that one phrase which I quoted from Wikipedia above: “holomorphic functions are also sometimes referred to as regular functions or as conformal maps.” A conformal mapping preserves angles, as you can see on the illustration below, which shows a rectangular grid and its image under a conformal map: f maps pairs of lines intersecting at 90° to pairs of curves still intersecting at 90°. I guess that’s very relevant, although I do not know why exactly as for now. More about that in later posts.