Analytic continuation

Pre-scriptum (dated 26 June 2020): the material in this post remains interesting but is, strictly speaking, not a prerequisite to understand quantum mechanics. It’s yet another example of how one can get lost in math when studying or teaching physics.

Original post:

In my previous post, I promised to say something about analytic continuation. To do so, let me first recall Taylor’s theorem: if we have some function f(z) that is analytic in some domain D, then we can write this function as an infinite power series:

f(z) = ∑ a_n(z-z₀)ⁿ

with n = 0, 1, 2,… going all the way to infinity (n = 0 → ∞) and the successive coefficients a_n equal to a_n = [f⁽ⁿ⁾(z₀)]/n! (with f⁽ⁿ⁾(z₀) denoting the derivative of the n^th order, and n! the factorial function n! = 1 x 2 x 3 x … x n).

I should immediately add that this domain D will always be an open disk, as illustrated below. The term ‘open’ means that the boundary points (i.e. the circle itself) are not part of the domain. This open disk is the so-called circle of convergence for the complex function f(z) = 1/(1-z), which is equivalent to the (infinite) power series f(z) = 1 + z + z²+ z³ + z⁴ +… [A clever reader will probably try to check this using Taylor’s theorem above, but I should note the exercise involves some gymnastics. Indeed, the development involves the use of the identity 1 + z + z²+ z³ + … + zⁿ = (1 – zⁿ⁺¹)/(1 – z).]

This power series converges only when the absolute value of z is (strictly) smaller than 1, so only when ¦z¦ < 1. Indeed, the illustration above shows the singularity at the point 1 (or the point (1, 0) if you want) on the real axis: the denominator of the function 1/(1-z) effectively becomes zero there. But so that’s one point only and, hence, we may ask ourselves why this domain should be bounded a circle going through this one point. Why not some square or rectangle or some other weird shape avoiding this point? That question takes a few theorems to answer, and so I’ll just say that this is just one of the many remarkable things about analytic functions: if a power series such as the one above converges to f(z) within some circle whose radius is the distance from z₀(in this case, z₀is the origin and so we’re actually dealing with a so-called Maclaurin expansion here, i.e. an oft-used special case of the Taylor expansion) to the nearest point z₁where f fails to be analytic (that point z₁is equal to 1 in this case), then this circle will actually be the largest circle centered at z₀such that the series converges to f(z) for all z interior to it.

Puff… That’s quite a mouthful so let me rephrase it. What is being said here is that there’s usually a condition of validity for the power series expansion of a function: if that condition of validity is not fulfilled, then the function cannot be represented by the power series. In this particular case, the expansion of f(z) = 1/(1-z) = 1 + z + z²+ z³ + z⁴ +… is only valid when ¦z¦ < 1, and so there is no larger circle about z₀ (i.e. the origin in this particular case) such that at each point interior to it, the Taylor series (or the Maclaurin series in this case) converges to f(z).

That being said, we can usually work our way around such singularities, especially when they are isolated, such as in this example (there is only this one point 1 that is causing trouble), and that is where the concept of analytic continuation comes in. However, before I explain this, I should first introduce Laurent’s theorem, which is like Taylor’s theorem but it applies to functions which are not as ‘nice’ as the functions for which Taylor’s theorem holds (i.e. functions that are not analytic everywhere), such as this 1/(1-z) function indeed. To be more specific, Laurent’s theorem says that, if we have a function f which is analytic in an annular domain (i.e. the red area in the illustration below) centered at some point z₀(in the illustration below, that’s point c) then f(z) will have a series representation involving both positive and negative powers of the term (z – z₀).

More in particular, f(z) will be equal to f(z) = ∑ [a_n(z – z₀)ⁿ] + ∑ [b_n/(z-z₀)ⁿ], with n = 0, 1,…, ∞ and with a_n and b_n coefficients involving complex integrals which I will not write them down here because WordPress lacks a good formula editor and so it would look pretty messy. An alternative representation of the Laurent series is to write f(z) as f(z) = ∑ [c_n(z – z₀)ⁿ], with c_n= (1/2πi) ∫_C [f(z)/(z – z₀)ⁿ⁺¹]dz (n = 0, ±1, ±2,…, ±∞). Well – so here I actually did write down the integral. I hope it’s not too messy 🙂 .

It’s relatively easy to verify that this Laurent series becomes the Taylor series if there would be no singularities, i.e. if the domain would cover the whole disk (so if there would be red everywhere, even at the origin point). In that case, the c_ncoefficient becomes (1/2πi) ∫_C [f(z)/(z – z₀)^-1+1]dz for n = -1 and we can use the fact that, if f(z) is analytic everywhere, the integral ∫_C [f(z)dz will be zero along any contour in the domain of f(z). For n = -2, the integrand becomes f(z)/(z – z₀)^-2+1= f(z)(z – z₀) and that’s an analytic function as well because the function z – z₀is analytic everywhere and, hence, the product of this (analytic) function with f(z) will also be analytic everywhere (sums, products and compositions of analytic functions are also analytic). So the integral will be zero once again. Similarly, for n = -3, the integrand f(z)/(z – z₀)^-3+1= f(z)(z – z₀)²is analytic and, hence, the integral is again zero. In short, all b_n coefficients (i.e. all ‘negative’ powers) in the Laurent series will be zero, except for n = 0, in which case b_n = b₀ = a₀. As for the a_ncoefficients, one can see they are equal to the Taylor coefficients by using what Penrose refers to as the ‘higher-order’ version of the Cauchy integral formula: f⁽ⁿ⁾(z₀)/n! = (1/2πi) ∫_C [f(z)/(z – z₀)ⁿ⁺¹]dz.

It is also easy to verify that this expression also holds for the special case of a so-called punctured disk, i.e. an annular domain for which the ‘hole’ at the center is limited to the center point only, so this ‘annular domain’ then consists of all points z around z₀ for which 0 < ¦z – z₀¦ < R. We can then write the Laurent series as f(z) = ∑ a_n(z-z₀)ⁿ+ b₁/(z – z₀) + b₂/(z – z₀)²+…+ b_n/(z – z₀)ⁿ+… with n = 0, 1, 2,…, ∞.

OK. So what? Well… The point to note is that we can usually deal with singularities. That’s what the so-called theory of residues and poles is for. The term pole is a more illustrious term for what is, in essence, an isolated singular point: it has to do with the shape of the (modulus) surface of f(z) near this point, which is, well… shaped like a tent on a (vertical) pole indeed. As for the term residue, that’s a term used to denote this coefficient b₁ in this power series above. The value of the residue at one or more isolated singular points can be used to evaluate integrals (so residues are used for solving integrals), but we won’t go into any more detail here, especially because, despite my initial promise, I still haven’t explained what analytic continuation actually is. Let me do that now.

For once, I must admit that Penrose’s explanation here is easier to follow than other texts (such as the Wikipedia article on analytic continuation, which I looked at but which, for once, seems to be less easy to follow than Penrose’s notes on it), so let me closely follow his line of reasoning here.

If, instead of the origin, we would use a non-zero point z₀ for our expansion of this function f(z) = 1/(1-z) = 1 + z + z²+ z³ + z⁴ +… (i.e. a proper Taylor expansion, instead of the Maclaurin expansion around the origin), then we would, once again, find a circle of convergence for this function which would, once again, be bounded by the singularity at point (1, 0), as illustrated below. In fact, we can move even further out and expand this function around the (non-zero) point z₁and so on and so on. See the illustration: it is essential that the successive circles of convergence around the origin, z₀, z₁etcetera overlap when ‘moving out’ like this.

So that’s this concept of ‘analytic continuation’. Paraphrasing Penrose, what’s happening here is that the domain D of the analytic function f(z) is being extended to a larger region D’ in which the function f(z) will also be analytic (or holomorphic – as this is the term which Penrose seems to prefer over ‘analytic’ when it comes to complex-valued functions).

Now, we should note something that, at first sight, seems to be incongruent: as we wander around a singularity like that (or, to use the more mathematically correct term, a pole of the function) to then return to our point of departure, we may get (in fact, we are likely to get) different function values ‘back at base’. Indeed, the illustration below shows what happens when we are ‘wandering’ around the origin for the log z function. You’ll remember (if not, see the previous posts) that, if we write z using polar coordinates (so we write z as z = re^iθ), then log z is equal to log z = lnr + i(θ + 2nπ). So we have a multiple-valued function here and we dealt with that by using branches, i.e. we limited the values which the argument of z (arg z = θ) could take to some range α < θ < α + 2π. However, when we are wandering around the origin, we don’t limit the range of θ. In fact, as we are wandering around the origin, we are effectively constructing this Riemann surface (which we introduced in one of our previous posts also), thereby effectively ‘gluing’ successive branches of the log z function together, and adding 2πi to the value of our log z function as we go around. [Note that the vertical axis in the illustration below keeps track of the imaginary part of log z only, i.e. the part with θ in it only. If my imaginary reader would like to see the real part of log z, I should refer him to the post with the entry on Riemann surfaces.]

But what about the power series? Well… The log z function is just like any other analytic function and so we can and do expand it as we go. For example, if we expand the log z function about the point (1, 0), we get log z = (z – 1) – (1/2)(z – 1)²+ (1/3)(z – 1)³– (1/4)(z – 1)⁴+… etcetera. But as we wonder around, we’ll move into a different branch of the log z function and, hence, we’ll get a different value when we get back to that point. However, I will leave the details of figuring that one out to you 🙂 and end this post, because the intention here is just to illustrate the principle, and not to copy some chapter out of a math course (or, at least, not to copy all of it let’s say :-)).

If you can’t work out it out, you can always try to read the Wikipedia article on analytic continuation. While less ‘intuitive’ than Penrose’s notes on it, it’s definitely more complete, even if does not quite exhaust the topic. Wikipedia defines analytic continuation as “a technique to extend the domain of a given analytic function by defining further values of that function, for example in a new region where an infinite series representation in terms of which it is initially defined becomes divergent.” The Wikipedia article also notes that, “in practice, this continuation is often done by first establishing some functional equation on the small domain and then using this equation to extend the domain: examples are the Riemann zeta function and the gamma function.” But so that’s sturdier stuff which Penrose does not touch upon – for now at least, but I expect him to develop such things in later Road to Reality chapters).

Post scriptum: Perhaps this is an appropriate place to note that, at first sight, singularities may look like no big deal: so we have a infinitesimally small hole in the domain of function log z or 1/z s or whatever, so what? Well… It’s probably useful to note that, if we wouldn’t have that ‘hole’ (i.e. the singulariy), any integral of this function (I mean the integral of this function along any closed contour around that point, i.e. ∫_C f(z)dz, would be equal to zero, but when we do have that little hole, like for f(z) = 1/z, we don’t have that result. In this particular case (i.e. f(z) = 1/z), you should note that the integral ∫_C (1/z)dz, for any closed contour around the origin, equals 2πi, or that, just to give one more example here, that the value of the integral ∫_C [f(z)/(z – z₀)]dz is equal to 2πi f(z₀). Hence, even if f(z) would be analytic over the whole open disk, including the origin, the ‘quotient function’ f(z)/z will not be analytic at the origin and, hence, the value of the integral of this ‘quotient function’ f(z)/z around the origin will not be zero but equal to 2πi times the value of the original f(z) function at the origin, i.e. 2πi times f(0). Vice versa, if we find that the value of the integral of some function around a closed contour – and I mean any closed contour really – is not equal to zero, we know we’ve got a problem somewhere and so we should look out for one or more infinitesimally small little ‘holes’ somewhere in the domain. Hence, singularities, and this complex theory of poles and residues which shows us how we can work with them, is extremely relevant indeed: it’s surely not a matter of just trying to get some better approximation for this or that value or formula or so. 🙂

In light of the above, it is now also clear that the term ‘residue’ is well chosen: this coefficient b₁is equal to ∫_C f(z)dz divided by 2πi (I take the case of a Maclaurin expansion here) and, hence, there would be no singularity, this integral (and, hence, the coefficient b₁) would be equal to zero. Now, because of the singularity, we have a coefficient b₁≠ 0 and, hence, using the term ‘residue’ for this ‘remnant’ is quite appropriate.

Euler’s formula

I went trekking (to the Annapurna Base Camp this time) and, hence, left the math and physics books alone for a week or two. When I came back, it was like I had forgotten everything, and I wasn’t able to re-do the exercises. Back to the basics of complex numbers once again. Let’s start with Euler’s formula:

e^ix = cos(x) + isin(x)

In his Lectures on Physics, Richard Feynman calls this equation ‘one of the most remarkable, almost astounding, formulas in all of mathematics’, so it’s probably no wonder I find it intriguing and, indeed, difficult to grasp. Let’s look at it. So we’ve got the real (but irrational) number e in it. That’s a fascinating number in itself because it pops up in different mathematical expressions which, at first sight, have nothing in common with each other. For example, e can be defined as the sum of the infinite series e = 1/0! + 1/2! + + 1/3! + 1/4! + … etcetera (n! stands for the factorial of n in this formula), but one can also define it as that unique positive real number for which d(e^t)/dt = e^t (in other words, as the base of an exponential function which is its own derivative). And, last but not least, there are also some expressions involving limits which can be used to define e. Where to start? More importantly, what’s the relation between all these expressions and Euler’s formula?

First, we should note that e^ix is not just any number: it is a complex number – as opposed to the more simple e^x expression, which denotes the real exponential function (as opposed to the complex exponential function e^z). Moreover, we should note that e^ix is a complex number on the unit circle. So, using polar coordinates, we should say that e^ix is a complex number with modulus 1 (the modulus is the absolute value of the complex number (i.e. the distance from 0 to the point we are looking at) or, alternatively, we could say it is the magnitude of the vector defined by the point we are looking at) and argument x (the argument is the angle (expressed in radians) between the positive real axis and the line from 0 to the point we are looking at).

Now, it is self-evident that cos(x) + isin(x) represents exactly the same: a point on the unit circle defined by the angle x. But so that doesn’t prove Euler’s formula: it only illustrates it. So let’s go to one or the other proof of the formula to try to understand it somewhat better. I’ll refer to Wikipedia for proving Euler’s formula in extenso but let me just summarize it. The Wikipedia article (as I looked at it today) gives three proofs.

The first proof uses the power series expansion (yes, the Taylor/Maclaurin series indeed – more about that later) for the exponential function: e^ix = 1 + ix + (ix)²/2! + (ix)³/3! +… etcetera. We then substitute using i² = -1, i³ = –i etcetera and so, when we then re-arrange the terms, we find the Maclaurin series for the cos(x) and sin(x) functions indeed. I will come back to these power series in another post.

The second proof uses one of the limit definitions for e^x but applies it to the complex exponential function. Indeed, one can write e^z (with z = x+iy) as e^z = lim(1 + z/n)ⁿ for n going to infinity. The proof substitutes ix for z and then calculates the limit for very large (or infinite) n indeed. This proof is less obvious than it seems because we are dealing with power series here and so one has to take into account issues of convergence and all that.

The third proof also looks complicated but, in fact, is probably the most intuitive of the three proofs given because it uses the derivative definition of e. To be more precise, it takes the derivative of both sides of Euler’s formula using the polar coordinates expression for complex numbers. Indeed, e^ix is a complex number and, hence, can be written as some number z = r(cosθ+ isinθ), and so the question to solve here is: what’s r and θ? We need to write these two values as a function of x. How do we do that? Well… If we take the derivative of both sides, we get d(e^ix)/dx = ie^ix = (cosθ + isinθ)dr/dx + r[d(cosθ + isinθ)/dθ]dθ/dx. That’s just the chain rule for derivatives of course. Now, writing it all out and equating the real and imaginary parts on both sides of the expression yields following: dr/dx = 0 and dθ/dx = 1. In addition, we must have that, for x = 0, eⁱ⁰ = [eⁱ]⁰ = 1, so we have r(0) = 1 (the modulus of the complex number (1,0) is one) and θ(0) = 0 (the argument of (1,0) is zero). It follows that the functions r and θ are equal to r = 1 and θ = x, which proves the formula.

While these proofs are (relatively) easy to understand, the formula remains weird, as evidenced also from its special cases, like eⁱ⁰ = e^i2π = 1 = – e^iπ = – e^–iπor, equivalently, e^iπ + 1 = 0, which is a formula which combines the five most basic quantities in mathematics: 0, 1, i, e and π. It is an amazing formula because we have two irrational numbers here, e and π, which have definitions which do not refer to each other at all (last time I checked, π was still being defined as the simple ratio of a circle’s circumference to its diameter, while the various definitions of e have nothing to do with circles), and so we combine these two seemingly unrelated numbers, also inserting the imaginary unit i (using iπ as an exponent for e) and we get minus 1 as a result (e^iπ = – 1). Amazing indeed, isn’t it?

[…] Well… I’d say at least as amazing as the Taylor or Maclaurin expansion of a function – but I’ll save my thoughts on these for another post (even if I am using the results of these expansions in this post). In my view, what Euler’s formula shows is the amazing power of mathematical notation really – and the creativity behind. Indeed, let’s look at what we’re doing with complex numbers: we start from one or two definitions only and suddenly all kinds of wonderful stuff starts popping up. It goes more or less like this really:

We start off with these familiar x and y coordinates of points in a plane. Now we call the x-axis the real axis and then, just to distinguish them from the real numbers, we call the numbers on the y-axis imaginary numbers. Again, it is just to distinguish them from the real numbers because, in fact, imaginary numbers are not imaginary at all: they are as real as the real numbers – or perhaps we should say that the real numbers are as imaginary as the imaginary numbers because, when everything is said and done, the real numbers are mental constructs as well, aren’t they? Imaginary numbers just happen to lie on another line, perpendicular to our so-called real line, and so that’s why we add a little symbol i (the so-called imaginary unit) when we write them down. So we write 1i (or i tout court), 2i, 3i etcetera, or i/2 or whatever (it doesn’t matter if we write i before the real number or after – as long as we’re consistent).

Then we combine these two numbers – the real and imaginary numbers – to form a so-called complex number, which is nothing but a point (x, y) in this Cartesian plane. Indeed, while complex numbers are somewhat more complex than the numbers we’re used to in daily life, they are not out of this world I’d say: they’re just points in space, and so we can also represent them as vectors (‘arrows’) from the origin to (x, y).

But so this is what we are doing really: we combine the real and imaginary numbers by using the very familiar plus (+) sign, so we write z = x + iy. Now that is actually where the magic starts: we are not adding the same things here, like we would do when we are counting apples or so, or when we are adding integers or rational or real numbers in general. No, we are adding here two different things here – real and imaginary numbers – which, in fact, we cannot really add. Indeed, your mommy told you that you cannot compare apples with oranges, didn’t she? Well… That’s exactly what we do here really, and so we will keep these real and imaginary numbers separate in our calculations indeed: we will add the real parts of complex numbers with each other only, and the imaginary parts of them also with each other only.

Addition is quite straightforward: we just add the two vectors. Multiplication is somewhat more tricky but (geometrically) easy to interpret as well: the product of two complex numbers is a vector with a length which is equal to the sum of the lengths of the two vectors we are multiplying (i.e. the two complex numbers which make up the product) , and its angle with the real axis is the sum of the angles of the two original vectors. From this definition, many things follow, all equally amazing indeed, but one of these amazing facts is that i² = -1, i³ = –i, i⁴ = 1, i⁵ = i, etcetera. Indeed: multiplying a complex number z = x + iy = (x, y) with the imaginary unit i amounts to rotating it 90° (counterclockwise) about the origin. So we are not defining i² as being equal to minus 1 (many textbooks treat this equality as a definition indeed): it just comes as a fact which we can derive from the earlier definition of a complex product. Sweet, isn’t it?

So we have addition and multiplication now. We want to do much more of course. After defining addition and multiplication, we want to do complex powers, and so it’s here that this business with e pops up.

We first need to remind ourselves of the simple fact that the number e is just a real number: it’s equal to 2.718281828459045235360287471 etcetera. We have to write ‘etcetera’ because e is an irrational number, which – whatever the term ‘irrational’ may suggest in everyday language – simply means that e is not a fraction of any integer numbers (so irrational means ‘not rational’). e is also a transcendental number – a word which suggest all kinds of mystical properties but which, in mathematics, only means we cannot write it as a root of some polynomial (a polynomial with rational coefficients that is). So it’s a weird number. That being said, it is also the so-called ‘natural’ base for the exponential function. Huh? Why would mathematicians take such a strange number as a so-called ‘natural’ base? They must be irrational, no? Well… No. If we take e as the base for the exponential function e^x (so that’s just this real (but irrational) number e to the power x, with x being the variable running along the x-axis: hence, we have a function here which takes a value from the set of real numbers and which yields some other real number), then we have a function here which is its own derivative: d(e^x)/dx = e^x. It is also the natural base for the logarithmic function and, as mentioned above, it kind of ‘pops up’ – quite ‘naturally’ indeed I’d say – in many other expressions, such as compound interest calculations for example or the general exponential function a^x = e^{x lna}. In other words, we need this and exp(x) and ln(x) functions to define powers of real numbers in general. So that’s why mathematicians call it ‘natural’.

While the example of compound interest calculations does not sound very exciting, all these formulas with e and exponential functions and what have you did inspire all these 18^th century mathematicians – like Euler – who were in search of a logical definition of complex powers.

Let’s state the problem once again: we can do addition and multiplication of complex numbers but so the question is how to do complex powers. When trying to figure that one out, Euler obviously wanted to preserve the usual properties of powers, like a^xa^y = a^x+y and, effectively, this property of the so-called ‘natural’ exponential function that d(e^x)/dx = e^x. In other words, we also want the complex exponential function to be its own derivative so d(e^z)/dz should give us e^zonce again.

Now, while Euler was thinking of that (and of many other things too of course), he was well aware of the fact that you can expand e^x into that power series which I mentioned above: e^x = 1/0! + x/1! + (x)²/2! + (x)³/3! +… etcetera. So Euler just sat down, substituted the real number x with the imaginary number ix and looked at it: e^ix = 1 + ix + (ix)²/2! + (ix)³/3! +… etcetera. Now lo and behold! Taking into account that i² = -1, i³ = –i, i⁴ = 1, i⁵ = i, etcetera, we can put that in and re-arrange the terms indeed and so Euler found that this equation becomes e^ix = (1 – x²/2! + x⁴/4! – -x⁶/6! +…) + i(x – x³/3! + x⁵/5! -… ). Now these two terms do correspond to the Maclaurin series for the cosine and sine function respectively, so there he had it: e^ix = cos(x) + isin(x). His formula: Euler’s formula!

From there, there was only one more step to take, and that was to write e^z = e^x+iy as e^xe^iy, and so there we have our definition of a complex power: it is a product of two factors – e^xand e^iy– both of which we have effectively defined now. Note that the e^x factor is just a real number, even if we write it as e^x: it acts as a sort of scaling factor for e^iywhich, you will remember (as we pointed it out above already), is a point on the unit circle. More generally, it can be shown that e^xis the absolute value of e^z (or the modulus or length or magnitude of the vector – whatever term you prefer: they all refer to the same), while y is the argument of the complex number e^z (i.e. the angle of the vector e^zwith the real axis). [And, yes, for those who would still harbor some doubts here: e^zis just another complex number and, hence, a two-dimensional vector, i.e. just a point in the Cartesian plane, so we have a function which goes from the set of complex numbers here (it takes z as input) and which yields another complex number.]

Of course, you will note that we don’t have something like z^w here, i.e. a complex base (i.e. z) with a complex exponent (i.e. w), or even a formula for complex powers of real numbers in general, i.e. a formula for a^w with a any real number (so not only e but any real number indeed) and w a complex exponent. However, that’s a problem which can be solved easily through writing z and w in their so-called polar form, so we write z as z = ¦z¦e^iθ= ¦z¦(cosθ + isinθ) and w as ¦w¦ e^iσ= ¦w¦(cosσ + isinσ) and then we can take it further from there. [Note that ¦z¦ and ¦w¦represent the modulus (i.e. the length) of z and w respectively, and the angles θ and σ are obviously the arguments of the same z and w respectively.] Of course, if z is a real number (so if y = 0), then the angle θ will obviously be zero (i.e. the angle of the real axis with itself) and so z will be equal to a real number (i.e. its real part only, as its imaginary part is zero) and then we are back to the case of a real base and a complex exponent. In other words, that covers the a^wcase.

[…] Wel… Easily? OK. I am simplifying a bit here – as I need to keep the length of this post manageable – but, in fact, it actually really is a matter of using these common properties of powers (such as e^a+bie^c = e^(a+c)+bi and it actually does all work out. And all of this magic did actually start with simply ‘adding’ the so-called ‘real’ numbers x on the x-axis with the so-called ‘imaginary’ numbers on the y-axis. 🙂

Post scriptum:

Penrose’s Road to Reality dedicates a whole chapter to complex exponentiation (Chapter 5). However, the development is not all that simple and straightforward indeed. The first step in the process is to take integer powers – and integer roots – of complex numbers, so that’s zⁿ for n = 0, ±1, ±2, ±3… etcetera (or z^1/2, z^1/3, z^1/4 if we’re talking integer roots). That’s easy because it can be solved through using the old formula of Abraham de Moivre: (cosθ + sinθ)ⁿ = cos(nθ) + isin(nθ) (de Moivre penned this down in 1707 already, more than 40 years before Euler looked at the matter). However, going from there to full-blown complex powers is, unfortunately, not so straightforward, as it involves a bit of a detour: we need to work with the inverse of the (complex) exponential function e^z, i.e. the (complex) natural logarithm.

Now that is less easy than it sounds. Indeed, while the definition of a complex logarithm is as straightforward as the definition of real logarithms (lnz is a function for which e^lnz = z), the function itself is a bit more… well… complex I should say. For starters, it is a multiple-valued function: if we write the solution w = lnz as w = u+iv, then it is obvious that e^w will be equal to e^u+iv = e^ue^iv and this complex number e^w can then be written in its polar form e^w = re^iθwith r = e^u and v = θ + 2nπ. Of course, ln(e^u+iv) = u + iv and so the solution of w will look like w = lnr + i(θ + 2nπ) with n = 0, ±1, ±2, ±3 etcetera. In short, we have an infinite number of solutions for w (one for every n we choose) and so we have this problem of multiple-valuedness indeed. We will not dwell on this here (at least not in this post) but simply note that this problem is linked to the properties of the complex exponential function e^z itself. Indeed, the complex exponential function e^z has very different properties than the real exponential function e^x. First, we should note that, unlike e^x(which, as we know goes from zero at the far end of the negative side of the real axis to infinity as x goes big on the positive side), e^zis a periodic function – so it oscillates and yields the same values after some time – with this ‘after some time’ being the periodicity of the function. Indeed, e^z= e^z+2πi and so its period 2πi (note that this period is an imaginary number – but so it’s a ‘real’ period, if you know what I mean :-)). In addition, and this is also very much unlike the real exponential function e^x, e^zcan be negative (as well as assume all kinds of other complex values). For example, e^iπ= -1, as we noted above already.

That being said, the problem of multiple-valuedness can be solved through the definition of a principal value of lnz and that, then, leads us to what we want here: a consistent definition of a complex power of a complex base (or the definition of a true complex exponential (and logarithmic) function in other words). To those who would want to see the details of this (i.e. my imaginary readers :-)), I would say that Penrose’s treatment of the matter in the above-mentioned Chapter 5 of The Road to Reality is rather cryptic – presumably because he has to keep his book around 1000 pages only (not a lot to explain all of the Laws of the Universe) and, hence, Brown & Churchill’s course (or whatever other course dealing with complex analysis) probably makes for easier reading.

[As for the problem of multiple-valuedness, we should probably also note the following: when taking the n^th root of a complex number (i.e. z^1/n with n = 2, 3, etcetera), we also obtain a set of n values c_k (with k = 0, 1, 2,… n-1), rather than one value only. However, once we have one of these values, we have all of them as we can write these c_kas c_k = r^1/ne^{i(θ/n+2kπ/n)}, (with the original complex number z equal to z = reⁱ^θ) then so we could also just consider the principal value c₀ and, as such, consider the function as a single-valued one. In short, the problem of multiple-valued functions pops up almost everywhere in the complex space, but it is not an issue really. In fact, we encounter the problem of multiple-valuedness as soon as we extend the exponential function in the space of the real numbers and also allow rational and real exponents, instead of positive integers only. For example, 4^1/2 is equal to ±2, so we have two results here too and, hence, multiple values. Another example would be the 4^th root of 16: we have four 4^th roots of 16: +2, -2 and then two complex roots +2i and -2i. However, standard practice is that we only take the positive value into account in order to ensure a ‘well-behaved’ exponential function. Indeed, the standard definition of a real exponential function is b^x = (e^lnb)^x = e^lnbe^x, and so, if x = 1/n, we’ll only assign the positive 4^th root to e^x. Standard practice will also restrict the value of b to a positive real number (b > 0). These conventions not only ensures a positive result but also continuity of the function and, hence, the existence of a derivative which we can then use to do other things. By the way, the definition also shows – once again – why e is such a nice (or ‘natural’) number: we can use it to calculate the value for any exponential function (for any real base b > 0). But so we had mentioned that already, and it’s now really time to stop writing. I think the point is clear.]

No royal road to reality

Pre-scriptum dated 19 May 2020: The entry below shows how easy it is to get ‘lost in math’ when studying physics. One doesn’t really need this stuff.

Text: I got stuck in Penrose’s Road to Reality—at chapter 5. That is not very encouraging: the book has 34 chapters, and every chapter seems to be getting more difficult.

In Chapter 5, Penrose introduces complex algebra. As I tried to get through it, I realized I had to do some more self-study. Indeed, while Penrose claims no other books or courses are needed to get through his, I do not find this to be the case. So I bought a fairly standard course in complex analysis (James Ward Brown and Ruel V. Churchill, Complex variables and applications) and I’ve done chapter 1 and 2 now. Although these first two chapters do not nothing else than introduce the subject-matter, I find the matter rather difficult and the terminology confusing. Examples:

1. The term ‘scalar’ is used to denote real numbers. So why use the term ‘scalar’ if the word ‘real’ is available as well? And why not use the term ‘real field’ instead of ‘scalar field’? Well… The term ‘real field’ actually means something else. A scalar field associates a (real) number to every point in space. So that’s simple: think of temperature or pressure. The term ‘scalar’ is said to be derived from ‘scaling’: a scalar is that what scales vectors. Indeed, scalar multiplication of a vector and a real number multiplies the magnitude of the vector without changing its direction. So what is a real field then? Well… A (formally) real field is a field that can be extended with a (not necessarily unique) ordering which makes it an ordered field. Does that help? Somewhat, I guess. But why the qualifier ‘formally real’? I checked and there is no such thing as an ‘informally real’ field. I guess it’s just to make sure we know what we are talking about, because ‘real’ is a word with many meanings.

2. So what’s a field in mathematics? It is an algebraic structure: a set of ‘things’ (like numbers) with operations defined on it, including the notions of addition, subtraction, multiplication, and division. As mentioned above, we have scalar fields and vector fields. In addition, we also have fields of complex numbers. We also have fields with some less likely candidates for addition and multiplication, such as functions (one can add and multiply functions with each other). In short, anything which satisfies the formal definition of a field – and here I should note that the above definition of a field is not formal – is a field. For example, the set of rational numbers satisfies the definition of a field too. So what is the formal definition? First of all, a field is a ring. Huh? Here we are in this abstract classification of algebraic structures: commutative groups, rings, fields, etcetera (there are also modules – a type of algebraic structure which I had never ever heard of before). To put it simply – because we have to move on, of course – a ring (no one seems to know where that word actually comes from) has addition and multiplication only, while a field has division too. In other words, a ring does not need to have multiplicative inverses. Huh? It’s simply, really: the integers form a ring, but the equation 2·x = 1 does not have a solution in integers (x = ½) and, hence, the integers do not form a field. The same example shows why rational numbers do.

3. But what about a vector field? Can we do division with vectors? Yes, but not by zero – but that is not a problem as that is understood in the definition of a field (or in the general definition of division for that matter). In two-dimensional space, we can represent vectors by complex numbers: z = (x,y), and we have a formula for the so-called multiplicative inverse of a complex number: z^-1 = (x/x²+y², -y/x²+y²). OK. That’s easy. Let’s move on to more advanced stuff.

4. In logic, we have the concept of well-formed formulas (wffs). In math, we have the concept of ‘well-behaved’: we have well-behaved sets, well-behaved spaces and lots of other well-behaved things, including well-behaved functions, which are, of course, those of interest to engineers and scientists (and, hence, in light of the objective of understanding Penrose’s Road to Reality, to me as well). I must admit that I was somewhat surprised to learn that ‘well-behaved’ is one of the very few terms in math that have no formal definition. Wikipedia notes that its definition, in the science of mathematics that is, depends on ‘mathematical interest, fashion, and taste’. Let me quote in full here: “To ensure that an object is ‘well-behaved’ mathematicians introduce further axioms to narrow down the domain of study. This has the benefit of making analysis easier, but cuts down on the generality of any conclusions reached. […] In both pure and applied mathematics (optimization, numerical integration, or mathematical physics, for example), well-behaved means not violating any assumptions needed to successfully apply whatever analysis is being discussed. The opposite case is usually labeled pathological.” Wikipedia also notes that “concepts like non-Euclidean geometry were once considered ill-behaved, but are now common objects of study.”

5. So what is a well-behaved function? There is actually a whole hierarchy, with varying degrees of ‘good’ behavior, so one function can be more ‘well-behaved’ than another. First, we have smooth functions: a smooth function has derivatives of all orders (as for its name, it’s actually well chosen: the graph of a smooth function is actually, well, smooth). Then we have analytic functions: analytic functions are smooth but, in addition to being smooth, an analytic function is a function that can be locally given by a convergent power series. Huh? Let me try an alternative definition: a function is analytic if and only if its Taylor series about x₀ converges to the function in some neighborhood for every x₀ in its domain. That’s not helping much either, is it? So… Well… Let’s just leave that one for now.

In fact, it may help to note that the authors of the course I am reading (J.W. Brown and R.V. Churchill, Complex Variables and Applications) use the terms analytic, regular and holomorphic as interchangeable, and they define an analytic function simply as a function which has a derivative everywhere. While that’s helpful, it’s obviously a bit loose (what’s the thing about the Taylor series?) and so I checked on Wikipedia, which clears the confusion and also defines the terms ‘holomorphic’ and ‘regular’:

“A holomorphic function is a complex-valued function of one or more complex variables that is complex differentiable in a neighborhood of every point in its domain. The existence of a complex derivative in a neighborhood is a very strong condition, for it implies that any holomorphic function is actually infinitely differentiable and equal to its own Taylor series. The term analytic function is often used interchangeably with ‘holomorphic function’ although the word ‘analytic’ is also used in a broader sense to describe any function (real, complex, or of more general type) that can be written as a convergent power series in a neighborhood of each point in its domain. The fact that the class of complex analytic functions coincides with the class of holomorphic functions is a major theorem in complex analysis.”

Wikipedia also adds following: “Holomorphic functions are also sometimes referred to as regular functions or as conformal maps. A holomorphic function whose domain is the whole complex plane is called an entire function. The phrase ‘holomorphic at a point z₀’ means not just differentiable at z₀, but differentiable everywhere within some neighborhood of z₀ in the complex plane.”

6. What to make of all this? Differentiability is obviously the key and, although there are many similarities between real differentiability and complex differentiability (both are linear and obey the product rule, quotient rule, and chain rule), real-valued functions and complex-valued functions are different animals. What are the conditions for differentiability? For real-valued functions, it is a matter of checking whether or not the limit defining the derivative exists and, of course, a necessary (but not sufficient) condition is continuity of the function.

For complex-valued functions, it is a bit more sophisticated because we’ve got so-called Cauchy-Riemann conditions applying here. How does that work? Well… We write f(z) as the sum of two functions: f(z) = u(x,y) + iv(x,y). So the real-valued function u(x,y) yields the real part of f(z), while v(x,y) yields the imaginary part of f(z). The Cauchy-Riemann equations (to be interpreted as conditions really) are the following: u_x = v_y and u_y = -v_x(note the minus sign in the second equation).

That looks simple enough, doesn’t it? However, as Wikipedia notes (see the quote above), differentiability at a point z₀ is not enough (to ensure the existence of the derivative of f(z) at that point). We need to look at some neighborhood of the point z₀ and see if these first-order derivatives (u_x, u_y, v_x and v_y) exist everywhere in that neighborhood and satisfy these Cauchy-Riemann equations. So we need to look beyond the point z₀itself hen doing our analysis: we need to ‘approach’ it from various directions before making any judgment. I know this sounds like Chinese but it became clear to me when doing the exercises.

7. OK. Phew! I got this far – but so that’s only chapter 1 and 2 of Brown & Churchill’s course ! In fact, chapter 2 also includes a few sections on so-called harmonic functions and harmonic conjugates. Let’s first talk about harmonic functions. Harmonic functions are even better behaved than holomorphic or analytic functions. Well… That’s not the right way to put it really. A harmonic function is a real-valued analytic function (its value could represent temperature, or pressure – just as an example) but, for a function to qualify as ‘harmonic’, an additional condition is imposed. That condition is known as Laplace’s equation: if we denote the harmonic function as H(x,y), then it has to have second-order derivatives which satisfies H_xx(x,y) + H_yy(x,y) = 0.

Huh? Laplace’s equation, or harmonic functions in general, plays an important role in physics, as the condition that is being imposed (the Laplace equation) often reflects a real-life physical constraint and, hence, the function H would describe real-life phenomena, such as the temperature of a thin plate (with the points on the plate defined by the (x,y) coordinates), or electrostatic potential. More about that later. Let’s conclude this first entry with the definition of harmonic conjugates.

8. As stated above, a harmonic function is a real-valued function. However, we also noted that a complex function f(z) can actually be written as a sum of a real and imaginary part using two real-valued functions u(x,y) and v(x,y). More in particular, we can write f(z) = u(x,y) + iv(x,y), with i the imaginary number (0,1). Now, if u and v would happen to be harmonic functions (but so that’s an if of course – see the Laplace condition imposed on their second-order derivatives in order to qualify for the ‘harmonic’ label) and, in addition to that, if their first-order derivatives would happen to satisfy the Cauchy-Riemann equations (in other words, f(z) should be a well-behaved analytic function), then (and only then) we can label v as the harmonic conjugate of u.

What does that mean? First, one should note that when v is a harmonic conjugate of u in some domain, it is not generally true that u is a harmonic conjugate of v. So one cannot just switch the functions. Indeed, the minus sign in the Cauchy–Riemann equations makes the relationship asymmetric. But so what’s the relevance of this definition of a harmonic conjugate? Well… There is a theorem that turns the definition around: ‘A function f(z) = u(x,y) + iv(x,y) is analytic (or holomorphic to use standard terminology) in a domain D if and only if v is a harmonic conjugate of u. In other words, introducing the definition of a harmonic conjugate (and the conditions which their first- and second-order derivatives have to satisfy), allows us to check whether or not we have a well-behaved complex-valued function (and with ‘well-behaved’ I mean analytic or holomorphic).

9. But, again, why do we need holomorphic functions? What’s so special about them? I am not sure for the moment, but I guess there’s something deeper in that one phrase which I quoted from Wikipedia above: “holomorphic functions are also sometimes referred to as regular functions or as conformal maps.” A conformal mapping preserves angles, as you can see on the illustration below, which shows a rectangular grid and its image under a conformal map: f maps pairs of lines intersecting at 90° to pairs of curves still intersecting at 90°. I guess that’s very relevant, although I do not know why exactly as for now. More about that in later posts.