Music and Math

Pre-scriptum (dated 26 June 2020): These posts on elementary math and physics have not suffered much the attack by the dark force鈥攚hich is good because I still like them. While my views on the true nature of light, matter and the force or forces that act on them have evolved significantly as part of my explorations of a more realist (classical) explanation of quantum mechanics, I think most (if not all) of the analysis in this post remains valid and fun to read. In fact, I find the simplest stuff is often the best. 馃檪

Original post:

I ended my previous post, on Music and Physics, by emphatically making the point that music is all about structure, about mathematical relations. Let me summarize the basics:

1. The octave is the musical unit, defined as the interval between two pitches with the higher frequency being twice聽the frequency of the lower聽pitch. Let’s聽denote the lower and higher pitch by聽a and b respectively, so we say that b‘s frequency is twice聽that of a.

2. We then divide the [a, b] interval (whose length is unity) in twelve equal sub-intervals, which define eleven notes in-between a and b. The pitch of the notes in-between is defined by the exponential function connecting a and b. What exponential function? The exponential function with base 2, so that’s the function y =聽2x.

Why base 2? Because of the doubling of the frequencies when going from a to b, and when going from b to b + 1, and from b + 1 to b + 2, etcetera. In music, we give a, b, b + 1, b + 2, etcetera the same聽name, or symbol: A, for example. Or Do. Or C. Or聽Re. Whatever. If we have the unit and the number of sub-intervals, all the rest follows. We just add a number to distinguish the various As, or Cs, or Gs, so we write A1, A2, etcetera. Or C1, C2, etcetera. The graph below illustrates the principle for the interval between C4 and C5. Don’t think the function is linear. It’s exponential: note the logarithmic frequency scale. To make the point, I also inserted another illustration (credit for that graph goes to another blogger).

Frequency_vs_name

equal-tempered-scale-graph-linear

You’ll wonder: why twelve sub-intervals? Well… That’s random. Non-Western cultures use a different number. Eight instead of twelve, for example鈥攚hich is more logical, at first sight at least:聽eight intervals amounts to dividing the interval in two equal halves, and the halves in halves again, and then once more: so the length of the sub-interval is then 1/2路1/2路1/2 = (1/2)3聽= 1/8. But why wouldn’t we divide by three, so we have 9 = 3路3 sub-intervals? Or by 27 = 3路3路3? Or by 16? Or by 5?

The answer is: we don’t know. The limited sensitivity of our ear demands that the intervals be cut up somehow. [You can do tests of the sensitivity of your ear to relative frequency differences online: it’s fun. Just try them! Some of the sites may recommend a hearing aid, but don’t take that crap.] So… The bottom line is that,聽somehow, mankind settled on twelve sub-intervals within our musical unit鈥攐r our sound unit, I should say.聽So it is what it is, and the ratio of the frequencies between two successive (semi)tones (e.g. C and C#, or E and F, as E and F are also separated by one half-step only)聽is 21/12聽= 1.059463… Hence, the pitch of each note is about 6% higher than the pitch of the previous note. OK. Next thing.

3.聽What’s the similarity between C1, C2, C3 etcetera? Or between A1, A2, A3 etcetera? The answer is: harmonics. The frequency of the first overtone聽of a string tuned at pitch A3 (i.e. 220 Hz) is equal to the fundamental frequency of a string tuned at pitch A4 (i.e. 440 Hz). Likewise,聽the frequency of the (pitch of the) C4 note above (which is the so-called middle C)聽is 261.626 Hz, while the frequency of the (pitch of the) next C note (C5) is twice聽that frequency: 523.251 Hz.聽[I should quickly clarify the terminology here: a tone consists of several harmonics, with frequencies f, 2路f, 3路f,… n路f,… The first harmonic is referred to as the fundamental, with frequency聽f. The second, third, etc harmonics are referred to as聽overtones, with frequency 2路f, 3路f, etc.]

To make a long story short: our ear is able to identify the individual harmonics in a tone, and if the frequency of the first harmonic of one tone (i.e. the fundamental) is the same frequency as the second harmonic of another, then we feel they are separated by one musical unit.

Isn’t that most remarkable? Why would it be that way?

My intuition tells me I should look at the energy of the components. The energy theorem tells us that the total energy in a wave is just the sum of the energies in all of the Fourier components. Surely, the fundamental must carry most of the energy, and then the first overtone, and then the second. Really? Is that so?

Well… I checked online to see if there’s anything on that, but my quick check reveals there’s nothing much out there in terms of research: if you’d聽google聽‘energy levels of overtones’, you’ll get hundreds of links to research on the vibrational modes of molecules, but nothing that’s related to music theory.聽So… Well… Perhaps this is my first truly original post! 馃檪 Let’s go for it. 馃檪

The energy in a wave is proportional to the square of its amplitude, and we must integrate over one period (T) of the oscillation. The illustration below should help you to understand what’s going on. The fundamental mode of the wave is an oscillation with a wavelength (位1) that is twice聽the length of the string (L). For the second mode, the wavelength (位2) is just L. For the third mode, we find that 位3聽= (2/3)路L. More in general, the wavelength of the nth聽mode is聽位n聽= (2/n)路L.

modes

The illustration above shows that we’re talking sine waves here, differing in their frequency (or wavelength) only. [The speed of the wave (c), as it travels back and forth along the string, i constant, so frequency and wavelength are in that simple relationship: c = f路位.] Simplifying and normalizing (i.e. choosing the ‘right’ units by multiplying scales with some proportionality constant), the energy of the first mode would be (proportional to):

Integral 1

What about the second and third modes? For the second mode, we have two oscillations per cycle, but we still need to integrate over the period of the first mode T = T1,聽which is聽twice聽the period of the second mode: T1聽= 2路T2. Hence,聽T2聽= (1/2)路T1. Therefore, the argument of the sine wave (i.e. the聽x variable in the integral above) should go from 0 to 4蟺. However, we want to compare the energies of the various modes, so let’s substitute cleverly. We write:

Integral 2

The period of the third mode is equal to T3聽= (1/3)路T1. Conversely, T1聽= 3路T3. Hence, the argument of the sine wave should go from 0 to 6蟺. Again, we’ll substitute cleverly so as to make the energies comparable. We write:

Integral 3

Now聽that聽is interesting! For a so-called ideal聽string, whose motion is the sum of a sinusoidal oscillation at the fundamental frequency f, another at the second harmonic frequency 2路f, another at the third harmonic 3路f, etcetera, we find that the energies of the various modes are proportional to the values in the harmonic series 1, 1/2, 1/3, 1/4,… 1/n, etcetera. Again, Pythagoras’ conclusion was wrong (the ratio of frequencies of individual notes do not respect simple ratios), but his intuition was right: the harmonic series 鈭憂鈭1聽(n = 1, 2,…,鈭) is very聽relevant in describing natural phenomena. It gives us the respective energies of the various natural聽modes聽of a vibrating string! In the graph below, the values are represented as聽areas. It is all quite deep and mysterious really!

602px-Integral_Test

So now we know why we feel C4 and C5 have so much in common that we call them by the same name: C, or聽Do. It also helps us to understand why the E and A tones have so much in common:聽the聽third harmonic of the 110 Hz A2聽string corresponds to the fundamental frequency of the E4 string: both are 330 Hz! Hence, E and A have ‘energy in common’, so to speak, but less ‘energy in common’ than two successive E notes, or two successive A notes, or two successive C notes (like C4 and C5).

[…]聽Well… Sort of… In fact, the analysis above is quite appealing but聽鈥 I hate to say it聽鈥 it’s wrong, as I explain in my聽post scriptum聽to this post. It’s like Pythagoras’ number theory of the Universe: the intuition behind is OK, but the conclusions aren’t quite right. 馃檪

Ideality versus reality

We’ve been talking ideal聽strings. Actual tones coming out of actual strings have a quality, which is determined by the relative amounts of the various harmonics that are present in the tone, which is not聽some simple sum of sinusoidal functions. Actual tones have a waveform that may resemble something like the wavefunction I presented in my previous post, when discussing Fourier analysis. Let me insert that illustration once again (and let me also acknowledge its source once more: it’s聽Wikipedia). The red waveform聽is the sum of six sine functions, with harmonically related frequencies, but with different amplitudes. Hence, the energy levels of the various modes will not be proportional to the values in that harmonic series聽鈭憂鈭1, with n = 1, 2,…,鈭.

Fourier_series_and_transform

Das wohltemperierte Klavier

Nothing in what I wrote above is related to questions of taste like: why do I seldomly select a classical music channel on my online radio station? Or why am I not into hip hop, even if my taste for music is quite similar to that of the common crowd (as evidenced from the fact that I like ‘Listeners’ Top’ hit lists)?

Not sure. It’s an unresolved topic, I guess鈥攊nvolving聽rhythm聽and other ‘structures’ I did聽not聽mention.聽Indeed, all of the above just tells us a nice story about the structure of the聽language聽of music: it’s a story about the tones, and how they are related to each other. That relation is, in essence, an exponential function with base 2. That’s all. Nothing more, nothing less. It’s remarkably simple and, at the same time, endlessly deep. 馃檪 But so it is聽not聽a story about the structure of a musical piece itself, of a pop song of Ellie Goulding, for instance, or one of Bach’s preludes聽or聽fugues.

That brings me back to the original question I raised in my previous post. It’s a question which was triggered, long time ago, when I tried to read Douglas Hofstadter‘s聽G枚del, Escher and Bach, frustrated because my brother seemed to understand it, and I didn’t. So I put it down, and never ever looked at it again. So what is it really about that famous piece of Bach?

Frankly, I still amn’t sure. As I mentioned in my previous post, musicians were struggling to find a tuning system that would allow them to easily transpose musical compositions.聽Transposing music amounts to changing the so-called聽key聽of a musical piece, so that’s moving the whole piece up or down in pitch by some constant interval that is not equal to an聽octave. It’s a piece of cake now. In fact, increasing or decreasing the playback speed of a recording also amounts to transposing a piece: a increase or decrease of the playback speed by 6% will shift the pitch up or down by about one semitone. Why? Well… Go back to what I wrote above about that 12th root of 2. We’ve got the right tuning system now, and so everything is easy. Logarithms are great! 馃檪

Back to Bach. Despite their admiration for the Greek ideas around aesthetics聽鈥 and,聽most notably, their fascination with harmonic ratios! 鈥 (almost) all Renaissance聽musicians were struggling with the so-called聽Pythagorean聽tuning system, which was used until the 18th century and which was based on a聽correct聽observation (similar strings, under the same tension but differing in length, sound 鈥榩leasant鈥 when sounded together聽if聽鈥 and only if 聽鈥 the ratio of the length of the strings is like 1:2, 2:3, 3:4, 3:5, 4:5, etcetera) but a聽wrong conclusion聽(the frequencies of musical tones should also obey the same harmonic ratios), and Bach’s so-called聽‘good’聽temperament tuning system was designed such that the piece could, indeed, be played in most keys without sounding… well… out of tune. 馃檪

Having said that, the modern ‘equal temperament’ tuning system, which prescribes that tuning should be done such that the notes are in the above-described simple聽logarithmic relation to each other, had already been invented. So the true question is: why didn’t Bach embrace it? Why did he stick to ratios? Why did it take so long for the right system to be accepted?

I don’t know. If you聽google, you’ll find a zillion of possible explanations. As far as I can see, most are all rather mystic. More importantly, most of them do not mention many facts. My explanation is rather simple: while Bach was, obviously, a musical genius, he may not have understood what an exponential, or a logarithm, is all about. Indeed, a quick read of summary biographies reveals that聽Bach studied a wide range of topics, like Latin and Greek, and theology鈥攐f course! But math is not mentioned. He didn’t write about tuning and all that: all of his time went to writing musical masterpieces!

What the biographies do mention is that he always found other people’s tunings unsatisfactory, and that he tuned his聽harpsichords and clavichords himself. Now that聽is quite revealing, I’d say! In my view, Bach couldn’t care less about the ratios. He knew something was wrong with the Pythagorean system (or the variants as were then used, which are referred to as meantone temperament)聽and, as a musical genius, he probably ended up tuning聽by ear.聽[For those who’d wonder what I am talking about, let me quickly insert a Wikipedia graph illustrating the difference between the Pythagorean system (and two of these meantone聽variants) and the聽equal temperament聽tuning system in use today.]

Meantone

So… What’s the point I am trying to make? Well… Frankly, I’d bet Bach’s own tuning was聽actually equal temperament, and so he should have named his masterpiece聽Das gleichtemperierte Klavier. Then we wouldn’t have all that ‘noise’ around it. 馃檪

Post scriptum: Did you like the argument on the respective energy levels of the harmonics of an ideal string? Too bad. It’s wrong. I made a common mistake: when substituting variables in the integral, I ‘forgot’ to substitute the lower and upper bound of the interval over which I was integrating the function. The calculation below corrects the mistake, and so it does the required substitutions鈥攆or the first three modes at least. What’s going on here? Well… Nothing much…聽I just integrate over the length L taking a snapshot at t = 0 (as mentioned, we can always shift the origin of our independent variable, so here we do it for time and so it’s OK). Hence, the argument of our wave function sin(kx鈭捪塼) reduces to kx, with k = 2蟺/位, and聽位= 2L, 位聽= L, 位= (2/3)路L for the first, second and third mode respectively. [As for solving the integral of the sine squared, you can聽google聽the formula, and please do check my substitutions. They should be OK, but… Well… We never know, do we? :-)]

energy integrals

[…] No… This doesn’t make all that much sense either. Those integrals yield聽the same energy for all three modes. Something must be wrong: shorter wavelengths (i.e. higher frequencies) are associated with higher energy levels. Full stop. So the ‘solution’ above can’t be right… […] You’re right. That’s where the time aspect comes into play. We were taking a snapshot, indeed, and the mean value of the sine squared function is 1/2 = 0.5, as should be clear from Pythagoras’ theorem: cos2x + sin2x = 1. So what I was doing is like integrating a constant function over the same-length interval. So… Well… Yes: no wonder I get the same value again and again.

[…]

We need to integrate over the same聽time interval. You could do that, as an exercise, but there’s a more direct approach to it: the energy of a wave is directly proportional to its frequency, so we write: E 鈭 f. If the frequency doubles, triples, quadruples etcetera, then its energy doubles, triples, quadruples etcetera too. But 鈥 remember 鈥 we’re talking one string only here, with a fixed wave speed聽c =聽路f聽鈥 so f = c/位 (read: the frequency is inversely proportional to the wavelength)聽鈥 and, therefore (assuming the same (maximum) amplitude), we get that the energy level of each mode is inversely proportional to the wavelength, so we find that E聽鈭 1/f.

Now, with direct or inverse proportionality relations, we can always invent some new unit that makes the relationship an identity, so let’s do that and turn it into an equation indeed. [And, yes, sorry… I apologize again to your old math teacher: he may not quite agree with the shortcut I am taking here, but he’ll justify the logic聽behind.] So… Remembering that聽位1聽=聽2L, 位2聽= L, 位3聽=聽(2/3)路L, etcetera, we can then write:

E1聽= (1/2)/L, E2聽= (2/2)/L, E3聽=聽(3/2)/L, E4聽=聽(4/2)/L, E5聽=聽(5/2)/L,…, En聽=聽(n/2)/L,…

That’s a really聽nice result, because… Well… In quantum theory, we have this so-called聽equipartition theorem, which says that the permitted energy levels of a harmonic oscillator are equally spaced, with the interval between them equal to h or (if you use the angular frequency to describe a wave (so that’s 蠅 = 2蟺路f), then Planck’s constant (h) becomes 魔 = h/2蟺). So here we’ve got equipartition too, with the interval between the various energy levels equal to (1/2)/L.

You’ll say: So what? Frankly, if this doesn’t amaze you, stop reading鈥攂ut if this doesn’t amaze you, you actually stopped reading a long time ago. 馃檪 Look at what we’ve got here. We聽didn鈥檛 specify anything about that string, so we didn鈥檛 care about its聽materials or diameter or tension or how it was made (a wound guitar string is a terribly complicated thing!) or about whatever. Still, we know its fundamental (or normal) modes, and their frequency or nodes or energy or whatever depend on the length of the string聽only, with the ‘fundamental’ unit of energy being equal to the reciprocal length. Full stop. So all is just a matter of size and proportions. In other words, it’s all about structure. Absolute measurements don’t matter.

You may say: Bull****.聽What’s the conclusion? You still didn’t tell me anything about how the total energy of the wave is supposed to be distributed over its normal modes!聽

That’s true. I didn’t. Why? Well… I am not sure, really. I presented a lot of stuff here, but I did not聽present a clear and unambiguous answer as to how the total energy of a string is distributed over its modes. Not for actual strings, nor for ideal聽strings. Let me be honest: I don’t know. I really don’t. Having said that, my guts instinct that most of the energy 鈥撀爋f, let’s say, a C4 note 鈥 should be in the primary mode (i.e. in the fundamental frequency) must聽be right: otherwise we would not call it a C4 note. So let’s try to make some assumptions. However, before doing so, let’s first briefly touch base with reality.

For actual聽strings (or聽actual聽musical sounds), I suspect the analysis聽can be quite complicated, as evidenced by the following illustration, which I took from one of the many interesting sites聽on this topic. Let me quote the author: “A flute is essentially a tube that is open at both ends. Air is blown across one end and sound comes out the other. The harmonics are all whole number multiples of the fundamental frequency (436聽Hz, a slightly flat A4 鈥 a bit lower in frequency than is normally acceptable). Note how the second harmonic is nearly as intense as the fundamental. [My = blog writer’s 馃檪 italics] This strong second harmonic is part of what makes a flute sound like a flute.”

Hmmm… What I see in the graph is a first harmonic that is actually more聽intense than its fundamental, so what’s that all about? So can we actually associate a specific frequency to that tone? Not sure. :-/ So we’re in trouble already.

flute

If reality doesn’t match our thinking, what about ideality? Hmmm… What to say?聽As for ideal聽strings 鈥 or ideal flutes 馃檪聽鈥 I’d venture to say that the most obvious distribution of energy over the various modes (or harmonics, when we’re talking sound) would is the聽Boltzmann distribution.

Huh?Yes. Have a look at one of my posts on statistical mechanics. It’s a weird thing: the distribution of molecular speeds in a gas, or the density of the air in the atmosphere, or whatever involving many particles and/or聽a great degree of complexity (so many, or such a degree of complexity, that only some kind of statistical approach to the problem works鈥攁ll that involves Boltzmann’s Law, which basically says the distribution function will be a function of the energy聽levels involved:聽f =聽e鈥揺nergy. So… Well… Yes. It’s the logarithmic scale again. It seems to govern the Universe. 馃檪

Huh?Yes. That’s why I聽think: the distribution of the total energy of the oscillation should be some Boltzmann function, so it should depend on the energy of the modes: most of the energy will be in the lower modes, and most of the most in the fundamental. […]聽Hmmm… It again begs the question: how much聽exactly?

Well… The Boltzmann聽distribution strongly聽resembles聽the ‘harmonic’ distribution shown above (1, 1/2, 1/3, 1/4 etc), but it’s not quite the same. The graph below shows how they are similar and dissimilar in shape. You can experiment yourself with coefficients and all that, but your conclusion will be the same. As they say in Asia: they are “same-same but different.” 馃檪 […]聽It’s like the ‘good’ and ‘equal’ temperament used when tuning musical instruments: the ‘good’ temperament 鈥 which is聽based on harmonic ratios 鈥 is good, but not good enough. Only the ‘equal’ temperament obeys the logarithmic scale and, therefore, is perfect. So, as I mentioned already, while my assumption isn’t quite right (the distribution is not聽harmonic, in the Pythagorean聽sense), the intuition behind is OK. So it’s just like Pythagoras’ number theory of the Universe. Having said that, I’ll leave it to you to draw the correct the conclusions from it. 馃檪

graph