Visualizing Music

Over in my guest book topic, Judah posed a question. I started composing a response but it got too long, so I've posted my stab at an answer here.

Hey, Virge, question for you: I was just listening to music and watching the visualizations on Windows Media Player, and they really seemed to flow with the music and mean something. It seemed like a strange thing that these randomly generated visualizations should have such a close bond with the music. I know that some of the input for the program comes from the music itself, but even so, how smart can the algorithm be? It doesn't know what's going on in my mind.


Is music able to convey relatively complex things through simple cues (volume, pitch, changes in pitch), and can those same cues be reflected in a visualization to the same effect? Or do the music and visualization convey only relatively simple things, which in turn are the cues for my mind to amplify on? Or is the whole thing a projection: what I hear in the music, I project onto the visualization, which in reality does not reflect the music except in a very simple way? What do you think?

As far as I can see, most of the Media Player visualizations are based on a combination of 5 elements:

  1. real time spectrum analysis of the audio--working out how much sound volume is present at each frequency, from the deepest bass to the highest sizzles,
  2. a geometric mapping--mapping the frequency and volume parts of the audio spectrum to different positions on the display and/or to different shapes or colours on the display,
  3. a time-based filtering algorithm so that a short duration change in the spectrum (e.g., the blip in the spectrum due to a single drum hit) will produce a change in the image that morphs and fades over a few seconds,
  4. some slow time-based input independent of the audio, e.g. a steady progression of the colors through parts of the palette, or a steady rotation of the whole picture,
  5. some pseudo-random input to introduce variety of images even if the audio is repetitive.

Of those elements, only #1 has any ability to connect the mood or feel of music to a visual effect. The spectrum is a very, very raw measure of the instantaneous timbre of the combined instruments.

This question reminds me of something I read recently (probably by Daniel Dennett, since I read two of his books over the last couple of months) about the problems of reductionism: we may eventually be able to completely explain in physical terms all the parts of the experience of music, but such a description will never be satisfying. Understanding each of the parts doesn't lead to an understanding of the whole. There are so many layers* in which a description of micro-behavior of one level depends on the macro-behavior of the underlying level, that our understanding of music as an experience cannot be usefully described in the language of acoustics. The information presented in a real-time updating audio spectrum is a very "mechanical" description of the music. What we experience when we listen to music is overwhelmingly determined by what we already have stored in our heads from past listening experiences. As you say, music is "able to convey relatively complex things through simple cues" and this is because we have a rich set of conditioned reactions to the language of music that we've absorbed as part of our culture.

In our westernized culture we associate certain sounds with certain emotions, e.g., a kazoo or tin whistle playing in a major key = playful; a violin playing in a minor key = wistful (even if you keep the same rhythm and tempo). The difference in tonality between major and minor makes a huge difference to the mood conveyed by a piece of music, but will be completely undetectable on a spectrum based visualization. It is possible to automatically analyze the tonal and harmonic content of music, and a visualization based on these might be able to run in real-time on a current PC, but I'm pretty sure none of the Media Player visualizers do so.

I think you're on the right track with the idea of projection, but it's not the whole story. Two other things are also contributing to the sensation:

  1. The people who design and select visualizations are actively choosing the ones that are interesting and pleasing representations of the music. Of all the possible ways to convert spectrum information to a picture, the only ones you get to see in the end product are ones that pleased the software geeks, and the marketers, and the customer focus groups, etc. So if there are certain mappings from audio spectrum -> color/shape/movement -> human image recognition -> association/memories -> emotions that do happen to correlate with the listening experience in our culture, then these will be selected for in the product development and marketing process.
  2. You are continually learning whether you realize it or not. When you listen to a piece of music and have an attractive pattern projected to your eyes it doesn't take long for your brain to start recognizing patterns (subconsciously) and building associations between those visual and audio patterns. If the visualization was mostly random, the associations wouldn't develop (except by accident). Because the visualizations are based primarily on consistent algorithmic manipulations of the audio information, the patterns are consistent and your brain learns them. Even if you go to a new visualization with different transformations, there is enough commonality in the way they process audio spectra for your brain to adapt, re-use what it has learned, and condition how you react to what you see.

* Here's my over-simplified map of the layers of knowledge required : mechanical vibrations -> sound transmission -> the ear as an auto-tuning sound transducer -> neural firings -> networked interconnections that have learned to "recognize" sounds/rhythms -> combinations of recognitions -> memory associations -> emotional states.


Thank you for that insightful response, Virge.

I don't agree with your implication that our associations with things like major/minor keys are entirely an artifact of Western culture. At best, I think that's begging the question: if we associate minor with sad because our culture does, why does our culture make that association? It has to start somewhere. But that's a different discussion.

It's an intriguing discussion too. Where does an arbitrary cultural convention start? The best explanations I've seen are in terms of memes.

If there was some reason or function behind our western major=happy/minor=sad convention, then we'd expect to see it exhibited in other cultures. Take a listen to the scales employed in Indian or middle eastern music, and you'll begin to appreciate how much our western ears are culturally tuned.

There are many aspects to harmony that have more than just cultural roots. The use of perfect fifths, octaves and unisons in medieval music were not arbitrary. When notes separated by these exact intervals are played/sung they sound like they belong together because there is a mathematical relationship between the frequencies (unison=frequency ratio 1:1, perfect fifth=3:2, octave=2:1), and there are no evident beat frequencies produced.
( )

Moving further up the harmonic series to intervals of a major third (approx. 5:4) and minor third (approx. 6:5) we can see why these might sound good to the ear (because of their mathematical relationship), but not as consonant as unisons, fifths and octaves. The difference tones make them sound harsher.

How good notes sound together can be traced back (mostly) to the mathematics of superposition of vibrations, but the connecting up of these recognizable harmonies to emotions I think is best explained as purely cultural, and probably has a coincidental start. Once a particular musical mode becomes, to some small extent, associated with an emotion, it is a convenient communication tool for a singer to use to help convey that emotion. This creates a feedback loop that locks in the association.

I think musical modes emerge in similar ways to idiomatic expressions in language. Take an expression like "the bee's knees". Why has this come to be associated with something highly admired? There is no intrinsic meaning of the individual words or their combination that would connect them to the culturally accepted meaning. The origin is more or less accidental. Someone coins it. Other's use it (because it's fun/clever, not because of any logic). It either locks in or it fades depending on many factors, including luck.

Well, I've heard a bit of Arabic music, and I have to agree with you that culture does play a role in our musical aesthetic. On the other hand, when you infer that the relation between harmony and emotion is purely cultural, you pass over the possibility that there are many scales/harmonies that are not mutually exclusive. Different cultures have latched onto one or the other, but that doesn't mean that they create the emotional associations out of whole cloth.

My main reason for believing that our associations with music are something intrinsic is empirical: from my own experience, I don't think that I've had nearly enough cultural saturation to hammer these associations into my mind from nothing. I don't think that I started with a blank slate, and I heard enough minor-key songs with external cues that told me "sad" that I finally just assimilated that notion into my consciousness without even realizing that I had done that. Heck, I've heard songs whose words did not fit the melody in the least, so I'm not even receiving one homogeneous impression from the "culture" around me.

Your metaphor of "the bee's knees" only goes part-way. Sure, I know what "the bee's knees" means. It's part of my idiom; I might even use it. But my knowledge of it is clearly something assimilated. I know why I know what it means. In my brain, "the meaning of 'the bee's knees'" is filed under externally acquired knowledge. My associations with music are far more visceral. I'm not prepared to ascribe that (fully) to some sort of unconscious cultural indoctrination. I'm ready to meet you half-way, though, and agree that the cultural feedback loop plays a role in reinforcing these associations.

Don't trust introspection. You are the person who knows the most about how you feel and what you know, but that doesn't make introspection an infallible guide to how you know and why you feel. It was personal experience and introspection that convinced our ancestors that the human heart was the centre of emotion. (We look back now and realise how easy it would have been to make that mistake. Emotions trigger physiological changes in the whole body, and a person's heart rate is an easily detectable symptom.)

Just because you don't notice how you're assimilating the cultural connotations of music doesn't mean they aren't seeping in, in lullabies your parents sang, in the songs you learned in pre-school, in every radio and TV advert, in the matching of lyrics to tonality within songs, in movie scores (which most people don't even notice while they concentrate on the attention-grabbing aspects of the movie). Music has a privileged path to your emotions. It doesn't need to be consciously parsed for meaning, so it usually escapes notice unless you've trained yourself to pay attention to it and analyze it and work out what it is about particular music that makes you feel as you do. An education in music sufficient to be able to do this is difficult to acquire before late teens, so most of one's musical culture is already in place before one can hope to understand how it got there.

That said, I'm prepared to concede (as I did above in the discussion of perfect intervals vs major and minor thirds) that some parts of harmonic structure are tied to emotions in a pre-wired fashion. There are intervals (like a major 7th ) that grate when presented in the raw, without any other sounds, and would probably grate on any person from any culture unless they had been counter-programmed to associate that sound with something very desirable. (Even then, the major seventh can and has been featured in very easy listening songs through the 20th century--Burt Bacharach springs to mind. We totally ignore the dissonance of the interval. In earlier centuries that sound would have been chaos, strife, anger. Prior to the 15th century it would have been held to be demonic.) Since there are differences in the "smoothness" or "niceness" of harmonic intervals that are physically measurable and correlated with people's perceptions, I'll grant that there could be some natural bias to interpretation of modes like major and minor. But, having seen the way music has changed over the centuries and how our perceptions of what "sounds right" and what "sounds wrong" have changed, I'm going to give most of the credit to culture.