Over in my guest book topic, Judah posed a question. I started composing a response but it got too long, so I've posted my stab at an answer here.
Hey, Virge, question for you: I was just listening to music and watching the visualizations on Windows Media Player, and they really seemed to flow with the music and mean something. It seemed like a strange thing that these randomly generated visualizations should have such a close bond with the music. I know that some of the input for the program comes from the music itself, but even so, how smart can the algorithm be? It doesn't know what's going on in my mind.
Is music able to convey relatively complex things through simple cues (volume, pitch, changes in pitch), and can those same cues be reflected in a visualization to the same effect? Or do the music and visualization convey only relatively simple things, which in turn are the cues for my mind to amplify on? Or is the whole thing a projection: what I hear in the music, I project onto the visualization, which in reality does not reflect the music except in a very simple way? What do you think?
As far as I can see, most of the Media Player visualizations are based on a combination of 5 elements:
- real time spectrum analysis of the audio--working out how much sound volume is present at each frequency, from the deepest bass to the highest sizzles,
- a geometric mapping--mapping the frequency and volume parts of the audio spectrum to different positions on the display and/or to different shapes or colours on the display,
- a time-based filtering algorithm so that a short duration change in the spectrum (e.g., the blip in the spectrum due to a single drum hit) will produce a change in the image that morphs and fades over a few seconds,
- some slow time-based input independent of the audio, e.g. a steady progression of the colors through parts of the palette, or a steady rotation of the whole picture,
- some pseudo-random input to introduce variety of images even if the audio is repetitive.
Of those elements, only #1 has any ability to connect the mood or feel of music to a visual effect. The spectrum is a very, very raw measure of the instantaneous timbre of the combined instruments.
This question reminds me of something I read recently (probably by Daniel Dennett, since I read two of his books over the last couple of months) about the problems of reductionism: we may eventually be able to completely explain in physical terms all the parts of the experience of music, but such a description will never be satisfying. Understanding each of the parts doesn't lead to an understanding of the whole. There are so many layers* in which a description of micro-behavior of one level depends on the macro-behavior of the underlying level, that our understanding of music as an experience cannot be usefully described in the language of acoustics. The information presented in a real-time updating audio spectrum is a very "mechanical" description of the music. What we experience when we listen to music is overwhelmingly determined by what we already have stored in our heads from past listening experiences. As you say, music is "able to convey relatively complex things through simple cues" and this is because we have a rich set of conditioned reactions to the language of music that we've absorbed as part of our culture.
In our westernized culture we associate certain sounds with certain emotions, e.g., a kazoo or tin whistle playing in a major key = playful; a violin playing in a minor key = wistful (even if you keep the same rhythm and tempo). The difference in tonality between major and minor makes a huge difference to the mood conveyed by a piece of music, but will be completely undetectable on a spectrum based visualization. It is possible to automatically analyze the tonal and harmonic content of music, and a visualization based on these might be able to run in real-time on a current PC, but I'm pretty sure none of the Media Player visualizers do so.
I think you're on the right track with the idea of projection, but it's not the whole story. Two other things are also contributing to the sensation:
- The people who design and select visualizations are actively choosing the ones that are interesting and pleasing representations of the music. Of all the possible ways to convert spectrum information to a picture, the only ones you get to see in the end product are ones that pleased the software geeks, and the marketers, and the customer focus groups, etc. So if there are certain mappings from audio spectrum -> color/shape/movement -> human image recognition -> association/memories -> emotions that do happen to correlate with the listening experience in our culture, then these will be selected for in the product development and marketing process.
- You are continually learning whether you realize it or not. When you listen to a piece of music and have an attractive pattern projected to your eyes it doesn't take long for your brain to start recognizing patterns (subconsciously) and building associations between those visual and audio patterns. If the visualization was mostly random, the associations wouldn't develop (except by accident). Because the visualizations are based primarily on consistent algorithmic manipulations of the audio information, the patterns are consistent and your brain learns them. Even if you go to a new visualization with different transformations, there is enough commonality in the way they process audio spectra for your brain to adapt, re-use what it has learned, and condition how you react to what you see.
* Here's my over-simplified map of the layers of knowledge required : mechanical vibrations -> sound transmission -> the ear as an auto-tuning sound transducer -> neural firings -> networked interconnections that have learned to "recognize" sounds/rhythms -> combinations of recognitions -> memory associations -> emotional states.