Please remember my prior post referencing Pink Floyd’s production of “Money” from the album “Dark Side of The Moon.” The point made there was that you have two basic approaches to directionality;
- You can try to accurately capture, convey and reproduce an original acoustic event
- You can impart directionality as an effect, without regard for the original sound source
Both of these approaches can be taken to the nth degree of sophistication.
I think the idea that you can synthesize directional cues is rooted firmly in the decision that you’re going about the matter of directionality purely as an effect. You must ask yourself a series of questions;
- What am I really trying to convey during a conference call?
- Do I need to shift focus to the dominant speaker in the room?
- Does that change the sonic mix or image positioning?
- What is the effect of rotating the sonic image when the visual image stays with the video screen that shows the remote sites?
- As I move from one acoustic perspective to another will I suffer the equivalent of acoustic whiplash?
- How do I overlay the acoustic perspective of multiple sites?
- What is their relative orientation?
Let’s get what some might feel is a rather controversial statement out on the table. “Surround sound” as we commonly find in the entertainment industry, including all the various forms of 5.1 and 7.1 surround configurations, is not an effort to accurately convey anything at all. The acoustic perspectives presented are usually much more dramatic than the actual events would be. It’s all done for effect.
With 5.1 or 7.1 surround you get the effect of something happening beside or behind you, but typically no really rich directional clues. All the dialogue is arbitrarily in the front channels so as to be located on-screen with the characters. Most directional information is based upon relative audio levels in the various channels, usually with few temporal clues at all.
Surround sound systems, and specifically home theater systems, are designed around certain practical realities. There’s a limit to what can be sensibly implemented in a theater or your living room, and every living room is a different acoustic environment.
Nonetheless, in the case of TV and movies the surround sound effect is often very good. This I suspect is the result of the fact that there’s a lot going on in most scenes. With a very rich sound environment the ear can get by with simplistic directional cues. When there’s less activity in the scene that your attention is more focused on the dialogue on-screen seems natural.
None of the common surround sound systems even take into consideration the vertical plane. That is, they don’t convey height at all. The curious 22.2 channel surround system proposed by NHK for their new Super High-Definition TV (four times 1080p!) is the first to make that effort. Even then, they make little effort to convey vertical information in the surround channels, offering it only in the frontal hemisphere.
The very fact that there are so many different, incompatible surround sound schemes tends to suggest that none of them take a comprehensive, scalable approach. It’s a bit like the old days of quadraphonic audio.
My suspicion is that, given the two primary approaches, we’ll be better off taking the more strict path of trying to accurately encode the directional cues inherent in the source event, and recreate these at the remote ends.
This approach is uniquely and elegantly embodied in an existing technology known as “Ambisonics.” That will be a topic for another time.
For the moment I will close with words of wisdom from Ole Johansson;
“In Sweden the average household broadband connection is 10 MBps both ways so there’s no reason to compress. It’s time to go wideband. It’s time to go stereo…5.1 surround. Experiment with a new kind of communication experience instead of trying to emulate the PSTN.”
“…the technology has to be advanced anyway. You have to do something beyond the traditional PSTN network. Because, if you don’t, what’s the point of VOIP?”
“…maybe wideband in itself, or stereo won’t pay for itself, but it will certainly create data traffic…and someone is paying for that data traffic.”