Immersive Audio: Sound All Around You

Michael Graves

17 years ago

Ok, this is me diving into the deep-end of something that very possibly literally no-one in the world cares about. That’s just so typical of me. Actually, I know that a few people are starting to clue in about this because I’ve heard it come up here and there in conversation, most recently at the Sept 15th HDComms event in NYC.

This post is actually the third in a series. In the first (Pink Floyd: The Making Of Money & Directionality) I took a quick look at pop music recording practices and specifically the practice of recording things “close-mic’d” the adding ambience through synthetic means. In the second in the series (Codecs, Wideband & Stereo: A Conversation At Amoocon) I followed a conversation in the hallway at AMOOCON 2009, noting aspects of the discussion pertaining to “stereo” or the conveyance of directionality.

Once we get beyond PSTN audio quality, when wideband is accepted as normal, then “dimensional” or “immersive” audio becomes a new frontier for exploration in telephony. In fact, in some limited ways we’re already doing this in larger video conference room & telepresence suites.

While I don’t have deep knowledge of the video conference marketplace I’m comfortable in saying that no-one to this point is taking dimensional audio seriously. Those earlier two posts really served to highlight two facts;

The entertainment industry in it’s various forms (music, movies, home theater, live theater, etc) doesn’t usually try to capture and accurately recreate an original acoustic event.
Most often people use the common “surround sound” technologies for an effect, with no particular basis in reality. In some ways it’s just the newest way to convey “tires-squealing-on-sand” for CSI:Miami.

In the rest of this post I’d like to focus on a conference room setting as this is the most typical facility that can be enhanced by the use of accurate directional/dimensional audio.

Conference rooms are multi-dimensional. It sounds odd but it’s true. They have length width, height, and they exist in time. That’s four dimensions all to be considered. Most conference room audio systems resort to simple stereo playback. That is, playback using two speakers. Usually one on the left and another on the right.

It’s not really stereo, it’s actually dual mono. Why? Well, the far end pickup of the sound is usually with a single microphone and conveyed as a single signal. It might actually be an array of microphones built into one physical device, but it’s processed to be one signal..as if from one microphone. Polycom and LifeSize do this, so I presume that the various others do as well.

The limiting factor is that they use the array of microphones to intelligently hone in on the loudest speaker. They work out who is the current speaker based upon the directionality of the loudest sound source. It’s a relatively simple process at work, not unlike using the balance know on a stereo to make one side louder than the other. It’s about equally effective at conveying what’s actually going on the room. It’s simply based on relative loudness. That’s only the most crude aspect of how we hear direction.

There is a question that you need to ask yourself. In considering directional audio for a conference room what are you trying to achieve? If you just want to hear the current speaker and nothing else, then the existing systems do a decent job. Good night and thanks for watching.

If you find something is lacking then perhaps what you really want is to more completely transport the people in the room into the other conference room, from a sonic perspective. Perhaps it’s just as important to know the the VP of R&D is fidgeting and shuffling papers as the CEO gives the pitch? It’s the acoustic equivalent of non-verbal communication. It’s capturing the entirety of the sound of whats going on at the point of origin and faithfully reproducing that at the other end. In all its dimensions. Yes, all five dimensions.

There is a long established, well researched and rational approach to doing this. It’s called Ambisonics and it is the brainchild of a brilliant mathematician, the late Michael Gerzon of Cambridge University in the UK.

There’s more to this topic than I can possibly convey as I really don’t have the math to get into the details. If I did I’d bore you out of your mind anyway, so just be happy that I can only give you a rough outline.

The basic are this…with a carefully designed array of microphone capsules you can capture the all the information necessary to recreate the sound heard at an exact point in space. The microphone capsules are mounted very close together to as if they were on the surfaces of a regular tetrahedron. When placed in a sound field (i.e. a room with something going on) all four of the mic capsule will “hear” the incident sound, but given they each point in a different direction they will convey the sound at a different phase angle.

The four signals, one from each mic capsule, effectively contain all the information required to record & replay the acoustic events at that location with extreme precision. In the language of Ambisonics this set of four signals are called “A Format.” They are the most basic signals possible in a working system.

When summed together they create a signal known as W that is essentially a simple mono representation of the soundfield. When the signals from the various mics are processed in a matrix manner, summing and differencing according to some mathematical guidelines you can derive directional signals of various sorts.

Here an example that’s especially easy to understand. The microphone array has a definite front so it’s orientation is important, but it’s omnidirectional. It picks up sound from all directions. By tinkering with the relationships between the signals from the various capsules we can make the microphone directional, emphasizing sounds from one direction. This effect can be varied to be subtle or profound, even acoustically “zoomed” like a lens.

We can even synthetically turn the microphone, to “aim” it at a different part of the performance. This kind of steerable directivity gives the ability, if desired, to fly a sound source around the room on playback. More likely we would cut from one acoustic perspective to another, like a editor switching from scene to scene. All of this is a relatively simple signal processing task that, in the case of a recording, can happen as a post-production process long after the live event has passed.

The great thing about the single point Ambisonic microphone concept is that it’s well suited to conference room applications. A single, very small, centrally located microphone can pickup a good sized room. If necessary more that one can be used and their outputs integrated to be processed as a single acoustic scene.

Microphone’s like the Core Sound Tetramic (pictured above right) are designed to extreme specifications. Care is taken to make the capsule array as small as possible so that the microphone is effectively a single point sample of the soundfield over the widest possible range of audio frequencies. The distance between the capsules becomes a source of differential phase or delay as the frequency of sound goes up.

Happily, for the purposes of video conferencing we don’t need such a tight spec. We’re not capturing the complexity of a symphony orchestra, so we don’t need dead flat frequency response from “DC-to-light.”

The playback end of an Ambisonic systems is also something that needs to be carefully considered. Playback is normally via an array of loudspeakers. Yes, an array. Usually more than two, often 6-8 identical speakers.

The physical layout of the speakers is variable. Since the source microphone encodes the entirety of the soundfield at the source site the signal to each speaker can be tailored to reflect the speakers positions in at the other end. That is, if a given speaker is not in an ideal location the signal being fed to that channel can be tweaked (phase & delay) to compensate. This is not unlike the directional mixing described earlier when considering post-production signal processing. We can derive the appropriate signal for any given speaker location based upon the actual speaker location relative to the orientation of the soundfield.

Also, given the scope of information encoded in the source signals, any number of channels can be derived from the 4 basic signals. If you have a very large room like a theater you can extract any number of completely separate channels from the base signals. Each speaker receives a unique signal that makes its location fit in the entirety of the playback environment.

One of the major benefits to Ambisonics is that it support fully periphonic reproduction. The term “Periphonic” means including height information. The common surround systems used in films are called “planar surround” systems as they only deal with sound in the left-right and front-back planes. An Ambisonic system can convey the sound of airplanes overhead, not that that’s a factor in a typical conference call. However, to know that a speaking party has stood up could be useful.

(Photo above) A portable Ambisonic playback rig used to demonstrate truly effective periphonic surround sound. The eight small speakers form the corners of a cube. This system was demonstrated by Hugh Pyle at OpenDork in Boston, September 2009.

For our purposes in conferencing it may be less than desirable to transmit all four channels of the basic Ambisonic signal set. We simply may not want to use that much bandwidth. Thankfully, Ambisonics provides a two channel signal format referred to as UHJ encoding that we can use. UHJ encoding was intended to allow a full Ambisonic mix to be conveyed in the more common two-channel (stereo) media like FM radio, LPs, CDs, video & audio cassettes.

In fact, an Ambisonic recording when UHJ encoded can be played back and it will sound great! The dimensional effects are lost because much of the directional cues are folded back into the plane of the stereo speakers, but the recording…or in our case the conference call…will still sound like high-quality stereo. Simple stereo positional effects will be sustained.

The greatest service that I can provide is to point those still interested in this area to a list of references. There are literally hundreds of papers and articles on Ambisonics dating back to the early 1970s. Here’s a short list of resources that will no doubt lead to countless hours of reading:

Whatever Happened To Ambisonics? – A great primer
Ambisonia
The Ambisonia Wiki
Ambisonics.Net
Wikipedia on Ambisonics
List of Ambisonic publications
University of York Music Technology Group
Surround Sound mailing list at Virginia Tech