Immersive Audio: Sound All Around You
There is a long established, well researched and rational approach to doing this. It’s called Ambisonics and it is the brainchild of a brilliant mathematician, the late Michael Gerzon of Cambridge University in the UK.
There’s more to this topic than I can possibly convey as I really don’t have the math to get into the details. If I did I’d bore you out of your mind anyway, so just be happy that I can only give you a rough outline.
The basic are this…with a carefully designed array of microphone capsules you can capture the all the information necessary to recreate the sound heard at an exact point in space. The microphone capsules are mounted very close together to as if they were on the surfaces of a regular tetrahedron. When placed in a sound field (i.e. a room with something going on) all four of the mic capsule will “hear” the incident sound, but given they each point in a different direction they will convey the sound at a different phase angle.
The four signals, one from each mic capsule, effectively contain all the information required to record & replay the acoustic events at that location with extreme precision. In the language of Ambisonics this set of four signals are called “A Format.” They are the most basic signals possible in a working system.
When summed together they create a signal known as W that is essentially a simple mono representation of the soundfield. When the signals from the various mics are processed in a matrix manner, summing and differencing according to some mathematical guidelines you can derive directional signals of various sorts.
Here an example that’s especially easy to understand. The microphone array has a definite front so it’s orientation is important, but it’s omnidirectional. It picks up sound from all directions. By tinkering with the relationships between the signals from the various capsules we can make the microphone directional, emphasizing sounds from one direction. This effect can be varied to be subtle or profound, even acoustically “zoomed” like a lens.
We can even synthetically turn the microphone, to “aim” it at a different part of the performance. This kind of steerable directivity gives the ability, if desired, to fly a sound source around the room on playback. More likely we would cut from one acoustic perspective to another, like a editor switching from scene to scene. All of this is a relatively simple signal processing task that, in the case of a recording, can happen as a post-production process long after the live event has passed.
The great thing about the single point Ambisonic microphone concept is that it’s well suited to conference room applications. A single, very small, centrally located microphone can pickup a good sized room. If necessary more that one can be used and their outputs integrated to be processed as a single acoustic scene.
Microphone’s like the Core Sound Tetramic (pictured above right) are designed to extreme specifications. Care is taken to make the capsule array as small as possible so that the microphone is effectively a single point sample of the soundfield over the widest possible range of audio frequencies. The distance between the capsules becomes a source of differential phase or delay as the frequency of sound goes up.
Happily, for the purposes of video conferencing we don’t need such a tight spec. We’re not capturing the complexity of a symphony orchestra, so we don’t need dead flat frequency response from “DC-to-light.”
Comments are closed.
Reminds me of 3D video conferencing – it’s the next frontier in multimedia communications.
Indeed, there are similarities. The recent IBC show in Amsterdam was highlighted by very real movement towards 3D-HDTV. As that business becomes mature there will be economies of scale that can be leveraged in the video conferencing arena.
The big difference between 3D sound and vision is that there is a lot more prior art in the realm of dimensional audio. Whether the various failed quad formats from the 60s & 70s, or Ambisonics, which was the one truly sensible technological approach, dimensional audio can be implemented today. Further, given the current capabilities of software, CPUs and DSPs, it doesn’t need to be costly.
Thanks for your article and for focusing on ambisonics. I have a hard time understanding why you would argue for making a UHJ stereo from the 4ch recording. If you loose the ability to playback spheric audio in on the recieving end then what is the point?
Kind regards,
Jens Toyberg
For the purposes of business conference applications like tele-presence height may have limited value. Planar surround may be enough to adequately convey goings on in a board room.
On this basis, some may prefer to limit the bandwidth requirements by passing only two channels. Existing video conference/tele-presence systems already do this. Thus passing UHJ encoded audio is an improvement that can be realized without additional bandwidth burden.Some approaches may elect to pass B format audio, and so deliver reproduction with height where bandwidth constraints are not a concern.