Immersive Audio: Sound All Around You

Ok, this is me diving into the deep-end of something that very possibly literally no-one in the world cares about. That’s just so typical of me. Actually, I know that a few people are starting to clue in about this because I’ve heard it come up here and there in conversation, most recently at the Sept 15th HDComms event in NYC.

This post is actually the third in a series. In the first (Pink Floyd: The Making Of Money & Directionality) I took a quick look at pop music recording practices and specifically the practice of recording things “close-mic’d” the adding ambience through synthetic means. In the second in the series (Codecs, Wideband & Stereo: A Conversation At Amoocon) I followed a conversation in the hallway at AMOOCON 2009, noting aspects of the discussion pertaining to “stereo” or the conveyance of directionality.

Once we get beyond PSTN audio quality, when wideband is accepted as normal, then “dimensional” or “immersive” audio becomes a new frontier for exploration in telephony. In fact, in some limited ways we’re already doing this in larger video conference room & telepresence suites.

While I don’t have deep knowledge of the video conference marketplace I’m comfortable in saying that no-one to this point is taking dimensional audio seriously. Those earlier two posts really served to highlight two facts;

  1. The entertainment industry in it’s various forms (music, movies, home theater, live theater, etc) doesn’t usually try to capture and accurately recreate an original acoustic event.
  2. Most often people use the common “surround sound” technologies for an effect, with no particular basis in reality. In some ways it’s just the newest way to convey “tires-squealing-on-sand” for CSI:Miami.

In the rest of this post I’d like to focus on a conference room setting as this is the most typical facility that can be enhanced by the use of accurate directional/dimensional audio.

Conference rooms are multi-dimensional. It sounds odd but it’s true. They have length width, height, and they exist in time. That’s four dimensions all to be considered. Most conference room audio systems resort to simple stereo playback. That is, playback using two speakers. Usually one on the left and another on the right.

It’s not really stereo, it’s actually dual mono. Why? Well, the far end pickup of the sound is usually with a single microphone and conveyed as a single signal. It might actually be an array of microphones built into one physical device, but it’s processed to be one signal..as if from one microphone. Polycom and LifeSize do this, so I presume that the various others do as well.

Conference-phones

The limiting factor is that they use the array of microphones to intelligently hone in on the loudest speaker. They work out who is the current speaker based upon the directionality of the loudest sound source. It’s a relatively simple process at work, not unlike using the balance know on a stereo to make one side louder than the other. It’s about equally effective at conveying what’s actually going on the room. It’s simply based on relative loudness. That’s only the most crude aspect of how we hear direction.

There is a question that you need to ask yourself. In considering directional audio for a conference room what are you trying to achieve? If you just want to hear the current speaker and nothing else, then the existing systems do a decent job. Good night and thanks for watching.

If you find something is lacking then perhaps what you really want is to more completely transport the people in the room into the other conference room, from a sonic perspective. Perhaps it’s just as important to know the the VP of R&D is fidgeting and shuffling papers as the CEO gives the pitch? It’s the acoustic equivalent of non-verbal communication. It’s capturing the entirety of the sound of whats going on at the point of origin and faithfully reproducing that at the other end. In all its dimensions. Yes, all five dimensions.

  • Reminds me of 3D video conferencing – it’s the next frontier in multimedia communications.

    • Indeed, there are similarities. The recent IBC show in Amsterdam was highlighted by very real movement towards 3D-HDTV. As that business becomes mature there will be economies of scale that can be leveraged in the video conferencing arena.

      The big difference between 3D sound and vision is that there is a lot more prior art in the realm of dimensional audio. Whether the various failed quad formats from the 60s & 70s, or Ambisonics, which was the one truly sensible technological approach, dimensional audio can be implemented today. Further, given the current capabilities of software, CPUs and DSPs, it doesn’t need to be costly.

  • Toyberg

    Thanks for your article and for focusing on ambisonics. I have a hard time understanding why you would argue for making a UHJ stereo from the 4ch recording. If you loose the ability to playback spheric audio in on the recieving end then what is the point?
    Kind regards,
    Jens Toyberg

    • mjgraves

      For the purposes of business conference applications like tele-presence height may have limited value. Planar surround may be enough to adequately convey goings on in a board room. 

      On this basis, some may prefer to limit the bandwidth requirements by passing only two channels. Existing video conference/tele-presence systems already do this. Thus passing UHJ encoded audio is an improvement that can be realized without additional bandwidth burden.Some approaches may elect to pass B format audio, and so deliver reproduction with height where bandwidth constraints are not a concern.