There is a long established, well researched and rational approach to doing this. It’s called Ambisonics and it is the brainchild of a brilliant mathematician, the late Michael Gerzon of Cambridge University in the UK.
There’s more to this topic than I can possibly convey as I really don’t have the math to get into the details. If I did I’d bore you out of your mind anyway, so just be happy that I can only give you a rough outline.
The basic are this…with a carefully designed array of microphone capsules you can capture the all the information necessary to recreate the sound heard at an exact point in space. The microphone capsules are mounted very close together to as if they were on the surfaces of a regular tetrahedron. When placed in a sound field (i.e. a room with something going on) all four of the mic capsule will “hear” the incident sound, but given they each point in a different direction they will convey the sound at a different phase angle.
The four signals, one from each mic capsule, effectively contain all the information required to record & replay the acoustic events at that location with extreme precision. In the language of Ambisonics this set of four signals are called “A Format.” They are the most basic signals possible in a working system.
When summed together they create a signal known as W that is essentially a simple mono representation of the soundfield. When the signals from the various mics are processed in a matrix manner, summing and differencing according to some mathematical guidelines you can derive directional signals of various sorts.
Here an example that’s especially easy to understand. The microphone array has a definite front so it’s orientation is important, but it’s omnidirectional. It picks up sound from all directions. By tinkering with the relationships between the signals from the various capsules we can make the microphone directional, emphasizing sounds from one direction. This effect can be varied to be subtle or profound, even acoustically “zoomed” like a lens.
We can even synthetically turn the microphone, to “aim” it at a different part of the performance. This kind of steerable directivity gives the ability, if desired, to fly a sound source around the room on playback. More likely we would cut from one acoustic perspective to another, like a editor switching from scene to scene. All of this is a relatively simple signal processing task that, in the case of a recording, can happen as a post-production process long after the live event has passed.
The great thing about the single point Ambisonic microphone concept is that it’s well suited to conference room applications. A single, very small, centrally located microphone can pickup a good sized room. If necessary more that one can be used and their outputs integrated to be processed as a single acoustic scene.
Microphone’s like the Core Sound Tetramic (pictured above right) are designed to extreme specifications. Care is taken to make the capsule array as small as possible so that the microphone is effectively a single point sample of the soundfield over the widest possible range of audio frequencies. The distance between the capsules becomes a source of differential phase or delay as the frequency of sound goes up.
Happily, for the purposes of video conferencing we don’t need such a tight spec. We’re not capturing the complexity of a symphony orchestra, so we don’t need dead flat frequency response from “DC-to-light.”