Creating binaural audio content and the object-based mixing workflow.

Creating binaural audio content and the object-based mixing workflow.

The concept of spatial or binaural audio can sound like something quite geeky and complicated, so let's democratize the subject here and now. 

In traditional stereo mixing, audio engineers send audio signals to two channels: Left and Right - what we all know as a stereo master bus. This is called channel-based mixing. As explained in part 1 of this article series, stereo does not reflect the reality of how we hear.

 Nowadays, we are becoming increasingly familiar with the concept of object-based mixing, which simply means that each source (object) is being positioned in a virtual environment (a virtual room) accompanied by their spatial parameters. Object-Based Audio (OBA) represents a breakthrough in live production, with next-generation codecs enabling the mixer to represent the soundfield (the scene) as an immersive image instead of just two channels.

No alt text provided for this image

The second piece of this puzzle is to take this immersive sound image (all the audio objects) and render it to the desired format for playback. The audio information in a binaural rendering gives us what we need to deliver the audio for headphones, while a channel-based rendering gives us what we need for delivering the audio through loudspeakers. 

Object-based mixing is far from new; it’s been used in movie productions for many years. The multi-channel audio experience you hear in a cinema is usually composed of multiple audio objects that have been positioned and moved within a virtual environment by a mixing engineer. Unlike binaural this type of multi-channel rendering is designed for a multi-speaker system, using various panning techniques which we won’t cover here. The speaker arrangements in a movie theatre, or in your home entertainment system, are just various channel-based diffusion system formats. Think Atmos, Auro 3D, DTS, Dolby vision, IMAX and all the other common surround sound formats.

Moving to object-based mixing for live sound engineers is quite simple in its essence. Individual audio tracks which were previously balanced (panned) between two stereo channels are now being declared as objects, and defined by their position. This workflow makes these mixes completely agnostic of the rendering type or the format arrangement.

 A mix is now based around a sound image that can be played with a real-time renderer or exported as a multichannel audio file with a standardized metadata model such as Audio Definition Model (ADM). From a portability and deliverability perspective, these exports are ideal, as they aren’t limited to a specific speaker arrangement or channel count, and can be rendered in the desired format. For engineers, moving to object-based mixing is truly a game-changer, opening the door to any format or stream type for a mix.

Now back to binaural rendering.

As we stated in part 1 of this article series, binaural audio differs from stereo in that it is a synthesis that virtualizes every object, and delivers the mix over headphones in two conventional audio channels. Let’s look at some of the challenges in delivering immersive binaural content. 

 Picture this: If we stick a microphone in each of your ears and record what you hear, and play it back for another individual, will it sound the same to them as it did to you? The answer is yes, to some extent. You see, for each of us, our body (ears, upper torso, etc.) plays an important role in how we perceive sound. So while another listener will get some sense of localization, it's like making them hear with your ears. 

 Much research has been conducted on how we perceive and localize sounds. As sound strikes the listener, a number of factors influence how the sound is perceived, including the size and shape of the head and ears, ear canal, nasal cavities, and more. This is what we refer to as HRTF filters (Head Related Transfer Function). The most commonly used tools for these measurements are what we refer to as a Generic Dummy Head. Humans can adapt and compensate for a generic HRTF filter that isn’t the perfect signature for their hearing capabilities.

No alt text provided for this image

 When streaming and recording material, many will rely on generic HRTFs such as Kemar, or the popular HRTF used in 360/VR pipelines such as the Neumann KU 100.

Ideally, for a mixing engineer wanting to work in binaural audio, the ultimate way of getting the most truthful and reliable monitoring experience, with a far more natural sense of space and direction, is by having their own individual HRTF. Unfortunately, a personalized HRTF is not easy to come by. Some laboratories do offer them, but the expense makes them largely impractical for most engineers. 


That being said, services do exist for creating your own personal HRTF, your aural ID, such as Genelec Aural ID. Starting with a video of your head and shoulder region from your mobile phone camera, the aural ID process builds an accurately-scaled 3D model of your head and upper torso dimensions, and from this delivers your personal HRTF file. A simple import of your personal HRTF file in your binaural renderer or monitoring tool gives you a sonic reproduction adapted specifically to you.

As mentioned earlier, headphones break the link to these natural mechanisms we have acquired over our lifetime, making it harder to localize sounds, since sounds from headphones seem to reside ‘inside’ our heads rather than all around us. Your personal unique HRTF and its effect helps to calculate how your head, external ears, and upper body impacts and colours the audio you hear.


Sanghyun Kim

Live Production Director at Manna Church, Audio, Lights, and Video

3 年

Hugo Larin is there any plan to make a standalone hardware Spat revolution product something like Soundscape or L-ISA?

回复

要查看或添加评论,请登录

Hugo Larin的更多文章

社区洞察

其他会员也浏览了