Headsets Won’t Work Miracles: Here is How Digital Sound Gets Degraded in the 21st Century
by Cristian Guiducci, EU-accredited conference interpreter and semi-professional sound engineer
Highlights
- Sound quality in the booth is influenced by two main macro-components: a) the audio transmission chain; b) headphones and microphones. The audio chain is what can degrade sound the most, and yet for some reason the discussion seems to be focusing on headphones and microphones alone.
- Interpreters have no control of the complex transmission chain where most audio mismanagement can happen and often happens. This is the primary cause of poor sound in our headsets and remote simultaneous interpreting (RSI) platforms complicate things even further: Since they do without the necessary dedicated infrastructure, platforms have to rely on artificial intelligence and algorithms whose efficacy is by far insufficient to provide good quality audio.
- But if we still want to address peripherals (headsets and mics), two things have to be kept in mind: a) the main problems concern the sound feed coming from meeting participants, not from interpreters; b) one of the main purposes of manufacturer specifications is marketing, so manufacturer specifications and even ISO compliance have to be interpreted and understood within the proper technical context.
- Our ears are not ISO compliant machines. Auditory perception is an extremely complex “analogue” system and deserves to be treated as our supreme judge. No doubt, as any judge, it needs adequate training to distinguish poor sound from quality sound, otherwise it will never be able to tell good from evil.
Discussion
As a blind interpreter with an audio engineering background, I have to rely on good sound not only in the booth, but also in my daily life. Too often have I heard colleagues dismissing sound quality as a minor problem, and then making superhuman efforts to make sense of unintelligible speech owing precisely to poor sound. Even in the EU conference setting sound is often poor, and that in spite of the fact that headsets and microphones comply with ISO standards. A growing number of colleagues suffer from acoustic disorders like tinnitus, partial hearing loss, Menière syndrome etc. Canadian parliamentary interpreters also have experienced the consequences of poor sound, both before and after the transition to remote simultaneous interpreting.
Why can we get poor sound despite headphone compliance with ISO standards?
The transmission chain, i.e. everything that comes between the participants' microphones and the interpreters' headphones, usually introduces most of the sound alterations leading to poor or degraded audio. Merely concentrating on end user peripherals to improve sound quality is therefore pointless. Without an adequate and well functioning audio chain, no ISO compliant headset or microphone, cheap or expensive, can restore good quality sound.
Why is concentrating only on headsets or microphones not enough?
Until the late 90's, an analogue conference audio system - though full of cables and hidden gear - was relatively simple in terms of its electronic components. The typical chain from speaker to interpreter would consist of a room microphone, a low-noise preamplifier, a professional mixer, the interpreter's console (functioning as a headphone amplifier) and, to close the chain, the interpreter's headphones. In this “old school” setting, headsets undeniably were, along with preamplifiers, the weakest link in the chain.
Good quality microphones with good frequency response were rare and expensive. And robust, durable, lightweight open-back headphones with a wide frequency response were neither cheap, nor easy to manufacture.
But times change fast and today even a pair of well chosen and above all properly managed 2$ condenser microphones can produce studio-like stereo recordings of a symphonic orchestra.
How come then excellent sound gets butchered before it reaches our headphones?
Did you ever wonder why a friend's voice message sounds much clearer, crispier or pleasant to your ears than that long online meeting that left your ears and brain exhausted? That voice message is the result of a self-contained and above all, well designed and properly managed sound chain.
Now in the conference setting, and more so in the RSI setting, no matter how good and/or expensive the equipment you and other participants are using might be, the audio chain is often poorly managed: poorly tuned equalisers, compressors, limiters or feedback prevention mechanisms will still result in huge sound degradation even if on paper, the whole installation is ISO compliant. Furthermore, AI algorithms have no ears, so they have no idea what the final result of their activation will sound like.
Theoretically, modern technology enables the digital transmission of high quality sound either on site or across the globe with reduced cabling, cheaper setups and greater language regime flexibility and/or scalability.
However, audio chains between meeting participants and interpreters are very complex systems and are full of pitfalls. In large, institutional conference settings involving IP-based equipment, recent trends have seen their management outsourced to a remote location and there is not much sound technicians operating on site can do to improve things no matter how hard they try: access to advanced functions is restricted.
What is then so complex about these chains?
First, a preamplified speaker microphone output is transformed into binary code at a given sampling rate and with a specific bit depth by an analogue-to-digital converter. This digitised signal then undergoes digital compression via specific audio codecs relying on a compression algorithm with a given bitrate. Binary information travels on site through various network facilities. A mixing board is then used to process this data flow, and sound manipulation capabilities are endless: voice compression, echo cancellation, automatic gain control, noise reduction, automatic or feedback prevention, parametric equalisers with fine frequency band adjustment just to name a few. An exhaustive list clearly goes beyond the scope of this article.
Hundreds of different variables can exert a negative influence on the transmission chain. If this is not managed with great care, the sound quality delivered to our headphones can become much lower than that of the original sound.
What happens when we add remote participants or RSI to the mix?
On site, processed data is fed into our interpreting consoles, converted back to analogue sound by a digital-to-analogue converter (DAC), amplified and sent into our headphones.
But when a distant site is added, data will often travel thousands of miles through the internet and, here too, transmission protocol resilience, bandwidth capacity, overall network latency, average packet loss, all play a crucial role in determining what reaches the interpreters' headphones (and what does not), and in what condition.
Sound is usually muffled due to poor equalisation or exaggerated use of feedback control mechanisms. The overall frequency response on the interpreter’s end of the chain is severely reduced by low sampling rates or low quality real time compression codecs. Destabilising speech artefacts originate from the use of noise reduction filters or echo suppression algorithms that will inevitably end up cutting out some useful voice information.
Microphone sound cuts due to poor network jitter can also puzzle interpreters and negatively affect speech intelligibility. The list could go on for pages… In such a complex situation, our modern microphones and headphones are very seldom the weakest link in the whole chain. Extreme sound quality degradation can occur at various levels throughout the digital path and our poor headphones cannot work miracles, no matter how expensive or funky they are.
Manufacturer specifications are marketing tools: they are not necessarily reliable
When sound is poor, lack of headphone compliance with ISO standards is very unlikely to be the primary suspect, especially when transmission chains are so polluted by overprocessing. But if we really insist on talking standards and headsets, the discussion should at least be technically sound and precise. First, real technical specifications must not be confused with marketing devices: manufacturers often take advantage of their customer's lack of technical expertise and make aggressive, dubious and misleading marketing statements that tend to overstate the real performance of their equipment. In their guidelines, interpreter organisations rely almost solely on raw frequency response, sensitivity and weight.
AIIC lately published recommendations on headsets almost exclusively focus on frequency response without an adequate discussion of other physical characteristics that make a real difference in the booth.
Headphones are for instance either open-back (or half-open) or closed-back (or half-closed). Interpreters should clearly prefer open-back headphones independently of their circumaural (on ear) or sovraural (over the ear) characteristics. Open back headphones notoriously provide a relaxed sound for long listening sessions, produce much less auditory fatigue, and interfere less with phonation. Using closed-back headsets, we can not hear our voice naturally through our eardrums an have to rely on bone transmission only, which typically leads to increased vocal effort to compensate for reduced proprioception: when one or both ears are covered by a closed-back headset, we tend to force our voice through the “barrier”, and this strains our vocal folds. Hearing ourselves well is crucial to control our output and prosody, and ensure customer satisfaction. Ironically and sadly enough, most headphones included in AIIC recommendations are closed-back or half-opened-back headphones at best.
As far as frequency response and marketing statements are concerned, 10$ mics or headphones have, on paper, the same specifications of microphones and headphones that cost hundreds of dollars, but of course sound completely different. It is basically like comparing lead to gold. You can hear the difference in this videoclip.
Peripherals claiming a frequency response range between 100 and 18000 Hz might be ISO compliant and still sound horrible. Those figures alone will give no clue on how that mic or headphone sounds in the real world. Why? Because it is totally useless to say that a headset transducer can reproduce frequencies from 20 to 20000 Hz if no efficiency curve (plus or minus 3db) is provided, and the same is true for microphones.
This additional specification, which I could not find in AIIC’s latest recommendations is absolutely necessary to understand how linear frequency response is and it indicates how flat the curve is at the lower and higher end of the frequency spectrum: on paper, I can easily claim that my headset can reproduce 20Hz frequencies from a church organ, but it might reproduce them 60 dB softer than the rest of the spectrum, so a human ear will most probably not be able to hear them. Conversely, if frequencies around 2000 Hz are reproduced 70 dB louder than frequencies ranging from 10000 Hz and above, it is very unlikely that the user will be able to use the upper part of the frequency spectrum. Quoting just a raw marketed frequency response doesn't mean good sound, though compliance with ISO PAS 24019 / ISO 20109 might be ensured.
Noise cancelling is not a desirable feature and often pure “marketing jargon”.
Noise cancelling microphone capsules simply do not exist and noise cancelling headphones are not suitable for interpreters.
Marketing claims on noise cancellation need debunking. AIIC's recommendations include noise cancelling in the specs, but these specs are pure marketing devices, and they mean almost nothing.
So-called noise-cancelling microphones can be fitted both to USB headsets and to 3.5mm jack headphones. However, analogue headphones have no electronics and by definition no noise-cancelling function is possible, neither for headphones nor for microphones. Most of them use neodimium passive dynamic transducers in headphones and condenser microphones relying on plug-in power supplied by the connected sound interface. Thank God, no electronic circuit capable of noise cancellation is present: its function would hamper our ability to work well.
Indeed, some headsets can be fitted with a cardioid mic. Cardioid microphones have a directional polar pattern, with greatest sensitivity at the front and rejection at the back, but although this is marketed as “noise cancelling”, it has nothing to do with noise-cancelling at all. If we elaborate further, USB headsets with built-in electronics, may theoretically have some form of active noise cancelling function both for mics and headphones, though this would require additional mics to create an "inverted phase" signal to cancel external noise. But in the booth, this is not desirable, neither for headphones, nor for microphones. Good noise cancelling requires very expensive, patented algorithms and its use should be limited to very noisy environments and most importantly, these algorithms usually create sound artefacts and generate additional pneumatic pressure on our ears.
Wrong technical claims are more dangerous than marketing jargon
The Recent AIIC Checklist on RSI reads: "? Does the RSI platform provide adequate protection against acoustic shock (at least 102 dBSLP peak loads as per G616 guideline or ISO 20109-compliant: 94 dBA SPL for any duration longer than 100ms)".
Though full of legal references and technical jargon, this is totally nonsensical and electronically meaningless.
An RSI platform is just a "middle man" and output limiting can never occur in the middle of the chain where these platforms operate. If you listen to your favourite radio station, you can, on your end of the audio chain, decide if you want to listen at comfortable headphone sound level or if you want to amplify the sound using your 300 watts loudspeaker system and blast your windows. If this happens, for sure you will not be able to sue the radio station for not limiting sound "pressure", which, by the way, doesn't even exist as such in the digital chain.
A pair of well trained human ears is our only supreme court
Training our ears to distinguish good from bad quality sound is essential and no algorithm or ISO standard will ever replace that. Stringent food labelling is very useful, but if you keep eating junk food manufactured in compliance with all the applicable standards and rules day in and day out, no “labelling” standard will protect your health.
Conclusion
To conclude on a positive note, As a blind interpreter I am grateful and open to all technological advances that can offer me good sound and help me protect my auditory system. ISO standards are a milestone and their further development will improve our profession even further. But extreme caution is needed: misinterpreting these standards or neglecting to monitor compliance therewith, relying on headphone limiters when literature has shown they do not solve problems, or limiting our analysis to marketing specs while neglecting muddy, distorted, muffled and artificially manipulated sound can be really dangerous and give interpreters a false sense of safety. Along the same lines, underestimating what happens down the digital audio chain and not knowing about its countless, sound-degrading algorithms, both in the conference and in the RSI setting, or ignoring frequent packet loss, which swallows fragments of useful speech, is extremely risky.
If we miss this fast moving target and fail to rise up to all these challenges, our ears will suffer a lot more in the not so "remote future", much more than we dare imagine today.
A great thank you to Andrea Caniato for the excellent peer review along with both graphical and textual editing.
Conference interpreter EN-ES-FR-PL>IT (EU accredited)
4 年Dear Cristian, after reading your article I decided to share with you a fun remix flagged by Politico: https://www.youtube.com/watch?v=R3dUqV6CO3o. It quite well confirms what you say about the sound being degraded and "showcases" how this is impacting on the EP day-to-day meetings. By the way, Politico refers to meetings during the lock-down but believe me: nothing has changed so far
Conference Interpreter - French, German & Polish into English
4 年Thank you for a fascinating and informative article. It's quite daunting to see how much is behind the console so to speak. I do agree with colleagues who have suggested that headsets and microphones are a good place to START looking for improvements. And it's not just about having a quality headset but it's about the difference between ANY headsets with built in microphones and talking into a laptop's built in microphone or a mobile phone. My limited personal experience is very clear... a headset, any headset makes a huge difference to sound quality. But that is not to say it creates great sound on its own. It creates an improvement! Once everyone is wearing a headset... and the sound is still bad... then we make the case you so eloquently make, that it's complicated, difficult, and that there is no one entity who can control the full transmission chain. (And maybe that will encourage more people to hold their meetings face-to-face!)
Free lance interpreter EU - activist
4 年You are brilliant and I am probably asking a stupid question... But the problems we encounter are almost the same problems our clients and normal participants encounter, right? Except that we are more exposed... Or have I gotten it all wrong? Because it is better when it's a common problem...?
Interpreter and Conference Organizer, CIT Consultant
4 年How would you phrase it then? The Recent AIIC Checklist on RSI reads: "? Does the RSI platform provide adequate protection against acoustic shock (at least 102 dBSLP peak loads as per G616 guideline or ISO 20109-compliant: 94 dBA SPL for any duration longer than 100ms)".
French-English conference interpreter, AIIC, ATIO, owner, Osmosis Communications
4 年Thanks, excellent article. Two observations. Firstly, I think you'll agree that the logic invoked in favour of a good audio transmission chain applies in reverse, i.e. a quality well managed audio transmission chain can't make a crappy sounding microphone sound better. The same holds true for a lousy headset. In other words, a chain is only as strong as its weakest link. Second, though it is true that a platform is only an intermediary between the participant's mic and the interpreter's headset, I believe that any sound impulse capable of leading to acoustic shocks should be intercepted as far upstream as possible. Hence, if it produced at a participant's mic or in transit to platform due to mismanaged or faulty transmission equipment it should be dealt with upon arrival at platform and not allowed to move further downstream. Any dangerous sound impulses produced after leaving the platform should be dealt with at the level of the interpreter console (hard or soft) or via outboard compressor-limiter between console and headsets.