Why aren't your conference recordings useful?
Toby Allen
Solving your customer identity challenges with Auth0 - Senior Solutions Engineer | CISSP | CCSP
How often does a call get recorded and not used for anything?
Perhaps one person, working remotely in a far flung corner of the globe, listens to it because they didn't want to get up at 3AM. But other than that they get pushed off into some dark corner of your storage never to be seen again.
How do you make the recordings useful?
Transcription!
Transcription is a great way to get usable information out of a recorded call. When people know they are being transcribed you can forget about taking notes and focus on the content of the meeting. No need to scribble down half captured action points. Also, accurate transcriptions pave the way for AI and ML to extract meaningful insights from recordings. Automatic summarisation, task extraction and more all become possible with high quality transcriptions.
Accuracy of transcription is determined by a number of factors with quality and clarity of the recording being the single biggest factor. The old saying "Garbage in, Garbage out" can absolutely be applied here.
- Codec choice - You should always use good old G.711 right? WRONG! G.711 is a narrow band codec designed to meet the constraints of old telephone systems. You want a wide band codec and the wider the better. OPUS really is the only sensible choice for a system today it supports bitrates up to 48kHz and is totally open source and royalty free. It is also important to chose the best codec on a user by user basis. Don't force an entire conference down to PSTN quality just because one person had to dial in.
- Channel separation - Don't throw away good quality audio by simply recording a single channel of fully mixed audio. What's the point of picking a high quality wide-band codec and then shoving all the channels together into a mono-MP3 file? Transcriptions work best when working with one speaker at a time. Give your transcription the best input it can get which is one speaker per audio stream/channel. The MP4 format can store many different channels in different codecs at different bitrates, take advantage of it. The vast majority of conference and CPaaS providers don't support channel separation or if they do its targeted at Contact Centre scenarios and only supports two channels.
- Context - Provide your transcription engine with context better known as a custom lexicon. Let it know peoples names, common company terms, cities where people are based etc etc. Any unusual words that might come up in the conversation should be provided. This reduces the possibility of words getting dropped or incorrectly recognised.
- Behavioural adaption - finally when the attendees know that they are being transcribed then you learn to adapt your behaviour on the call. Standardised phrases become more common. "Tom, your action is to research ..." This means when you're reviewing a transcript later you can jump to the key phrases and grab your action items.
At Clique we're obsessed with call quality. We've built our CPaaS platform to use and record the highest quality audio possible. This means using wide-band codecs and storing every channel on a multiparty call individually to ensure the most accurate transcriptions.
If you're interested in experiencing this check out Caw.me a high quality audio conferencing solution built on the CliqueAPI platform.