登录查看更多内容

The ongoing effort to improve Rossa process from an AI Researcher

Cinnamon AI

Cinnamon AI provides an AI platform for enterprises to utilize unstructured data.

发布日期: 2022年1月4日

Meeting diarization is one of the focuses of the Rossa voice team. Meeting diarization aims at answering “who spoke what” in a meeting, which has various applications in daily conversations and call centers.?

In meeting diarization, speaker diarization is one of the most important tasks, which is to specify who is speaking in the conversation. It is the technical process of splitting up an audio recording into smaller segments, where each segment contains the voices of only one speaker and recognizing the speakers in each segment accordingly. When the speakers are unknown, speaker diarization is the task of clustering speech segments into different groups, where each corresponds to one speaker.?

Essentially, speaker diarization builds a feature extraction model, namely speaker embedding model, to extract the voice characteristics from a given audio segment. It then groups the audios based on the similarity of the extracted features.

Rossa Voice team started the meeting diarization product, namely Facilun, from scratch where no research of speaker recognition was investigated. Understanding the difficulties of the team in voice distinguishing, Aiden decided to research and make an effort in this process. Aiden faced many challenges at the beginning, where he didn’t know which model is most effective and what dataset can be used. Through many experiments, failed and tried again, and…failed again, Aiden can gradually handle the technologies. He understood which models are suitable, when they failed and why they failed. Up to now, it is hard to say that we have a perfect speaker diarization model, but some good baseline models were built. We are reaching the level of application, where the requirement of accuracy is over 90%, meaning that error rates lie below 10% in normal recording conditions.

By the separation of speakers, the speech-to-text output becomes more readable. Moreover, it supports many following tasks of ASR output processing. Punctuation, which is to segment the ASR output onto sentences, is more accurate. Information extraction from the ASR output is easier as we know whose important sentences should be extracted. Meeting content can be better classified as we know it belongs to the call center’s operator or from the customers, etc.

Danny Butvinik 1 年前

2023: A pivotal year for AI evolution - What's next in…

Anndy Lian 11 个月前

AI Concepts Simplified: What is Natural Language…

FocusKPI, Inc. 1 年前

Almost all members of Rossa are involved in the development of the diarization modules. Some of the core features are developed in collaboration with the ASR module, and some NLP techniques are applied. Diarization was improved over time by implementing techniques that have similar characteristics to the ones used in NLP and ASR. It was done through collaboration among AI Researchers in Rossa’s projects. One of the lessons learned based on collaboration is listening and learning to compromise. It’s essential to listen closely to each team member’s ideas, feedback, advice, as well as to respond in a considerate and respectful manner. Reaching a compromise is the best way to approach different perspectives.

Although speaker diarization has achieved certain results and improved various tasks, there still remain some limitations in terms of performance, especially when the recording environment is bad. Speaker diarization needs to be further improved to satisfy the requirements of such applications. In addition, speaker diarization could be meaningless if it is used alone. Many efforts of adaptation and collaboration with other speech processing modules would be made toward more efficient application to bring true values to Rossa’s clients.

Tran Minh Quan

Senior Developer Technologist @ NVIDIA | Visual Computing, GPU Computing

2 年

Artist Idol.

Ernesta Dao

Project Manager

2 年

Aiden Idol <3

1 次回应

查看更多评论

要查看或添加评论，请登录

The ongoing effort to improve Rossa process from an AI Researcher

Cinnamon AI

Cinnamon AI provides an AI platform for enterprises to utilize unstructured data.

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

A Comprehensive Guide to Natural Language Understanding in 2024

Cue the next phase of NLP training: CAI & NLP 024

Why context is truth in conversational AI

ChatGPT and its Rivals: Which AI Model Should You Choose?

ChatGPT internals, and its implications for Enterprise AI

Quick view of ChatGPT

AI-Powered Chatbots: Bridging the Gap Between Backend Systems and Seamless User Interaction

AI Tools For Text Generation

Agentic vs. Non-Agentic AI Chatbots: Unveiling the Future of AI Interaction

Mastering AI Interactions: Advanced Tips and Tricks for Tech Enthusiasts

领英推荐

Drive digital transformation by seamlessly handling internal documents in their original formats.

2024年5月16日

?????????????????????????? ???????? ????????????????

2024年3月22日

Cinnamon and our interpolation solution in the domain of Japanese animated movies

2022年11月28日

Colorization Techblog - Cinnamon and our colorization solution in the domain of Japanese animated movies

2022年9月22日

How Software Engineer and AI newbies can collaborate together

2022年1月18日

The collaboration in Rossa Voice Project

2021年12月8日

Business Mindset in Aurora Project

2021年10月29日

Business mindset in Aurora Projects

2021年10月8日