Automatic Meeting Summaries:
             Get access to what matters!

Automatic Meeting Summaries: Get access to what matters!

In the traditional pre-pandemic world we all went to meetings – in person interactions that took place in offices, conference rooms, cafes and even bars. Some of us flew across the country or even around the world for more important meetings to meet face to face, shake hands and look each other in the eyes. We all know that meetings are a person’s medium to exchange verbal, visual and behavioral information, and to share emotions. For such an exchange to occur, we all had to be in the same place and at the same time.?During meetings, ideas are exchanged, questions are posed and answered. People smile, nod, gesticulate, and interact in many different ways. Some take notes in notepads or phones, others ferociously type away. However, despite the vibrant attempts to capture the moment with all the details and nuances; fast forward 24 hours …”What did Julian mean when he mentioned that?”; “Adrienne knew how to do it, what was it?". In other words, only seldom notes and bullet points survive and mostly out of context. I’ll get back to meetings in a second, but for now, what’s the alternative?

What are summaries? Summaries are condensed versions of a body of information that are so natural to us … this is why Twitter/Instagram/TikTok and all the TL/DR’s exist and are so popular today. Summaries capture the critical bits of information that are relevant in a larger context, and suppress the connective and illustrative components that make the interactions and conversations flow. What if we could summarize meetings, by capturing everything relevant that happens visually, verbally, and behaviorally? What if a machine could understand human interactions and put forward only the most relevant summary??

Summarizing interactions and conversations at dinner parties is hard. These are often undirected and open ended. Abstractive summarization of generic content is a hard problem in machine learning. It is largely unconstrained and often is ill defined, it may be too shallow for some and too pedantic for others. Summarizing meetings, on the other hand, with clear goals, domain specifics and constraints is concrete. Extractive summaries are much more rigorous; given an entire body of information, simply keep the bits that matter, and redact those that don’t. In this formulation it does make it a bit simpler, a discriminative model is all you need, but how does one know what to keep? The number of ways to summarize a meeting is combinatorially explosive. The key, however, is that most of the summaries are bad, and only a few are relevant. Good extractive summaries for a given meeting are only different in the target audience: some prefer one topic – design focused discussions, for example, while others care about technical details. Summaries can be generic - a population model, or user specific - personalized summaries. On the spectrum from population to personalized approaches, domain specific priors, such as preferences, interests, experience, expertise are used to guide the algorithms to prioritize certain details over others. Now let’s get back to meetings!

We don’t go to meetings anymore, we don’t fly to meetings, we don’t shake people’s hands in meetings, but we are in even more meetings. And instead of exercising our social EQ to interact, we stare at the screen full of small faces, trying to pay attention to someone’s powerpoint slides – it's hard, extremely ineffective, and unproductive. As humans, we are trained to understand each other based on all modalities of interactions: how we act, what we say, how we say it, and in what context. As Peter Drucker said: “The most important thing in communication is hearing what isn’t said.” How can we get all that from a Zoom like video call? Most of us have a hard time multitasking: listening and taking notes, watching for people’s reactions, let alone communicating non-verbally. However, if the algorithms can understand what we say, how we say it, and how we behave in virtual meetings, then perhaps they can summarize the meetings for us! I’ve long been a proponent of Human Centered AI (HCAI), not Artificial General Intelligence (AGI), AI that amplifies humans without replacement. Recently Yan LeCun agreed: “I think the phrase AGI should be retired and replaced by “human-level AI”. As humans, we are great at interacting, collaborating, and connecting; multitasking not so much. Using HCAI to help us communicate is Headroom’s goal!

In a talk at the ICCV’21 Multi-task Learning workshop, I gave an overview of the existing multimodal approach to multi-task learning as it pertains to understanding human interactions. The advances of sequence-to-sequence modeling using transformers made a great impact on Natural Language Understanding, and the now commonplace GPT models are widely applied across many language tasks. However, as mentioned above, it is not sufficient to analyze solely language to understand human interactions even in a virtual meeting format, other senses must be considered for an accurate summary. Combining several modalities in a unified transformer model is the focus of our ongoing work. The model takes audio visual streams along with participation activity in the meeting, as the input and jointly trains on multiple tasks ranging from engagement analysis and natural language understanding, to joint vision-and-language reasoning. The model encodes each modality as a sequence of hidden states with transformer encoders, preceded by a convolutional neural network feature extractor, and a transformer decoder over the encoded input modalities, followed by task-specific output heads applied on the decoder hidden states to make the final predictions for each of the tasks.?

Today, we launch the first generation of automatic multimodal summaries for all meetings in Headroom. These summaries capture key statements, actions and engagement of participants in meetings in real time. Using the latest novel algorithms in vision and language, we developed a multimodal approach to extractive summarization of meetings that will save you hours every day, allowing you to be more productive and focus on things you love, rather than … “What did Adrienne say?”

After an incredibly successful public launch of Headroom on Product Hunt last month, there's no reason to spend time trying to capture a moment as it's happening, or review a day's full of meetings to catch up. Meet in Headroom – focus on what matters!

Oksana Farenik

Help freelancers attract more clients | Ambassador at Gigger

6 个月

??

回复
Herbert Bay

CEO at Earkick | AI Leader | LLM | Entrepreneur | Board Member | Unicorn Hunter | Speaker | Adventurer

2 年

Well done Andrew and team

回复
Tuyen Trung Truong

Professor at University of Oslo (UiO)

2 年

Extremely Interesting! What are the formats (does the software extract some moments/actions from the video, or does it write some texts itself) of and how accurate are the summaries? More importantly, can you say something about how the software can achieve this feat?

Very exciting to see this development Andrew Rabinovich, Julian Green! Many congratulations. Relevant for enterprise and for the education domains!

Julian Green

Building AI weather and climate forecasting tools for all.

2 年

Try it out at www.goheadroom.com

要查看或添加评论,请登录

社区洞察

其他会员也浏览了