The Stripe of Automatic Speech Recognition (ASR)

The Stripe of Automatic Speech Recognition (ASR)

What is AssemblyAI?

I have a special summer discount going for my AI Supremacy Newsletter, help me get to 100 paid subscribers.

Get 25% off for 1 year

If you enjoy articles about A.I. at the intersection of breaking news join AiSupremacy?here. I cannot continue to write without community support. (follow the link below). For the price of a cup of coffee, Join 84 other paying subscribers.

https://aisupremacy.substack.com/subscribe

AI-as-a-Service in an audio friendly world - powering Automatic Speech Recognition (ASR)

Can I just say, I really like this startup? But there are so many A.I. startups now where I feel that way. This article first appear on my Artificial Intelligence Survey.

JULY 14TH, 2022 4:15 PM MONTREAL, CANADA

The speech to text AI startups, AssmeblyAI has now raised $64 million, according to Crunchbase.

AssemblyAI today announced that it raised $30 million in a Series B round led by Insight Partners with participation from Y Combinator and Accel. To date, AssemblyAI has raised $64 million, which founder and CEO?Dylan Fox?tells TechCrunch is being invested in growing the company’s research and engineering teams and data center capacity AI model training.

On LinkedIn, Dylan says he talks about t #ai, #startups, #deeplearning, #speechtotext, and #speechrecognition - that sounds about right.

You are reading AI Supremacy, the top machine learning Newsletter on Substack covering AI’s impact on business, society and technology. If you can share to colleagues, friends, family and on Reddit, Hacker News or LinkedIn I would be grateful.

Share


Origin Story


Fox founded AssemblyAI after a 2-year stint at Cisco, where he worked on machine learning for collaboration products. Prior to that, he started YouGive1, an organization that worked with companies to reward customers with product offers in exchange for nonprofit donations.

Here is?all they achieved in 2021.

AssembyAI has a great future with audio.

AssemblyAI is all about leveraging the same AI technology used to create popular AI models like?DALL-E 2, GPT-3, and?Google’s LaMDA model, to create State-of-the-Art AI models for transcribing, understanding, and analyzing audio and video data – including?Transformers,?Large Language Models, massive?GPU clusters, and large datasets.

First-of-their kind audio-first social networks, like Twitter Spaces and Clubhouse, have started popping up everywhere, including Substack’s App and?LinkedIn Audio Events.?There’s an API for that.

Smaller companies struggle to keep up, which is why many turn to “AI-as-a-service” vendors that handle the challenging work of creating models and charge for access to them through an API. One such vendor is?AssemblyAI, which focuses specifically on speech-to-text and text analysis services.

Automatically convert audio and video files and live audio streams to text with AssemblyAI's Speech-to-Text APIs. Do more with Audio Intelligence - summarization, content moderation, topic detection, and more. Powered by cutting-edge AI models.


The Stripe for Text to Speech


Four months ago, they announced our?$28M Series A led by Accel, with participation from Y Combinator, the Stripe founders – John and Patrick Collison, Nat Friedman, and Daniel Gross.

Now there’s yet another round. They are moving fast.

Democratizing Text to Speech


Their goal is to expose this progress to every developer and product team on the internet – via a simple set of APIs. As we continue to research and train State-of-the-Art AI models for ASR and NLP tasks (like speech recognition, summarization, language identification, and many other tasks), we will continue to expose these AI models to developers and product teams via simple APIs – available for free.

Over the past 6 months just in 2022 they’ve already launched ASR support for?15 new languages?– including Spanish, German, French, Italian, Hindi, and Japanese, released major improvements to our?Auto Chapters and Summarization models, Real-Time ASR models, Content Moderation models, and?countless other product updates.

They were founded in 2017.

“I was looking for speech recognition and natural language processing (NLP) APIs for past projects, and started AssemblyAI after seeing how limited, and low-accuracy, the available options were back in 2017,” Fox told TechCrunch in an email interview. “The company’s goal is to research and deploy cutting-edge AI models for NLP and speech recognition, and expose those models to developers in very simple software development kits and APIs that are free and easy to integrate.” - Dylan Fox

ASR stands for automatic speed recognition.

I consider AssemblyAI a rather promising AI startup.

With this new funding, they will be able to accelerate their product roadmap, build out better AI infrastructure to accelerate our AI research and inference engines, and grow our AI research team – which today include researchers from DeepMind, Google Brain, Meta AI, BMW, and Cisco.

The API for ASR


AssemblyAI has the go-to solution for analyzing speech, offering ultra-simple API access for transcribing, summarizing and otherwise figuring out what’s going on in thousands of audio streams at a time.

Their value proposition to me is really strong. Think about it, AssemblyAI offers AI-powered, API-based services in over 80 languages for automatic transcription, topic detection, and content moderation as well as “auto chapters,” which breaks down audio and video files into “chapters” with summaries for each. Using the platform, developers can call various APIs to perform tasks like “identify the speakers in this conversation” or “check this podcast for prohibited content” at a relatively low cost, starting at $0.00025 per audio-second.?

  • Automatic transcription
  • Topic detection
  • Content moderation
  • Auto Chapters
  • Speaker identification
  • Super cheap: $0.00025 per audio-second. That’s $.90 for 1 hour.

No alt text provided for this image

Super Minimalistic APIs


AssemblyAI offers a handful of different APIs that you can call extremely simply (a line or two of code) to perform tasks like “check this podcast for prohibited content,” or “identify the speakers in this conversation,” or “summarize this meeting into less than 100 words.”

No alt text provided for this image

So Many Use Cases


But Fox says AssemblyAI continues to grow at a fast clip, fueled by the pandemic, and — by extension — the rise of remote work. Audio and video is being incorporated into an expanding number of products, he notes, like videoconferencing and even?dating apps. That’s led product teams to look for ways to build additive, high-value features on top of audio and video data.

Thanks for reading!

I have a special summer discount going for my AI Supremacy Newsletter, help me get to 100 paid subscribers.

Get 25% off for 1 year

If you enjoy articles about A.I. at the intersection of breaking news join AiSupremacy?here. I cannot continue to write without community support. (follow the link below). For the price of a cup of coffee, Join 84 other paying subscribers.

https://aisupremacy.substack.com/subscribe

What do you think of their unique value proposition, product market fit and future potential?

Respond in a comment below.

Takahide Maruoka

Credly Top Legacy Badge Earner | ISO/IEC FDIS 42001 | ISO/IEC 27001:2022 | NVIDIA | Google | IBM | Cisco Systems | Generative AI

2 年

Japanese startups are raising funds through crowdfunding. In Japan, audio clubhouses were popular last year. This year, the clubhouse boom has cooled. You are right to focus on audio. I wanted to be scientifically more advanced in voice recognition than in image recognition technology. Image recognition is being researched and developed more by large and small companies. There is a need, especially in the medical field. If we can develop a service specializing in audio relations with speech recognition, it will create business opportunities. I think start-up companies are better suited for this.

Mitch Austin

CEO at Spirare Center for Airway and Sinus

2 年

This is exciting tech. I’d like to see it used in translationally for speech analytics as well. This looks like a route of diagnostics for neurological issues in speech and therapy. Educational speech and learning developmental may also benefit with these algorithms.

POOJA JAIN

Storyteller | Linkedin Top Voice 2024 | Senior Data Engineer@ Globant | Linkedin Learning Instructor | 2xGCP & AWS Certified | LICAP'2022

2 年

Insightful share ????Michael Spencer

Netra Hirani

Analyst at Bain & Co. | AI specialist | Author

2 年

It's brilliant! A great tool for audio analysis and conversational AI!

Michael Spencer

A.I. Writer, researcher and curator - full-time Newsletter publication manager.

2 年

Such a promising startup. Joseph Zaghloul

要查看或添加评论,请登录

社区洞察

其他会员也浏览了