Building a conversational voice AI agent? Check out our guide on everything we learned from experimenting with OpenAI, ElevenLabs, Vapi, Google Gemini and LiveKit.
Don't believe the demos you see from top frontier AI companies. Very little of what you see works as advertised. We learned this the hard way as we tried to build a conversational voice AI agent using the latest tech in the space: OpenAI, ElevenLabs, Google Gemini, LiveKit and Vapi. After 6 weeks of building and 50 hours of audio testing, we found that no solution could provide the critical balance of latency, quality and reliability we needed. So, we reverted back to a voice infrastructure using tools that have been around since 2023. Not as flashy, but it actually works. I get why frontier AI companies optimize for flashy demos; showcasing the latest mind-blowing capability is what attracts investment and talent. But I also hope we see more focus on what’s needed to build production-grade applications, much of which comes down to good old-fashioned, “unsexy” software engineering: stable, low latency APIs, debugging tools, and graceful error handling, to name a few. Despite these frustrations, the space is moving *so* fast, and it’s an exciting challenge to build on the cutting edge. In the last week, both ElevenLabs and LiveKit reached out to update us on promising new features that might resolve some the issues we encountered. These teams are working hard and achieving new breakthroughs every day, and we’re excited to continue to collaborate and share feedback. In the meantime, Tristan Jehan wrote a guide on everything we learned so that you don't make the same mistakes we did (link in comments). And if you're building voice AI, we’d love to connect and share notes!