We had a fantastic time hosting our “How (not) to test AI” event at TECH WEEK by a16z. A big thank you to all our wonderful attendees ??. We weren’t able to host everyone as we had received hundreds of applications. But we promise to host more events of this sort soon - in both San Francisco and London. Follow us to stay tuned! ??The tl;dr of our event ?? The key framework to implementing successful and controllable AI applications: 1. Generate test datasets relevant for your AI use case and company 2. Test your AI application systematically with those datasets 3. Improve your AI models based on those tests e.g. via adjusting prompts or fine-tuning 4. Run this process continuously At Maihem, we provide you a platform to automate all of that, so you can build AI that works - responsibly and at scale. Reach out to us to learn how we can help your organization set up our industry-leading AI quality assurance and security testing solution. A big shoutout to our sponsors and supporters for this event at Urban Innovation Fund and Presidio Legal, P.C. See you at our next event! ?? #SFTechWeek #AI #AIsafety #AIsecurity #AIquality #qualityassurance #LLMs #YC
MAIHEM (YC W24)
数据基础架构与分析
San Francisco,California 1,200 位关注者
Industry-leading quality testing for mission critical AI applications | YC W24 | www.maihem.ai
关于我们
At MAIHEM, we create AI agents that continuously test AI products, such as conversational AI chat- and voice bots. We help companies improve and stress-test their AI products – automating quality assurance, red-teaming, and customer experience optimization.
- 网站
-
https://www.maihem.ai/
MAIHEM (YC W24)的外部链接
- 所属行业
- 数据基础架构与分析
- 规模
- 2-10 人
- 总部
- San Francisco,California
- 类型
- 私人持股
- 创立
- 2023
- 领域
- LLM analytics、generative AI analytics、AI safety、AI robustness、AI user analytics、NLP、LLMs、Predictive customer analytics、Business analytics、AI analytics、AI fairness、AI regulation readiness、risk evaluations、performance evaluations、fine-tuning、synthetic data、synthetic data simulations、AI risk simulations、AI agents、AI quality assurance、quality assurance、AI compliance、AI regulation和automated quality assurance
地点
MAIHEM (YC W24)员工
-
Lorcan Delaney
Principal at firstminute capital | ex-Morgan Stanley IB | Oxford CS
-
Eduardo Candela
Co-Founder @ MAIHEM (YC W24) | PhD AI Safety, Imperial | MIT alum | ex-Tesla
-
Gabriele Morello
Founding AI Engineer @ MAIHEM (YC W24) | prev. CERN, KTH
-
[Jayce] Lye Jia Jun
AI Red Team Engineer @ MAIHEM.ai (YC W24) | AI Safety Researcher | SMU Information Systems Undergraduate | NYP Top Cybersecurity Graduate | Former…
动态
-
Exciting to be at KI.ckstart this week! Meet our CEO Max Ahrens there as he is speaking about the future and best practices for testing AI and making genAI applications ready to be deployed by companies.
?? Nicht der Hype, sondern der Nutzen z?hlt – was generative KI wirklich bringt Zusammen mit Artur Felic hatte ich die Gelegenheit gestern, beim KI.ckstart Gen AI Event dabei zu sein – und ich muss sagen: Es war eine geballte Ladung Inspiration, Praxiswissen und Netzwerken rund um generative KI und ihre Anwendungsm?glichkeiten. ?? Eine zentrale Botschaft: ?Nicht bei der L?sung anfangen, sondern beim Problem.“ Dieser Satz von Alexander Britz (Microsoft Deutschland) zog sich wie ein roter Faden durch die Keynotes und Diskussionen. Auch Rémi Denoix und Kirstin Heinl (Zukunftszentrum Süd) betonten, dass der Schlüssel zu erfolgreichen KI-Projekten darin liegt, nicht bei der L?sung anzufangen, sondern beim Problem. ?? Mein Highlight: Der praxisnahe Einblick in die sichere Entwicklung von LLM-Anwendungen. Max Ahrens von MAIHEM (YC W24) zeigte, wie wichtig Testing und Monitoring sind, um Risiken wie Halluzinationen, Prompt-Injections oder sensible Themen, wie Politik in LLM-Applikationen zu reduzieren. Für mich, der viel im LLM-Bereich prototyped und forscht, ein sehr spannendes Thema! KI ist ein m?chtiges Werkzeug – aber nur, wenn wir es richtig einsetzen, wird es uns und unsere Unternehmen voranbringen. ?? Mein Takeaway: KI bietet gigantische Chancen, aber die Frage sollte immer zuerst lauten: Welches Problem will ich l?sen? ?? Wie sieht’s bei Euch aus? Welches Problem geht Ihr gerade mit generativer KI an?
-
Learn how to detect and mitigate AI hallucinations with our industry-leading AI testing and monitoring solutions. ?? In our latest article, we discuss how we leverage cutting-edge AI research findings to build AI hallucination detectors that beat all benchmark models. ?? Our solution: A Two-Pass Detection Method inspired by Map-Reduce. Multiple LLM systems analyze and categorize each claim as supported, unsupported, contradicted, or inferred. Results are then automatically consolidated and discrepancies being resolved – yielding an AI hallucination detector that yields state-of-the-art results in the most challenging claim verification benchmarks. ? Benefits: 1. Highest accuracy: Catching critical hallucinations in your AI product. 2. Scalable & efficient:?Cost-effective processing at large scale. 3. Reliable & interpretable outputs:?Enhancing the trustworthiness of your AI-generated content and of our evaluations. ?? Read the full article here to learn how we can help your organization with our industry-leading approach to AI hallucination detection: https://lnkd.in/e-AdfG5P #AI #genAI #AItesting #AIqualityassurance #hallucinations #AIreliability #trustworthyAI #AIsafety
-
Nothing excites us more than happy customers! ?? That‘s why we‘re putting in the hard work every single day: building something people want - building the future for testing AI applications. Meet Risotto (YC W24): they are building at the bleeding edge of technology, creating AI-powered IT co-pilots to streamline IT support. Companies like Medium and Retool use Risotto already. We‘re incredibly proud to help Risotto build AI products that meet the highest quality and security standards. If you’re building AI-powered applications - from chatbots, to co-pilots, to AI agents - check out Maihem to future-proof your AI development and testing: ??test your AI products systematically, continuosly, and automatically ? ship AI products and features faster ? build AI products that meet the highest quality standards. ?? Book a call with us to learn how we can help your company: https://lnkd.in/e-YBay2p
Let agents test your agents! ?????? Imagine my surprise when a test agent found an edge case in Risotto (YC W24) I hadn't considered: someone having trouble accessing an app they should have access to. Risotto applied the IT adage: turn it off and back on again.???? This week, I turned to MAIHEM (YC W24) to help us automate our smoke tests to prevent regressions because we're shipping fast and growing more complex by the day. It was easy to set up and the MAIHEM team has been super responsive. This feels like the future of testing, especially if you're building a product with a natural language interface. How much of your QA process is automated? How’s it working?
-
Check out how Chris Paul the Co-Founder of Risotto (YC W24) uses Maihem to test their very novel and complete IT support AI agent! Risotto is one of our favorite AI use cases to test – mission critical AI agents.
Let agents test your agents! ?????? Imagine my surprise when a test agent found an edge case in Risotto (YC W24) I hadn't considered: someone having trouble accessing an app they should have access to. Risotto applied the IT adage: turn it off and back on again.???? This week, I turned to MAIHEM (YC W24) to help us automate our smoke tests to prevent regressions because we're shipping fast and growing more complex by the day. It was easy to set up and the MAIHEM team has been super responsive. This feels like the future of testing, especially if you're building a product with a natural language interface. How much of your QA process is automated? How’s it working?
-
This is a great short article to learn about the top 10 vulnerabilities of LLM applications, and how to test and address them.
AI applications are transforming industries, but they also bring new security challenges. In our latest article, we dive into how to test and secure the OWASP Top 10 vulnerabilities for LLMs using Maihem’s automated AI testing platform—so you can deploy your AI applications with confidence! ???? ?? Read the full article here:?https://lnkd.in/eNKPWkZD #AI #Security #LLM #Cybersecurity #MachineLearning #OWASP #LLMSecurity #AITesting #AISecurity
How to Test the OWASP Top 10 Critical Vulnerabilities for LLMs
maihem.ai
-
AI applications are transforming industries, but they also bring new security challenges. In our latest article, we dive into how to test and secure the OWASP Top 10 vulnerabilities for LLMs using Maihem’s automated AI testing platform—so you can deploy your AI applications with confidence! ???? ?? Read the full article here:?https://lnkd.in/eNKPWkZD #AI #Security #LLM #Cybersecurity #MachineLearning #OWASP #LLMSecurity #AITesting #AISecurity
How to Test the OWASP Top 10 Critical Vulnerabilities for LLMs
maihem.ai
-
Overwhelmed on how to properly test your AI applications? You have a feeling that 'eyeballing it' might not be industry best practice? Your feeling is right. At Maihem, we provide you with a platform to automate quality and security testing, so you can build AI products that work - responsibly and at scale. Happy Friday, everyone. ??
-
Our CEO Max Ahrens spoke at the AI agent panel during TECH WEEK by a16z hosted by Untapped Ventures and Georgian! Check out the highlights in the post below!
Driving AI Innovation as General Partner, Untapped Ventures | AI Keynote Speaker | Proud Husband & Father of 3 Boys
What an incredible event at SF TECH WEEK by a16z, with over 170 attendees! From thought-provoking discussions to building meaningful connections, this was truly a ?? event to remember! We were honored to co-host with Georgian and bring together some of the brightest minds in VC and AI for an unforgettable gathering focused on Agentic and Vertical AI. A massive thank you to our amazing lineup of speakers: Jon Chu (Khosla Ventures), Nahim Nasser (Georgian), Pranav R. (Conviction), Alejandra Vergara (Bee Partners), Arash Afrakhteh (Pear VC), Andrew Brackin (Gradient), Saurabh Sharma (You.com), Ashar Rizqi (Bounti.ai), Jacky W.(Nullify), Aria Attar (TensorStax), Jonas Diezun (Beam AI) and Max Ahrens (MAIHEM (YC W24). Check out some of the highlights below! ?? If you're in LA next week, we're hosting another amazing panel at LA Tech Week and would love to have you there! Register here: https://lnkd.in/grCuhnZt #SFTechWeek #AgenticAI #VerticalAI #AIInnovation #VC #AIVC #techweek
-
+3
-
We're hosting an exclusive #SFTechWeek lunch event for founders and senior technical leaders from VC-backed startups, scale-ups, and established companies that build LLM-based applications. Oct 9, 1pm PT, San Francisco. We'll be keeping the audience size small (max. 12 attendees, 1 attendee per company) to create an intimate gathering, focusing on candid conversations and valuable knowledge-sharing among senior decision makers around quality testing for AI applications. Chatham House rules. Attendees will be carefully selected to ensure a diverse mix of leaders actively shaping the future of AI. Apply here to attend: https://lnkd.in/erK6zyR6 Few spots remaining. #AI #AIquality #AIsafety #AIsecurity #SFTechWeek #YC
RSVP to AI pioneers lunch – how (not) to test AI products | Partiful
partiful.com