Patronus AI

科技、信息和网络

San Francisco，California 5,762 位关注者

Powerful AI Evaluation and Optimization

关注

查看全部 31 位员工

关于我们

Patronus AI is the leading AI evaluation and optimization company. Our research-backed product enables AI engineers to optimize their agents, access powerful evaluation models, and automatically detect LLM system performance issues across 50+ modes. Leading technology companies and enterprises like AngelList, Etsy, and Pearson use Patronus AI to ship top-tier AI products. Founded by machine learning experts from Meta, Patronus AI is on a mission to accelerate the world's adoption of generative AI. We are backed by Notable Capital, Lightspeed Venture Partners, Stanford University, Datadog, Gokul Rajaram, and leading software and AI executives.

网站: https://patronus.ai
Patronus AI的外部链接
所属行业: 科技、信息和网络
规模: 11-50 人
总部: San Francisco，California
类型: 私人持股
创立: 2023

地点

主要

US，California，San Francisco

获取路线

Patronus AI员工

查看全部员工

动态

Patronus AI转发了
Anand Kannappan
8 小时前已编辑
举报此动态
Had an awesome time last week hosting 100+ AI researchers and engineers from NVIDIA, Databricks, Meta, Palantir Technologies, and more! At Patronus AI, AI research is a big part of what we do. We've begun to develop an important 10-year research agenda, led by the one and only Rebecca Qian. More to come on this soon. When we started the company, we wanted to balance research and product. Product is how we quantify the true impact of research, so research should ultimately serve customers. This means that the research we do is "applied", in the purest sense of the word. We are usually bottlenecked by engineering skill and speed, not simply research ideation (and maybe we're bottlenecked by compute too sometimes ??). Applied research might not be for everyone. But the most exciting part about applied research is you get to see your ideas come to life so quickly. That feeling is irreplaceable. At Patronus AI, we created: - The first widely popular open source LLM judge (Lynx) - The first standardized domain-specific LLM benchmark (FinanceBench) - The first SLM judge to beat GPT-4o mini (Glider) - The first Multimodal LLM judge - The first explainable evaluations And it's still Day -1 for us. If you're excited to join our team and push the frontiers of AI evaluation, let’s talk! Apply here: https://lnkd.in/gQHxW-RR
赞评论分享
Patronus AI

5,762 位关注者
2 天前
举报此动态
In this article, you will learn about best-in-class prompt testing techniques to ensure the quality, reliability, and accuracy of LLM systems, including prompt types, evaluation criteria, datasets, LLM-as-a-judge models, and testing infrastructure. All based on the latest AI research produced by the Patronus AI Team and the broader research community. #AI #NLP #businessintelligence

AI LLM Test Prompts: Best Practices for AI Evaluation

Patronus AI，发布于领英

赞评论分享
Patronus AI转发了
AI Brief

5,039 位关注者
1 周
举报此动态
Patronus AI has launched a new tool called Multimodal LLM-as-a-Judge for evaluating image-to-text AI systems. The tool, powered by Google Gemini, helps developers improve AI applications by checking text presence, grid structure, and object identification. Etsy is already using this technology to reduce AI hallucinations in product image captions. Read more: https://lnkd.in/eEjFqA5Y ?? Subscribe to the Daily AI Brief: https://lnkd.in/epHYTU3i #ai #artificialintelligence #ainews
赞评论分享
Patronus AI转发了
Anand Kannappan
1 周已编辑
举报此动态
Today, I am super excited to introduce the first Multimodal LLM-as-a-Judge! ?? Over the past year, creative teams have started to invest in image AI to unlock new user value. However, as they scale these new experiences, the unpredictability of their multimodal systems scales as well. We also kept hearing from our customers that they wanted to use our product for AI use cases beyond text. That’s why we developed the first MLLM-as-a-Judge, starting with image evaluation. ?? Our MLLM-as-a-Judge helps AI engineers score and optimize for image input to text output use cases. The product also supports out-of-the-box evaluators that can scan for text presence, grid structure, spatial orientation, and object identification. We're also thrilled to share how Etsy uses Patronus AI's MLLM-as-a-Judge to detect image caption hallucination ? MLLM-as-a-Judge represents the next phase of our vision to advance scalable oversight of AI.?The best is yet to come ?? Blog: https://lnkd.in/ebhvjbEX

11 条评论

赞评论分享
Patronus AI

5,762 位关注者
1 周已编辑
举报此动态
We are hosting an AI Bagels event with Notable Capital on Thursday in NYC! Come talk about AI with researchers from NVIDIA, Databricks, and Runway. And yes, we will have real bagels, not virtual bagels ?? See you there! ?? RSVP: https://lu.ma/fw31gni6
赞评论分享
Patronus AI

5,762 位关注者
2 周
举报此动态
Each month, Patronus will bring you the latest educational content on AI engineering. This includes methods and frameworks developed in our industry-leading AI evaluation research. In this month's inaugural article, you will learn the best practices and techniques for evaluating LLM-based applications using a combination of open-source tools, synthetic prompts, real-time monitoring, specialized LLM-as-a-Judge models, and reference frameworks. #ai #nlp #businessintelligence

LLM Testing: The Latest Techniques & Best Practices

Patronus AI，发布于领英

赞评论分享
Patronus AI转发了
Anand Kannappan
2 周
举报此动态
We are hiring AI researchers and ML engineers!??? The next generation of intelligent, agentic systems requires intelligent, scalable AI evaluation. At Patronus AI, we’ve trained and open sourced SOTA LLM judges, exposed vulnerabilities in leading models, and built a research-backed product that has been used by leading AI teams across companies like AngelList, HP, and OpenAI. ?? We’re looking for a research lead, research scientists, and ML engineers to develop AI that provides high quality oversight and feedback to AI systems. If you're excited about supervising autonomous applications, driving ambitious research projects, and working on the frontiers of AI evaluation, let’s talk! DM me or apply here: https://lnkd.in/gQHxW-RR

19 条评论

赞评论分享
Patronus AI

5,762 位关注者
3 周已编辑
举报此动态
Exciting to see Databricks use our eval benchmark FinanceBench to evaluate how well fine-tuning embedding models with synthetic data improves RAG performance! ? FinanceBench is the industry’s first standardized benchmark for LLM performance on financial questions. It's a large-scale set of 10k question and answer pairs based on public filings like SEC 10Ks. Since its launch, it has been used by thousands of financial institutions, universities, regulatory groups, and leading AI companies around the world. We’re thrilled to see Databricks push forward in RAG research, and we at Patronus AI are excited to continue bringing alpha evals to AI teams ?? Read the Databricks blog post: https://lnkd.in/dhqKH_zW Download the FinanceBench sample on Hugging Face: https://lnkd.in/emBP3DGu Read the FinanceBench arXiv paper: https://lnkd.in/eThVhwVy Reach out to us to learn more!

Improving Retrieval and RAG with Embedding Model Finetuning

databricks.com

赞评论分享
Patronus AI转发了
Notable Capital

65,241 位关注者
2 个月
举报此动态
With our first year as Notable Capital behind us, we closed out 2024 with a lot to celebrate! From a number of incredible investments like Patronus AI, Parafin, LocalStack, and more...to celebrating portfolio exits from HashiCorp, Ibotta, and Gem Security (acquired by Wiz) just to name a few - 2024 was a year that set the stage for what’s ahead. We recently shared our 2024 Year in Review letter with our LPs, highlighting the moments that made this year truly *notable* which you can see some of here: https://lnkd.in/grqibaDq

2024 | Notable Capital

notablecap.com

3 条评论

赞评论分享
Patronus AI

5,762 位关注者
2 个月
举报此动态
Last Thursday marked the end of the 12 Days of Christmas at Patronus AI ?? In case you missed it, here's a recap of everything we announced ?? Day 1: Automatic Failure Highlighting in LLM Outputs Day 2: FinanceBench v1.1 Day 3: Adaptive Dataset Uploads Day 4: 100 Prompt Injections Day 5: Patronus Experiments Day 6: Patronus Comparisons 2.0 Day 7: SOC-2 Type 1 Compliance Day 8: Excessive Agency Test Suite Day 9: 360 Degree Human Annotation Day 10: Lynx 2.0 Day 11: Criteria Copilot Day 12: Glider More coming soon ?? But for now, merry Christmas! ??

赞评论分享

相似主页

查看职位

融资

Patronus AI 共 2 轮

上一轮

A 轮 2024年6月22日

US$17,000,000.00

投资者

Notable Capital +11 其他投资者

在 Crunchbase 上查看更多信息

登录看看您认识Patronus AI的哪些人

Patronus AI

科技、信息和网络

San Francisco，California 5,762 位关注者

Powerful AI Evaluation and Optimization

关于我们

地点

Patronus AI员工

Glenn Solomon

Managing Partner at Notable Capital

Nnamdi Iregbulem

Partner at Lightspeed Venture Partners

Alexandra Chou

Sales at Patronus AI

Maciej Ge?don

Self Employed

动态

立即加入，查看您错过的职场动态

相似主页

Vertis

Nomic AI

Notable Capital

Kyber Technologies

Substrate

Lightspeed

Braintrust

Patronus Group

Exa

LlamaIndex

查看职位

分析师职位

工程师职位

用户体验设计师职位

实习生职位

网络安全工程师职位

首席信息安全官职位

销售专员职位

平台工程师职位

网络安全专员职位

活动制作人职位

科学家职位

助理职位

助理项目经理职位

首席软件工程师职位

机器学习工程师职位

客户合作伙伴职位

销售副总裁职位

工程总监职位

企业关系经理职位

项目主管职位

融资