Patronus AI

科技、信息和网络

New York，New York 5,340 位关注者

Automated AI Evaluation and Security

关注

查看全部 31 位员工

关于我们

Patronus AI is the leading automated AI evaluation and security company. Our world-class platform enables enterprise development teams to score LLM performance, generate adversarial test cases, benchmark LLMs, and more. Customers use Patronus AI to detect LLM mistakes at scale and deploy AI products safely and confidently. Founded by machine learning experts from Meta AI and Meta Reality Labs, Patronus AI is on a mission to boost enterprise confidence in generative AI. We are backed by Lightspeed Venture Partners, Replit CEO Amjad Masad, Gokul Rajaram, and Fortune 500 executives and board members.

网站: https://patronus.ai
Patronus AI的外部链接
所属行业: 科技、信息和网络
规模: 11-50 人
总部: New York，New York
类型: 私人持股
创立: 2023

地点

主要

US，New York，New York

获取路线

Patronus AI员工

查看全部员工

动态

Patronus AI

5,340 位关注者
1 天前已编辑
举报此动态
Exciting to see Databricks use our eval benchmark FinanceBench to evaluate how well fine-tuning embedding models with synthetic data improves RAG performance! ? FinanceBench is the industry’s first standardized benchmark for LLM performance on financial questions. It's a large-scale set of 10k question and answer pairs based on public filings like SEC 10Ks. Since its launch, it has been used by thousands of financial institutions, universities, regulatory groups, and leading AI companies around the world. We’re thrilled to see Databricks push forward in RAG research, and we at Patronus AI are excited to continue bringing alpha evals to AI teams ?? Read the Databricks blog post: https://lnkd.in/dhqKH_zW Download the FinanceBench sample on Hugging Face: https://lnkd.in/emBP3DGu Read the FinanceBench arXiv paper: https://lnkd.in/eThVhwVy Reach out to us to learn more!

Improving Retrieval and RAG with Embedding Model Finetuning

databricks.com

赞评论分享
Patronus AI转发了
Notable Capital

65,137 位关注者
1 个月
举报此动态
With our first year as Notable Capital behind us, we closed out 2024 with a lot to celebrate! From a number of incredible investments like Patronus AI, Parafin, LocalStack, and more...to celebrating portfolio exits from HashiCorp, Ibotta, and Gem Security (acquired by Wiz) just to name a few - 2024 was a year that set the stage for what’s ahead. We recently shared our 2024 Year in Review letter with our LPs, highlighting the moments that made this year truly *notable* which you can see some of here: https://lnkd.in/grqibaDq

2024 | Notable Capital

notablecap.com

3 条评论

赞评论分享
Patronus AI

5,340 位关注者
2 个月
举报此动态
Last Thursday marked the end of the 12 Days of Christmas at Patronus AI ?? In case you missed it, here's a recap of everything we announced ?? Day 1: Automatic Failure Highlighting in LLM Outputs Day 2: FinanceBench v1.1 Day 3: Adaptive Dataset Uploads Day 4: 100 Prompt Injections Day 5: Patronus Experiments Day 6: Patronus Comparisons 2.0 Day 7: SOC-2 Type 1 Compliance Day 8: Excessive Agency Test Suite Day 9: 360 Degree Human Annotation Day 10: Lynx 2.0 Day 11: Criteria Copilot Day 12: Glider More coming soon ?? But for now, merry Christmas! ??

赞评论分享
Patronus AI

5,340 位关注者
2 个月
举报此动态
Introducing Glider - the smallest model to beat GPT-4o-mini on eval tasks ???? - Open source, open weights, open code - Explainable evaluations by nature - Trained on 183 criteria and 685 domains And that’s our 12th day of Christmas at Patronus AI ???? Download Glider on Hugging Face: https://lnkd.in/eud69M8w Try out Glider on Patronus for free: https://app.patronus.ai? Read the ArXiv paper: https://lnkd.in/eSnAmZ9g? Read our blog: https://lnkd.in/ej3a4fME Glider demo on Hugging Face Spaces: https://lnkd.in/eRyn3WY8? Read the VentureBeat coverage by Michael Nu?ez: https://lnkd.in/eZ8xrg-2?

赞评论分享
Patronus AI

5,340 位关注者
2 个月已编辑
举报此动态
On the 11th day of Christmas, we are announcing… Criteria Copilot! ?? Customers frequently tell us they want their AI products to work well on specific dimensions they care about, like brand voice and topic relevance. Our Judge Evaluators help with exactly this – you can use our LLM judges to test against specific evaluation criteria that you define by hand. And today, we’re announcing the Criteria Copilot to make these evaluation criteria easier to define ?? After you write out your evaluation criteria in natural language, the Criteria Copilot automatically flags semantic ambiguities, grammatical issues, and formatting improvements! ?? Try it out here: https://app.patronus.ai? Read the docs: https://lnkd.in/eNn-7BqS

赞评论分享
Patronus AI

5,340 位关注者
2 个月
举报此动态
Introducing Lynx v2.0, an 8B State-of-the-Art RAG hallucination detection model ?? Since we released Lynx v1.1 a few months ago, hundreds of thousands of developers have used it in all kinds of real world applications for real-time RAG hallucination detection. ? Now, Lynx v2.0 is even better ?? and it was trained on long context data from real world domains like finance and medicine. - Beats Claude-3.5-Sonnet on HaluBench by 2.2% - 3.4% higher accuracy than Lynx v1.1 on HaluBench - Optimized for long context use cases? - Detects 8 types of common hallucinations, including Coreference Errors, Calculation Errors, CoT hallucinations, and more! Use Lynx 2.0 with any of our Day 1 integration partners like NVIDIA, MongoDB, and Nomic AI ? And that’s our 10th day of Christmas at Patronus AI ???? 2 more to go! Try it out with the Patronus API: https://app.patronus.ai? Read the docs: https://lnkd.in/e-rMzMe8? Read the Lynx arXiv paper: https://lnkd.in/eznVjrWA? Read the Lynx blog: https://lnkd.in/eYaP5Zpe
赞评论分享
Patronus AI

5,340 位关注者
2 个月
举报此动态
On the 9th day of Christmas, we are announcing… 360 Degree Human Annotation! ?? We have always believed that AI evaluation should involve both AI and humans in the loop. Humans are the golden standard. While our state-of-the-art automated evaluators can catch a variety of failures, humans are great at uncovering new trends and problems while reviewing these failures. For example, many customers tell us that studying hallucinations flagged by Lynx often leads to the discovery of entirely new failure modes. ?? And that’s why we’re introducing 360 Degree Human Annotation, a new way to deepen AI-human collaboration during evaluation ? - Define Annotation Criteria directly in the UI - Rate your AI product’s outputs using feedback categories like Discrete and Continuous? - Add detailed explanations to your annotations for extra depth and insight Try it out here: https://app.patronus.ai? Read the docs: https://lnkd.in/er5rP6iB

赞评论分享
Patronus AI

5,340 位关注者
2 个月
举报此动态
On the 8th day of Christmas, we are announcing… Excessive Agency Test Suite! ?? As AI agents become more popular, we frequently hear that developers want to limit agent permissions and scope, in order to prevent end users who are attempting fraud and scams. Developers don’t want their agents accessing tools or taking autonomous actions that could put everything at risk. The OWASP Top 10 LLM Vulnerabilities list comprehensively captures all security failure modes, and Excessive Agency at #6 on the list is an important one to address during the agent build phase. That’s why we are releasing the Excessive Agency Test Suite. Our research team developed powerful generator models to create this comprehensive test suite, and identified that these new tests have high attack success rates against AI agents. ? You can search for “owasp-llm06-excessive-agency” in Patronus Datasets to view and download the dataset, or access it remotely in code using the Patronus SDK. ?? Try it out here: https://app.patronus.ai?

1 条评论

赞评论分享
Patronus AI

5,340 位关注者
2 个月
举报此动态
On the 7th day of Christmas, we are announcing... SOC-2 Type 1 Compliance! ?? Not much else to say here, except that your data will always be safe and secure with us ?? More on compliance coming soon next year!
赞评论分享
Patronus AI

5,340 位关注者
2 个月已编辑
举报此动态
On the 6th day of Christmas, we are announcing… Patronus Comparisons 2.0! ?? When customers start fixing issues with their LLM systems after using our evaluators, they usually want to know whether these fixes can actually work against unseen data. AI engineers are running evals all the time, so they would experiment over and over again with new prompts, system parameters, finetuned LLMs, or LLM providers. But hill climbing only works if… you know you’re actually going up the hill! ?? We built Patronus Comparisons for exactly this reason. Developers can visually compare performance on their agents and RAG systems across time periods, offline vs. online settings, LLMs, evaluators, and more. ? With 2.0, we rolled out a redesign along with new features like side-by-side sample comparisons, saved views with customizable filters, exportability, and more viz. ?? Try it out here: https://lnkd.in/gXfHvyww

赞评论分享

相似主页

查看职位

融资

Patronus AI 共 2 轮

上一轮

A 轮 2024年6月22日

US$17,000,000.00

投资者

Notable Capital +10 其他投资者

在 Crunchbase 上查看更多信息

登录看看您认识Patronus AI的哪些人

Patronus AI

科技、信息和网络

New York，New York 5,340 位关注者

Automated AI Evaluation and Security

关于我们

地点

Patronus AI员工

Glenn Solomon

Managing Partner at Notable Capital

Nnamdi Iregbulem

Partner at Lightspeed Venture Partners

Alexandra Chou

Sales at Patronus AI

Maciej Ge?don

Self Employed

动态

立即加入，查看您错过的职场动态

相似主页

Vertis

Nomic AI

Notable Capital

Kyber Technologies

Substrate

Lightspeed

Braintrust

Patronus Group

Hugging Face

Rogo

查看职位

分析师职位

工程师职位

用户体验设计师职位

实习生职位

网络安全工程师职位

首席信息安全官职位

销售专员职位

平台工程师职位

网络安全专员职位

活动制作人职位

科学家职位

助理职位

助理项目经理职位

首席软件工程师职位

机器学习工程师职位

客户合作伙伴职位

销售副总裁职位

工程总监职位

企业关系经理职位

项目主管职位

融资