Patronus AI的封面图片
Patronus AI

Patronus AI

科技、信息和网络

New York,New York 5,340 位关注者

Automated AI Evaluation and Security

关于我们

Patronus AI is the leading automated AI evaluation and security company. Our world-class platform enables enterprise development teams to score LLM performance, generate adversarial test cases, benchmark LLMs, and more. Customers use Patronus AI to detect LLM mistakes at scale and deploy AI products safely and confidently. Founded by machine learning experts from Meta AI and Meta Reality Labs, Patronus AI is on a mission to boost enterprise confidence in generative AI. We are backed by Lightspeed Venture Partners, Replit CEO Amjad Masad, Gokul Rajaram, and Fortune 500 executives and board members.

网站
https://patronus.ai
所属行业
科技、信息和网络
规模
11-50 人
总部
New York,New York
类型
私人持股
创立
2023

地点

Patronus AI员工

动态

  • 查看Patronus AI的组织主页

    5,340 位关注者

    Exciting to see Databricks use our eval benchmark FinanceBench to evaluate how well fine-tuning embedding models with synthetic data improves RAG performance! ? FinanceBench is the industry’s first standardized benchmark for LLM performance on financial questions. It's a large-scale set of 10k question and answer pairs based on public filings like SEC 10Ks. Since its launch, it has been used by thousands of financial institutions, universities, regulatory groups, and leading AI companies around the world. We’re thrilled to see Databricks push forward in RAG research, and we at Patronus AI are excited to continue bringing alpha evals to AI teams ?? Read the Databricks blog post: https://lnkd.in/dhqKH_zW Download the FinanceBench sample on Hugging Face: https://lnkd.in/emBP3DGu Read the FinanceBench arXiv paper: https://lnkd.in/eThVhwVy Reach out to us to learn more!

  • Patronus AI转发了

    查看Notable Capital的组织主页

    65,137 位关注者

    With our first year as Notable Capital behind us, we closed out 2024 with a lot to celebrate! From a number of incredible investments like Patronus AI, Parafin, LocalStack, and more...to celebrating portfolio exits from HashiCorp, Ibotta, and Gem Security (acquired by Wiz) just to name a few - 2024 was a year that set the stage for what’s ahead. We recently shared our 2024 Year in Review letter with our LPs, highlighting the moments that made this year truly *notable* which you can see some of here: https://lnkd.in/grqibaDq

  • 查看Patronus AI的组织主页

    5,340 位关注者

    Last Thursday marked the end of the 12 Days of Christmas at Patronus AI ?? In case you missed it, here's a recap of everything we announced ?? Day 1: Automatic Failure Highlighting in LLM Outputs Day 2: FinanceBench v1.1 Day 3: Adaptive Dataset Uploads Day 4: 100 Prompt Injections Day 5: Patronus Experiments Day 6: Patronus Comparisons 2.0 Day 7: SOC-2 Type 1 Compliance Day 8: Excessive Agency Test Suite Day 9: 360 Degree Human Annotation Day 10: Lynx 2.0 Day 11: Criteria Copilot Day 12: Glider More coming soon ?? But for now, merry Christmas! ??

  • 查看Patronus AI的组织主页

    5,340 位关注者

    Introducing Glider - the smallest model to beat GPT-4o-mini on eval tasks ???? - Open source, open weights, open code - Explainable evaluations by nature - Trained on 183 criteria and 685 domains And that’s our 12th day of Christmas at Patronus AI ???? Download Glider on Hugging Face: https://lnkd.in/eud69M8w Try out Glider on Patronus for free: https://app.patronus.ai? Read the ArXiv paper: https://lnkd.in/eSnAmZ9g? Read our blog: https://lnkd.in/ej3a4fME Glider demo on Hugging Face Spaces: https://lnkd.in/eRyn3WY8? Read the VentureBeat coverage by Michael Nu?ez: https://lnkd.in/eZ8xrg-2?

  • 查看Patronus AI的组织主页

    5,340 位关注者

    On the 11th day of Christmas, we are announcing… Criteria Copilot! ?? Customers frequently tell us they want their AI products to work well on specific dimensions they care about, like brand voice and topic relevance. Our Judge Evaluators help with exactly this – you can use our LLM judges to test against specific evaluation criteria that you define by hand. And today, we’re announcing the Criteria Copilot to make these evaluation criteria easier to define ?? After you write out your evaluation criteria in natural language, the Criteria Copilot automatically flags semantic ambiguities, grammatical issues, and formatting improvements! ?? Try it out here: https://app.patronus.ai? Read the docs: https://lnkd.in/eNn-7BqS

  • 查看Patronus AI的组织主页

    5,340 位关注者

    Introducing Lynx v2.0, an 8B State-of-the-Art RAG hallucination detection model ?? Since we released Lynx v1.1 a few months ago, hundreds of thousands of developers have used it in all kinds of real world applications for real-time RAG hallucination detection. ? Now, Lynx v2.0 is even better ?? and it was trained on long context data from real world domains like finance and medicine. - Beats Claude-3.5-Sonnet on HaluBench by 2.2% - 3.4% higher accuracy than Lynx v1.1 on HaluBench - Optimized for long context use cases? - Detects 8 types of common hallucinations, including Coreference Errors, Calculation Errors, CoT hallucinations, and more! Use Lynx 2.0 with any of our Day 1 integration partners like NVIDIA, MongoDB, and Nomic AI ? And that’s our 10th day of Christmas at Patronus AI ???? 2 more to go! Try it out with the Patronus API: https://app.patronus.ai? Read the docs: https://lnkd.in/e-rMzMe8? Read the Lynx arXiv paper: https://lnkd.in/eznVjrWA? Read the Lynx blog: https://lnkd.in/eYaP5Zpe

    • 该图片无替代文字
  • 查看Patronus AI的组织主页

    5,340 位关注者

    On the 9th day of Christmas, we are announcing… 360 Degree Human Annotation! ?? We have always believed that AI evaluation should involve both AI and humans in the loop. Humans are the golden standard. While our state-of-the-art automated evaluators can catch a variety of failures, humans are great at uncovering new trends and problems while reviewing these failures. For example, many customers tell us that studying hallucinations flagged by Lynx often leads to the discovery of entirely new failure modes. ?? And that’s why we’re introducing 360 Degree Human Annotation, a new way to deepen AI-human collaboration during evaluation ? - Define Annotation Criteria directly in the UI - Rate your AI product’s outputs using feedback categories like Discrete and Continuous? - Add detailed explanations to your annotations for extra depth and insight Try it out here: https://app.patronus.ai? Read the docs: https://lnkd.in/er5rP6iB

  • 查看Patronus AI的组织主页

    5,340 位关注者

    On the 8th day of Christmas, we are announcing… Excessive Agency Test Suite! ?? As AI agents become more popular, we frequently hear that developers want to limit agent permissions and scope, in order to prevent end users who are attempting fraud and scams. Developers don’t want their agents accessing tools or taking autonomous actions that could put everything at risk. The OWASP Top 10 LLM Vulnerabilities list comprehensively captures all security failure modes, and Excessive Agency at #6 on the list is an important one to address during the agent build phase. That’s why we are releasing the Excessive Agency Test Suite. Our research team developed powerful generator models to create this comprehensive test suite, and identified that these new tests have high attack success rates against AI agents. ? You can search for “owasp-llm06-excessive-agency” in Patronus Datasets to view and download the dataset, or access it remotely in code using the Patronus SDK. ?? Try it out here: https://app.patronus.ai?

  • 查看Patronus AI的组织主页

    5,340 位关注者

    On the 7th day of Christmas, we are announcing... SOC-2 Type 1 Compliance! ?? Not much else to say here, except that your data will always be safe and secure with us ?? More on compliance coming soon next year!

    • 该图片无替代文字
  • 查看Patronus AI的组织主页

    5,340 位关注者

    On the 6th day of Christmas, we are announcing… Patronus Comparisons 2.0! ?? When customers start fixing issues with their LLM systems after using our evaluators, they usually want to know whether these fixes can actually work against unseen data. AI engineers are running evals all the time, so they would experiment over and over again with new prompts, system parameters, finetuned LLMs, or LLM providers. But hill climbing only works if… you know you’re actually going up the hill! ?? We built Patronus Comparisons for exactly this reason. Developers can visually compare performance on their agents and RAG systems across time periods, offline vs. online settings, LLMs, evaluators, and more. ? With 2.0, we rolled out a redesign along with new features like side-by-side sample comparisons, saved views with customizable filters, exportability, and more viz. ?? Try it out here: https://lnkd.in/gXfHvyww

相似主页

查看职位

融资