Quotient AI的封面图片
Quotient AI

Quotient AI

软件开发

Boston,Massachusetts 2,316 位关注者

Build better AI products fast.

关于我们

Build better AI products fast.

网站
www.quotientai.co
所属行业
软件开发
规模
2-10 人
总部
Boston,Massachusetts
类型
私人持股

地点

Quotient AI员工

动态

  • Quotient AI转发了

    查看Julia Neagu的档案

    CEO & Co-Founder | Quotient AI

    “When we first started hooking Copilot Chat in, we realized we’d get everything under the sun—people asking for random stuff that had nothing to do with code. We had billions of requests, so we had to cluster the logs just to figure out what was actually happening. That’s how we discovered real usage patterns—and that’s how we got serious about building our eval harness [...] At the end of the day, without evaluations, you’re flying completely blind. If you can’t measure it, you can’t improve it.” Check out Freddie Vargus and Reid Mayo from OpenPipe dropping some knowledge in the latest podcast ??

    查看Reid Mayo的档案

    Founding AI Engineer @ OpenPipe (YC23) | The End-to-End LLM Fine-tuning Platform for Developers

    Engineers who know me know I’ve been on a Evals kick for a few months now – interviewing top Founders, Staff AI Engineers, and thought leaders in the space. I’ve traveled all over the US going to AI conferences back to back to back to back to back and I keep doubling down on the Evals topic for one practical reason. It continues to pop up in conversation after conversation as the single most challenging problem in the Applied AI engineering space. This was my experience in late 2023 – and it’s still true in 2025. For this reason I’m incredibly excited to announce my interview with one of the leading minds in Evals – Freddie Vargus. Freddie Vargus (and his Co-founder Julia Neagu) led the team that built the Evals for the first significant LLM-backed product post ChatGPT. You know, that one whose name defined all "human in the loop" AI products ever since? Github Copilot So they’ve been deeply serious about this topic for years. Post Github they decided to go all-in by founding evals company Quotient AI. Their mission since has been to make SOTA Evals techniques accessible for builders (who want to get sh*t done, but in a way that doesn’t compromise the future of their tech). Key Insights from our convo: - CIAI (Continuous Improvement of AI): Audit usage logs to surface gaps in your Knowledge Base or other canonical sources of Ground Truth. Patching gaps to systematically improve Agent/Copilot quality. (Jared Scheel knows all about this) - Monitor Outcome Distributions: Map real-world output distributions to expected output distributions to surface potential issues (Agent has four tools but calls one 99% of the time? Look into that) - Evals ARE your product: Measuring and monitoring quality is more than just “tech debt reduction.” Unless you are OpenAI, Anthropic, Deepseek or some other SOTA lab building foundational AI, the fundamental value of your GenAI product is its ability to STEER foundational AI and ALIGN IT to your end-customer’s needs. Evals are critical to both. - Two-Week Evals Sprint: Bootstrapping evals for a project can feel daunting. Take (balanced) action by predefining the evals objectives/tasks you will execute on, and set hard deadlines to avoid quagmires. - Evaluate Subcomponents: Don’t just evaluate final outputs, isolate and test retrieval pipelines, tool calls – everything upstream from final output has potential side-effects on output. Freddie is a hardcore technical founder and he’s unusually hardcore on this topic – don’t miss it. https://lnkd.in/gHUx3wbD

  • Quotient AI转发了

    查看Julia Neagu的档案

    CEO & Co-Founder | Quotient AI

    I'm excited to launch Evaluations in Quotient's Python SDK ?? Whether you're testing different models, iterating on prompts, or validating outputs against ground truth data, our platform makes evaluating AI applications so easy it's a no-brainer to integrate it in your routine developer workflows. Why did we build this? Talking with engineers from small AI-native startups to large Fortune 500s, we kept hearing the same challenge: the barrier to setting up a reliable evaluation workflow is so high that most prefer to circumvent that process entirely and test manually or in production. This leads to unpredictable performance, undetected regressions, and ultimately, poor user experiences. We believe developers should focus on building amazing products, not babysitting infrastructure. We've been committed from day one to making comprehensive evaluation as simple as possible - just a few lines of code and a few minutes. Behind the scenes, our distributed infrastructure handles all the heavy lifting asynchronously, so you can get back to what matters. Best of all? This is all included in our comprehensive free tier. We provide ?? 10,000 evaluation rows / month, with inference costs included to get you started. ?? Want to see Quotient Evaluations in action? We've published a practical cookbook showing how we used our SDK to evaluate OpenAI's o1 and DeepSeek R1 models on tax-related questions. Find the release announcement and cookbook in comments ?? Curious to see how Quotient's can transform your AI development workflow? Let's chat! Email me directly at julia @ quotientai.co.

    • 该图片无替代文字
    • 该图片无替代文字
    • 该图片无替代文字
  • Quotient AI转发了

    查看Julia Neagu的档案

    CEO & Co-Founder | Quotient AI

    Just spotted something that made my day: Miguel from On Project Labs in Spain ???? made a kick-ass video about Quotient AI! Miguel described how Quotient helps him solve real challenges in his AI development process: finding the right prompts, comparing different models, and making smart decisions about which providers to use. He even demonstrated how he used it to generate LinkedIn posts! It's so exciting to see Quotient being discovered and shared organically by developers across the globe. Building tools that actually solve problems for people is what it's all about!

    查看Miguel García Tenorio的档案

    AI engineer

    ??Herramientas que te salvarán la vida: Parte 1 - Quotient A la hora de desarrollar una aplicación que incluye un LLM o una llamada a OpenAI, siempre me he enfrentado con estos problemas: ? Este es el mejor prompt que podría usar para conseguir lo que necesito? ? Que modelo debería usar? ? Que provider es mejor? OpenAI, Anthropic ? ? gpt-o vs gpt-4?vs gpt-XXXX? Hasta ahora nunca he tenido realmente la respuesta,?y decidir cual es el prompt más optimo o cual es el mejor modelo en cuanto a calidad y precio que podemos utilizar es un proceso de prueba y error infinito. ? Pero, hace unas semanas investigando posibles soluciones encontré esta increíble herramienta. Se llama Quotient AI y te permite evaluar todo esto de una manera muy sencilla. Entre otras cosas ofrece: ? La posibilidad de testear un mismo prompt con distintos modelos con un solo click para ver como se comportan en cada caso ? Te permite dar feedback sobre la respuesta y en base a ese feedback analiza el prompt y propone una mejora de tu prompt ? Analiza el tiempo y el coste de cada modelo para tu prompt ? Permite almacenar prompts y tener un repositorio de prompts que puede compartido por tu equipo e integrado en tu aplicación con su SDK En definitiva, esta herramienta se está convirtiendo en una herramienta indispensable para mi y me permite de manera muy rápida testear mis prompts y decidir que modelo y que prompt utilizar en cada caso. He probado esto con un prompt para generar Posts en Linkedin para que veáis lo sencilla que es esta herramienta y el gran valor que puede aportar ?? Conoces a alguna herramienta parecida? #promptengineering #OpenAi #Anthropic #Evals #InteligenciaAritficial #AI #IA #ChatGPT #GPT #AIengineer #Automation

    • 该图片无替代文字
    • 该图片无替代文字
    • 该图片无替代文字
    • 该图片无替代文字
    • 该图片无替代文字
  • Quotient AI转发了

    查看Julia Neagu的档案

    CEO & Co-Founder | Quotient AI

    Curious about how AI search engines like Perplexity, Exa, and Google's Gemini measure up? Check out our new Hugging Face cookbook that walks you through systematically evaluating and comparing their outputs using our open source judges ?? library. This comprehensive tutorial provides step-by-step guidance, real-world examples, and practical tips on using LLM-as-a-judge. Link in comments.

    • 该图片无替代文字
  • Quotient AI转发了

    查看Databricks的组织主页

    884,353 位关注者

    Keep your dataset a secret – even from yourself. That’s one of Julia Neagu CEO & Co-founder of Quotient AI, “five rules of evaluations” for bringing AI to production. Explore more tips for deploying large-scale AI – including the best tools and infrastructure, methods for reducing bias, and human-in-the-loop systems – in the latest episode of Data Brew. Hosted by Brooke Wenig and Denny Lee. https://lnkd.in/gV6KijTt

  • Quotient AI转发了

    Julia Neagu puts the notion of 'vibe development' to rest in the latest episode of Data Brew. Discover the rules of AI evaluations for bringing products to production in this captivating conversation. Watch the full episode on your favorite podcast platform ?? ?? Apple: https://hubs.ly/Q02Tt0qb0 ?? Spotify: https://hubs.ly/Q02TsYxX0 ?? Youtube: https://lnkd.in/emSpKpuH Data Brew by Databricks is hosted by Brooke Wenig and Denny Lee

    • 该图片无替代文字
  • 查看Quotient AI的组织主页

    2,316 位关注者

    Our CEO Julia Neagu chatted with Brooke Wenig and Denny Lee from Databricks about shipping the best AI products ??

  • Quotient AI转发了

    查看Julia Neagu的档案

    CEO & Co-Founder | Quotient AI

    We've open-sourced two tools to make LLM evaluation straightforward: - judges ??: A collection of research-backed prompts that use LLMs to evaluate other LLMs. Start evaluating your models immediately with battle-tested approaches. - autojudge: Build your own evaluators that match your team's standards. Feed in your labeled data and feedback, get back evaluator prompts aligned with how your team thinks about quality. There's no need to reinvent the wheel – start with proven LLM-as-a-judge prompts, grow into custom ones tailored to your use-case.

    • 该图片无替代文字
  • 查看Quotient AI的组织主页

    2,316 位关注者

    ???? we've released `judges` - a OSS library of SOTA, research-backed evaluators for common use-cases like hallucination, harmfulness, and empathy.

    查看Freddie Vargus的档案

    co-founder & cto — Quotient AI

    Today we're releasing judges ?? our new open-source library of LLM-as-a-judge evaluators. judges contains a curated set of evaluators, backed by published research, to bootstrap your LLM projects. Use them out-of-the-box or as a foundation for your own custom evaluators??? Why judges? We've spoken to a lot of folks who want to use LLM-as-a-judge but don't know where to start. judges is here to help! These evaluators are not silver bullets, but they're good starting points and sources of inspiration for building your own. Give it a try and let us know what you think!?https://lnkd.in/e2GWN_CR?-- we also invite people to request improvements, suggest papers, and report bugs on our GitHub. What's next -- we want to integrate with more models and are keeping track of single and multi-step research for evaluators.We also want to hear from people on what use cases for LLM as a judge they have, and are looking to integrate our work from SMELL to make it easier for people to bootstrap custom LLM judges (blog:?https://lnkd.in/e6wWBqks)

  • Quotient AI转发了

    查看Qdrant的组织主页

    34,689 位关注者

    ??? RAG Systems: Build It, Then Break It (For Science) Building a RAG system is just the beginning. The real challenge? Making sure it performs consistently under pressure. Missed retrievals? Hallucinated answers? Poorly optimized pipelines? These are the bottlenecks that turn promising systems into frustrating ones. In our latest blog, we break down what it takes to go from “it works” to “it works well.” ? Spot retrieval blind spots using precision metrics and relevance testing. ? Fine-tune embedding strategies to ensure accurate context is passed to your LLM. ? Measure and reduce hallucination rates, so your LLM generates grounded, factual responses. We’re talking real frameworks (Ragas, Quotient AI, Arize AI Phoenix), specific fixes for underperforming pipelines, and concrete metrics like NDCG and Recall to measure success. ?? Learn how to evaluate your RAG system: https://lnkd.in/ez2DEEhj

    • 该图片无替代文字

相似主页

查看职位