Braintrust

Braintrust

软件开发

Braintrust is the end-to-end platform for building AI applications

关于我们

Braintrust is the enterprise-grade stack for building AI products. From evaluations, to prompt playground, to data management, we take uncertainty and tedium out of incorporating AI into your business.

网站
https://braintrustdata.com/
所属行业
软件开发
规模
11-50 人
类型
私人持股
创立
2023

Braintrust员工

动态

  • 查看Braintrust的公司主页,图片

    2,816 位关注者

    We’re thrilled to announce that we've raised a $36M Series A led by Martin Casado at Andreessen Horowitz to advance the future of AI software engineering, bringing our total funding to $45 million. Through our work with top AI engineering and product teams from Notion, Stripe, Vercel, Airtable, Instacart, Zapier, Coda, The Browser Company, and many others, we’ve had a front-row seat to what it takes to build world-class AI products. Along the way, we’ve learned a few key lessons: - Crafting effective prompts requires active iteration. - Evaluations are crucial for systematically improving quality over time. - Production logs provide a vital feedback loop, generating new data points that drive better evaluations. Evals are just the first step to building AI apps. That’s why we’re also excited to introduce functions, the flexible primitive for creating prompts, tools, and scorers that sync between your codebase and the Braintrust UI.

  • Braintrust转发了

    查看Eddie Siegel的档案,图片

    CTO/Co-Founder @ Fractional AI - we're hiring!

    Every genAI project involves building evals to define and measure correctness, but it's rarely straightforward. Take our recent AI adtech project: we’re taking heaps of unstructured, inconsistent data and organizing it into a hierarchical taxonomy with LLMs (imagine a usable output like “Food, Beverages & Tobacco > Food Items > Meat, Seafood & Eggs > Meat”). At first glance, you might think, “Easy, just define correctness.” But the reality is far more complex — there are hundreds of thousands of categories, getting the top-level category right is much more critical than perfecting the 4th level of the taxonomy, there are many dimensions of correctness etc. So what did we do? Here’s how Joshua Marker and Dan Girellini built custom evals using Braintrust (code included). https://lnkd.in/gxT-4H-F

    Boosting Reliability with Evaluators

    Boosting Reliability with Evaluators

    engineering.fractional.ai

  • Braintrust转发了

    查看Madrona的公司主页,图片

    26,101 位关注者

    We are excited to unveil the 2024 Intelligent Applications 40, highlighting this year’s most compelling private companies leveraging artificial intelligence and machine learning. We received over 380 nominations from more than 70 venture investors across 54 top-tier venture and corporate investment firms and continued our partnership with Pitchbook to incorporate its data-driven scoring into the voting process to further enhance the rigor and research behind our decision-making. At Madrona, we have spent over a decade partnering with and investing in intelligent application companies, observing their influence and impact across various verticals. These applications are poised to shape the future of software and the next wave of computing. We believe they deserve to be recognized and celebrated for their achievements! Read more insights and thoughts about this year's winners and the current AI landscape: https://lnkd.in/g2CjX_rj See the full list and methodology here: https://lnkd.in/gqy9z34A We're celebrating the #IA40 winners on Oct. 2 at the 2024 #IASummit in partnership with Microsoft, Amazon Web Services (AWS), Delta Air Lines, NYSE, Morgan Stanley, & McKinsey & Company. Request an invite here https://lnkd.in/gQ4bPc7S

    • 该图片无替代文字
  • Braintrust转发了

    查看Sanghamitra Deb的档案,图片

    AI & ML Leadership

    Last week we had a great experience having an AI dinner hosted by Braintrust and Greylock. Having a round table with fellow AI practitioners was insightful.?One of the topics of discussion was “fine-tuning” vs using RAG combined with foundational models available from external vendors. It is indeed interesting to see a lot of the industry move from fine-tuning to using external API’s as Claude, Gemini GPTs and LLAMA models have new and improved versions available. There is definitely a debate : Hemant Jain brought up — Is fine-tuning done for most use cases? Or will we see a reversal of #trends? Will Companies that have been moving away from finetuning towards using RAG’s with external models will go back to fine-tuning? In domains such as finance or healthcare where most of the data is not public having fine-tuned models might make sense. In domains where numerical accuracy is not the goal, such as summarization , small heavily fine-tuned models might lead to great cost savings. Of course all fine-tuning efforts require a constant source of high quality data which can be a challenge for a lot of companies. For a lot of general use cases in QNA, conversational search it could be possible to get the desired quality of content with few shot prompting , RAG or an agentic framework without fine-tuning. This saves time and gives the opportunity to experiment with multiple solutions to improve your product. It was great to hear Himanshu Gahlot & Debarag Banerjee talk about their experiences working with agents. Now there is no free lunch. Your ability to create a multitude of solutions for content generation also implies that you need to figure out how to how to evaluate each of solutions that you generate. You need a rubric that can measure high quality required for your product and then you need a supply of SME’s and automated evaluation techniques to identify which solutions should be surfaced to customers. Right now the practice of GenAI product building is more than a year old and we are at a point where #accountability in #ai generated content has become important more than ever before. Currently a lot of effort is going into creating a framework for evolving quality rubrics and automated evaluations. Braintrust spoke about their evaluation platform and encouraged a discussion on “fun” (or painful) evaluation issues faced by the room. My personal favorite is formatting issues, their never ending solutions such as function calling and none of them being full proof. So your content is not rendered correctly for at least a few percentage of the times. What new ideas do you have on improving/correcting formatting generated by your favorite pesky foundational model. #genai #roundtables #aidinner #rag #finetuning #evaluations

    • 该图片无替代文字

相似主页

融资