We're #hiring a new Founding Software Engineer in Mountain View, California. Apply today or share this post with your network.
Bespoke Labs
数据基础架构与分析
Mountain View,California 1,020 位关注者
Bespoke Labs is a venture funded startup creating AI tools for data curation and post-training LLMs. (We are hiring!)
关于我们
Data curation and Small Specialized Models using Generative AI.
- 网站
-
https://bespokelabs.ai/
Bespoke Labs的外部链接
- 所属行业
- 数据基础架构与分析
- 规模
- 2-10 人
- 总部
- Mountain View,California
- 类型
- 私人持股
地点
-
主要
800 W El Camino Real
US,California,Mountain View,94040
Bespoke Labs员工
-
Mahesh (Maheswaran) Sathiamoorthy
Founder of Bespoke Labs. Ex-Google DeepMind
-
Alex Dimakis
Professor, University of Texas at Austin. Co-Director, Center for the Foundations of Machine Learning. BespokeLabsAI: data curation for…
-
Negin Raoof
Graduate Student Researcher @ UT Austin
-
Ryan Marten
AI Researcher
动态
-
Bespoke Labs is excited to contribute to Evalchemy, an open-source platform for LLM evaluation. The problem: Running popular Evals for an LLM like MMLU, MTBench, WildBench, RepoBench, IFEval, AlpacaEval requires you to download and install more than 10 repos, each with different dependencies and issues. This is, as you might expect, an actual nightmare. Many LM benchmarks are not optimized for performance and cost, and can take dozens of hours to compute. Evalchemy can run the full battery of benchmarks 3x faster compared to previous repos, due to parallelism optimizations in our implementation. It also allows easy installation and a consistent platform to run benchmarks and keep track in a leaderboard. We also support adding your own custom benchmarks and leaderboards. https://lnkd.in/gZQG9ZTY We hope that the open source community will help us develop this library into a convenient evaluation tool for AI engineers. Please tell us about your favorite benchmarks or features and we can add them!
-
Bespoke Labs转发了
Professor, University of Texas at Austin. Co-Director, Center for the Foundations of Machine Learning. BespokeLabsAI: data curation for post-training (we are hiring).
Very happy from the news that our paper? "Which questions should I answer? Salience Prediction of Inquisitive Questions" received an outstanding paper award in EMNLP 2024.? Congratulations to Yating, Ritika and the whole team.? ?#EMNLP2024 The paper is available online: https://lnkd.in/gNQBenZ6
-
Bespoke Labs转发了
Professor, University of Texas at Austin. Co-Director, Center for the Foundations of Machine Learning. BespokeLabsAI: data curation for post-training (we are hiring).
AI monoliths vs Unix Philosophy:? The case for Small Specialized Models. The current thinking in AI is that AGI is coming, and that one gigantic model will be able to reason and solve business problems ranging from customer support to product development. Currently, agents are basically big system prompts on the same gigantic model. Through prompt engineering, AI builders are trying to plan and execute complex multi-step processes. This is not working very well.? This monolith view of AI is in sharp contrast to how we teach engineers to build systems. When multiple people have to build complex systems, they should build specialized modular components. This makes systems reliable and helps large teams of people coordinate with specs that are easy to explain, engineer and evaluate. Monolithic gigantic AI systems are also extremely wasteful in terms of energy and cost: using GPT4o as a summarizer, fact checker, or user intent detector, reminds me of the first days of the big data wave, when people where spinning Hadoop clusters to process 1GB of data.? Instead, I would like to make the case for Small Specialized Models following the Unix philosophy guidelines:? 1. Write programs that do one thing and do it well. 2. Write programs to work together. 3. Write programs to handle text streams, because that is a universal interface. Now replace programs with AI models. I believe that the best way to engineer AI systems will be to use post-training to specialize Llama small models into narrow focused jobs. 'Programming' these small specialized models will be done by creating post-training datasets. These datasets?will be created by transforming internal data by prompting big foundation models and then distilling them through post-training. This is similar to the "Textbooks is all you need", but for narrow jobs like summarization, legal QA, and so on, as opposed to building general-purpose small models. Several papers have shown that it is possible to create post-training datasets by prompting big models and creating small specialized models that are faster and also outperform their big teachers in narrow tasks. Creating small specialized models is currently hard.?Evaluation, post-training data curation and fine-tuning are tricky, and better tools are needed. Still, its good to go back to UNIX philosophy to inform our future architectures.?
-
Bespoke Labs is excited to support the Datacomp community and related open-source AI efforts with curation tools, datasets and compute.
Professor, University of Texas at Austin. Co-Director, Center for the Foundations of Machine Learning. BespokeLabsAI: data curation for post-training (we are hiring).
Wow, I just realized that our Datacomp datasets have 800k downloads last month on Huggingface! Excited to see this project come so far. (if you don't know it already, Datacomp is the largest public multimodal dataset of images and captions).
-
Quite the speaker list for the Metadata &AI Summit.
Learn about the hottest trends, biggest challenges, and best solutions around metadata and AI from a STAR-STUDDED speaker lineup starting tomorrow at the 2024 Metadata & AI Summit! Register here: https://lnkd.in/enwxCq-p Apple - Deepak Chandramouli, Ravi Sharma, Satish Kotha Netflix - Alicia J., Ashwin Iyer, Kevin C. Meta - Raghotham Murthy Slack - Nedra Albrecht Pinterest - Deepak Agarwal LinkedIn - Raghavan Muthuregunathan Microsoft - Sadid Hasan RunLLM - Joseph Gonzalez Deutsche Telekom Digital Labs - Shashidhar Singhal Checkout - Matthew Coudert Accenture - Teresa Tung DeepLearning - Joe Reis ?? Kraft Heinz - Jeffrey Tackes Generationship - Michelle Yi UC Berkeley - Joe Hellerstein Bespoke Labs - Alex Dimakis Grab - Harvey LI Merck - Dr. Harsha Gurulingappa Star Tree - Chinmay Soman Acryl Data & DataHub - Shirshanka Das, Maggie Hays We hope to see you there!
-
Stellar panel: Our Chief Scientist Alex Dimakis will be presenting on the struggles of moving AI from research to production at a panel at #MetadataAISummit2024 with Joe Hellerstein Teresa Tung Deepak Agarwal hosted by Shirshanka Das
?? Why do enterprise AI initiatives often struggle to move from research to production? And more importantly - how can we bridge this gap effectively? I'm excited to moderate a stellar panel at #MetadataAISummit2024 featuring experts who're at the cutting edge and have successfully taken AI from research to production - repeatedly: ?? Teresa Tung (Senior Managing Director, Accenture - Leading AI transformation initiatives)? ?? Joe Hellerstein (Jim Gray Professor of CS, UC Berkeley - Pioneer in distributed systems & databases) ?? Deepak Agarwal (Chief AI Officer & VP, Pinterest - At the forefront of Internet-scale AI for more than a decade) ??? Alex Dimakis (Co-founder & Chief Scientist, BespokeLabsAI - Leading researcher in ML systems) We'll mix theory, practical solutions and opine about the future. Stuff like:? ?? How to stop AI models from going off the rails ?? Proven governance frameworks? ?? Why metadata matters and how to collect it cheaply? ?? Balancing innovation with safety and reliability considerations ?? Battle-tested scaling strategies Whether you’re an optimist and can’t wait to have AI take over our daily lives, or a pessimist and worried about how we can safely use AI in production, you’re invited to listen in! ????????: ?? Oct 29, 2024 | Tuesday ?? 1:40 - 2:40pm EDT ???????????????? ?? https://lnkd.in/g_JJ6Pmx #AIGovernance #MLOps #ResponsibleAI #EnterpriseAI #Metadata
-
This and the Nobel Prize for Hinton have made our day :)
We benchmarked the OpenAI DevDay Eval product and Bespoke Labs's Minicheck for hallucination detection. Minicheck is the current best hallucination detector on Guardrails AI Hub. OpenAI: - Accuracy: 69.19% - F1: 0.7564 - High recall, lower precision Minicheck: - Accuracy: 74.96% - F1: 0.7516 - Better at detecting hallucinations Overall, OpenAI classifies more hallucinations as factual, but has high recall around detecting factual statements. However the current most precise model is Minicheck.
-
Our small verifier model is now integrated with Ollama. Now we can all check our hallucinations locally
Bespoke Labs released Bespoke-Minicheck, a 7B fact-checking model is now available in Ollama! It answers with Yes / No and you can use it to fact check claims on your own documents. How to use the model with examples: https://lnkd.in/gD9_9mCw
-
Bespoke-minicheck-7B is better than GPT-o1 on grounded fact checking (and small enough to run on a macbook).
Professor, University of Texas at Austin. Co-Director, Center for the Foundations of Machine Learning. BespokeLabsAI: data curation for post-training (we are hiring).
Is GPT-o1 crushing all the benchmarks? It's better than GPT-4o on grounded fact checking (78.5 raised to 79.7 on WiCE). But more expensive and slow. Gladly, our 7B model Bespoke-minicheck is even better and gets 83 on this benchmark. https://lnkd.in/ggYNxUxR