Read how Intercom's Fin, powered by Claude, helps 25,000+ companies deliver instant, high-quality support: https://lnkd.in/eYSKc__n
Anthropic
研究服务
Anthropic is an AI safety and research company working to build reliable, interpretable, and steerable AI systems.
关于我们
We're an AI research company that builds reliable, interpretable, and steerable AI systems. Our first product is Claude, an AI assistant for tasks at any scale. Our research interests span multiple areas including natural language, human feedback, scaling laws, reinforcement learning, code generation, and interpretability.
- 网站
-
https://www.anthropic.com/
Anthropic的外部链接
- 所属行业
- 研究服务
- 规模
- 501-1,000 人
- 类型
- 私人持股
Anthropic员工
动态
-
With styles, you can now customize how Claude responds. Select from the new preset options: Concise, Explanatory, or Formal. Whether you're a developer writing formal documentation, a marketer crafting clear brand guidelines, or a product team planning extensive project requirements, Claude can adapt to your preferred way of writing.
-
We’re expanding our collaboration with Amazon Web Services (AWS) to develop and deploy next-generation AI systems. This includes a new $4 billion investment from Amazon and establishes AWS as our primary cloud and training partner. This brings Amazon's total investment in Anthropic to $8 billion. Working closely with AWS, we're developing future generations of Trainium chips. Designing both hardware and software together lets us optimize every aspect of model training. Our engineers work with the AWS chip design team to maximize computational efficiency, writing low-level kernels to directly interface with Trainium silicon and contributing to the AWS Neuron software stack. Through Amazon Bedrock, Claude has become core infrastructure for tens of thousands of companies seeking reliable and practical AI at scale. Together, we're laying a new technological foundation—from silicon to software—to train and power our most advanced AI models. Read more: https://lnkd.in/gfbKewvz
Powering the next generation of AI development with AWS
anthropic.com
-
Our new research paper: Adding Error Bars to Evals. AI model evaluations don’t usually include statistics or uncertainty. We think they should. Read the blog post: https://lnkd.in/d2jKfpyT When a new AI model is released, the accompanying model card typically reports a matrix of evaluation scores on a variety of standard evaluations, such as MMLU, GPQA, or the LSAT. But it’s unusual for these scores to include any indication of the uncertainty, or randomness, surrounding them. This omission makes it difficult to compare the evaluation scores of two models in a rigorous way. “Randomness” in language model evaluations may take a couple of forms. Any stream of output tokens from a model may be nondeterministic, and so re-evaluating the same model on the same evaluation may produce slightly different results each time. This randomness is known as measurement error. But there’s another form of randomness that’s not visible by the time an evaluation is performed. This is the sampling error; of all possible questions one could ask about a topic, we decide to include some questions in the evaluation, but not others. In our research paper, we recommend techniques for reducing measurement error and properly quantifying sampling error in model evaluations. With a simple assumption in place—that evaluation questions were randomly drawn from some underlying distribution—we develop an analytic framework for model evaluations using statistical theory. Drawing on the science of experimental design, we make a series of recommendations for performing evaluations and reporting the results in a way that maximizes the amount of information conveyed. Our paper makes five core recommendations. These recommendations will likely not surprise readers with a background in statistics or experimentation, but they are not standard in the world of model evaluations. Specifically, our paper recommends: 1. Computing standard errors using the Central Limit Theorem 2. Using clustered standard errors when questions are drawn in related groups 3. Reducing variance by resampling answers and by analyzing next-token probabilities 4. Using paired analysis when two models are tested on the same questions 5. Conducting power analysis to determine whether an evaluation can answer a specific hypothesis. For mathematical details on the theory behind each recommendation, read the full research paper here: https://lnkd.in/dBrr9zFi.
A statistical approach to model evaluations
anthropic.com
-
We’ve added a new prompt improver to the Anthropic Console. Take an existing prompt and Claude will automatically refine it with prompt engineering techniques like chain-of-thought reasoning. The prompt improver also makes it easy to adapt prompts originally written for other AI models to work better with Claude. Read more: https://lnkd.in/dx-5sp5P.
-
Coinbase customers now get faster and more accurate support with Claude powering their chatbot, help center search, and customer service teams across 100+ countries:?https://lnkd.in/gWCvNy2u
Coinbase transforms their customer support with Claude
anthropic.com
-
Read how Asana uses Claude to help 150,000+ companies automate workflows and save countless hours on tasks. https://lnkd.in/dee_PFDr
How Asana transforms work management with Claude for 150,000 global customers
anthropic.com
-
Claude 3.5 Haiku is now available on our API, Amazon Bedrock, and Google Cloud's Vertex AI. Haiku is fast and particularly strong at coding. It outperforms state-of-the-art models—including GPT-4o—on SWE-bench Verified, which measures how models solve real software issues. During final testing, Haiku surpassed Claude 3 Opus, our previous flagship model, on many benchmarks—at a fraction of the cost. As a result, we've increased pricing for Claude 3.5 Haiku to reflect its increase in intelligence: anthropic.com/claude/haiku. Claude 3 Haiku remains available for use cases that benefit from image input or its lower price point: https://lnkd.in/e9yNTtNp.