Artificial Analysis

Artificial Analysis

科技、信息和网络

Independent analysis of AI models and hosting providers: https://artificialanalysis.ai/

关于我们

The leading independent benchmark for LLMs - compare quality, speed and price to pick the best model for your use case.

网站
https://artificialanalysis.ai/
所属行业
科技、信息和网络
规模
11-50 人
类型
私人持股

Artificial Analysis员工

动态

  • 查看Artificial Analysis的公司主页,图片

    6,226 位关注者

    Thanks for the support Andrew Ng! Completely agree, faster token generation will become increasingly important as a greater proportion of output tokens are consumed by models, such as in multi-step agentic workflows, rather than being read by people.

    查看Andrew Ng的档案,图片
    Andrew Ng Andrew Ng是领英影响力人物

    Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of Landing AI

    Shoutout to the team that built https://lnkd.in/g3Y-Zj3W . Really neat site that benchmarks the speed of different LLM API providers to help developers pick which models to use. This nicely complements the LMSYS Chatbot Arena, Hugging Face open LLM leaderboards and Stanford's HELM that focus more on the quality of the outputs. I hope benchmarks like this encourage more providers to work on fast token generation, which is critical for agentic workflows!

    Model & API Providers Analysis | Artificial Analysis

    Model & API Providers Analysis | Artificial Analysis

    artificialanalysis.ai

  • 查看Artificial Analysis的公司主页,图片

    6,226 位关注者

    Groq is has launched their endpoint of OpenAI's new Whisper Large v3 Turbo Speech-to-Text model! ?? OpenAI released a new 'Turbo' version of Whisper Large v3 last week, which is nearly 50% smaller than the non-Turbo Large v3 model, reducing the parameter count from 1.55B to 0.8B. The new Turbo model's word error rate is marginally higher than the Large v3 non-Turbo model at 12% vs. 10% in our evaluation. However, it is a lot faster/less compute intensive given its smaller size making it an attractive option for transcription use-cases that are speed-dependent. OpenAI has stated they achieved the reduction in size with marginal quality impact by reducing the number of model layers down to 4, from 32, and by further post-training - fine-tuning for another 2 epochs on multilingual transcription data. Groq with their launch today is allowing everyone to access these speed gains. We are benchmarking a Speed Factor of ~216x real-time audio time, >6X faster than OpenAI's Whisper v2 endpoint and a ~15% gain over Groq's non-Turbo Large v3 endpoint. Well done Groq on the fast launch of an API endpoint of the model and allowing all to access it! Link to our Speech to Text analysis below ??

    • 该图片无替代文字
  • 查看Artificial Analysis的公司主页,图片

    6,226 位关注者

    Groq has set a world record in LLM inference API speed by serving Llama 3.2 1B at >3k tokens/s ?? Meta's Llama 3.2 3B and 1B models are well positioned for two categories of use-cases. Firstly, applications running on edge devices or on-device, where compute resources are limited. Secondly, use-cases which require very fast response times and/or very cheap token generation. Groq with their custom LPU chips are taking fast and cheap token generation to the extreme by serving the models at >3k tokens per second and pricing at $0.04/1M input/output tokens. To put this in context, this is ~25X faster than GPT-4o's API and ~110X cheaper. While intelligence of these models is not comparable to the much larger frontier models, not all use-cases require frontier intelligence. Consumer apps which require real-time interaction and cheap token generation, live monitoring and classification are both example use-cases which suit these smaller models. Link below for our analysis of how Llama 3.2 3B & 1B compare to other smaller models, and of the providers serving them ??

    • 该图片无替代文字
  • 查看Artificial Analysis的公司主页,图片

    6,226 位关注者

    The blueberry reveal: congratulations to Black Forest Labs on Flux1.1[pro], the new best Text to Image model in the world! ???????? Flux1.1[pro] has been tested under the pseudonym ‘blueberry’ in the Artificial Analysis Image Arena over past week and has topped the leaderboard. Available now on Black Forest Labs, Replicate, fal and Freepik. Check out the Artificial Analysis Image Arena to see what it can do!

  • 查看Artificial Analysis的公司主页,图片

    6,226 位关注者

    OpenAI's speech-to-speech release today are the latest reminder that perceived limitations of AI, and frankly transformers, are consistently being overcome at a rapid pace — not only within individual areas but also as different areas of AI interact OpenAI's speech-to-speech release can take into account "emotion, emphasis and accents" of input audio. This shows the power of how progress in different areas of AI, here text-to-speech, speech-to-text and text-to-text capabilities can result in 'sudden jumps' of advancement as they interact. It's likely no coincidence there was a new version of Whisper (speech-to-text) released yesterday too. Models of various modalities (e.g., text-to-image, text-to-video, image/video-to-text, text-to-music) and specialized areas (such as in medicine) continue to develop rapidly. As such, it's a relatively safe bet that we are going to see continued 'jumps' of advancement as these progress and converge. All to say it might be smart to be cautious when declaring limitations of AI or when thinking you can predict the future of AI precisely. The rapid pace of advancement, especially as different fields converge, makes it difficult to predict how the different areas will interact and what will be the outcomes & limitations.

  • 查看Artificial Analysis的公司主页,图片

    6,226 位关注者

    AI inference is getting faster with the new and much smaller Llama 3.2 models! SambaNova Systems and Groq are offering Llama 3.2 1B at >2k output tokens per second, and 3B at >1,400 tokens per second. Honorable mention to Cerebras Systems who is offering Llama 3.1 8B at >2k tokens per second. Nvidia based providers, and particularly Fireworks AI, are also offering the models at very fast speeds relative to other popular 'small' models. For use-cases which require very fast inference, including real-time applications or live monitoring, and/or very cheap inference and are without demanding intelligence needs Llama 3.2 3B and 1B are very compelling options. Further, these models are small enough to be run on consumer hardware including laptops and phones and as such suit self-hosted use-cases well. Link below for how the intelligence of Llama 3.2 3B & 1B compare to other smaller models, and our analysis of the providers serving them ??

    • 该图片无替代文字
  • 查看Artificial Analysis的公司主页,图片

    6,226 位关注者

    AI-focused chips may be what is required for o1-style reasoning in most production use-cases The high end-to-end latency and cost are what is preventing use of OpenAI’s o1-preview and o1-mini models in most production use-cases. The median request in our benchmarks takes ~15s for o1-mini and 34s for o1-preview, and o1-preview can be >30X more expensive than GPT-4o. The reason for this is essentially that many more output tokens are generated by these models compared to other models, we previously benchmarked ~2.6X and ~6X more output tokens generated with o1-mini and o1-preview respectively. The emerging AI focused chip companies, including Groq, SambaNova Systems and Cerebras Systems, offer output speeds multiple times faster than Nvidia based providers. They also generally have lower per-token pricing on their API offerings and generally claim lower TCO cost relative to system throughput for their chip offerings. These output speeds & lower costs may be what is required for o1’s approach of reasoning at inference time to be used in production use-cases, many of which are either (or both) latency or cost sensitive. Looking forward, there seems to be a relation between the length of reasoning and the quality of output, o1-preview reasons for a lot more tokens than o1-mini, and with faster output speeds this could potentially be extended for further intelligence gains. Note that given streaming is not supported on the o1 models, output speed is measured end-to-end rather than after the first token is received. Link to our analysis below ??

    • 该图片无替代文字
  • 查看Artificial Analysis的公司主页,图片

    6,226 位关注者

    Output Speed & Price comparison of Llama 3.2 90B providers: A clear trade-off curve exists Together AI and Fireworks are the fastest providers of Llama 3.2 90B, while DeepInfra and Hyperbolic are the cheapest. Across the non-hyperscaler inference providers, there is a clear speed vs. price trade-off curve. Providers with slower output speeds generally have lower per token pricing. Only providers which support vision input are shown. Note that these benchmarks are based on language capabilities. We will soon be sharing information on image input pricing and latency considering image input. Congratulations to Together AI, Fireworks AI, Deep Infra Inc., Hyperbolic and Amazon Web Services (AWS) on the fast-launch! Link to analysis below

    • 该图片无替代文字
  • 查看Artificial Analysis的公司主页,图片

    6,226 位关注者

    Important note regarding yesterday's Llama 3.2 90B and 11B (Vision) release: The text capabilities remain exactly the same as Meta's Llama 3.1 70B and 8B respectively As Meta noted in their announcement, they "intentionally did not update the language-model parameters". Vision capabilities were added using a set of adapter weights that added an encoder to the models (hence the larger parameter count). The adapter weights act as a bridge between the image encoder and the language model. These adapter weights, particularly within the cross-attention layers, align the visual representations from the image with the text embeddings by mapping them to the same latent space, allowing the language model to interpret images alongside text. To train the adapter weights, Meta trained on diverse sets of text and image pairs. These were refined in post-training using "supervised fine-tuning, rejection sampling, and direct preference optimization" techniques along with synthetic data generated by Llama 3.1 and ranked using a reward model. Ultimately, this is why we are not see differences in our language-focused eval results. We will be releasing vision focused eval results where we compare to other models with vision capabilities including GPT-4o, Claude 3.5 Sonnet and Gemini 1.5 Pro and Flash shortly.

    • 该图片无替代文字

相似主页