Has GenAI Peaked? Three Key Areas of Progress to Watch

Has GenAI Peaked? Three Key Areas of Progress to Watch

Generative AI (GenAI) has undergone significant advancements in recent years, prompting discussions about whether it has reached its zenith. To address this question, it is essential to examine three critical areas of progress over the past twelve months: reducing inference costs, increasing inference speed, and enhancing the performance of small language models.

Reducing Inference Costs

The cost of inference, which refers to the expenses associated with generating outputs using AI models, remains a primary concern for organisations integrating GenAI technologies. Recent developments have focused on mitigating these costs through both hardware and software innovations. New, more efficient chips have emerged, offering faster processing capabilities that significantly lower the computational expenses involved in inference tasks. Furthermore, the trend towards local machine learning—where AI models operate on local devices rather than relying solely on cloud servers—has been identified as a cost-effective strategy. This approach reduces dependency on costly cloud-based resources, thereby making AI technologies more accessible to a wider range of organisations.

For example, OpenAI's GPT-4o model is priced at $5 per million input tokens and $15 per million output tokens. In contrast, GPT-3.5 Turbo was priced at approximately $2 per million tokens. This comparison illustrates a significant reduction in costs for the newer models, enabling broader adoption across various sectors. Furthermore, GPT-4o mini is priced at $0.15 per million input tokens and $0.60 per million output tokens, offering an even more cost-effective solution for enterprises.

Increasing Inference Speed

Enhancements in inference speed, or the time required for a model to generate a response, have been substantial. This improvement is crucial for applications that demand real-time processing, such as customer service chatbots and interactive AI systems. Advances in hardware, including the development of faster processors and more efficient architectures, have significantly contributed to these speed enhancements.

For instance, OpenAI's GPT-4o has shown improvements in processing speed, generating text at approximately 94 tokens per second, compared to the 35.68 tokens per second for GPT-4 Turbo. This represents a significant increase in speed, allowing for more efficient handling of complex tasks. Similarly, Meta's Llama 3.1 70B model achieves an output rate of approximately 61.8 tokens per second, demonstrating a marked improvement over previous iterations and making these models suitable for high-demand environments.

Performance of Small Language Models

Small language models (SLMs) have gained prominence due to their efficiency and cost-effectiveness. Unlike large language models, which necessitate substantial computational resources, SLMs are engineered to perform effectively on specific tasks with fewer parameters. Recent research has highlighted that small models can enhance the outputs of larger models by refining their predictions, thereby improving performance without extensive fine-tuning. The market for small language models is anticipated to expand considerably, driven by their ability to deliver high accuracy and speed while remaining accessible to a broader spectrum of organisations.

For example, Meta's Llama 3.1 70B model scores 86.0% on the MMLU benchmark, demonstrating superior performance in language understanding tasks compared to its predecessor, which scored 80.9%. Similarly, OpenAI's GPT-4o mini achieves 82% on the MMLU benchmark and excels in multimodal reasoning tasks, scoring 59.4% on the MMMU evaluation. These benchmarks highlight the capability of small models to perform at levels comparable to larger, more resource-intensive models. Additionally, Microsoft's Phi-3.5-mini model achieves impressive results across most benchmarks, with a score of 74.5 on HellaSwag and 52.8 on ANLI, often outperforming larger models like Mistral-7b and Llama-3-instruct-8b, and rivals the performance of GPT-3.5 on tasks such as HellaSwag and ANLI.

Statistics and Trends

  • Algorithmic Progress: The level of compute needed to achieve a given level of performance in language models has halved roughly every 8 months, with a 95% confidence interval of 5 to 14 months.
  • Small Language Model Market: The global Small Language Model market is projected to reach US$ 17,180 million in 2029, increasing from US$ 5,180 million in 2022, with a CAGR of 17.8% during the period of 2023 to 2029.
  • Efficient Benchmarking: Efficient benchmarking techniques can reduce computation costs by up to 100 times or more without compromising reliability.
  • Multimodal Small Language Models: Models like GPT-4o Mini have demonstrated superior performance across a majority of benchmarks when compared to larger models, without the need for additional training data.

These statistics underscore the rapid progress in GenAI, particularly in reducing inference costs, increasing inference speed, and enhancing the performance of small language models. The advancements in these areas suggest that GenAI is continuing to evolve, offering new possibilities and applications across various industries.


If you found this article informative and valuable, consider sharing it with your network to help others discover the power of AI.


要查看或添加评论,请登录

Robyn Le Sueur的更多文章

社区洞察

其他会员也浏览了