Towards AGI: Understanding the Tiers of Artificial Intelligence
Edward Lewis?
Customer Success Leader | AI | Transformation | Growth | Board Member | 2x Exits
Navigating the Path to Artificial General Intelligence
Artificial General Intelligence (AGI) has emerged as the ultimate objective for numerous leading AI laboratories worldwide. If you'd like to know which ones, look up the top tech companies in the world. It's them.
This AGI term, though somewhat nebulous, spans a wide array of ambitions—from the creation of a "superhuman machine god" to a more modest goal of a machine capable of outperforming humans at any given task. Given the primary focus of these companies on training AI systems to achieve AGI, it's crucial to ponder when—or if—we might witness the advent of an AI that eclipses human capabilities across the board.
A 2023 survey of computer scientists reflected a broad consensus that AGI is within the realm of possibility. However, the approaches and timelines proposed by experts are deeply divided. The average expected date for achieving AGI was pegged at 2047, with a 10% chance of reaching this milestone as early as 2027. This author envisions some probability of 2030 being the date. Despite these predictions, it's evident that today's AI systems do not yet perform better than humans in most tasks. Nevertheless, AI has already demonstrated superhuman performance in some remarkably complex and nuanced areas.
Superhuman Performance in AI: Real-World Examples
AI systems have shown remarkable prowess in tasks that require a high degree of empathy and judgment—traits traditionally considered exclusive to humans. For instance:
These instances underscore the potential of AI in surpassing human performance in certain tasks. However, the capabilities of AI remain uneven. For example, while an AI might generate excellent startup ideas, it may struggle with coding complex products without human assistance. Similarly, GPT-4 can offer sound medical diagnoses but might falter with simple arithmetic when prescribing medication.
The Jagged Frontier of AI
The concept of the Jagged Frontier of AI captures this uneven terrain. Current AI excels in specific areas but requires human partnership to navigate its limitations. This co-intelligence model, where humans and AI collaborate, highlights the current state of AI development.
Yet, the ultimate goal for major AI companies remains pushing this Frontier further, aiming for AI systems that outperform humans in all tasks. This aspiration brings forth essential questions: How do we determine the areas where AI excels? And how can we measure the pace of AI's advancements?
Testing AI: A Measure of Superhumanity
One approach to evaluating AI's capabilities is to administer tests designed for humans. OpenAI , for instance, showcased the advancements of GPT-4 over its predecessor, GPT-3.5, by illustrating how it fared against human test-takers. Despite impressive results, this method is not without its flaws. Issues like "overfitting"—where AI systems memorize answers from their training data—can skew results. Moreover, standardized tests may not fully capture an AI's capabilities. For example, GPT-4's 90th percentile performance on the Bar Exam was later scrutinized, revealing it might only fall within the 69th percentile when appropriately prompted.
To gauge AI's progression towards AGI, it's vital to consider continuous benchmarks. Many of these benchmarks, however, are idiosyncratic and often focus on coding skills or general knowledge. One notable benchmark is the MMLU (Massive Multitask Language Understanding), which tests AI on a variety of topics. While MMLU scores provide insight into AI's progress, they also highlight the limitations of these assessments. The results may be influenced by pre-exposure to test questions, and the uncalibrated nature of these tests makes it challenging to measure incremental improvements accurately.
Benchmarks and the Road Ahead
Despite their imperfections, benchmarks like the MMLU offer valuable insights. They illustrate the rapid advancements in AI, showing how models such as GPT-4, Gemini, and Claude have approached and, in some cases, surpassed human performance. The Arena Leaderboard, which uses user preferences to compare AI models, further corroborates these trends.
领英推荐
As AI models grow larger and more sophisticated, their performance continues to improve. This progress suggests that we are moving closer to AGI, even if the path remains fraught with challenges. The ongoing development of open-weight models by entities like Meta and the proprietary advancements by companies like Google and OpenAI, indicate a thriving and competitive landscape in AI research.
Understanding AGI: A Tiered Approach
Given the complexities of measuring AI capabilities, it might be helpful to conceptualize AGI development in tiers:
Currently, we see AI excelling in Tier 3 and Tier 4 scenarios. For instance, in specific domains like medical diagnostics and legal analysis, AI can offer insights that surpass those of human experts. However, the unique strengths and limitations of AI mean that a co-intelligence model remains essential.
As AI continues to evolve, it is likely to disrupt various industries significantly. The rise of Artificial Focused Intelligence and co-intelligence systems will drive productivity and efficiency gains, necessitating a reevaluation of human roles in decision-making processes. While the journey to true AGI remains uncertain, the broader cognitive revolution is undeniably underway, promising transformative impacts across multiple sectors.
A Closer Look at Benchmarks and Testing
The limitations of current benchmarks highlight the complexity of truly understanding AI’s capabilities. The MMLU, for instance, although a common measure, is often critiqued for its idiosyncratic questions and the potential for overfitting. However, these benchmarks are still crucial as they provide a comparative measure of AI's progress over time.
The Arena Leaderboard offers an alternative evaluation method by comparing AI models based on user preferences in over a million conversations. This system, which uses the ELO rating (originally developed for ranking chess players), helps measure the "vibes" or subjective quality of AI responses. Interestingly, the trends observed in the Arena Leaderboard closely mirror those in MMLU, indicating a consistent improvement across various evaluation metrics.
The Role of Open-Weight Models
The development of open-weight models is particularly noteworthy. Meta's Llama series, for example, represents a significant step forward. The largest version of Llama 3, expected to achieve GPT-4 class performance, underscores the potential of open-weight models to democratize AI research and application. These models, freely available for anyone to download and use, promote transparency and collaboration in AI development, contrasting with the closed-source models of companies like Google, Anthropic , and OpenAI.
Scaling Laws and the Future of AI
The rapid improvements in AI capabilities can be largely attributed to the scaling laws of AI, which suggest that larger models (requiring more data and training time) tend to perform better. This principle has held true for several years and continues to drive advancements. However, the future trajectory of AI improvements remains uncertain. While some experts predict that the scaling laws will continue to yield performance gains, others caution about potential bottlenecks related to computational expense and data availability .
Based on insights from industry insiders, it appears we may still have several years of rapid ability increases ahead. The continuous enhancement of AI models, alongside innovative approaches to overcome existing limitations, suggests a dynamic and evolving landscape. The end of rapid increases could be imminent, or it might be years away—only time and ongoing research will tell.
Senior Managing Director
6 个月Edward Lewis Very insightful. Thank you for sharing