Can We Really Hand-Engineer Level 2+ AGI?
When I was working on recommendation engines for Yahoo News personalization in 2014, the state-of-the-art (SOTA) in natural language processing (NLP) involved learning task-specific representations and designing task-specific architectures, which were then integrated through manually engineered, extensive pipelines of software components. Sometimes, it took 40-50 steps to generate even a summary. However, the immense effort required to manually connect components for solving complex problems like summarization or question and answer (Q&A) didn't seem sensible to me. I viewed it as an attempt to replicate human capabilities through manually curated code, often only involving hundreds or thousands of lines of software logic.
In five years, by 2019, after Google researchers’ now-famous transformer architecture published in 2017 [1], SOTA NLP shifted towards task-agnostic pre-training and architectures [2]. This advancement enabled capabilities such as summarization and Q&A that engineering teams had failed to achieve over the previous five years by merely adding more components to enterprise architectures. We learned that NLP wasn't an engineering problem but a research one. The model's complexity was such that no one could simply write the code; it had to be learned by the computer. This gave rise to large language models (LLMs) involving tens and hundreds of GBs of machine learning code, the inner workings of which, even companies like OpenAI, do not fully understand.
Five years later, today, there's a return to engineering extensive pipelines around a single algorithm, termed "scaffolds," or as we called it, software architecture. They vary in form, such as Meta’s RAG [3], Microsoft’s WizardLM [4], UIUC’s LATS [5], Google’s ReAct [6], Microsoft’s AutoGen [7], DeepMind’s FunSearch [8], MetaGPT [9], and many others, including fully engineered solutions for now more complex task reasoning and execution problems. Opening one of these Git repositories reveals only hundreds or thousands of lines of software logic that you can understand in a few hours. The applications range from marketing specialists to sales cold calls.
In January 2024, Google DeepMind researchers introduced a framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) models [10]. To date, this is the most current and precise classification framework. The ongoing efforts in complex task solving align with Level 2, Competent AGI, based on DeepMind researchers' statement that current frontier language models are considered Level 1 General AI ("Emerging AGI") until they achieve higher performance across a broader set of tasks, at which point they would meet the criteria for Level 2 General AI ("Competent AGI").
I believe the field is repeating the mistake of a decade ago by attempting to manually craft human capabilities through software logic. There’s a peculiar expectation that just the right combination of "Lego blocks" can construct a spaceship to the moon. Just as we didn’t know how to write the LLM model and still don’t, but instead train it, I doubt there's any combination of software code that humans can devise to achieve Level 2 AGI.
I'm not sure when engineering and data science became conflated, with engineers teaching data scientists to think deterministically and write deterministic code, rather than data scientists teaching engineers to think in terms of probability and train probabilistic models. However, if there's any validity to my viewpoint, then our best bet lies in focusing our efforts on model training rather than engineering. Maybe this is the natural progression. Only time will tell, I suppose.
领英推荐
1) Attention Is All You Need - Google Brain 2017 - https://arxiv.org/pdf/1706.03762.pdf?
2) Language Models are Unsupervised Multitask Learners - OpenAI 2019 - https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
3) Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks - Meta 2021 - https://arxiv.org/pdf/2005.11401.pdf?
4) WizardLM: Empowering Large Language Models to Follow Complex Instructions - Microsoft 2023 -? https://arxiv.org/abs/2304.12244?
5) Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models - UIUC 2023 - https://arxiv.org/abs/2310.04406?
6) ReAct: Synergizing Reasoning and Acting in Language Models - Google 2022 - https://arxiv.org/abs/2210.03629?
7) AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation - Microsoft 2023 - https://arxiv.org/pdf/2308.08155.pdf?
8) Mathematical discoveries from program search with large language models - Google DeepMind 2023 - https://www.nature.com/articles/s41586-023-06924-6?
9) MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework - 2023 - https://arxiv.org/abs/2308.00352?
10) Levels of AGI: Operationalizing Progress on the Path to AGI - DeepMind 2024 - https://arxiv.org/pdf/2311.02462.pdf?