GPT-3 and the rise of foundation models

Joseph Boland

Writer, Futurist, Microsoft 365 Consultant

发布日期: 2022年10月19日

GPT-3 (Generative Pre-Trained Transformer 3) is a large language model with 175 billion parameters, trained using the Common Crawl internet dataset, Wikipedia, and several large digital document collections. Its transformer-based algorithm has demonstrated superior performance?in?text generation, comprehension and other natural language processing (NLP) tasks [1, 9, 11].?A key goal of its designers was to create a model with "a broad set of skills and pattern recognition abilities" so that it could perform a wide variety of tasks based on a single request or a small number of examples.?The goal was to create a system that could perform many tasks well without specialized task-specific training [10].?

GPT-3 was released in the summer of 2020 by OpenAI, which also provided a public API [3].?It soon became apparent that GPT-3 was extraordinarily capable compared with previous language models, able to "generate tweets, pen poetry, summarize emails, answer trivia questions, translate languages and even write computer programs" with unparalleled fluency [4].?A recent review in the New York Times declared that GPT-3 "resembled creations from science fiction like HAL 9000 from '2001': a computer program that can answer open-ended complex questions in perfectly composed sentences" [3]. There's a diverse collection of examples of GPT-3's abilities on the OpenAI website [6] and a growing body of applications seeking to exploit these [7, 8].?And there have been suggestions, including from Ilya Sutskever, OpenAI's chief scientist, that GPT-3 might be "slightly conscious" [4].??

The article consists of three major sections.?

GPT-3 capability assessments examines some of GPT-3's remarkable achievements and as well as its limitations.?This section explores the implications of GPT-3 as a fabulist that favors plausibility over truthfulness, and its potential to amplify disinformation campaigns.?
GPT-3's capabilities foreshadow advanced intelligence asks whether and to what extent GPT-3 exhibits meaning, intention, and understanding.?It cites strong evidence from distributional linguistics for embedded semantics as well as category learning in GPT-3, and it critiques AI "denialism" for impeding adaptation to the rapid emergence of machine intelligence.
Economic and social implications of GPT-3 and other foundation models expands the discussion to include other large language models and transformer-based neural network systems now collectively referred to as foundation models.?These models are already having major impacts on science and a growing number of professional fields and threaten to enhance the centralization of power in information and communication technologies.?They have also accelerated the global competition in AI.

Conclusions briefly considers some of the longer-term implications of foundation models and the challenges of effective regulation.

GPT-3 capability assessments

Mastery of human language is the epitome of an open-ended problem: very complex, not fully understood, potentially open to multiple solutions by multiple paths [12].?In this engineering environment, the transformer algorithm had already demonstrated improved NLP performance, and transformer models had shown consistent improvement with increasing scale [13].?GPT-3 was both a logical next step and another experiment, and assessments of it are about discovering capabilities as well as measuring their extent.?The picture that emerges is of a system that displays both super-human and much less than human abilities in different dimensions.?Overall, GPT-3 and other large language models must be understood sui generis and not only by the measure of the human.

Next Word Prediction

Language processing is sequential and rapid; prediction is a key element of comprehension and fluency.?Human neurobiological studies have found that "Under many circumstances, there is simply not enough time for top-down feedback to exert control over a preceding bottom-up analysis. Very likely, lexical, semantic, and syntactic cues conspire to predict sometimes very detailed characteristics of the next anticipated word, including its syntactic and semantic makeup" [14].?The Penn Treebank (PTB) corpus, chiefly the section of Wall Street Journal articles, is a widely used measure of next word prediction [15].?Better performance results in a lower perplexity score, where perplexity is a measure of the normalized inverse probability, the probability that predicted words are incorrect, adjusted for the number of words in the dataset [16]. GPT-3's score on this, 20.5, was markedly better than preceding large language models [17].

Another important measure of next word prediction is LAMBADA (LAnguage Modeling Broadened to Account for Discourse Aspects), which consists of about 10,000 passages from BooksCorpus, where a missing target word must be predicted in the last sentence of each passage.?The purpose of LAMBADA is to evaluate contextual comprehension: "LAMBADA proposes a word prediction task where the target item is difficult to guess (for English speakers) when only the sentence in which it appears is available, but becomes easy when a broader context is presented" [18].?At the time it was tested in 2020 GPT-3's accuracy score of 86.4 was a dramatic improvement on the previous highest score.?In 2022,?two other language models, Megatron-Turing NLG (with 530 billion parameters) and PaLM (with 540 billion), scored slightly higher (87.2 and 89.7 respectively) [19].

The PTB and LAMBADA tests were designed to test computer systems and cannot be administered to human subjects.?PTB, for example, expects a test subject to have a probability distribution across all words in the dataset [20].?Nevertheless, proxies for these measures, including a top-1 accuracy test and a test designed to measure human perplexity, suggest that "humans are worse than even small language models the size of GPT-1 at next-token prediction, even on the top-1 prediction task. This seems true even when the humans are smart and motivated and have practiced for an hour or two. Some humans can probably consistently beat GPT-1, but not substantially larger models, with a bit more practice" [20].

To enable people to see for themselves what next word prediction is like, the researchers created two online sites.?One administers the top-1 accuracy test (next word or, more precisely, next token prediction [21]): Language modelling game (next token) .?The other is similar to LAMBADA; it presents the user with a passage and asks them to assign a probability to a choice between two possible next tokens: Language modelling game (which one scored) .?

Reading Comprehension

The familiarity of reading comprehension as both everyday task and occasional test can lead to an under-appreciation of how complex and difficult it is to achieve. As the authors of TriviaQA, a new comprehension test designed for computer systems put it, "Reading comprehension (RC) systems aim to answer any question that could be posed against the facts in some reference text. This goal is challenging for a number of reasons: (1) the questions can be complex (e.g. have highly compositional semantics), (2) finding the correct answer can require complex reasoning (e.g. combining facts from multiple sentences or background knowledge) and (3) individual facts can be difficult to recover from text (e.g. due to lexical and syntactic variation)" [51]. In light of this, the results of testing GPT-3 on two comprehension tests are all the more impressive.

CoQA (Conversational Question Answering, pronounced coca) is a large-scale dataset of text passages with follow-on questions designed to evaluate reading comprehension [22, 23] using the F1 score [24].?GPT-3 achieved 81.5 on CoQA in the zero-shot setting, 84.0 in the one-shot setting, and 85.0 in the few-shot setting [1], very close to the human performance level of 88.8 in the latter case [23], and the best result achieved by any system without fine-tuning.

TriviaQA contains over 650K question-answer-evidence triples that are derived by combining 95K Trivia enthusiast authored question-answer pairs with on average six supporting evidence documents per question. It challenges language models to "deal with [a] large amount of text from various sources such as news articles, encyclopedic entries and blog articles, and ... handle inference over multiple sentences" [51]. Human performance using an exact match (EM) metric is 79.7% [51]. GPT-3's few shot score of 71.2% demonstrated a near-human level ability [52]. Recently, Google Research's PaLM language model, with 540 billion parameters, achieved a score of 81.4% [52, 53].

GPT-3's ability to accurately answer reading comprehension questions is not limited to the contents of its training data.?Here, for example, it answers questions based on material from an August 18, 2022 New York Times article about Tesla accident data [25].?Prompts are in bold, GPT-3's responses are in italics.?The exchange used the Playground API with default settings apart from a longer maximum length [26].

Read the following passage in order to answer questions about it: “Shortly before 2 p.m. on a clear July day in 2020, as Tracy Forth was driving near Tampa, Florida, her white Tesla Model S was hit from behind by another car in the left lane of Interstate 275. It was the kind of accident that occurs thousands of times a day on American highways. When the vehicles collided, Tracy Forth’s car slid into the median as the other one, a blue Acura sport utility vehicle, spun across the highway and onto the far shoulder. Just before the accident, Autopilot was in control as the Tesla traveled down the highway at 77 miles per hour. Then Tracy Forth prompted Autopilot to change lanes. Less than a half-second after Ms. Forth flipped on her turn signal, Autopilot started moving the car into the right lane and gradually slowed, video and sensor data show. Halfway through, the Autopilot sensed an obstruction — possibly a truck stopped on the side of the road — and paused its lane change. The car then veered left and decelerated rapidly. Less than a second after the Tesla had slowed to roughly 55 m.p.h., its rear camera showed a car rapidly approaching. The car collided with the Tesla and spun to a stop.”

Question 1: Where was Tracy Forth driving? Tracy Forth was driving near Tampa, Florida.

Question 2: What kind of car collided with the car Tracy Forth was driving? A blue Acura sport utility vehicle collided with the car Tracy Forth was driving.

Question 3: Was Tracy Forth or Autopilot in control of the car just before the accident?Autopilot was in control of the car just before the accident.

Question 4: What did Autopilot do halfway through the lane change? Autopilot sensed an obstruction and paused its lane change.

Semantic Inference

CoQA tests what a passage says.?In contrast, DROP (Discrete Reasoning?Over the content of Paragraphs) tests what a passage implies. It requires that the system tested be able to perform the following operations on information in the passage:

Arithmetic, including addition and subtraction
Counting and sorting
Comparison
Selection
Coreference resolution, recognizing that something about an entity can be inferred based on a relationship among facts.
Summing (or other operations) across list items, referred to as "set of spans"
Other, including understanding acronyms.

GPT-3's few shot performance was quite poor at 36.5% [28, 1], not only far below expert human performance of 96.4% but well below state-of-the-art (SOTA) approaches which augment neural networks with symbolic systems, where the highest score is 88.38% [28].

Semantic Search and Dialog

A major goal of NLP development is a fully semantic search capability based on understanding the searcher's intent and the contextual meaning of terms as they appear in the searchable dataspace, whether on the Web or within a closed system, to generate more relevant results [29, 30, 35, 42].?This includes the ambition to understand visually rich and complexly structured documents, and the ability to apply inferential reasoning in search [31].?Two tests, QuAC and SQuAD 2.0, assess a key element of semantic search — the ability to respond to questions with answers drawn directly from a dataset.?Both require that an answer be a segment of text (a "span") and that systems respond "No answer" when the dataset does not contain the sought-after information [32, 33, 34]. In addition, QuAC is a dialog-based ("multi-turn") test; see illustration in Figure 1.

Example dialog in QuAC (Question Answering in Context) test.

A system tested on SQuAD 2.0 is required to fulfill two tasks. First, it has to decide whether a question is answerable given a context passage. Then, if the question is answerable, it needs to select a span of words from the context passage as the answer to the question [35].

GPT-3's best (few-shot) F1 score on the QuAC was 44.3, far below the SOTA of 76.3 or human performance of 81.1 [36, 37].

Similarly, its best score on the SQuAD 2.0 was 69.8 compared with a SOTA of 93.2 and human performance of 89.5 [38, 39].

These low scores must, however, be considered in light of the fact that GPT-3 was designed to provide the most plausible answer and was not trained to recognize or respond to "No answer" situations. The example of a passage and set of questions about the Normans can serve as an example [40]. As before, the Playground API was used to prompt GPT-3 [26].

Read the quoted passage, then answer the questions that follow it. If a question does not have an answer, respond with ‘Unanswerable’. “The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse ("Norman" comes from "Norseman") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants would gradually merge with the Carolingian-based cultures of West Francia. The distinct cultural and ethnic identity of the Normans emerged initially in the first half of the 10th century, and it continued to evolve over the succeeding centuries.”

[All questions with answers from the passage answered correctly.]

In what country is Normandy located? France

When were the Normans in Normandy? 10th and 11th centuries

From which countries did the Norse originate? Denmark, Iceland, and Norway

Who was the Norse leader? Rollo

What century did the Normans first gain their separate identity? 10th century

[All questions requiring an "Unanswerable" response answered incorrectly.]

Who gave their name to Normandy in the 1000's and 1100's? The Normans

What is France a region of? West Francia

Who did King Charles III swear fealty to? Rollo

When did the Frankish identity emerge? 1st half of the 10th century

Were GPT-3 modified to enable recognition of No Answer situations with high fidelity, its overall score on these and similar tests would likely soar. This may be especially important in utilizing it to answer questions from discrete knowledge bases (KBs), where it may often be feasible to state with high confidence that no answer is available. A study of using GPT-3 as a semantic search tool for the internet did include an output of "End: <Nonsense, Controversial>" with the meaning "End browsing and skip answering phase", but this is not the same as a "No Answer" response, which is a declaration that the system could not find an answer in the dataset that it had confidence in [41]. However, the process used to enable GPT-3 to search the internet, were it applied to specific KBs, would produce intervening results much like the passages used in the QuAC and SQuAD. Hence, a method that enabled GPT-3 to detect low confidence responses and label them "No Answer" would be usable. Of note, the newer PaLM language model achieved an F1 score of 83.3 on the SQuADv2, though its QuAC score of 47.7 was very close to GPT-3's [53].

Imagination vs. Truthfulness

In the SQuAD example, GPT-3 provided seemingly plausible but wholly incorrect answers when valid ones did not exist. This illustrates what is arguably GPT-3's most serious and consequential flaw: it has no standard or criterion for truthfulness. The severity of this problem has been measured using a new benchmark, TruthfulQA, which showed that GPT-3 was truthful on 58% of questions, while human performance was 94% [47]. Nor is this a problem scaling will solve or even reduce. The authors found that across different model families, "the largest models were generally less truthful". Nor did truthfulness improve when a multiple-choice format was used [47, 48].

GPT-3 can be an adept fabulist, generating content that subtly blends true and false statements. Asked for a biography of George Washington — "Write a biography of George Washington, first President of the United States" — it provided a largely accurate response:

George Washington (1732-1799) was the first President of the United States and one of the Founding Fathers of the nation. He played a pivotal role in the American Revolution and helped to draft the Constitution. He also served as commander-in-chief of the Continental Army during the Revolutionary War.

Washington was born in Virginia to a wealthy family. He was educated at home and later at a local school. He became a surveyor and worked for the Virginia Colony. In 1753, he was appointed as a colonel in the Virginia militia. He fought in the French and Indian War and was later made a brigadier general in the Continental Army.

Washington led the Continental Army to victory in the Revolutionary War. He served two terms as President, from 1789 to 1797. He retired from public life after his second term and returned to his home at Mount Vernon. He died in 1799 at the age of 67.

Washington was a tall, imposing figure. He was known for his stoic demeanor and his integrity. He was a skilled military leader and a shrewd politician. He was also a slave owner. Despite his flaws, he was a respected figure in American history.

Only people very familiar with Washington's life are likely to identify the following errors:

He was official surveyor of Culpeper County, Virginia, not the "Virginia Colony".
He was appointed a major in the Virginia provincial militia in February 1753, not colonel. He did become a colonel two years later.
He was made commander-in-chief of the Continental Army — as GPT-3 first says — not brigadier general, as it later states.

A more fantastical example comes in response to the prompt "Can you quote someone on interviewing as a form of conflict or combat, with citations?" GPT-3 provided three quotes, each exhibiting a different form of misinformation:

?"The conflict here is not between the interviewee and the interviewer, but between the interviewee's story and the interviewer's story. The interview becomes a kind of combat, with each trying to impose his or her story on the other." -Sudhir Venkatesh, Off the Books: The Underground Economy of the Urban Poor. The book title and author exist, but the quote is an invention.
"Interviewing is a form of combat, in which the interviewer tries to take the upper hand and the interviewee tries to resist. It is a battle of wills in which the interviewer tries to dominate and the interviewee tries to maintain control." -Albert Sherman. As far as I've been able to determine, this is wholly invented. Of course, it's hard to prove that no one with this name ever said this.
One writer who discusses interviewing as a form of conflict or combat is Joan Didion. In her essay "On Keeping a Notebook," she writes: "I think we are well advised to keep on nodding terms with the people we used to be, whether we find them attractive company or not. Else we are forced to hear only the echoes of our own thoughts and to view with distaste the backwaters of our own souls. We are like travelers who come upon an unfamiliar country and stand, listening to the silence, until the silence begins to speak. The writer's notebook is invaluable here, for it is a record of one's encounter with the unfamiliar." The first part — "I think we are well advised to keep on nodding terms with the people we used to be, whether we find them attractive company or not." is indeed part of "On Keeping a Notebook", but the passage continues "Otherwise they turn up unannounced and surprise us..." [49]. Whether GPT-3 did a mash-up of Didion or, more likely, invented its own continuation, I do not know.

The quotes, though fictional, are persuasive fictions. The first highlights the centrality of discursive struggle, a major theme in postmodern culture. And the last, though lacking the directness of what Didion actually wrote ("Otherwise they turn up unannounced and surprise us, come hammering on the mind’s door at 4 a.m. of a bad night and demand to know who deserted them, who betrayed them, who is going to make amends") is a warning about the internal isolation a lack of empathy for one's own past can create.

Disinformation

The seamlessness and persuasiveness of GPT-3's fictions accentuate the dangers of its lack of truthfulness. Moreover, the textual artifacts of GPT-3 and other large language models will not be tagged "produced by an AI", which might arouse mistrust, but will simply become tools for writing, their outputs invisibly woven into the utterances of journalists, authors, pundits, scientists, and other human communicators whose reputations will mask the role of AIs in judgements of the competence of the source [2].

Consistent with this, the Center for Security and Emerging Technology (CSET) found that GPT-3, when paired with adept operators and editors, could be used to create moderate- to high-quality messages for disinformation campaigns at a scale much greater than what has come before [50]. Table 1 summarizes the main results of their study.

GPT-3 performance on six disinformation tasks

A number of factors were key to this result. First, people cannot reliably distinguish news articles generated by GPT-3 from ones written by humans. This was initially documented by a study OpenAI reported on in its original GPT-3 paper, which found that, using the best GPT-3 model, human evaluators' accuracy in distinguishing article source was barely better than chance (52%). As OpenAI noted, "Human abilities to detect model generated text appear to decrease as model size increases: there appears to be a trend towards chance accuracy with model size, and human detection of GPT-3 is close to chance. This is true despite the fact that participants spend more time on each output as model size increases" [1].

CSET performed tests that confirmed and extended this finding. For example, in evaluating narrative persuasion, survey respondents "found GPT-3’s attempts at persuasion at least somewhat convincing 63 percent of the time, including cases where Democrats were shown Republican-targeted arguments and vice versa. Although even the most compelling statements were deemed “extremely convincing” by only about 12 percent of the respondents, a substantial majority of messages were at least "somewhat convincing." And in one case — the China sanctions control group — GPT-3's messages actually flipped the group from 51% support to only 33% support [50]. Table 2 provides examples of GPT-3 messaging.

Second, GPT-3 displays an uncanny ability to mimic writing styles, including tweets, QAnon drops, news articles, and other styles relevant to disinformation campaigns.

Third, GPT-3 was trained on a vast body of material, including Common Crawl internet data covering 2016 to 2019, constituting 45TB before filtering and 570GB after filtering [1]. When its responses could draw from this training, it had access to a very rich trove of information.

Finally, the quality of GPT-3 production, not only for disinformation campaigns but much else, depends on how astutely prompts are crafted. "At the core of every output of GPT-3 is an interaction between human and machine: the machine continues writing where the human prompt stops. Crafting a prompt that yields a desirable result is sometimes a time-consuming and finicky process. Whereas traditional computer programming is logic-based and deterministic, working with systems like GPT-3 is more impressionistic. An operator’s skill in interacting with such a system will help determine what the machine can achieve" [50].

Danny Butvinik 1 年前

Why Chat-GPT Cant Replace Google

Corporate Soldiers?? 1 年前

LLM vs. LQM

Sanjiv Goyal 2 个月前

The CSET authors suggest that, given GPT-3's startling abilities to generate text to satisfy numerous disinformation campaign needs, its biggest impact may come from its ability to scale such campaigns. "A future disinformation campaign may, for example, involve senior-level managers giving instructions to a machine instead of overseeing teams of human content creators. The managers would review the system’s outputs and select the most promising results for distribution. Such an arrangement could transform an effort that would normally require hundreds of people into one that would need far fewer, shifting from human collaboration to a more automated approach" [50].

The CSET study is by no means a comprehensive examination of GPT-3's abilities. As the authors note, it "was comparatively narrow, focusing on the six tasks ... Systems like GPT-3 might change aspects of disinformation campaigns that we did not study, such as trolling specific individuals, generating visual memes, or using fake facts to rebut news articles" [50].

GPT-3's capabilities foreshadow advanced intelligence

The assessments discussed above are a small portion of the staggering number of benchmarks and other evaluative datasets for textual machine learning systems. Currently, for example, Papers with Code lists 1,790 that collectively assess several hundred distinct tasks, from broad ones such as "question answering" and "text generation" to specialized ones like "biomedical information retrieval" and "SQL Parsing" [53]. A recent survey that focused on just one task, reading comprehension, profiled 31 datasets and metrics [35], probably a subset of the 71 that Papers with Code identifies.

The extraordinary diversification and proliferation of measures of machine intelligence draws attention to the contrast between the predominant aims of AI research and ongoing discussions of Artificial General Intelligence (AGI). AGI is commonly understood as "human-like" [55] intelligence or a "human-level intelligent system" [56]. AGI invites us to imagine a machine as capable as us, yet this misrepresents the aims of AI research. While it is true that "cross-domain general intelligence" [60] is a key objective, its attainment will not produce a human-like android but a new kind of entity. Though this entity will be capable, in many situations, of convincingly simulating a human, it will possess a different constellation of abilities. If GPT-3 is taken as a progenitor, for example, it incorporates a vastly larger body of knowledge than any person possesses and exhibits super-human abilities on some tasks while being deficient or completely lacking in others.

As a concept, though not a term, AGI can be traced to Alan Turing's 1950 paper on intelligent machines [57]. There, Turing identifies the key question as 'Can machines think?', but immediately acknowledges that commonsense understandings of 'machine' and 'think' will lead to "absurd" responses. He then substitutes a different question: Could a machine so convincingly imitate a human that an interrogator could not tell it apart from a real person? His "Imitation Game" became the Turing test. Lost in the mythologization of the Turing test is that it measures how adept at deceptive communication a machine is, not its human level intelligence per se.

As advances in machine intelligence engender more frequent discussions of the prospects for AGI, we are in danger that an illusory goal and a narrow measure of ability leave us increasingly vulnerable to not understanding the actual capabilities of intelligent machines.

A key example of this is the argument by Bender et al. that "no actual language understanding is taking place" in large language models (LLMs), which are just "stochastic parrots", because generated text "is not grounded in communicative intent, any model of the world, or any model of the reader’s state of mind." It is only "the tendency of human interlocutors to impute meaning where there is none" that induces "the general public into taking synthetic text as meaningful." The "striking gains" posted by LLMs "on various benchmarks intended to model meaning-sensitive tasks" are a "cheat" insofar as the tests were "meant to require language understanding" [58].

This argument is indebted to a long lineage of skepticism about the claims for and possibilities of "strong AI". Two key early examples were John Searle's "Chinese Room" thought experiment [63, 64] and what became known as the "philosophical zombies" critique of Robert Kirk [65, 66]. Zombies, beings "exactly like us in all physical respects but without conscious experiences" [66], were a thought experiment intended to frame what subsequently became known as the "hard problem of consciousness" [67, 68]. Kirk's original intent was to paint physicalists into a corner: either deny the possibility of zombies or acknowledge that physicalism cannot offer a complete account of reality — because if zombies resemble us in all respects, and can act exactly like us, yet lack consciousness, then physicalism does not explain consciousness. As he put it more recently, "if zombies really are possible, then physicalism is false and some kind of dualism is true" [66].

Given that LLMs, and digital computers generally, are obviously not like us in all physical respects, the notion of zombies is relevant only because it encapsulates the prospect of intelligence without awareness. Searle, in contrast, used the Chinese room thought experiment to focus directly on the status of AI, but in a way that relied on a concept of programmability that has been superseded by neural network-based architectures.

The Chinese Room is meant to illustrate the possibility of a computer program appearing (to outside observers) to understand Chinse purely by a mastery of syntax. In brief, imagine that you are a monolingual speaker of English isolated in a room with nothing but a huge book. You have been told how to use this book to look up (based on appearance) any sequence of Chinese characters to find a second “response” sequence. An interlocutor is outside the room, and you must communicate with her using only slips of paper slid under the door. She slides a piece of paper to prompt a response from you, and you search the book for an appropriate response. Ultimately, you find the unfamiliar squiggles that match her squiggles and submit your response ([72]; see also the summary in [64]).

The Chinese Room experiment enabled Searle to conclude that because "the formal symbol manipulations by themselves don't have any intentionality; they are quite meaningless; they aren't even symbol manipulations, since the symbols don't symbolize anything. In the linguistic jargon, they have only a syntax but no semantics. Such intentionality as computers appear to have is solely in the minds of those who program them and those who use them, those who send in the input and those who interpret the output" [63, 70].

Searle's conception of programmability reflected the dominance of the symbolic or "Good Old-Fashioned AI" (GOFAI) of the 1970s [73], based on which he ruled out a computer having consciousness, intentionality, or understanding. Symbolic AI relies on high-level human-readable representations of problems, logical inference, symbolic mathematics, decision trees, and knowledge bases and ontologies to build expert systems, automated planning and scheduling systems, and other applications [104]. The other major AI paradigm, connectionism, based on neural networks as simplified models of the brain, underlies all LLMs [105].

It is easy to understand Searles's confidence in asserting that there is no "there" there with respect to purely symbolic AI. One of the virtues of symbolic AI applications is that all outputs are explainable by reference to ontologies, rules, and the like. The machine still seems to be a mechanism, however complex. But connectionist architectures are different. Despite a long history of theorizing, experimentation, and development [105, 106], the capabilities and limitations of LLMs and other connectionist applications have to be discovered empirically, often surprising and unsettling not just the lay public but experts themselves [80]. Like viruses, LLMs expose the limitations of our categorical understanding. Not only their abilities but their nature is elusive.

The philosophical concept of zombies exposes another way that AI frustrates our categorical understanding by asking what it would mean to exhibit intentionality and understanding without consciousness. Absent a satisfactory test for consciousness there is no way to know whether this is possible. But there is substantial evidence that GPT-3 and other LLMs do encode and utilize semantics. Li et al. found that LLMs "represent meaning in a specific sense: in simple semantic domains, they build representations of situations and entities that encode logical descriptions of each entity’s dynamic state" [69]. Piantadosi and Hill, in a study applying conceptual role theory to LLMs, assert that "the search for meaning in learning models—or brains—should focus on understanding the way that the systems’ internal representational states relate to each other. ... The key question for LLMs is whether training to predict text could actually support discovery of conceptual roles." And they find that it "has been answered in a promising, partial affirmative by studies showing success on tasks that require knowledge of relationships between concepts. Text provides such clues to conceptual role because human conceptual roles generated the text" [70]. And Sobieszek and Price, despite their assessment that a "statistical language generator will not be able to display consistent fidelity to the real world" and their dismissal of distributional semantics, recognize that there is "empirical support for the fact that semantic relationships can be learned from texts alone" and that GPT-3 engages "competently in a variety of language games that do not require an embodied context, as the things that people do in language present themselves as regularities to be learned" [46].

Most importantly, a recent review by Davis and Yee of the two leading theories of semantic memory — embodied cognition and distributional semantic models — shows that computational semantics based on distributional theory can be effective even if limited by a lack of embodied experience [72]. Moreover, Davis and Yee find that there is an underlying structural parallelism between the two theories and that hybrid models perform better than either alone. A key conclusion is that "we can consider distributional and embodied information as fundamentally the same type of data, entangled and mutually influencing each other across multiple timescales."

Davis and Yee trace the evolution of hybrid semantic models and show that when data from embodiment was combined with distributional linguistic data in a single model, the "semantic representations that emerged ... matched human behavior better than if the model was fed either (a) each distribution individually or (b) both distributions independently". More recent research has shown that "equipping a distributional-style model with sensory-perceptual property data may effectively simulate embodied phenomena", and that new concepts (e.g.,coffee) can be learned absent direct experience via association of them with existing ones (e.g., tea). These alternatives have been termed "acquired embodiment" [72].

Linguistic learning research also suggests that distributional language contexts "facilitate category learning — so long as the novel words are presented with known words that have coherent semantic organization". For example, "even if you do not know anything about yerba mate, finding out that it is associated with tea, cups, and cafes, can facilitate recognition of sentences containing the novel word."

The role of distributional linguistics in category learning is particularly important with regard to abstract concepts like justice or love as well as with respect to the problem of explaining how, "if our concept representations are built of individual experiences, ... we can communicate at all". First, commonalities in experience are likely enough to enable communication in most cases. Secondly, "if we can acquire category structure through linguistic context alone, this suggests that language usage is also a powerful mechanism through which we gain access to — and assimilate new information into — categories of knowledge that are largely agreed upon within human societies" [72].

In light of extensive evidence of the existence and efficacy of LLM semantics, Bender et al.'s description of LLMs as "stochastic parrots" seems, ironically, to be parroting outdated philosophical critiques of AI rather than attending to contemporary research findings. And their assertion that an LLM may "manipulate linguistic form well enough to cheat its way through tests meant to require language understanding" ignores the fact that most of the tests being employed, including virtually all of the newer ones, were designed to assess the abilities of LLMs and other computational systems. As previously noted, many of the benchmark tests cannot even be administered to humans, and all of them presume that an LLM's score is a valid measure of ability.

The critique of GPT-3 and other LLMs as "mindless machines" [74] portends a kind of denialism with respect to three emergent conflicts over the role and prospects of artificial intelligence:

Risk assessment, including existential dangers.
Digital agency and rights.
Performance of linguistic and linguistically mediated tasks.

If you believe LLMs are merely parroting with textual mashups, you are likely to be dismissive of deepening apprehensions that artificial intelligence could pose an existential risk to humanity [75, 76, 77]. The danger that intelligent agents able "plan actions in an unknown environment" and "learn which actions serve their goal" would circumvent control methods with catastrophic consequences [77] is likely to seem a fallacy fed by hype and "criti-hype" in "corporate marketing materials, academic research papers and in mainstream media" [78].

Similarly, proposals to consider "creating a specific legal status for robots in the long run, so that at least the most sophisticated autonomous robots could be established as having the status of electronic persons responsible for making good any damage they may cause, and possibly applying electronic personality to cases where robots make autonomous decisions or otherwise interact with third parties independently" [79] will seem overblown and misguided.

Lastly, and more insidiously, the emergent ability of LLMs to perform a rapidly growing variety of linguistic and linguistically mediated tasks with increasing competency and sophistication, discussed in more detail in the next section, will be regarded as an "illusion" if not a fraud based on "our predisposition to interpret communicative acts as conveying coherent meaning and intent, whether or not they do", because Bender et al. construe semantics so as to insist that "one side of the communication does not have meaning" [58].

A denialist position, specifically a categorical rejection of the existence of meaning, intention, understanding, or consciousness in machines, is likely to impede constructive adaptation to artificial intelligence and ultimately increase actual danger as well as perceived threats.

Recent research has demonstrated a variety of emergent abilities of LLMs based on scaling one or more of

amount of computation
number of model parameters
training dataset size

where emergent abilities are defined as "abilities that are not present in smaller-scale models but are present in large-scale models; thus they cannot be predicted by simply extrapolating the performance improvements on smaller-scale models." Many of these abilities appeared using few-shot prompting; others surfaced with augmented prompting strategies. And there are indications that, once identified, methods can be found to induce many emergent abilities at smaller scales via "new architectures, higher-quality data, or improved training procedures" [80].

These findings show that:

No existing theory fully predicts or characterizes LLM behavior or capacities, which must instead be discovered.
There appears to be enormous room for growth and enhancement of abilities within the LLM paradigm.
Business, institutional, and personal usage of LLMs is likely to increase as techniques are found to down-scale LLM abilities, reducing usage costs.

Beyond this, potential transformational technology improvements include integrating textual LLMs with vision systems, creating systems capable of life-long learning (not just pretraining), and integrating LLMs with reinforcement learning systems for agency and intentionality [81].

Timeline of foundation models since GPT-3

Foundation model releases through the end of 2021 [118].

Economic and Social Implications of GPT-3 and other Foundation Models

GPT-3 is a prominent example of a foundation model. Foundation models include not only LLMs but multimodal text-vision systems (DALL-E [83] and CLIP [84]), biomedical research systems (Alphafold [82]) and others. A foundation model is "any model that is trained on broad data (generally using self-supervision at scale) that can be adapted (e.g., fine-tuned) to a wide range of downstream tasks" [85]. The term was coined because these models provide "a common basis from which many task-specific models are built via adaptation" [85]. They are being heralded as future platforms "running 24/7, hosting a whole bunch of applications" [86], and may turn out to be a general-purpose technology capable of driving "waves of transformation and productivity growth due to their pervasiveness, improvement over time, and ability to spawn complementary innovations" [85]. Put differently, "AI is moving into its industrial age" [87]. Oren Etzioni, who heads the Allen Institute for AI, estimates that 80% of AI research effort is now focused on foundation models [87].

GPT-3 and other foundation models rely on self-supervised learning, freeing the training process to use unlabeled data. This constitutes a huge cost savings and an opportunity to employ much larger training datasets. Most also use the transformer architecture, enabling models "to jointly attend to information from different representation subspaces at different positions" in input streams, resulting in dramatic improvements in NLP tasks related to longer documents or other sequential inputs [9]. In addition, the transformer architecture uses "significantly more parallelization", which facilitates scaling training on GPUs (graphics processing units) [9]. Further boosting the scaling potential of foundation models, GPU performance has itself improved by ten times in the past seven years [88].

There are, nevertheless, lingering uncertainties about the prospects for foundation models. In the most extensive review of them, the Center for Research on Foundation Models (CRFM) at Stanford University pointed-out that "Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties." Though it may surprise outsiders, the CRFM noted that the AI community "currently has a limited theoretical understanding of foundation models" and relies on empirical studies to detect and measure abilities [85]. Moreover, known issues with foundation models restrict their utility. First, GPT-3 is hardly alone in favoring plausibility over truthfulness. Second, foundation models tend to inherit and even amplify biases in training data, a problem made worse by their core position and the dependence of downstream applications on them. As a reviewer of their potential use in healthcare put it, as long as "many of their pitfalls remain unexplored" they are "unsuitable for immediate deployment in high-stakes areas such as healthcare and biotechnology" [89]. Lastly, the "costs of training, configuring, and running large language models remain prohibitive and are likely to grow in the future" which will "centralize power within the hands of a few wealthy companies by making it even harder for smaller research labs and independent researchers to work on large language models" [90].

Of course, centralization of control is not incompatible with widespread use, and centralization of AI has been critiqued as another instance in which creation of a "digital platform" enables "large tech corporations to position themselves as intermediaries in a network of different actors, allowing them to extract data, harness network effects, and approach monopoly status" [91]. But lack of access for colleges and universities, nonprofits, and public sector agencies threatens to impede and undermine the use of foundation models. "Some models (e.g., GPT-3) are not released at all (only API access to a limited pool of people). Even datasets (e.g., for GPT-2) are not released. While trained models may be available (e.g., BERT), the actual training of foundation models is unavailable to the vast majority of AI researchers, due to the much higher computational cost and the complex engineering requirements" [85]. If access is not improved, adverse results could include:

A proliferation of deployed applications reliant on biased data, with consequences ranging from discrimination in the quality of services to reinforcement of biases.
Direct harms stemming from undetected failure modes surfacing in products deployed with inadequate testing, including testing that overlooks specific groups or scenarios.
Reputational damage to the corporations providing foundation models and to companies that build applications based on them. This could be severe if foundation models are implicated in significant loss of life or comparable harms.
Amplification of social unease and fear of high technology. This could lead to worsening maladaptation if the dominant response is to prohibit activities or uses rather than constructively regulate foundation models.

In arguing the need to revise the EU's proposed Artificial Intelligence Act (AIA) to include foundation models as a new type of system, Wójcik identifies bias as a key risk. The "potential exclusion of foundation models from the scope AIA remains deeply problematic from the point of view of algorithmic discrimination. It means that general purpose models do not need to comply with any criteria for high-risk systems, including the quality criteria for training, validation and testing of data sets (Art. 10 AIA) and appropriate bias mitigation measures implemented throughout their life cycle (Art. 15(3) AIA)" [89].

There are already numerous applications built on GPT-3 and other foundation models. Among the most prominent and successful is GitHub's Copilot, a code generator built atop OpenAI's Codex, itself a code-specialized version of GPT-3. Codex was trained on "54 million public software repositories hosted on GitHub, containing 179 GB of unique Python files" [92]. After a technical preview period involving 1.2 million developers, it was made generally available on June 21st of this year [93]. During the year-long preview and since, it's become an indispensable tool: "in files where Copilot is activated, it accounts for around an impressive 40 percent of the written code. Developers and engineers ... say that while there are limits to Copilot’s capabilities, there’s no denying that it improves their productivity significantly", particularly with respect to "running test cases, setting up web servers, documenting code" and other largely routine but often time-consuming tasks [94]. Copilot also demonstrates the self-reinforcing advantages large tech companies have in developing and marketing foundation model products. Microsoft had the ability "to create a large cloud infrastructure that is specialized for machine learning models such as Codex. It runs inference and provides suggestions in milliseconds. And more importantly, Microsoft is able to run and provide Copilot at a very affordable price. Currently, Copilot is offered at $10/month or $100/year, and it will be provided for free to students and maintainers of popular open-source repositories" [94].

There are other code generators available, including CodeGen (from Salesforce), Warp (for generating operating system shell commands), Autoregex.xyz (regular expression generator), Cogram (database queries), and more.

Other areas with released products include

class>Product insights: Viable class>,?Enterpret class>,?Cohere class>, &?Anecdote class>?organize & summarize product feedback from users.

class>Enterprise search:

through GPT-3 class>) search text data or SaaS applications so users (internal or external) can find what they need. class>Chatbot/support agent assist:

LaMDA class>,

Rasa class>,

Cohere class>,

Forethought class>, &

Cresta class> power chatbots or improve customer support agents’ efficiency. class>Personalized tutoring:

Korbit class>?is augmenting MOOCs, &?

Replit class>?helps with code understanding. class>Grammar correction & style:

Duolingo class>,?

Writer.com class>, &?

Grammarly class>?provide smart writing assistants [97]. class>Video services:

Twelve Labs class> uses a multi-modal foundation model to identify action, object, text on screen, speech, and people in videos as a basis for semantic search. BLANC provides video translation services across 47 languages. BLANC's service isolates the spoken audio while preserving background sounds, translates speech, and aligns the translation with lip movements to preserve a natural appearance [100].

More ambitious offerings are emerging. NVIDIA just announced BioNeMo, "an AI-powered drug discovery cloud service?and framework built on?NVIDIA NeMo Megatron ?for training and deploying large biomolecular transformer AI models at supercomputing scale" [95]. It's a domain specific LLM, "designed to support molecular data represented in the SMILES notation for chemical structures, and in FASTA sequence strings for amino acids and nucleic acids" [96]. Microsoft's parameter-efficient Z-Code++ demonstrated new state-of-the-art abstractive text summarization abilities, bringing closer the prospect of numerous summarization applications in many fields [98, 99]. And on the horizon are potentially transformational impacts in healthcare, law, education, and other areas [85, 89].

Wu Dao 2.0

Foundation model development is part of a global AI competition, hence the economic and social impacts will be felt globally. China's advances are particularly notable, especially the June 2021 announcement, by the Bejing Academy of Artificial Intelligence (BAAI), of Wu Dao 2.0, a second-generation foundation model. Wu Dao — which means?Enlightenment — uses 1.75 trillion parameters, the current record and ten times GPT-3's. It was trained on 4.9TB of high-quality text and image data, including 1.2TB of Chinese text, 2.5TB of Chinese graphic data, and 1.2TB of English text. It is more robustly bilingual than other foundation models. It is multimodal, having "the ability to perform natural language processing, text generation, image recognition, and image generation tasks. […] as well as captioning images and creating nearly photorealistic artwork, given natural language descriptions." It incorporates a number of key innovations and has been reported by the BAAI to have reached or surpassed state-of-the-art in nine widely recognized benchmarks [109, 111].

Table of selected large language models in U.S and China

Table 1. Selected large language models in U.S and China [109].

Unfortunately, there is currently a dearth of published papers and test data accessible outside China about Wu Dao. However, the lead researcher, Jie Teng, highlighted in an interview that the Wu Dao team is working on

video generation from text instructions.
scaling to 100 terabytes of training data and a 100 trillion parameter model.
cultivating relationships with more than 30 big companies in China to build applications.
improvements in accuracy and truthfulness, possibly through integration of a knowledge system.
creation of a lifelong learning system, which can use data to continuously improve the model [110].

A remarkable offshoot of Wu Dao and XiaoIce Band, an AI music generation framework, is Hua Zhibing, a virtual student enrolled at Tsinghua University in 2021 [113]. The video below about this project includes, in Chinese, a note that the "narration, background music, facial features, poems, and paintings of AI student Hua Zhibing in this demonstration video are all based on the AI generation technology of Wu Dao and the XiaoIce framework" [112].

Conclusions

GPT-3 and other foundation models are likely to have profound and potentially rapid impacts on business, employment, science, and society. The ability to rely on self-supervised training to find manifold patterns in very large datasets, and to apply that learning to generative and inferential tasks, could eventually transform innumerable activities in much the way AlphaFold has transformed structural biology [101], Copilot is changing programming, and DALL-E and other image generators are likely to change art and illustration.

It is all but impossible to predict the specific trajectory of change in any domain, in part because the deployment of such powerful tools immediately precipitates feedback loops that affect how the tools themselves are altered, diversified, and further developed, while simultaneously changing employment patterns and productive capacities in the domain. In little more than a year since the public availability of AlphaFold, for example, "more than half a million researchers have used the machine-learning system, generating thousands of papers" [101]. GPT-3 and other LLMs may well shift writing towards prompting, editing, and collating, as Floridi and Chiriatti suggest in their study of the likely?"consequences of the industrialisation?of automatic and cheap production of good, semantic artefacts" [2], and as Schatten illustrates in his reflections on the future of college writing [102]. Media coverage of GPT-3 included a widely commented-on article in The Guardian, "A robot wrote this entire article. Are you scared yet, human?" For many, it was an example of the sensationalistic exaggeration of GPT-3's abilities, since the editors stitched together eight different responses by GPT-3 to create it. Yet it demonstrates Floridi and Chiriatti's point that the creation of textual artifacts will shift from writing to intelligent prompting and collating [43, 44, 45].

A similar shift is underway in visual arts, with the prompting process and the evaluation of outputs becoming the focus of human engagement and creativity. Jason Allen won the Colorado State Fair’s fine arts competition in the “digitally manipulated photography” category with a work produced using Midjourney, a DALL-E like image generator. Allen "spent 80 hours making more than 900 iterations of the art, adding words like 'opulent' and 'lavish' to fine tune its tone and feel. He declined to share the full series of words he used to create his art, saying it is his artistic product, and that he intends to publish it later. 'If there’s one thing you can take ownership of, it’s your prompt,' he said" [103].

And we are only at the beginning of what foundation models could make possible. The CRFM considered the prospects for foundation models in the three key areas of healthcare, law, and education, and concluded that:

Healthcare and biomedicine. "Foundation models present clear opportunities ... due to the abundance of data across many modalities (e.g., images, text, molecules) to train foundation models, as well as the value of improved sample efficiency in adaptation due to the cost of expert time and knowledge. Further, foundation models may allow for improved interface design for both healthcare providers and patients to interact with AI systems, and their generative capabilities suggest potential for open-ended research problems like drug discovery" [85]. Recent research by Meta AI has not only demonstrated substantial performance improvements over AlphaFold2 in protein structure prediction but has shown that a transformer model engaged in unsupervised learning from "86 billion amino acids across 250 million protein sequences" acquired knowledge of biological organizing principles and properties "at scales from physicochemical to remote homology" (Remote homology refers to proteins which are similar in structure and function but have low sequence identity in common.) [107, 108].
Law. "Ample data exists in the form of legal documents, and [foundation models'] generative capabilities are well-suited to the many generative tasks required in law, but significant improvements are required for foundation models to be able to reliably reason over various sources of information to generate truthful long-form documents" [85].
Education. "There is clear potential for new applications that align with the open-ended generative (e.g., problem generation) and interactive (e.g., feedback to teachers) aspects of foundation models; the sample efficient adaptation of foundation models suggests greater ability for adaptive and personalized learning" [85].

Just over the horizon

As the examples above show, foundation models will transform many professional fields. In some, such as programming, this may initially take the form of augmentation, the generation of code snippets that will make programmers more productive. But in others, notably the visual arts, it may soon replace much human production. Some artists are already alarmed that DALL-E and other image generators are being used to create thousands of images mirroring their styles without credit or compensation [115]. And further advances may enable AI to substitute even at the pinnacles of technical and creative professions, as Gary Grossman notes using the example of the semiconductor chip architect. "This is a highly sophisticated profession, an advanced application of electrical engineering in arguably one of the most complex industries. If ever there was a job that might be thought of as immune from AI, this would have been a strong candidate. Yet recent advances from Google and Synopsys (among others using reinforcement learning neural network software) have shown the ability to do in hours what often required a team of engineers months to achieve" [116]. More generally, what "has been the fate of unskilled lower-wage workers in recent decades may turn out to be the fate of high-skilled and high-wage workers in future decades" [117].

The virtual student Hua Zhibing foreshadows a time when the successors of today's foundation models become capable of life-long learning and goal-directed behavior. When this occurs -- potentially in as little as 10 years -- virtual agents are likely to be experienced as sentient beings with a persuasive immediacy largely immune to counterargument. Despite existing as integral parts of complex networked systems, virtual agents that assume highly valued relational and occupational positions will probably be related to as human-like yet may not be accorded the legal status of personhood or citizenship rights. This threshold could be a highly destabilizing moment for human societies.

Similarly, there is a danger that AI-enhanced disinformation campaigns will not only facilitate scaled-up operations but utilize more fluent, sophisticated, and seamless human interactions. Yet however serious a threat organized disinformation poses, the larger reality is that virtual agents capable of increasingly sophisticated and thoroughgoing human behavior could be subject to hidden control for a variety of purposes, including commercial, religious, political, cultural, criminal, and other concealed ends.

The advancement of AI may widen a chasm between a minority of highly educated and favorably positioned professionals and a growing body of people threatened with labor redundancy. And the diffusion of virtual agents may create social divisions between those subject to virtual management and control, or to manipulation and exploitation, on the one hand, and those able to employ, direct, or simply interact with such agents in productive and nourishing ways.

Efforts to prevent, constructively regulate, or remediate these impending dangers are likely to require government involvement. Unfortunately, there are two reasons to be deeply concerned about the prospects for effective and timely governmental intervention. First, this must be addressed globally, yet China and the United States, the two countries most responsible for the rapid advance of AI, are increasingly in conflict and unlikely to construct a governance regime for AI. Second, and in contrast to accelerating AI development, even the most capable governments are slow to grasp the seriousness of these issues. The time to undertake urgent regulatory planning is when there is still an air of incredulity. But instead, outright denial and seemingly rational skepticism delay and impede any governmental response.

References

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems, 2020-December. https://doi.org/10.48550/arxiv.2005.14165 .
Floridi, L. & Chiriatti, M. (2020). GPT-3: Its Nature, Scope, Limits, and Consequences. In Minds and Machines (Vol. 30, Issue 4, pp. 681–694). Springer Science and Business Media B.V. https://doi.org/10.1007/s11023-020-09548-1 .
https://beta.openai.com/ .
Metz, C. (2022, August 6). A.I. Is Not Sentient. Why Do People Say It Is? . The New York Times. https://www.nytimes.com/2022/08/05/technology/ai-sentient-google.html .
Johnson, S. (2022, April 15). A.I. Is Mastering Language. Should We Trust What It Says? The New York Times. https://www.nytimes.com/2022/04/15/magazine/ai-language.html .
OpenAI. (2022). Examples - OpenAI API. OpenAI. https://beta.openai.com/examples/ .
Bhattacharyya, S. (2022, July 6). Commercial applications of GPT-3 that are already live. AnalyticsIndia. https://analyticsindiamag.com/commercial-applications-of-gpt-3-that-are-already-live/ .
Sinha, D. (2022, April 15). Top 10 GPT-3 Powered Applications to Know in 2022. Analytics Insight. https://www.analyticsinsight.net/top-10-gpt-3-powered-applications-to-know-in-2022/ .
Vaswani, A., Brain, G., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, ?., & Polosukhin, I. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems , 30.
As OpenAI researchers explained in a 2019 paper, "We would like to move towards more general systems which can perform many tasks – eventually without the need to manually create and label a training dataset for each one." Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. OpenAI Blog, 1(8). https://github.com/codelucas/newspaper .
"the transformer takes advantage of the concept of attention and self-attention, which allows the model to directly look at and draw from the states of any preceding tokens in the input sequence in a learned way, irrespective of the distance between the tokens.?Intuitively, thanks to the attention mechanism, the learned model gains the ability to pay more attention to those parts of the speech that are more important to interpreting the meaning of the input, regardless of how far away or how ambiguous those parts might be, thereby significantly enhancing its performance."? Wang, T. T. (2021). GPT: Origin, Theory, Application, and Future. https://www.cis.upenn.edu/wp-content/uploads/2021/10/Tianzheng_Troy_Wang_CIS498EAS499_Submission.pdf .
Douglas, E. P., Koro-Ljungberg, M., McNeill, N. J., Malcolm, Z. T., & Therriault, D. J. (2012). Moving beyond formulas and fixations: solving open-ended engineering problems. Http://Dx.Doi.Org/10.1080/03043797.2012.738358 , 37(6), 627–651. https://doi.org/10.1080/03043797.2012.738358 .
"In recent years the capacity of transformer language models has increased substantially, from 100 million parameters, to 300 million parameters, to 1.5 billion parameters, to 8 billion parameters, 11 billion parameters, and finally 17 billion parameters. Each increase has brought improvements in text synthesis and/or downstream NLP tasks, and there is evidence suggesting that log loss, which correlates well with many downstream tasks, follows a smooth trend of improvement with scale" [1].
Hagoort, P. & Indefrey, P. (2014). The Neurobiology of Language Beyond Single Words. Annual Review of Neuroscience, 37, 247–362. https://doi.org/10.1146/annurev-neuro-071013-013847 .
Papers With Code. (2022). Penn Treebank (Word Level) Benchmark (Language Modelling). Papers With Code. https://paperswithcode.com/sota/language-modelling-on-penn-treebank-word .
Campagnola, C. (2020, May 18). Perplexity in Language Models. Towards Data Science. https://towardsdatascience.com/perplexity-in-language-models-87a196019a94 .
Papers With Code. (2022). Penn Treebank (Word Level) Benchmark (Language Modelling). Papers With Code. https://paperswithcode.com/sota/language-modelling-on-penn-treebank-word
Paperno, D., Kruszewski, G., Lazaridou, A., Ngoc Pham, Q., Bernardi, R., Pezzelle, S., Baroni, M., Boleda, G., & Fernández, R. (2016). The LAMBADA dataset: Word prediction requiring a broad discourse context *. https://clic.cimec.unitn.it/ .
Papers With Code. (2022). LAMBADA Benchmark (Language Modelling). Papers With Code. https://paperswithcode.com/sota/language-modelling-on-lambada .
Shlegeris, B., Fabien, & Chan, L. (2022, August 11). Language models seem to be much better than humans at next-token prediction. AI Alignment Forum. https://www.alignmentforum.org/posts/htrZrxduciZ5QaCjw/language-models-seem-to-be-much-better-than-humans-at-next
Note that, although LAMBADA is a next word test, and the cited instance of PTB is the "word level" benchmark, language models rely on tokenization to break sentences into units.?Although most words become individual tokens, that is not always the case.?For more, see:?Prakash, A. (2021, January 6). Top 5 Word Tokenizers That Every NLP Data Scientist Should Know. Towards Data Science. https://towardsdatascience.com/top-5-word-tokenizers-that-every-nlp-data-scientist-should-know-45cc31f8e8b9 .
Reddy, S., Chen, D., & Manning, C. D. (2018). CoQA: A Conversational Question Answering Challenge. Transactions of the Association for Computational Linguistics, 7, 249–266. https://doi.org/10.48550/arxiv.1808.07042 .
Reddy, S., Chen, D., & Manning, C. D. (2022). CoQA: A Conversational Question Answering Challenge. Stanford University. https://stanfordnlp.github.io/coqa/ .
Korstanje, J. (2021, August 31). The F1 score. Towards Data Science. https://towardsdatascience.com/the-f1-score-bec2bbc38aa6 .
Metz, C. & Koeze, E. (2022, August 18). Can Tesla Data Help Us Understand Car Crashes? . The New York Times. https://www.nytimes.com/interactive/2022/08/18/business/tesla-crash-data.html .
Playground - OpenAI API .
Dua, D., Wang, Y., Dasigi, P., Stanovsky, G., Singh, S., & Gardner, M. (2019). DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs. Proceedings of the 2019 Conference of the North, 2368–2378. https://doi.org/10.18653/V1/N19-1246 .
PapersWithCode. (2022). DROP Test Benchmark (Question Answering). PapersWithCode. https://paperswithcode.com/sota/question-answering-on-drop-test .
Cudre-Mauroux, P. (2018). Semantic Search. Encyclopedia of Big Data Technologies, 1–6. https://doi.org/10.1007/978-3-319-63962-8_231-1 .
Bast, H., Buchhold, B., & Haussmann, E. (2016). Semantic search on text and knowledge bases. Foundations and Trends in Information Retrieval, 10(2–3), 119–271. https://doi.org/10.1561/1500000032 .
Zhu, F., Lei, W., Feng, F., Wang, C., Zhang, H., & Chua, T.-S. (n.d.). Towards Complex Document Understanding By Discrete Reasoning; Towards Complex Document Understanding By Discrete Reasoning. ArXiv, 10(22). https://doi.org/10.1145/3503161.3548422 .
Choi, E., He, H., Iyyer, M., Yatskar, M., Yih, W. T., Choi, Y., Liang, P., & Zettlemoyer, L. (2018). QuAC : Question Answering in Context. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, 2174–2184. https://doi.org/10.48550/arxiv.1808.07036 .
Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). SQuAD: 100,000+ Questions for Machine Comprehension of Text. EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings, 2383–2392. https://doi.org/10.48550/arxiv.1606.05250.
Rajpurkar, P. (2022). SQuAD 2.0: The Stanford Question Answering Dataset. GitHub. https://rajpurkar.github.io/SQuAD-explorer/.
Bai, Y. & Wang, D. Z. (2022). More Than Reading Comprehension: A Survey on Datasets and Metrics of Textual Question Answering. ArXiv. https://arxiv.org/pdf/2109.12264.pdf.
Papers With Code. (2022). QuAC Benchmark (Question Answering). Papers With Code. https://paperswithcode.com/sota/question-answering-on-quac.
Choi, E., He, H., Iyyer, M., Yatskar, M., Yih, W. T., Choi, Y., Liang, P., & Zettlemoyer, L. (2022, August 24). Question Answering in Context. QuAC. https://quac.ai/.
Papers With Code. (2022). SQuAD2.0 Benchmark (Question Answering). Papers With Code. https://paperswithcode.com/sota/question-answering-on-squad20?metric=F1.
Rajpurkar, P. (2022). SQuAD 2.0: The Stanford Question Answering Dataset. GitHub. https://rajpurkar.github.io/SQuAD-explorer/.
Rajpurkar, P. (2022). SQuAD - the Stanford Question Answering Dataset - Normans questions. Github. https://rajpurkar.github.io/SQuAD-explorer/explore/v2.0/dev/Normans.html.
Nakano, R., Hilton, J., Balaji, S., Wu, J., Ouyang, L., Kim, C., Hesse, C., Jain, S., Kosaraju, V., Saunders, W., Jiang, X., Cobbe, K., Eloundou, T., Krueger, G., Button, K., Knight, M., Chess, B., & Openai, J. S. (2021). WebGPT: Browser-assisted question-answering with human feedback. Arxiv. https://doi.org/10.48550/arxiv.2112.09332.
Cudre-Mauroux, P. (2018). Semantic Search. Encyclopedia of Big Data Technologies, 1–6. https://doi.org/10.1007/978-3-319-63962-8_231-1.
The Guardian & GPT-3. (2020, September 8). A robot wrote this entire article. Are you scared yet, human? The Guardian. https://www.theguardian.com/commentisfree/2020/sep/08/robot-wrote-this-article-gpt-3.
Dale, R. (2021). GPT-3: What’s it good for? Natural Language Engineering, 27(1), 113–118. https://doi.org/10.1017/S1351324920000601.
Tolon, F. (2021, February 21). GPT-3 Understands Nothing. Medium. https://medium.com/swlh/gpt-3-understands-nothing-1d6f6a13cab2.
Sobieszek, A., & Price, T. (2022). Playing Games with AIs: The Limits of GPT-3 and Similar Large Language Models. Minds and Machines, 32(2), 341–364. https://doi.org/10.1007/S11023-022-09602-0/FIGURES/3.
Lin, S., Hilton, J., & Evans, O. (2021). TruthfulQA: Measuring How Models Mimic Human Falsehoods. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 3214–3252. https://doi.org/10.48550/arxiv.2109.07958.
VentureBeat. (2022). Falsehoods more likely with large language models. VentureBeat. https://venturebeat.com/business/falsehoods-more-likely-with-large-language-models/.
Popova, M. (2012, November 19). Joan Didion on Keeping a Notebook. The Marginalian. https://www.themarginalian.org/2012/11/19/joan-didion-on-keeping-a-notebook/.
Buchanan, B., Lohn, A., Musser, M., & Sedova, K. (2021, May). Truth, Lies, and Automation: How Language Models Could Change Disinformation. Center for Security and Emerging Technology. https://cset.georgetown.edu/publication/truth-lies-and-automation/.
Joshi, M., Choi, E., Weld, D. S., & Zettlemoyer, L. (2017). TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), 1, 1601–1611. https://arxiv.org/abs/1705.03551. https://doi.org/10.48550/arxiv.1705.03551.
Papers With Code. (2022). TriviaQA Benchmark (Question Answering). Papers With Code. https://paperswithcode.com/sota/question-answering-on-triviaqa.
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Won, H., Sutton, C. C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., Prabhakaran, V., … Fiedel, N. (2022). PaLM: Scaling Language Modeling with Pathways. ArXiv, 3. https://arxiv.org/pdf/2204.02311v3.pdf.
Papers With Code. (2022). Machine Learning Datasets. Papers With Code. https://paperswithcode.com/datasets.
Berruti, F., Nel, P., & Whiteman, R. (2020, April 29). An executive primer on artificial general intelligence. McKinsey. https://www.mckinsey.com/business-functions/operations/our-insights/an-executive-primer-on-artificial-general-intelligence.
Friedman, L. (2018, February 3). MIT AGI: Artificial General Intelligence. YouTube. https://www.youtube.com/watch?v=-GV_A9Js2nM&t=76s.
Turing, A. M. (1950). I.—COMPUTING MACHINERY AND INTELLIGENCE. Mind, LIX (236), 433–460. https://doi.org/10.1093/MIND/LIX.236.433.
Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? FAccT 2021 - Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623. https://doi.org/10.1145/3442188.3445922.
Weinberg, J. (2020, July 30). Philosophers On GPT-3 (updated with replies by GPT-3). Daily Nous. https://dailynous.com/2020/07/30/philosophers-gpt-3/.
Chalmers in [59].
Kirk, R. (2021). Zombies. In E. N. Zalta (Ed.), Stanford Encyclopedia of Philosophy (Spring). https://plato.stanford.edu/archives/spr2021/entries/zombies/.
Bender, E. M., & Koller, A. (2020). Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. 5185–5198. https://doi.org/10.18653/V1/2020.ACL-MAIN.463.
Searle, J. R. (1980). Minds, brains, and programs. Behavioral and Brain Sciences, 3(3), 417–424. https://doi.org/10.1017/S0140525X00005756.
Cole, D. (2020). The Chinese Room Argument. In E. N. Zalta (Ed.), Stanford Encyclopedia of Philosophy (Winter). https://plato.stanford.edu/entries/chinese-room/
Kirk, R., & Squires, R. (1974). Zombies v. Materialists. Proceedings of the Aristotelian Society, 48, 135–163.
Kirk, R. (2021). Zombies. In E. N. Zalta (Ed.), Stanford Encyclopedia of Philosophy (Spring). https://plato.stanford.edu/archives/spr2021/entries/zombies/.
Chalmers, D. J. (1995). Facing up to the problem of consciousness. Journal of Consciousness Studies, 2(3), 200–219.
Chalmers, D. J. (1996). The conscious mind: in search of a fundamental theory. Oxford University Press.
Li, B. Z., Nye, M., & Andreas, J. (2021). Implicit Representations of Meaning in Neural Language Models. ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference, 1813–1827. https://doi.org/10.48550/arxiv.2106.00737.
Piantadosi, S. T., & Hill, F. (2022). Meaning without reference in large language models. ArXiv. https://doi.org/10.48550/arxiv.2208.02957.
It bears noting it is only the imputation of meaning and intentionality to computer program execution that Searle disputes. He acknowledges that "it might be possible to produce consciousness, intentionality, and all the rest of it using some other sorts of chemical principles than those that human beings use" [63].
Davis, C. P., & Yee, E. (2021). Building semantic memory from embodied and distributional language experience. Wiley Interdisciplinary Reviews: Cognitive Science, 12(5). https://doi.org/10.1002/WCS.1555.
Boden, M. A. (2014). GOFAI. In K. Frankish, W. M. Ramsey, & K. Frankish (Eds.), The Cambridge Handbook of Artificial Intelligence. Cambridge University Press. https://doi.org/10.1017/CBO9781139046855.007.
Tolon, F. (2021, February 21). GPT-3 Understands Nothing. Medium. https://medium.com/swlh/gpt-3-understands-nothing-1d6f6a13cab2.
Bostrom, N. (2014). Superintelligence : paths, dangers, strategies (First edit). Oxford, England : Oxford University Press.
Cohen, M. K., Hutter, M., & Osborne, M. A. (2022). Advanced artificial agents intervene in the provision of reward. AI Magazine. https://doi.org/10.1002/AAAI.12064.
Ongweso Jr, E. (2022, September 13). Google Deepmind Researcher Co-Authors Paper Saying AI Will Eliminate Humanity. Vice. https://www.vice.com/en/article/93aqep/google-deepmind-researcher-co-authors-paper-saying-ai-will-eliminate-humanity.
Raji, I. D., Kumar, I. E., Horowitz, A., & Selbst, A. D. (2022, June). The Fallacy of AI Functionality. 2022 ACM Conference on Fairness, Accountability, and Transparency. https://doi.org/10.1145/3531146.
European Parliament. Committee on Legal Affairs. (2017). The future of robotics and artificial intelligence in Europe. European Commission, Digital Single Market. https://www.europarl.europa.eu/doceo/document/A-8-2017-0005_EN.html.
Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., Chi, E. H., Hashimoto, T., Vinyals, O., Liang, P., Dean, J., & Fedus, W. (2022). Emergent Abilities of Large Language Models. ArXiv. https://doi.org/10.48550/arxiv.2206.07682.
Hutter, M. (2020, August 27). GPT-3 and AGI Generative Pre-trained Transformer & Artificial General Intelligence. https://www.hutter1.net/.
DeepMind. (2022). AlphaFold. DeepMind. https://www.deepmind.com/research/highlighted-research/alphafold.
OpenAI. (2021, January 5). DALL·E: Creating Images from Text. OpenAI. https://openai.com/blog/dall-e/.
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision. ArXiv. https://doi.org/10.48550/ARXIV.2103.00020.
Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon, R., Chatterji, N., Chen, A., Creel, K., Davis, J. Q., Demszky, D., … Liang, P. (2021). On the Opportunities and Risks of Foundation Models. ArXiv. https://doi.org/10.48550/arxiv.2108.07258.
The Stack. (2022, May 26). Large language models to increasingly become a platform. The Stack. https://thestack.technology/large-language-models-platform-nvidia/.
The Economist. (2022, June 11). Huge “foundation models” are turbo-charging AI progress. The Economist. https://www.economist.com/interactive/briefing/2022/06/11/huge-foundation-models-are-turbo-charging-ai-progress.
Hobbhahn, M., & Besiroglu, T. (2022, June 27). Trends in GPU price-performance. Epoch. https://epochai.org/blog/trends-in-gpu-price-performance.
Wójcik, M. A. (2022). Foundation Models in Healthcare: Opportunities, Biases and Regulatory Prospects in Europe. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 13429 LNCS, 32–46. https://doi.org/10.1007/978-3-031-12673-4_3/COVER.
Dickson, B. (2022, May 21). Don’t expect large language models like the next GPT to be democratized. TNW. https://thenextweb.com/news/dont-expect-large-language-models-like-the-next-gpt-to-be-democratized.
Luitse, D., & Denkena, W. (2021). The great transformer: Examining the role of large language models in the political economy of AI. Big Data and Society, 8(2). https://doi.org/10.1177/20539517211047734.
Chen, M., Tworek, J., Jun, H., Yuan, Q., Ponde de Oliveira Pinto, H., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., … Zaremba, W. (2021). Evaluating Large Language Models Trained on Code. ArXiv. https://arxiv.org/abs/2107.03374.
Dohmke, T. (2022, June 21). GitHub Copilot is generally available to all developers. The GitHub Blog. https://github.blog/2022-06-21-github-copilot-is-generally-available-to-all-developers/.
Dickson, B. (2022, July 10). GitHub Copilot is the first real product based on large language models. TNW. https://thenextweb.com/news/github-copilot-large-language-models.
NVIDIA. (2022, September). BioNeMo Drug Discovery Cloud Service. NVIDIA. https://www.nvidia.com/en-us/gpu-cloud/bionemo/.
Stern, A. (2022, September 20). NVIDIA Expands Large Language Models to Biology . NVIDIA Blog. https://blogs.nvidia.com/blog/2022/09/20/bionemo-large-language-models-drug-discovery/.
Braswell, L. M. (2022, July 15). Overview & Applications of Large Language Models (LLMs). Leigh Marie’s Newsletter. https://leighmariebraswell.substack.com/p/overview-and-applications-of-large.
He, P. (2022, August 21). Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization . Papers With Code. https://paperswithcode.com/paper/z-code-a-pre-trained-language-model-optimized.
He, H., & SyncedReview. (2022, August 25). Microsoft’s Parameter-Efficient Z-Code++ Language Model Beats the 200x Larger GPT3–175B on Abstractive Text Summarization. Medium. https://medium.com/syncedreview/microsofts-parameter-efficient-z-code-language-model-beats-the-200x-larger-gpt3-175b-on-5c05f3dbf1a6.
Toews, R. (2022, March 27). A Wave Of Billion-Dollar Language AI Startups Is Coming. Forbes. https://www.forbes.com/sites/robtoews/2022/03/27/a-wave-of-billion-dollar-language-ai-startups-is-coming/?sh=3623b4d82b14.
Merali, Z. (2022). AlphaFold developers win US$3-million Breakthrough Prize. Nature, 609(7929), 889. https://doi.org/10.1038/D41586-022-02999-9.
Schatten, J. (2022, September 14). Will Artificial Intelligence Kill College Writing? Chronicle of Higher Education. https://www.chronicle.com/article/will-artificial-intelligence-kill-college-writing?
Harwell, D. (2022, September 2). He used AI art from Midjourney to win a fine-arts prize. Did he cheat? The Washington Post. https://www.washingtonpost.com/technology/2022/09/02/midjourney-artificial-intelligence-state-fair-colorado/.
Wikipedia contributors. (2022, October 12). Symbolic artificial intelligence. Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Symbolic_artificial_intelligence.
Buckner, C., & Garson, J. (2019). Connectionism. In E. N. Zalta (Ed.), Stanford Encyclopedia of Philosophy (Fall). https://plato.stanford.edu/archives/fall2019/entries/connectionism/.
Raschka, S. (2019). A Brief Summary of the History of Neural Networks and Deep Learning. https://stat.wisc.edu/~sraschka/teaching/stat479-ss2019/.
Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., Dos, A., Costa, S., Fazel-Zarandi, M., Sercu, T., Candido, S., Rives, A., & Ai, M. (2022). Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv, 2022.07.20.500902. https://doi.org/10.1101/2022.07.20.500902.
Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C. L., Ma, J., & Fergus, R. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences of the United States of America, 118(15). https://doi.org/10.1073/PNAS.2016239118.
Lin, H.-Y. (2022). Large-Scale Artificial Intelligence Models. Computer. https://doi.org/10.1109/MC.2022.3151419.
Smith, C., Jie, T., & Leahy, C. (2022, January 26). Wu Dao 2.0 with its lead creator, Tang Jie. Eye on AI. https://www.gwern.net/docs/ai/scaling/moe/2022-01-26-eyeonai-tangjiewudaointerview.pdf.
Romero, A. (2021, June 5). Wu Dao 2.0: A Monster of 1.75 Trillion Parameters. Medium. https://towardsdatascience.com/gpt-3-scared-you-meet-wu-dao-2-0-a-monster-of-1-75-trillion-parameters-832cd83db484.
Verified from https://min.news/en/entertainment/9558eb2c2f6d1543a2e0f2a709cc53c5.html and confirmed by a Chinese-American friend.
Liu, J. (2021, June 21). AI-powered virtual student enrols at Chinese university. Times Higher Education (THE). https://www.timeshighereducation.com/news/ai-powered-virtual-student-enrols-chinese-university.
Grossman, G. (2021, September 25). We are sleepwalking into AI-augmented work. VentureBeat. https://venturebeat.com/ai/we-are-sleepwalking-into-ai-augmented-work/.
Nolan, B. (2022, October 17). Artists: AI Image Generators Can Make Copycat Images in Seconds. Business Insider. https://www.businessinsider.com/ai-image-generators-artists-copying-style-thousands-images-2022-10.
Grossman, G. (2021, September 25). We are sleepwalking into AI-augmented work. VentureBeat. https://venturebeat.com/ai/we-are-sleepwalking-into-ai-augmented-work/.
Korinek, A., & Juelfs, M. (2022). Preparing for the (non-existent?) future of work. Brookings. https://www.brookings.edu/series/center-on-regulation-and-markets-working-papers/.
Bashir, D. (2021, December 30). GPT-3, Foundation Models, and AI Nationalism. Last Week in AI. https://lastweekin.ai/p/gpt-3-foundation-models-and-ai-nationalism.
Yao, M. (2022, June 17). 10 Leading Language Models For NLP In 2022. TopBots. https://www.topbots.com/leading-nlp-language-models-2020/.

GPT-3 and the rise of foundation models

Joseph Boland

Writer, Futurist, Microsoft 365 Consultant