AI trends 2022 — I — Large Language models

AI trends 2022 — I — Large Language models

The language model is the “brain” of language understanding. These AI models rely on machine learning to determine how related phrases, sentences, or paragraphs are. It learns and understands the language by ingesting a large amount of text and building a statistical model that understands the probability of phrases, sentences, or paragraphs related to each other.

Language models are getting larger while becoming more refined in understanding language. Artificial intelligence can process and generate more human-like interactions while using semantic techniques that improve the quality of its results.

Another benefit of these large language models is that it requires just a few training examples to fine-tune the model for a new problem. Previously, AI solutions would require a lot of human-labeled data, which is difficult and expensive to create. We can now achieve the same or better results with larger AI models with just one or a few training examples. This will reduce the cost of artificial intelligence, and we should expect many business processes to be automated.

As language models grow, their capabilities change in unexpected ways

GPT-3(Generative Pre-trained Transformer 3) has 175 billion parameters and was trained on 570 gigabytes of text. For comparison, its predecessor, GPT-2, was over 100 times smaller, at 1.5 billion parameters. This increase in scale drastically changes the behavior of the model — GPT-3 is able to perform tasks it was not explicitly trained on, like translating sentences from English to French, with few to no training examples. This behavior was mostly absent in GPT-2. Furthermore, for some tasks, GPT-3 outperforms models that were?explicitly?trained to solve those tasks, although in other tasks it falls short. Workshop participants said they were surprised that such behavior emerges from simple scaling of data and computational resources and expressed curiosity about what further capabilities would emerge from the further scale.

GPT-3’s uses and their downstream effects on the economy are unknown

GPT-3 has an unusually large set of capabilities, including text summarization, chatbots, search, and code generation. Future users are likely to discover even more capabilities. This makes it difficult to characterize all possible uses (and misuses) of large language models in order to forecast the impact GPT-3 might have on society. Furthermore, it’s unclear what effect highly capable models will have on the labor market. This raises the question of when (or what) jobs could (or should) be automated by large language models.

No alt text provided for this image


Is GPT-3 intelligent, and does it matter?

Unlike chess engines, which solve a specific problem, humans are “generally” intelligent and can learn to do anything from writing poetry to playing soccer to filing tax returns. In contrast to most current AI systems, GPT-3 is edging closer to such general intelligence, workshop participants agreed.?However, participants differed in terms of where they felt GPT-3 fell short in this regard.

Some participants said that GPT-3 lacked intentions, goals, and the ability to understand cause and effect — all hallmarks of human cognition. On the other hand, some noted that GPT-3 might not need to understand to successfully perform tasks — after all, a non-French speaker recently won the French Scrabble championship.

Future models won’t be restricted to learning just from language

GPT-3 was trained primarily on text. Participants agreed that future language models would be trained on data from other modalities (e.g., images, audio recordings, videos, etc.) to enable more diverse capabilities, provide a more robust learning signal, and increase learning speed. In fact, shortly after the workshop, OpenAI took a step in this direction and released a model called DALL-E, a version of GPT-3 that generates images from text descriptions. One surprising aspect of DALL-E is its ability to sensibly synthesize visual images from whimsical text descriptions. For example, it can generate a convincing rendition of “a baby daikon radish in a tutu walking a dog.”

Furthermore, some workshop participants also felt future models should be embodied — meaning that they should be situated in an environment they can interact with. Some argued this would help models learn cause and effect the way humans do, through physically interacting with their surroundings.

Disinformation is a real concern, but several unknowns remain

Models like GPT-3 can be used to create false or misleading essays, tweets, or news stories. Still, participants questioned whether it’s easier, cheaper, and more effective to hire humans to create such propaganda. One held that we could learn from similar calls of alarm when the photo-editing software program Photoshop was developed. Most agreed that we need a better understanding of the economies of automated versus human-generated disinformation before we understand how much of a threat GPT-3 poses.

Future models won’t merely reflect the data — they will reflect our chosen values

GPT-3 can exhibit undesirable behavior, including known racial, gender, and religious biases. Participants noted that it’s difficult to define what it means to mitigate such behavior in a universal manner — either in the training data or in the trained model — since appropriate language use varies across contexts and cultures. Nevertheless, participants discussed several potential solutions, including filtering the training data or model outputs, changing the way the model is trained, and learning from human feedback and testing. However, participants agreed there is no silver bullet and further cross-disciplinary research is needed on what values we should imbue these models with and how to accomplish this.

We should develop norms and principles for deploying language models now

Who should build and deploy these large language models? How will they be held accountable for possible harms resulting from poor performance, bias, or misuse? Workshop participants considered a range of ideas: Increase resources available to universities so that academia can build and evaluate new models, legally require disclosure when AI is used to generate synthetic media, and develop tools and metrics to evaluate possible harms and misuses.


Pervading the workshop conversation was also a sense of urgency — organizations developing large language models will have only a short window of opportunity before others develop similar or better models. Those currently on the cutting edge, participants argued, have a unique ability and responsibility to set norms and guidelines that others may follow.

Annoberry with its unique expertise and capable workforce is an expert when it comes to data annotation which leads to such AI models.

Visit us at?annoberry.com or follow us on LinkedIn for more at?https://www.dhirubhai.net/company/annoberry.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了