NLP, GPT & Future of Design, Part 1
Sandeep Ozarde
Founder Director at Leaf Design; PhD Student at University of Hertfordshire
Natural language processing (NLP) refers to the branch of computer science—and more specifically, the branch of?artificial intelligence or AI—concerned with giving computers the ability to understand text and spoken words in much the same way human beings can.
NLP combines computational linguistics—rule-based modeling of human language—with statistical, machine learning, and deep learning models. Together, these technologies enable computers to process human language in the form of text or voice data and to ‘understand’ its full meaning, complete with the speaker or writer’s intent and sentiment.
NLP drives computer programs that translate text from one language to another, respond to spoken commands, and summarize large volumes of text rapidly—even in real time. There’s a good chance you’ve interacted with NLP in the form of voice-operated GPS systems, digital assistants, speech-to-text dictation software, customer service chatbots, and other consumer conveniences. But NLP also plays a growing role in enterprise solutions that help streamline business operations, increase employee productivity, design, creativity and simplify mission-critical business processes.
Heinz asked AI image-maker DALL·E 2 to create "ketchup" and variations with various aesthetic modifiers ("renaissance," "impressionism," "street art," and so on). Though the results varied in style, they were all unmistakably Heinz-centric, notably referencing the distinctive shape of the Heinz label. Aside from being a fun stunt, it was also a powerful flex that demonstrated the power of the Heinz brand. Rethink, a Canadian agency decided to experiment with artificial intelligence. DALL·E 2, an AI text-to-image tool, generated dozens of images in seconds in response to prompts like "ketchup bottle on table" or "impressionist style painting of a bottle of ketchup."
Credit: AI art composite from Ogilvy Paris, Dentsu Creative Portugal, Wunderman Thompson, Omneky, Rethink, and TBWA/Melbourne.
In the realm of Artificial Intelligence, a recently developed technology known as Generative Pre-trained Transformers 3 (GPT-3) has generated a great deal of excitement and buzz among industry professionals (GPT-3). Recently, the most popular buzzword in the field of artificial intelligence technology has been GPT-3. This technology, which has applications in web design and development, has also become a popular topic of conversation. This revolutionary new language generator, GPT-3, may appear to be just another announcement of a little-known piece of technology; however, this significant shift will have profound repercussions for both businesses and society in the foreseeable future.?
OpenAI released the beta version of GTP-3 in June 2020. GTP-3 is the most advanced artificial intelligence language model ever made available to the general public. The company carries out research in the field of artificial intelligence, which is viewed as a competitor to DeepMind. The company's stated mission is to promote and develop friendly AI in a way that benefits humanity as a whole. In 2015, Elon Musk, Sam Altman, and several other individuals, who collectively pledged US$1 billion, established the organization in San Francisco. In February 2018, Musk stepped down from his role as a board member, but he continued to contribute financially. Microsoft made an investment of US$1 billion in OpenAI LP in 2019. Microsoft Corp is in advanced talks for a new round of funding in OpenAI, according to a person familiar with the matter, as the software giant seeks to incorporate artificial intelligence into its products further.
In 2016, 20th Century Fox partnered with IBM to develop an?AI-created movie trailer?for its horror film?Morgan. GPT-3 is indeed the latest and arguably the most powerful member in a family of deep learning NLP models, including?Transformer?(2017),?BERT?(2018),?GPT series?(2018, 2019, 2020) and?T5?(2019) as its superstars. Based on them, the research community has proposed numerous variations and improvements, approaching or even surpassing human performance on many NLP benchmark tasks.
GPT-3 is a neural network machine learning model trained with internet data to generate any type of text. It was developed by OpenAI and requires only a small amount of text as input to generate vast quantities of relevant and sophisticated machine-generated text.?For this learning process, DALL·E 2 uses the already existing technology of CLIP (Contrastive Language-Image Pre-training), which was also developed by OpenAI. CLIP manages to find matching text descriptions for an image based on text-image pairs on the internet.
This allows you to create a variety of different images with different text inputs: Eg. text input: "Astronaut walking on the Mars surface".
The GPT-3 deep learning neural network consists of over 175 billion machine learning parameters. Before GPT-3, Microsoft's Turing NLG model, which had 10 billion parameters, was the largest trained language model in terms of size. GPT-3 is the largest neural network ever developed as of early 2021. As a result, GPT-3 is superior to all previous models when it comes to producing text that appears to have been written by a person.
GPT-3 is a text predictor in its most fundamental form; its output is a statistically plausible response to the input it is provided with, which is founded on the data it was trained on in the past. However, there are some people who believe that GPT-3 is not the best artificial intelligence system for answering questions and summarising text. Although GPT-3 performs poorly in comparison to the SOTA (state-of-the-art) methods for each individual NLP task, it is significantly more general than any other system that has come before it, and future systems will be designed to be similar to GPT-3.
Research in natural language processing (NLP) originated as research in Machine Translation (MT). The first period beginning in 1950 and ending in 1969 is referred to as the first era. It was hoped that translation could rapidly build on the significant achievements that computers had achieved in deciphering codes during World War II. On both sides of the Cold War, researchers attempted to develop systems capable of translating other nations' scientific output. However, at the beginning of this era, almost nothing was known about the structure of human language, artificial intelligence, or machine learning.
The second era of natural language processing (NLP), which lasted from 1970 until 1992, was marked by the developing of a whole series of NLP demonstration systems. These systems displayed a high level of sophistication and depth in managing phenomena in human languages, such as syntax and reference. These systems included SHRDLU, which Terry Winograd developed; LUNAR, which Bill Woods developed; systems developed by Roger Schank, such as SAM; Gary Hendrix's LIFER; and GUS, which Danny Bobrow developed. These were all rule-based systems that had been hand-built, but they began to model and use some of the complexity of how humans understand language. Some of the designs were even put into production and used for things like operational database querying. Linguistics and knowledge-based artificial intelligence were advancing rapidly during this era. A new generation of hand-built systems emerged in the second decade of this era. These systems had a clear separation between declarative linguistic knowledge and procedural processing, and they benefited from developing a range of more modern linguistic theories.
However, work direction changed markedly in the third era, from roughly 1993 to 2012. In this period, the digital text became abundantly available. The clear direction was to develop algorithms that could achieve some level of language understanding over large amounts of natural text, and that used the existence of this text to help provide this ability. This led to a fundamental reorientation of the field around empirical machine learning models of NLP, an orientation that still dominates the field today. The period from 2013 to the present extended the empirical orientation of the third era, but the work has been enormously changed by the introduction of deep learning or artificial neural network methods. This approach represents words and sentences by a position in a (several hundred or thousand-dimensional) real-valued vector space.
Transformers were introduced in 2017 by a team at Google Brain and are increasingly the model of choice for NLP problems, replacing RNN models such as Long Short-Term Memory (LSTM). The additional training parallelization allows training on larger datasets. This led to the development of pretrained systems such as BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer), which were trained with large language datasets, such as the Wikipedia Corpus and Common Crawl, and can be fine-tuned for specific tasks.
The IBM Deep Blue’s victory wasn’t so much a triumph of AI but a kind of death knell. It was a high-water mark for old-school computer intelligence, the laborious handcrafting of endless lines of code, which would soon be eclipsed by a rival form of AI: the neural net—in particular, the technique known as “deep learning.” For all the weight it threw around, Deep Blue was the lumbering dinosaur about to be killed by an asteroid; neural nets were the little mammals that would survive and transform the planet. Yet even today, deep into a world chock-full of everyday AI, computer scientists are still arguing whether machines will ever truly “think.” And when it comes to answering that question, Deep Blue may get the last laugh.
Transformers typically undergo self-supervised learning involving unsupervised pretraining followed by supervised fine-tuning. Pretraining is typically done on a larger dataset than fine-tuning, due to the limited availability of labelled training data. Tasks for pretraining and fine-tuning commonly include 1. Language modelling, 2. Next-sentence prediction, 3. Question answering, 4. Reading comprehension, 5. Sentiment analysis, 6. Paraphrasing.
The transformer has had great success in Natural Language Processing (NLP), for example the tasks of machine translation and time series prediction. Many pretrained models such as GPT-2, GPT-3, BERT, XLNet, and RoBERTa demonstrate the ability of transformers to perform a wide variety of such NLP-related tasks and have the potential to find real-world applications. These may include 1. Machine translation, 2. Document summarization, 3. Document generation, 4. Named entity recognition (NER), 5. Biological sequence analysis, 6. Video understanding.
In 2020, it was shown that the transformer architecture, more specifically GPT-2, could be tuned to play chess. Transformers have been applied to image processing with results competitive with convolutional neural networks. The transformer model has been implemented in standard deep learning frameworks such as TensorFlow and PyTorch. Transformers is a library produced by Hugging Face that supplies transformer-based architectures and pretrained models.
OpenAI launched DALL·E 2, a project that allowed users to generate art from strings of text and showed the fast advances in that segment of AI technology. Microsoft announced it was integrating DALL·E 2 with various products, including Microsoft Design, a new graphic design app, and the image creator for the search app Bing. Alphabet Inc.'s Google and Microsoft have each developed their image-generating models—but have been slower to release them to the public because of ethical concerns.
With the advent of DALL·E 2 and open-source alternatives like?Stable Diffusion?in recent years, AI image generators have exploded in popularity. In September, OpenAI said that more than 1.5 million users were actively creating over 2 million images daily with DALL·E 2, including artists, creative directors and authors. Brands such as Stitch Fix, Nestlé and Heinz have piloted DALL-E 2 for ad campaigns and other commercial use cases, while certain architectural firms have used DALL-E 2 and tools akin to it to?conceptualize?new buildings.
DALL·E 2 is a new?neural network algorithm?that creates a picture from a short phrase or sentence you provide. In principle, anyone with enough resources and expertise can make a system like this. Google Research recently announced a unique, similar text-to-image plan, and one independent developer is publicly developing their version that anyone can try right now on the web. However, it's not yet as good as DALL·E 2 or Google's system. DALL·E 2 has learned the relationship between images and the text used to describe them. It uses a process called "diffusion," which starts with a pattern of random dots and gradually alters that pattern towards an image when it recognizes specific aspects of that image.
DALL·E 2's edit feature already enables changes within a generated or uploaded image — a capability known as Inpainting. Now, with Outpainting, users can extend the original image, creating large-scale images in any aspect ratio. Outpainting takes into account the image's existing visual elements — including shadows, reflections, and textures — to maintain the context of the original image. DALL·E 2 can expand images beyond what's in the original canvas, creating expansive new compositions. DALL·E 2 can make realistic edits to existing images from a natural language caption, and it can add and remove elements while taking into account shadows, reflections, and textures. DALL·E 2 can take an image and create different variations inspired by the original.
领英推荐
Artists have already created remarkable images with the new Outpainting feature, which helped us better understand its capabilities. GPT-3 continues to function as a language predictor. It does not think and does not have a mind of its own, and it can only generate content based on the information fed into it. DALL·E 2 is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions using a dataset of text–image pairs. DALL·E 2 has diverse capabilities, including creating anthropomorphized versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images.
Since GPT-4 is expected to have around 100 trillion parameters and will be five hundred times larger than GPT-3, it gives room for some hyper-inflated expectations. Since the human brain has approximately 100 trillion synapses, GPT-4 too will have as many synapses generating hopes of having a human-like language generator in the market. This brings the old debate around the efficiency of large models again into the limelight. Big companies like Google and Facebook, too, are no exception to this trap. When OpenAI's Jared Kaplan and colleagues reported that scaling the number of parameters always results in improved performance, they took to its face value only to see smaller but efficient models beating out PaLM and GSLM. In other cases, like MT-NLG, it doesn't stand up to any benchmark compared to smaller models like Gopher or Chinchilla.
OpenAI — which had access to a 10,000 Nvidia V100 supercomputer provided by Microsoft (although it hasn't been disclosed the exact amount of GPUs they used)— decided not to retrain GPT-3 after researchers found a mistake because it'd have been infeasible. Some gross calculations estimate a training cost of at least $4.6 million, which is out of reach for most companies — without including research and development costs, which would elevate the number to $10–30M.
Multimodal models are the deep learning models of the future. Because we live in a multimodal world, our brains are multisensory. Perceiving the world in only one mode simultaneously severely limits AI's ability to navigate and comprehend it. Making GPT-4 a text-only model could be an attempt to push language models to their limits, adjusting parameters like model and dataset size before moving on to the next generation of multimodal AI.
Sparsity:?Sparse models that use conditional computation in different parts of the model to process other inputs have been successful. Such models scale quickly beyond the 1 trillion parameter mark without incurring high computing costs. However, the benefits of MoE approaches taper off on very large models. GPT-4, like GPT-2 and GPT-3, will be a dense model. In other words, all parameters will be used to process any given input.
Optimization:?Assuming that GPT-4 could be larger than GPT-3, the number of training tokens required to be compute-optimal (according to DeepMind's findings) could be around 5 trillion–an order of magnitude greater than current datasets. The number of FLOPs required to train the model to achieve minimal training loss would be 10–20x that of GPT-3. OpenAI will focus on optimizing variables than scaling the model.?
In alignment:?The OpenAI's north star is a beneficial AGI. The OpenAI will likely build on the learnings from InstructGPT models, which are trained with humans in the loop. InstructGPT was deployed as the default language model on OpenAI's API and is much better at following user intentions than GPT-3. It also makes them more truthful and less toxic, using techniques developed through alignment research. However, the alignment was limited to OpenAI employees and English-speaking labellers. GPT-4 is likely to be more aligned with humans than GPT-3.
InstructGPT is better than GPT-3 at following English instructions. GPT-3 models aren’t trained to follow user instructions. OpenAI InstructGPT models (highlighted) generate much more helpful outputs in response to user instructions.
Future advancements in design, art, movies, fashion, architecture and many other fields will be made possible by Human + AI tools. A large neural network can be programmed to carry out various text generation tasks using language, as demonstrated by OpenAI's GPT-3. Image GPT demonstrated that a similar kind of neural network could also be used to produce images with high levels of fidelity. We build on these findings to demonstrate that manipulating visual concepts via language is now feasible.
"DALL·E 2 is a handy assistant that amplifies what a person can normally do, but it is dependent on the creativity of the person using it. An artist or someone more creative can create some really interesting stuff," says Aditya Ramesh.
DALL·E 2 is a transformer language model, like GPT-3. It receives both the text and the image as a single stream of data containing up to 1280 tokens and is trained using maximum likelihood to generate all of the tokens sequentially. This training procedure enables DALL·E 2 to generate an image from scratch and regenerate any rectangular region of an existing image that extends to the bottom-right corner consistent with the text prompt.
A token is any symbol from a discrete vocabulary; for humans, each English letter is a token from a 26-letter alphabet. DALL·E 2's vocabulary has tokens for both text and image concepts. Specifically, each image caption is represented using a maximum of 256 BPE-encoded tokens with a vocabulary size of 16384, and the image is represented using 1024 tokens with a vocabulary size of 8192.?
The images are preprocessed to 256x256 resolution during training. Similar to VQ-VAE (VQ-VAE stands for Vector Quantized Variational Autoencoder), each image is compressed to a 32x32 grid of discrete latent codes using a discrete VAE that we pretrained using a continuous relaxation. The system found that training using relaxation obviates the need for an explicit codebook, EMA loss, or tricks like dead code revival and can scale up to large vocabulary sizes.
DALL·E 2, created by Aditya Ramesh, Mikhail Pavlov, and Scott Gray, worked together to scale the model to 12 billion parameters and designed the infrastructure to draw samples from the model. The software's name is a portmanteau of the names of the animated robot Pixar character WALL-E and the Spanish surrealist artist Salvador Dali.?
The near future of technological advancements will not render human designers or creativity obsolete. Instead, Designers gain a team of assistants to handle their most mundane tasks, allowing them to concentrate on curating and enhancing good ideas or developing their own. In the same way that GPT- 3's summarization, explanation, and generation of Python code makes programming accessible to non-programmers, the iterative design enables non-designers to explore new avenues. Currently, a small business or individual designer has access to capabilities that were previously only available to large organizations.
There is always a dual aspect to disruptive technologies. Network computing's introduction in 1989 paved the way for the Internet. Tim Berners-Lee envisioned the Internet as a collaborative space for sharing information and facilitating communication. The Internet opened Pandora's box with unrestricted access to all knowledge and the dissolution of boundaries. In addition to its many advantages, the Internet is a conduit for disinformation, trolling, doxing, crime, threats, and traumatic content.
Researchers at OpenAI have now trained language models significantly better than GPT-3 at following user intentions while simultaneously making the models more trustworthy and risk-free. These models, which are trained using a process known as InstructGPT, are trained with humans in the loop. This enables humans to use reinforcement to guide the behaviour of the models in ways that we want, thereby amplifying good results and inhibiting behaviours that are not desired. This marks a significant step forward in developing powerful AI systems that can fulfil the needs of humans.
OpenAI's new artificial intelligence tools can produce text that looks and reads just like humans wrote it because they have access to vast amounts of data and powerful computing resources. GPT-3 can produce original prose, poetry, articles, dialogue, computer code, memes, and other forms of writing because it is responsible for producing a significant portion of the news. The most recent additions to OpenAI's arsenal of artificial intelligence tools include a generative artificial intelligence system that is anchored in context and can generate texts when presented with a prompt or setting.?
OpenAI is revolutionizing how software developers will use artificial intelligence technologies in the future. OpenAI has given us the tools we have finally required to bring about the transformational changes necessary within our communities by utilizing GPT-3. Following this, the GPT-3 language models developed by OpenAI were made available for use in commercial settings, which resulted in an explosion of AI-powered content-generation tools. Later on, OpenAI's policy regarding generated content underwent some revisions, which resulted in GPT-3 being removed and replaced with alternative models. In point of fact, in a single demonstration found on the Internet, GPT-3 demonstrated how to create an app that looked and functioned similarly to the Instagram app by merely making use of a plug-in to Figma, which is software that is widely used to develop apps.?
It is pretrained because it was not designed with domain knowledge; however, it can perform tasks specific to a domain, such as translating a foreign language. You are free to instruct GPT-3 to use natural language in order to complete tasks differently, so long as it is cognizant of the fact that the tasks it is required to do are going to be completed in a manner that is analogous to how humans (by maintaining their distance) will complete them. In this procedure, the public data set acquires incremental knowledge regarding the methods and questions that are most likely to be posed, which enables GPT-3 to produce an answer that is more pertinent as you progress through the tutorial.?
It is possible to ask GPT-3 to become anything you want it to be; once it has trained a model, it could code like a programmer or write like a novelist. This is one of the many capabilities that GPT-3 possesses. GPT-3 can produce anything with language structures. It can answer questions and write essays, transcribe lengthy documents and convert them to other languages, jot down notes, and even produce computer code. It is not inaccurate to refer to GPT-3 as a new language model because it is exceptionally effective at creating language-structured content that is startlingly similar to the work produced by (human) professionals. GPT-3 is an autoregressive language model capable of learning and replicating the mannerisms of human writing. It has the potential to produce text in a way that is similar to that produced by humans.?
Although the language-generating capabilities of GPT-3 are being hailed as the best that artificial intelligence has ever seen, there are still several significant concerns that need to be addressed. Anyone who has seen the results of AI language knows that the results can vary, and the system's output undoubtedly seems to be a step up from what was previously seen. Even though designers and developers have offered a variety of responses to this topic, ranging from astonishment to horror, it is indisputable that the straightforward predictive model for language will have a significant impact on the current state of the art in artificial intelligence technology.
OpenAI has made its resources available to the public data set to assist in understanding how languages operate and where they are located in time, as well as to transform the content into the language that is most helpful for users. OpenAI claims it can apply a publicly available dataset to any language-related tasks, such as semantic search, summarization, sentiment analysis, content generation, translation, and so on, with just a handful of examples or by specifying their tasks in English.
In line with this, renowned programmer Sharif Shameem has demonstrated the use of GPT-3 to describe designs that it can then construct independently. This was accomplished, although GPT-3 has not been trained to generate code. According to the people who developed it, the GPT-3 model of OpenAI has been trained on approximately 45 terabytes of text data taken from various sources, including Wikipedia and books. Compared with other language models, the Full Version of the OpenAI GPT-3 is by far the largest model trained because it contains approximately 175 billion training parameters.
References: