What is Generative AI?

What is Generative AI?

Generative AI (GenAI) is a type of artificial intelligence technology that can produce various types of content, including text, imagery, audio and synthetic data . The recent buzz around generative AI has been driven by the simplicity of new user interfaces for creating high-quality text, graphics and videos in a matter of seconds.

The technology, it should be noted, is not brand-new. Generative AI was introduced in the 1960s in chatbots. But it was not until 2014, with the introduction of generative adversarial networks , or GANs -- a type of machine learning algorithm -- that generative AI could create convincingly authentic images, videos and audio of real people.

On the one hand, this newfound capability has opened up opportunities that include better movie dubbing and rich educational content. It also unlocked concerns about deepfakes -- digitally forged images or videos -- and harmful cybersecurity attacks on businesses, including nefarious requests that realistically mimic an employee's boss.

Two additional recent advances that will be discussed in more detail below have played a critical part in generative AI going mainstream: transformers and the breakthrough language models they enabled. Transformers are a type of machine learning that made it possible for researchers to train ever-larger models without having to label all of the data in advance. New models could thus be trained on billions of pages of text, resulting in answers with more depth. In addition, transformers unlocked a new notion called attention that enabled models to track the connections between words across pages, chapters and books rather than just in individual sentences. And not just words: Transformers could also use their ability to track connections to analyze code, proteins, chemicals and DNA.

The rapid advances in so-called large language models (LLMs ) -- i.e., models with billions or even trillions of parameters -- have opened a new era in which generative AI models can write engaging text, paint photorealistic images and even create somewhat entertaining sitcoms on the fly. Moreover, innovations in multimodal AI enable teams to generate content across multiple types of media, including text, graphics and video. This is the basis for tools like Dall-E that automatically create images from a text description or generate text captions from images.

These breakthroughs notwithstanding, we are still in the early days of using generative AI to create readable text and photorealistic stylized graphics. Early implementations have had issues with accuracy and bias, as well as being prone to hallucinations and spitting back weird answers . Still, progress thus far indicates that the inherent capabilities of this generative AI could fundamentally change enterprise technology how businesses operate. Going forward, this technology could help write code, design new drugs, develop products, redesign business processes and transform supply chains.

How does generative AI work?

Generative AI starts with a prompt that could be in the form of a text, an image, a video, a design, musical notes, or any input that the AI system can process. Various AI algorithms then return new content in response to the prompt. Content can include essays, solutions to problems, or realistic fakes created from pictures or audio of a person.

Early versions of generative AI required submitting data via an API or an otherwise complicated process. Developers had to familiarize themselves with special tools and write applications using languages such as Python.

Now, pioneers in generative AI are developing better user experiences that let you describe a request in plain language. After an initial response, you can also customize the results with feedback about the style, tone and other elements you want the generated content to reflect.

Generative AI models

Generative AI models combine various AI algorithms to represent and process content. For example, to generate text, various natural language processing techniques transform raw characters (e.g., letters, punctuation and words) into sentences, parts of speech, entities and actions, which are represented as vectors using multiple encoding techniques. Similarly, images are transformed into various visual elements, also expressed as vectors. One caution is that these techniques can also encode the biases, racism, deception and puffery contained in the training data.

Once developers settle on a way to represent the world, they apply a particular neural network to generate new content in response to a query or prompt. Techniques such as GANs and variational autoencoders (VAEs) -- neural networks with a decoder and encoder -- are suitable for generating realistic human faces, synthetic data for AI training or even facsimiles of particular humans.

Recent progress in transformers such as Google's Bidirectional Encoder Representations from Transformers (BERT ), OpenAI's GPT and Google AlphaFold have also resulted in neural networks that can not only encode language, images and proteins but also generate new content.?

How neural networks are transforming generative AI

Researchers have been creating AI and other tools for programmatically generating content since the early days of AI. The earliest approaches, known as rule-based systems and later as "expert systems," used explicitly crafted rules for generating responses or data sets.

Neural networks, which form the basis of much of the AI and machine learning applications today, flipped the problem around. Designed to mimic how the human brain works, neural networks "learn" the rules from finding patterns in existing data sets. Developed in the 1950s and 1960s, the first neural networks were limited by a lack of computational power and small data sets. It was not until the advent of big data in the mid-2000s and improvements in computer hardware that neural networks became practical for generating content.

The field accelerated when researchers found a way to get neural networks to run in parallel across the graphics processing units (GPUs) that were being used in the computer gaming industry to render video games. New machine learning techniques developed in the past decade, including the aforementioned generative adversarial networks and transformers , have set the stage for the recent remarkable advances in AI-generated content.

What are Dall-E, ChatGPT and Gemini?

ChatGPT, Dall-E and Gemini (formerly Bard) are popular generative AI interfaces.

Dall-E. Trained on a large data set of images and their associated text descriptions, Dall-E is an example of a multimodal AI application that identifies connections across multiple media, such as vision, text and audio. In this case, it connects the meaning of words to visual elements. It was built using OpenAI 's GPT implementation in 2021. Dall-E 2, a second, more capable version, was released in 2022. It enables users to generate imagery in multiple styles driven by user prompts.

ChatGPT. The AI-powered chatbot that took the world by storm in November 2022 was built on OpenAI's GPT-3.5 implementation. OpenAI has provided a way to interact and fine-tune text responses via a chat interface with interactive feedback. Earlier versions of GPT were only accessible via an API. GPT-4 was released March 14, 2023. ChatGPT incorporates the history of its conversation with a user into its results, simulating a real conversation. After the incredible popularity of the new GPT interface, Microsoft announced a significant new investment into OpenAI and integrated a version of GPT into its Bing search engine.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了