The 4 Types Of Generative AI Transforming Our World
Bernard Marr
?? Internationally Best-selling #Author?? #KeynoteSpeaker?? #Futurist?? #Business, #Tech & #Strategy Advisor
Thank you for reading my latest article The 4 Types Of Generative AI Transforming Our World. Here at LinkedIn and at Forbes I regularly write about management and technology trends.
To read my future articles simply join my network by clicking 'Follow'. Also feel free to connect with me via Twitter, Facebook, Instagram, Podcast or YouTube.
The term generative AI refers to a relatively new field of AI that can create human-like content, from pictures and videos to poetry and even computer code.
To achieve this, several different techniques are used. These have mostly evolved over the last 10 years, building on earlier work carried out in the fields of deep learning, transformer models and neural networks.
All of them rely on data to effectively ‘learn’ how to generate content, but beyond that, they are built around quite different methodologies. Here’s my overview of some of the categories that they fall into, as well as the type of content they can be used to create.
?
Large Language Models
Large language models (LLMs) are the foundational technology behind breakthrough generative AI tools like ChatGPT, Claude and Google Gemini. Fundamentally, they are neural networks that are trained on huge amounts of text data, allowing them to learn the relationship between words and then predict the next word that should appear in any given sequence of words. They can then be further trained on specific texts related to specialized domains – known as ‘fine-tuning’ to enable them to carry out specific tasks.
Words are broken down into 'tokens,' which could be small, individual words, parts of longer words, or combinations of prefixes, suffixes and other linguistic elements that frequently appear together in text. The mathematical process of matrix transformation is then used to convert them into structured numerical data that can be analyzed by computers.
As well as creating text and computer code, LLMs have made it possible for computers to understand natural language inputs for many tasks, including language translation, sentiment analysis and other forms of generative AI such as text-to-image or text-to-voice. However, their use has created ethical concerns around bias, AI hallucination, misinformation, deepfakes and the use of intellectual property to train algorithms.
?
Diffusion Models
Diffusion models are widely used in image and video generation, and work via a process known as ‘iterative denoising’. Starting from a text prompt which the computer can use understand what it has to create an image of, random ‘noise’ is generated – you can think of this as starting to draw a picture by scribbling randomly on a piece of paper.
Gradually, the scribbles are then refined, using its training data to understand what features should be included in the final image. At each step, ‘noise’ is removed as the image is gradually adjusted to include the desired characteristics. Eventually, this leads to the creation of an entirely new image that matches the text prompt but hasn't already been found in the training data.
By following this process, today’s most advanced diffusion models, such as Stable Diffusion and Dall-E, can create photo-realistic images, as well as images that imitate paintings and drawings of any style. What’s more, they are increasingly able to generate videos, as recently demonstrated by OpenAI’s groundbreaking Sora model. ?
?
领英推荐
Generative Adversarial Networks
Generative Adversarial Networks (GANs) emerged in 2014 and quickly became one of the most effective models for generating synthetic content, both text and images. The basic principle involves pitting two different algorithms against each other. One is known as the ‘generator,’ and the other is known as the ‘discriminator,’ and both are given the task of getting better and better at out-foxing each other. The generator attempts to create realistic content, and the discriminator attempts to determine whether it is real or not. Each learns from the other, becoming better and better at its job until the generator knows how to create content that’s as close as possible to being ‘real.’ ?
Although pre-dating the large language models and diffusion models used in headline-grabbing tools like ChatGPT and Dall-E, GANs are still considered to be versatile and powerful tools for generating pictures, video, text and sound, and are widely used for computer vision and natural language processing tasks.
?
Neural Radiance Fields
Neural Radiance Fields (NeRFs) are the newest technology covered here, only emerging onto the scene in 2020. Unlike the other generative technologies, they are specifically used to create representations of 3D objects using deep learning. This means creating an aspect of an image that can't be seen by the ‘camera’ – for example, an object in the background of an image that's obscured by an object in the foreground or the rear aspect of an object that’s been pictured from the front.
This is done by predicting elements such as the volumetric properties of objects and mapping them to 3D spatial coordinates, using neural networks to model the geometry and properties such as the reflection of light around an object.
This allows, for example, for a two-dimensional image of an object – say, a building or a tree – to be recreated as a three-dimensional representation that can be viewed from any angle. This technique, pioneered by Nvidia, is being used to create 3D worlds that can be explored in simulations and video games, as well as visualizing robotics, architecture and urban planning.
Hybrid Models In Generative AI
One of the latest advancements in the field of generative AI is the development of hybrid models, which combine various techniques to create innovative content generation systems. These models draw on the strengths of different approaches, such as blending the adversarial training of Generative Adversarial Networks (GANs) with the iterative denoising of diffusion models to produce more refined and realistic outputs. By integrating Large Language Models (LLMs) with other neural networks, hybrid models can offer enhanced context and adaptability, leading to more accurate and contextually relevant results. This hybrid approach unlocks new possibilities for applications like text-to-image generation, where the fusion of different generative techniques leads to more complex and diverse outputs, as well as improved virtual environments. For example, DeepMind's AlphaCode combines the power of Large Language Models (LLMs) with reinforcement learning to generate high-quality computer code, demonstrating the versatility of hybrid approaches in software development. Another example is OpenAI's CLIP, which fuses text and image recognition capabilities to create more accurate text-to-image models. CLIP can understand complex relationships between text and images, allowing it to work across various generative applications.
Generative AI is constantly evolving, with new methodologies and applications emerging regularly. As the field continues to grow, we can expect to see even more innovative approaches that blend different techniques to create advanced AI systems. The next decade is likely to bring groundbreaking applications that will transform industries and reshape how we interact with technology.
About Bernard Marr
Bernard Marr is a world-renowned futurist, influencer and thought leader in the fields of business and technology, with a passion for using technology for the good of humanity. He is a best-selling author of over 20 books, writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations.
He has a combined following of 4 million people across his social media channels and newsletters and was ranked by LinkedIn as one of the top 5 business influencers in the world. Bernard’s latest book is ‘Generative AI in Practice’.
Helping Men 50 plus lonely men to find the real scene of life and finally live their dreams.
5 个月Interesting!
Full Stack Software Engineer | Building software applications and technology platforms that leverage innovation and agile product development best practices to enable reliability and security at scale.
5 个月Thanks for posting
Executive Leader focused on Finance, Innovation, Strategy, and Operations
5 个月Great overview of these models Bernard Marr. We need more efforts to explain these technologies to the layperson. A better understanding of what each model offers can lead to improved use to solve real world problems. Marc Israel you bring up some other exciting models. Hope to see more about them in future posts, especially GCNs and FOMMs.
Helping You Make Sense.
5 个月this is good.
Product Marketing | Content Strategy | Costs-Benefit Analysis | Industry Research
5 个月?Fascinating read, Bernard! Your article on the '4 Types Of Generative AI Transforming Our World' provides a clear and comprehensive overview of the pivotal technologies reshaping our digital landscape. From LLMs enhancing natural language processing to the artistic capabilities of diffusion models, the breadth of applications is truly astounding. It's exciting to think about how these AI technologies not only augment existing processes but also pave the way for entirely new forms of creation and interaction. The potential for neural radiance fields in architectural visualization and video game design, particularly, hints at a future where the boundaries between the digital and physical worlds blur even further. What strikes me most is the critical role of responsible integration and the ethical considerations these technologies demand. As we advance, ensuring these powerful tools enhance societal well-being will be as important as the innovations themselves. Thank you for such an insightful piece. I'm looking forward to seeing how these technologies evolve and the new paradigms they introduce in our professional and personal lives. #GenerativeAI #AIInnovation #FutureOfTech