The Key Role Of Artificial Intelligence and Deep Learning For The Metaverse
Nick Rosa, FRSA
Industry Technology Innovation Lead Europe at Accenture | Director at AIXR | Book Author | Keynote Speaker | Podcast Host | Fellow of the Royal Society of Arts
The following is an extract from my book "Understanding the Metaverse: A Business and Ethical Guide" published by Wiley and available worldwide in hardcover, kindle, and audiobook from this link .
As the title suggests, my objective in writing this book was to make the topic accessible to as many people as possible, clarifying what the Metaverse is from a business, technological, and ethical point of view, explaining its origins, why it is important, the main use cases, how to apply an efficient Metaverse strategy for businesses and consumers and the potential risks associated with this new wave of digital transformation that will deeply change the way we perceive reality and value.
I wrote the book between March 2022 and August 2022, and Wiley published it in October 2022, before the onset of the recent AI hype craze we have all been witnessing lately. I hope you'll enjoy the reading, and I look forward to all your comments.
The Key Role Of Artificial Intelligence and Deep Learning For The Metaverse
Deep learning, generative adversarial networks (GAN), Large Language Models (LLM), and neural radiance fields (NeRF) fall under the umbrella of artificial intelligence (AI). There are multiple objectives we can achieve by harnessing these technologies, particularly in the Metaverse, which present fantastic opportunities as well as potential risks. Let’s look at what these might be and what we can do to seize the opportunities and mitigate the risks.
Application #1: Visual content generation
It will not be long before the demand for content exceeds supply, which means both creators and companies will need ways to automate the generation of 3D content and assets. Technologies like GAN and NeRF can provide these kinds of generative AI services and do so in a way that produces incredibly realistic results.
How might this work in practice? Imagine creating 3D models using a vast database of 2D images that can be fully or partially ported into a model by AI. This will be particularly useful for companies operating online stores in the Metaverse because they will be able to create a 3D model of a store using 2D images and AI technology. In fact, photogrammetry already exists and is used in everything from real estate and engineering to forensics and entertainment. However, it requires a large amount of visual data to produce an accurate 3D model.
With 3D GAN, a complete 3D model can be generated with just a couple of pictures because the AI is able to analyse the images it is provided with and use this data to create replicas of what is within those images. NeRF, meanwhile, uses a process known as inverse rendering, where the AI is able to approximate how light behaves in the real world based on a handful of 2D images taken from different angles, enabling the AI agent to create a 3D scene that fills in the gaps from those images.
You can see how this technology could easily be applied in Metaverse platforms where we are creating landscapes, buildings and whole new worlds to explore.
Application #2: More realistic avatars
An avatar is the digital persona of a real user in a virtual world. AI can be used to create more realistic avatars, particularly in relation to their facial expressions and inverse kinematics, which relate to the avatar’s movement. Inverse kinematics can be used to track the movements of a user and translate those movements in the real world into movements performed by the avatar in a virtual world. If you look at the Metaverse platforms we have at the time of this writing, the majority of the avatars you come across don’t have legs (they are literally floating torsos and heads that are cut off at the waist – weird!), because this movement is especially difficult to translate into 3D worlds in a way that’s believable.
With AI technology, there is the potential to make the avatars we have in virtual settings much more realistic, both in terms of how they move and how they express emotions.
Application #3: Autonomous AI-driven NPCs
We already mentioned that AI could provide more realistic avatars for real humans, but the data collected for that purpose can theoretically also be used to train AI agents that for fully automated and photorealistic and virtual humans that look and behave much more like humans. There are advantages and risks to this, however.
The advantages are that we will be able to create much more believable non-playing characters (NPCs) in the Metaverse thanks to the power of conversational agents programmatically driven by Large Language Models (LLM), a technology that is becoming exponentially more sophisticated in each generation. These completely autonomous NPCs could act as guides within the Metaverse, helping new users learn how to navigate different platforms as well as provide ongoing support. Think of these NPCs as an avatar version of Siri or Alexa, who can act as a concierge in your virtual world.
Of course, the risk to this is that these avatars could be used to manipulate us, either by encouraging us to buy certain products or even changing our political views, as I explained in Chapters 3 and 7. This is why it is very important for us to consider what biometric data can be measured by the organisations running Metaverse platforms in the first place and how any data that is measured is used.
While there is the potential for this data to be misused, I predict that the increased use of Metaverse platforms and immer- sive headsets (both augmented reality [AR] and virtual reality [VR]) will lead to immense data lakes of human behaviour that could potentially lead to the creation of better-quality automated robotics, and possibly even allow us to realise the dream of having human-like, fully autonomous robots within the next century.
When you think about it, this makes sense because robots behave using vectors of movement in space. Take an automated arm in a car factory – this arm uses vectors of movement to understand where it has to solder on the different parts of the car. In order to input those vectors, you need precise data. However, to create believable movements in a human, you can create reinforced learning algorithms that can understand the position of a human body in different poses.
We are already seeing this happen in video games. Just look at how the game publisher Electronic Arts has used inverse kinematics of football players performing different actions to vastly improve the players' movements in its FIFA-series football games. The company’s AI then works to create much more believable avatars on the virtual pitch.
领英推荐
Application #4: Automatic detection of antisocial behaviour and harassment in the Metaverse
One of the wonderful things about the Metaverse is its diversity. People will be able to express themselves however they want. Of course, there are some people in this world who don’t believe in freedom of expression and who will disagree with other people’s life choices, which could give rise to harassment and antisocial behaviour on Metaverse platforms. It is already proving difficult to police the Metaverse, using the standard approaches taken by the likes of social media, such as checking images and written content for anything offensive. In Chapter 7, we explored harassment on Metaverse platforms and some of the steps that can be taken to prevent it, but the more users we have in the Metaverse, the harder and harder this will be to keep track of.
This is where AI can play a role because it can be trained to look out for inappropriate behaviour between people’s avatars in the Metaverse and do so much more effectively than a team of humans ever could. Of course, users will always be able to report inappropriate behaviour, but having an automated system to detect harassment and inappropriate behaviour in the Metaverse will be very useful and, I believe, very much needed.
Application #5: Faster and more accessible world creation in the Metaverse
As I write this, to create or edit worlds in any Metaverse platform, you need some development and coding skills because you have to use the Unreal or Unity 3D engines for world-building. In the future, however, I envision that the development and alteration of 3D worlds in the Metaverse will become increasingly seamless and simple to do. There are already platforms, such as Dreams for PlayStation, that are exploring a low-code and even a no-code interface for world-building.
I believe this will become even simpler, to the point that users will be able to change what they can see in certain Metaverse environments through simple voice commands. Meta is al- ready experimenting with this, and, in a demo1, Mark Zucker- berg created a seaside scene, which was entirely AI-generated, through a series of voice commands (it might not be the most realistic or inspiring beach scene at the moment, but it shows the potential of the technology).
This potential is actually mind-blowing when you stop to think about it. Picture yourself on the ‘Holodeck’ on the Starship Enterprise. You ask it to take you to a 1930s jazz club in Chicago, and suddenly, you’re immersed in a dark, moody bar. Smoke curls through the single spotlight that is focused on the piano on a small stage, with the pianist having a drag of his cigarette before sitting down to play. As the music floats through the room, you turn and look behind you to the bar. One bartender is making a drink for another customer, while the other is lazily cleaning the countertop while watching the performance. If you’re a Star Trek fan, you no doubt have your own idea about which place and time you’d ask the Holodeck to recreate for you.
I believe, in the next 20 to 30 years, we can expect this level of sophistication in content programmatically generated based on user requests. Naturally, what experiences we can have and what situations and places we can recreate will depend on the data that is fed into the AI that sits behind and powers the platform. We’ve reached a critical mass with AI technology though; the snowball is rolling down the hill, and it’s only go- ing to gather pace.
Application #6: Moving digital assets seamlessly between Metaverse platforms
In Chapter 3, I explained how there is work being carried out to set consistent design standards across different Metaverse platforms, with the aim of allowing us to move, with our digital assets, seamlessly from one platform to another.
This is particularly important for any brands that sell digital assets, whether that’s a designer handbag or a flashy sports car, because having the ability to take them into different worlds will matter to many Metaverse users. For example, if you sell a non-fungible token (NFT) for a Gucci handbag that can only be used in Roblox and not transferred with your avatar to Decentraland, it could make the NFT less appealing to consumers, and therefore less valuable.
Technology such as GAN and deep learning could be utilised to programmatically create 3D filters that will automatically generate a version of any digital asset you hold in the graphical style of multiple platforms. Your Gucci bag might look a bit different when you’re carrying it in Sandbox as opposed to in Roblox, because it will be rendered in a way that is compatible with the style of that platform, but it will still be your Gucci bag because of the NFT it is attached to.
AI is becoming increasingly sophisticated
As you’ve read about some of these applications, you might feel as though they are years, if not decades, away from being realised, but AI is more sophisticated than you might imagine and already incredibly accurate when it comes to creating photo-realistic images. Research published early in 2022 found that AI-synthesised faces are already almost indistinguishable from real faces and, perhaps more interestingly, are also considered to be more trustworthy than real faces.
This highlights where there are risks with the adoption of AI technology within the Metaverse, because if we can already generate not only photorealistic, but also more trustworthy, human faces, we have to consider how these could be used for nefarious purposes in the Metaverse, as I explained in Chapter 7 when we discussed data ethics within the Metaverse.
You can’t talk about AI without talking about data, because, for AI technology to be effective, it needs as large a pool of data as possible. This data also has to be in the right format, so an AI that is able to write needs text-based data, an AI that creates 2D images needs 2D image data and so on. To create believable 3D worlds, these algorithms also need 3D content, which is increasingly easy to collect, now that the likes of the iPhone 12 Pro (and later iPhone Pro models) even come equipped with a light detection and ranging (LiDAR) scanner (3D scanner).
In fact, 6D.ai (which was acquired by Niantic Labs in 2020) set out to use this technology to create a persistent map of the world. Since being acquired by Niantic, the company’s software development toolkit (SDK) has been injected into Niantic’s SDK called Lightship. This is significant because Niantic is the organisation behind Pokémon GO, which means that every person playing Pokémon GO around the world is now contributing to creating this 3D map. This has also shifted Niantic’s business model, because the company is now making its SDK available to anyone who wants to create a game or application that is map based.
The key when it comes to creating 3D worlds using AI is to ensure that all the specific assets within any 3D world are semantically segmented, which essentially means they should be correctly labelled – for example, as a bicycle, a tree, a child and so on. This is important because, once an image is segmented and injected into an AI algorithm, the algorithm can start to understand the common characteristics of assets with the same semantic label, and this allows it to start producing other assets that are similar to, or a variation of, that asset.
We also can’t forget about GAN, which is an AI algorithm that is literally designed to fill in the blanks, as I explained earlier in this chapter. Nvidia has found a very interesting use for a GAN AI algorithm, namely, to produce super-resolution images instead of low-resolution images. Its GAN algorithm is essentially able to take images that have a resolution of 1080p and scale them up to, potentially, 4K resolution. To do this, the Nvidia algorithm fills in the blanks using a hallucination based on what the AI agent computes should be there due to pixel resolution. This means that if you are playing a game on a lower-spec computer with 1080p resolution, you can upscale the images and run it in 4K resolution without using the graphics processing unit (GPU). Instead, you are using an AI agent that is running on the GPU, but that uses a fraction of the power to compute a 4K image. This technology of AI super resolution could be one of the key ingredients to achieving high-quality graphic renderings on lower-spec devices such as mobile phones or self-contained VR headsets such as the Meta Quest 2.
Holding the power of creation
The Metaverse has the potential to give us the opportunity to be almost God-like in the way we create and adjust our world. You could walk into an empty room in the Metaverse, say the words ‘Let there be light!’ and light will appear. Similarly, you could command trees, furniture and whatever else you want in your space, simply through the power of your voice and the AI algorithm that interprets and realises your commands.
These tools can make us into God-like creators, which can be wonderfully empowering. Of course, there are risks associated with this, as we have discussed, and if it is not regulated effectively, we face the slightly terrifying possibility of entering The Matrix, and a world that is so compelling and designed to trigger our dopamine receptors that it will always make us happy and trap us in a digital reality.
We want AI to enhance our lives, and the Metaverse has the potential to play a huge role in that, but we don’t want to be bewitched by AI algorithms that can outsmart our capacity for discerning fiction from reality and genuine behaviour from deception. We are standing at the dawn of a new era, and we have the opportunity to ensure that technology is used to empower and enhance our lives, rather than to manipulate and control them.
Passionate about People, Transformation & Emerging Technologies. CIPD Associate.
1 年A great read indeed Nick & easy to digest! I will be waiting for 2.0 in few years' time!
培训及协调经理
1 年Keiron Fletcher
Speaker, Author: Emerging Technologies (AR/VR/XR, AI, Spatial Computing)
1 年So important. Well done.
Gastroenterologist at GastroSur/CoDirector Digital Healthcare OdiseIA/Key Opinion Leader ( KOL ) Faculty at Reach/ AI Ethicist/ Leading Metaverse Doctor
1 年It seems really interesting Nick Rosa !!