Is fake will look so real? Machine learning for Film talents.
In recent years, machine learning has advanced to the point where it is now possible to generate realistic, high-quality video content using artificial intelligence (AI) models even from a single picture. many areas where this technology is having a significant impact but in my opinion, it may revolutionize acting and casting for film.
In a recent research paper, a team of 5 Russian and Armenian computer scientists proposed a novel deep-learning model that can realistically animate a person's face in a video, based on the movements of another person's face. The model utilizes a combination of appearance and motion encoders, along with a warping generator and a 3D convolutional network, to create a seamless transition between the two faces.
The project called MegaPortraits: One-shot Megapixel Neural Head Avatars by :
Nikita Drobyshev Samsung AI Center - Moscow Russia, Jenya Chelishev Samsung AI Center - Moscow Russia, Taras Khakhulin Samsung AI Center - Moscow Skolkovo Institute of Science and Technology Russia, Aleksei Ivakhnenko Samsung AI Center - Moscow Russia, Victor Lempitsky Yandex Armenia, Egor Zakharov Samsung AI Center - Moscow Skolkovo Institute of Science and Technology Russia.
First, they take two pictures from a video and use one of them as a guide for what the other one should look like. Then, they break down the pictures into different parts, for example, what the car looks like or where it is in the picture.
Next, they use a special machine to figure out how the different parts of the pictures are moving. They can then use that information to make the picture look different or move in a different way.
Finally, they put everything back together to make a new picture that looks like the source but acts like the drive.
let's not get shy to geek about it, basic model:
领英推荐
v??→?? = w→?? ? G3D(w??→ ? v?? )
This process involves sampling two frames from a training video, a source frame (x??) and a driving frame (x??). The driving frame is used as both an input for the system and the ground truth. The source frame is passed through an appearance encoder (Eapp) that outputs local volumetric features (v??) and a global descriptor (e??).
In parallel, motion descriptors for the source and driving frames are separately calculated using a motion encoder (Emtn). This encoder outputs head rotations (R??/??), translations (t??/??), and latent expression descriptors (z??/??). The source tuple (R??, t??, z??, e??) is then input into a warping generator (W??→) to produce a 3D warping field (w??→), which removes the motion data from the volumetric features by mapping them into a canonical coordinate space. These features are then processed by a 3D convolutional network (G3D).
Finally, the driver tuple (R??, t??, z??, e??) is fed into a separate warping generator (W→??), which outputs a w→?? field that is used to impose the driver's motion on the final 4D volumetric features, obtained by multiplying the warping fields and passing the result through G3D.
The source tuple, along with the global descriptor, is then used to generate a warping field that maps the volumetric features into a canonical coordinate space. These features are then processed by a 3D convolutional network, and the driver tuple is used to impose the motion of the driving frame. The final 4D volumetric features are obtained through a combination of the warping generator and the 3D convolutional network.
The potential impact of this technology on the film industry is significant. One immediate application is in the creation of virtual actors for use in films and television shows. Currently, the process of creating digital actors involves either costly motion capture techniques or extensive manual animation. With this technology, a digital actor could be created using the facial movements of an existing actor, eliminating the need for motion capture or manual animation.
This technology also has the potential to revolutionize the casting process for films. Traditionally, casting involves a lengthy and expensive process of auditioning actors, and it is often difficult to find the perfect fit for a particular role. With the ability to create digital actors based on the facial movements of existing actors, casting directors could potentially bypass the audition process entirely, simply selecting the best facial match for a particular role.
The use of these new neural architectures and training methods has allowed for the creation of convincing high-resolution neural avatars that outperform previous methods. This technology has the potential to revolutionize industries such as entertainment, gaming, and virtual production by allowing for a more immersive and interactive experience. As the field continues to advance, we can expect to see even more exciting developments in neural head avatar technology
Do you think there are potential ethical concerns that come with this technology? For example, is there a risk that digital actors could be used to replace real actors in films? leading to a loss of jobs in the industry? There is also the question of consent - would an actor need to give their permission for their facial movements to be used in this way? I would love to know your thoughts.
Paper's link: