登录查看更多内容

Driving Video: A New Approach to Portrait Animation

Diana Wolf T.

Writer | Editor of Deep Learning Daily | Silicon Valley-Based

发布日期: 2024年7月15日

What Is "LivePortrait?"

A new white paper from researchers in China introduces LivePortrait, an experimental AI tool available for free on Hugging Face. This innovative technology creates realistic animations from a single image.

I've written before about tools that animate still photos with ease. These technologies are fascinating and innovative, but bring up complex ethical questions. We'll dive into the research, but swimming through the murky ethical waters.

How Does It Work?

LivePortrait uses a two-stage process to create its animations:

Base Model Training: The system learns to extract appearance information from the source image and motion information from a driving video. It then combines these elements to create a realistic animation.
Stitching and Retargeting: After the base model is trained, LivePortrait adds extra control features. The "stitching module" helps improve the quality of the animation. The "retargeting modules" allow for fine-tuned control over specific facial features like eyes and lips.

What Is a "Driving Video?"

Unlike the videos I insist on capturing using my Tesla's dashcam, "driving video" has a different meaning in this context from "LivePortrait." Instead of showing ladders in the middle of Highway 101, or spectacularly bad driving, a "driving video" in this context is a key concept in LivePortrait's animation process. It provides the motion data that drives the animation of the still image. Essentially, the driving video shows a person or animal making various facial expressions and head movements.

LivePortrait uses this video to map these movements onto the still image, making it appear as though the image is performing the same actions. This technique ensures the animated portrait looks natural and lifelike. This is one of the sample driving videos provided by LivePortrait.

If you create a free Hugging Face account, you can try this tool out for yourself. Hugging Face is an open source data science and machine learning platform. Recognizable by an adorable little smiley face emoji with hands, the platform acts as a collaboration space for new models, datasets and applications.

Testing Out LivePortrait

Step # 1 involved choosing a "Source Portrait." I chose a favorite image of my son, taken about a decade ago.

If you want to be a good archeologist, you gotta get out of the library!

Step # 2 involves adding the driving video. I chose one of the sample videos where the expression matched the serious expression of Dr. Jones.

Default examples offered on the Hugging Face page

Step # 3: Click "Animate" button and wait for the magic to happen. It rendered very quickly on a Mac with an "M" series chip. (On a high-end GPU, it can render in just 12.8 milliseconds.)

The combination of the driving image and the still potrait = animated young Dr. Jones.

This is a playable video showing the driving video in action. (The rest of the shots below are screencaps for those who cannot access video.)

Screencap: Driving video awkwardly smiles and young Indy awkwardly smiles.

Screencap: Driving video slightly opens mouth and looks off to the side, and young Indiana does the same. The eye movements are not an exact match, likely due to difference in eye shape. Young Indy is only half-Asian.

Screencap: These movements synced up well. Driving video looks to side and does big cheesy grin. Young Indiana perfectly follows suit with a big toothy grin.

Other Key Features:

Versatility: It works on various styles of images, from realistic photos to oil paintings and even 3D renderings.

In this example provided by LivePortrait, an oil painting is brought to life

Fine Control: Users can adjust specific aspects of the animation, like how wide the eyes open or how much the lips move.

Generalization: With some additional training, the system can even animate animals like cats, dogs, and pandas.

And, yes, you could create some incredible cat videos for YouTube with this technology. Screenshot taken from the LivePortrait research paper.

Technical Innovations

Large-Scale Training: The researchers used about 69 million high-quality video frames to train the system, significantly improving its performance.
Mixed Training Strategy: By using both still images and videos during training, LivePortrait learned to handle a wider range of inputs.
Compact Representation: The system uses a clever method to represent facial movements with minimal data, making it very efficient.

Potential Applications

-Creating virtual avatars for video calls or online presentations

-Bringing historical figures to life in educational content

-Enhancing storytelling in digital media and entertainment

-Generating animated content for social media and marketing

OK, now let's get into those murky ethical areas.

Ethical Considerations

"I see dead people." - Cole. The Sixth Sense.

Using LivePortrait, you can take any image and combine it with a driving image.

Animating the Deceased

The ability to bring historical figures or recently deceased individuals "back to life" is a double-edged sword. If you have a photo of someone dead, you can combine them with a "driving video" and move their face around. This is a huge and thriving business in China, and I fully respect that other cultures have different views on how they regard their dearly departed. For myself, I'd be uncomfortable animating a photo of the brother I lost to cancer. The memories I have of him are very special to me, and I don't wish to "reanimate him" in AI form.

Potential Benefits:

Educational value in history lessons

Preserving memories for grieving families
Creating unique experiences in museums or cultural events

Ethical Concerns:

Violation of the deceased's privacy and dignity
Emotional distress for family members and loved ones
Blurring the line between remembrance and exploitation

Deepfakes and Political Manipulation

In an election year, the implications of such technology become even more concerning:

Potential for Misinformation:

Creation of fake videos showing politicians saying or doing things they never did
Rapid spread of false narratives through social media
Erosion of trust in visual evidence

Although, to be fair, these days the public should be continually educated that "seeing is not believing." The reason I probably like the slightly dystopian style of AI-video is I don't have to think about "what is real" and "what is not." Flying goats are clearly not real, but they sure are fun.

Sherry Azim 1 年前

From Pixels to Perception: The Immersive Evolution of…

Orion Market Research Pvt. Ltd. (OMR) 5 个月前

Top 10 best AI Animation Tools in 2024

Best Animation Studios 4 周前

Consent and Ownership

The issue of consent becomes complicated when dealing with historical figures or the deceased:

Who has the right to authorize the use of a person's likeness? Before using my son's photo for this article, I checked with him to make sure it was okay.
Does a fan have a right to manipulate a celebrity's photo without their consent? Do these rights change when a public figure/celebrity passes on? Should there be a "digital right of publicity" that extends beyond death?
How do we balance public interest with individual privacy rights?

Potential Safeguards and Regulations

To address these ethical concerns, several measures could be considered:

Legal Frameworks: Developing laws that govern the use of AI-generated likenesses, especially of deceased individuals

Ethical Guidelines: Creating industry standards for the responsible use of portrait animation technology

Digital Watermarking: Implementing mandatory markers to identify AI-generated content

Education: Promoting digital literacy to help the public critically evaluate AI-generated media

Consent Mechanisms: Establishing clear processes for obtaining permission from individuals (or their estates) for the use of their likeness

Final Thoughts

As we continue to develop and use technologies like LivePortrait, it's crucial to engage in ongoing ethical discussions and establish frameworks that protect individuals' rights while allowing for innovation and creative expression.

I am a retired educator who enjoys writing about AI.

Learn something new every day with #DeepLearningDaily.

Key Terms From This Article:

Driving Video

A video used as a reference for motion and expressions, which the AI uses to animate the still image. It provides the movement patterns that are then applied to the source image.

Base Model

The foundational AI model in LivePortrait that learns to extract and combine appearance and motion information.

Stitching Module

A component in LivePortrait that enhances the quality of the animation by improving the transition between different facial expressions.

Retargeting Modules

Features in LivePortrait that allow fine-tuned control over specific facial elements like eyes and lips in the animated output.

Deepfake

Synthetic media where a person's likeness is replaced with someone else's in existing images or videos, often using AI techniques.

GPU (Graphics Processing Unit)

A specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images, often used in AI processing for faster computations.

Mixed Training Strategy

An approach in AI where the model is trained on both still images and videos to improve its ability to handle various types of input.

Implicit Keypoints

A technique used in LivePortrait to represent facial movements efficiently with minimal data.

Fine-tuning

The process of making small adjustments to a pre-trained AI model to improve its performance on a specific task or dataset.

Virtual Avatar

A digital representation of a person, often used in video calls, online presentations, or virtual environments.

Digital Right of Publicity

A proposed concept extending an individual's right to control the commercial use of their name, image, or likeness to the digital realm, potentially even after death.

Digital Watermarking

The process of embedding information into digital media, which could be used to identify AI-generated content.

FAQs

What is LivePortrait? It’s a tool that creates animations from a single image using AI.
How does it work? It uses AI to add realistic movements to still images.
What are its key features? Control over movements, high processing speed, and versatility.
What are the practical applications? Virtual avatars, animated stories, and more.
What are the ethical considerations? Emotional impact, potential misuse, and consent issues.

Additional Resources for Inquisitive Minds:

AI Avatars and the Future of Grief. (Deep Learning Daily.)
"Deepfakes of your dead loved ones are a booming Chinese business. " MIT Technology Review.

"Microsoft's VASA-1 can deepfake a person with one photo and one audio track." ARS Technica. Edwards, B. (2024, April 19).
Microsoft's VASA-1: The Promise and Perils of Lifelike AI Avatars- Deep Learning Daily, April 20, 2024. Read the article . Watch the YouTube video.
"Deconstructing Deepfakes- How they work and what are the risks?" US Government Accountability Office. (October 20, 2020.)
"How to spot a deepfake? One simple trick is all you need." ZDNet. (August 9, 2022.)
"Detect DeepFakes: How to counteract misinformation created by AI. " Affective Computing Project. MIT.
Deepfakes and Deception: A Framework for the Ethical and Legal Use of Machine-Manipulated Media . Modern War Institute at West Point. (July 2023.)
The Double-Edged Sword of Deepfakes. Deep Learning Daily. (February 12, 2024.)
Books: "Deep Learning" by Ian Goodfellow, Yoshua Bengio, and Aaron Courville - A comprehensive textbook covering the fundamentals of deep learning, including a detailed introduction to GANs, written by one of the pioneers in the field

#AITechnology,#PortraitAnimation,#LivePortrait,#DeepfakeEthics,#AIEthics, #DeepLearning

Deep Learning Daily

1,451 位关注者

要查看或添加评论，请登录

Diana Wolf T.的更多文章

OpenAI’s Advanced Voice Mode Comes to Desktop: A New Era of AI Conversation

2024年11月23日

OpenAI’s Advanced Voice Mode Comes to Desktop: A New Era of AI Conversation

Picture this: You’re not typing a message or staring at a static screen. Instead, you’re speaking, and your AI responds…
The Hidden Cost of AI: Why Choosing GPT-3.5 Over GPT-4.0 is a Smarter, Greener Choice

2024年11月22日

The Hidden Cost of AI: Why Choosing GPT-3.5 Over GPT-4.0 is a Smarter, Greener Choice

Imagine standing at the edge of a vast, shimmering lake. Every time you dip your hand into the water—every time you ask…

2 条评论
Can We Have AI and Save the Planet? The Energy Conundrum of AI Development

2024年11月21日

Can We Have AI and Save the Planet? The Energy Conundrum of AI Development

Artificial intelligence is transforming the world at breakneck speed, from personal assistants like ChatGPT to…
Which One Is the Real Propaganda Machine? Sam Altman and Elon Musk’s AI Clash

2024年11月20日

Which One Is the Real Propaganda Machine? Sam Altman and Elon Musk’s AI Clash

The feud between OpenAI CEO Sam Altman and xAI founder Elon Musk has reignited, this time over the alleged political…
How Do We Define AGI? And Why Does It Matter?

2024年11月19日

How Do We Define AGI? And Why Does It Matter?

Imagine tomorrow a major tech company announces they've achieved Artificial General Intelligence (AGI). How would we…
"Why One Teacher Says AI in Math Class Isn’t Cheating—It’s Learning"

2024年11月18日

"Why One Teacher Says AI in Math Class Isn’t Cheating—It’s Learning"

California classrooms are taking a bold step forward, with some educators not just allowing but actively encouraging…
Bluesky vs. X: The AI Training Showdown

2024年11月17日

Bluesky vs. X: The AI Training Showdown

As AI rapidly evolves, the ethics of data usage have taken center stage, especially in social media. Platforms like…
How to Humanize ChatGPT Content: A Practical Prompting Guide

2024年11月16日

How to Humanize ChatGPT Content: A Practical Prompting Guide

Ever read AI-generated content and thought, "This feels..

2 条评论
How AI Was Used to Polarize the 2024 Election—Or Was It?

2024年11月15日

How AI Was Used to Polarize the 2024 Election—Or Was It?

Over the past year, I've written over a dozen articles about the influence AI could have on the elections. My favorite…
AI Surveillance in the Trump Era: Privacy Risks and Self-Protection

2024年11月14日

AI Surveillance in the Trump Era: Privacy Risks and Self-Protection

With Donald Trump’s recent election victory, new concerns are rising about the future of privacy in America. Trump has…

2 条评论

See all articles

What Is "LivePortrait?"

How Does It Work?

What Is a "Driving Video?"

Testing Out LivePortrait

Other Key Features:

Technical Innovations

Potential Applications

Ethical Considerations

Animating the Deceased

Deepfakes and Political Manipulation

领英推荐

Consent and Ownership

Potential Safeguards and Regulations

Final Thoughts

Driving Video

Base Model

Stitching Module

Retargeting Modules

Deepfake

GPU (Graphics Processing Unit)

Mixed Training Strategy

Implicit Keypoints

Fine-tuning

Virtual Avatar

Digital Right of Publicity

Digital Watermarking

FAQs

Additional Resources for Inquisitive Minds:

Deep Learning Daily

1,451 位关注者

Diana Wolf T.的更多文章

OpenAI’s Advanced Voice Mode Comes to Desktop: A New Era of AI Conversation

The Hidden Cost of AI: Why Choosing GPT-3.5 Over GPT-4.0 is a Smarter, Greener Choice

Can We Have AI and Save the Planet? The Energy Conundrum of AI Development

Which One Is the Real Propaganda Machine? Sam Altman and Elon Musk’s AI Clash

How Do We Define AGI? And Why Does It Matter?

"Why One Teacher Says AI in Math Class Isn’t Cheating—It’s Learning"

Bluesky vs. X: The AI Training Showdown

How to Humanize ChatGPT Content: A Practical Prompting Guide

How AI Was Used to Polarize the 2024 Election—Or Was It?

AI Surveillance in the Trump Era: Privacy Risks and Self-Protection

社区洞察

其他会员也浏览了

GANs, VAEs and Diffusion Models: An Animation Explainer

What Makes Animation in Augmented Reality a Game-Changer for Creative?

Unleashing Creativity: A Journey Through Animation with Little Fish Tales

How AI animation is changing the field with machine learning

AI in Animation| A Human Touch to the Digital World

The Magical Transformation: Exploring the Impact of Animation on Theme Parks and Attractions

Text-to-Animation: A Lost Race for Some

How Motion Capture is Used in Movies and Video Games

Revolutionizing Animation: How AI Tools Are Transforming Content Creation

How AI is changing the Animation Industry