Face Rigging
AYUSH GUPTA
AI Consultant | Lead AI & ML Expert | GenAI Innovator | Proficient in LLMs & Multimodal AI Models | Strategic Advisor in AI-Driven Transformations | Empowering Businesses with Intelligent Solutions!
To create and reciprocate the motion of any driving video into anime with AI.
A brief about the module is by converting motions into anime characters with the help of different functionality like face shifter , rota tor and the combiner; this all can be possible with the help of 68 facial landmark detection approaches. The model is working Ill with or without a Webcam with custom background effects.
My scope is to moderate the model with motion of video to any anime character and for this initially I have worked upon the model configuration where I have focused on facial landmarks detection which is easily achievable.So the next approach is to make movement of that facial landmark which can be executed with a face shifter and rota tor with the facial values so all I have to do is combine all the stat points to one through a model combiner and I will have results in output.
Goal and Vision:
To visualise and reciprocate the motion of driving video into custom characters with customisation and effects which will turn into a cartoonist world.So initially I have tried to cover up this with Anime's if it works then will customise with humans also with cartoonist world in 2D/3D both.
Model brief and Implementation:
Approach is to work with animation characters (3D Model approach) which consist of a face shifter and face rota tor in which it traces out the facial expressions and forward the data points in the face combiner where I can see the actual motion results , here model is not generating any sort of files while execution as it is transferring the data points from one model to another which helps in improved performance.
Tested with predefined files to automate the characters.So I have approached this model for custom characters but for this I need to do a lot of morphing and editing on local with specific terms and conditions with other software like GNU-GIMP but later on it is working with custom.
Technique:
The images that are fed from voxceleb2 are resized from 224x224 to 256x256 by using zero-padding. This is done so that spatial dimensions won't get affected when passing through down sampling layers.
The embedded uses 6 down sampling residual blocks in the middle a self-attention layer is added and in the end the output takes the last residual block which is resized to a vector of size 512 through max pooling.
The generator and discriminator uses the same architecture as the em bedder
Module with code sample:
Code description: It takes a driving video and an anime image as an input , processes it and converts it into the anime motion as in output.
Code will work for both Webcam and on video.
Code workflow:
Code utilise the necessary packages and libraries to import
I provide driving video and input anime image and the output path as command line argument , where the driving video needs to be in mp4 format also I can provide live Webcam feed through index (--human_video 0) , for the input anime image it must be in RGBA format with transparent background and 256 x 256 PX size.
I can also provide a background image for customising the final outputs background.
The model processes the driving video frame by frame and extracts the facial landmarks using the landmark detection method from frame , then calculates the head, eyes and mouth position from it using Euler angles and passes it to the next .
Note : This section covered in function update_image(self)
Next is the anime image to the poser module which returns the generated anime image.
posed_image = self.poser.pose(self.source_image,self.current_pose).detach().CPU()
Now the algorithm performs post processing where it improves the quality of image by removing noise and also adds the background to the process image by calling the function:
pil_image = self.post_processing(pil_image, resized_frame_bg)
The final output image is returned to TK module which displays the processed result.
So result is updated and generated by command
self.master.after(1000 // 60, self.update_image)
For processing with Webcam I can edit the code and changed the index to 0 so it will work for Webcam as well
video_capture = cv2.VideoCapture(opt.human_video)
Results :
Customise Results:
As to use features , have made some in modules to work with custom image background , custom video background and on webcam.
Extensions:-
This section has some ideas for extending this article that you may wish to explore.
- Different Datasets - Update the example to use it on some other datasets.
- Without Identity Mapping - Update the example to train the generator models without the identity mapping and then compare results.
- Will extend this module into a complete software package.
If you explore any of these extensions, I’d love to know.
Thank You.