AnimateAnything: Consistent and Controllable Animation for Video Generation

AnimateAnything: Consistent and Controllable Animation for Video Generation

Today's paper introduces AnimateAnything, a new approach that generates high-quality videos from a single image while allowing users to control the video generation process through multiple input signal such as camera trajectories, text prompts, and user motion annotations. The method unifies different types of motion controls into a common optical flow representation, enabling precise and coherent video manipulation while maintaining high visual quality.

Method Overview

The approach consists of a two-stage pipeline. In the first stage, all visual control signals (like camera trajectories, user annotations, or reference videos) are converted into a unified optical flow representation. This unification helps manage different types of motion controls coherently and reduces potential conflicts between different control signals.

The second stage uses this unified optical flow to guide the actual video generation process. The method incorporates a novel frequency stabilization module that operates in the frequency domain to reduce flickering and maintain temporal consistency in the generated videos.

Explicit controls (like user-drawn arrows) are directly converted to sparse optical flows, while implicit controls (like camera trajectories) are processed through a specialized Camera Reference Model. This dual approach allows the system to handle both local object motions and global camera movements effectively.

The method also introduces a frequency-based stabilization technique that helps maintain temporal coherence by ensuring consistency in the frequency domain of the generated video, which is particularly important for reducing flickering in cases with large motion changes.

Results

The paper demonstrates superior performance compared to existing methods across multiple metrics. The approach shows:

  • Better video quality metrics (FID, SSIM, PSNR, LPIPS)
  • Improved temporal consistency and reduced flickering
  • More precise camera trajectory control
  • Better handling of user-specified motion annotations
  • Strong generalization capabilities across different types of scenes and motions

Conclusion

AnimateAnything introduces a unified approach to handling multiple types of motion controls. The two-stage pipeline, combined with the frequency stabilization module, enables the creation of high-quality, stable videos while maintaining precise control over various aspects of the generation process. For more information please consult the?full paper.

Congrats to the authors for their work!

Lei, Guojun, et al. "AnimateAnything: Consistent and Controllable Animation for Video Generation." arXiv preprint arXiv:2411.10836 (2024).

要查看或添加评论,请登录