Your Daily AI Research tl;dr | 2022-06-02
Image from the first paper.

Your Daily AI Research tl;dr | 2022-06-02

Welcome to your official daily AI research tl;dr (and news) intended for AI professionals and enthusiasts.

In this newsletter, I share the most exciting papers I find on a daily basis, along with a short summary to help you quickly seize if the paper is worth investigating. I will also take this opportunity to share daily interesting news in the field. I hope you enjoy the format of this newsletter, and I would gladly take any feedback you have in the comments to improve it.

Now, let's get started with this iteration!

1?? CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers

We've first seen large language models create text, such as GPT-3. Then, similar Transformer-based architectures were adapted to images, yielding a lot of progress with text-to-image, especially recently with Dall-E 2 and Imagen. Now, let's jump to the next logical step: text-to-video. Generating videos isn't just adding a time dimension to the images and scorching them together. Each frame needs to be perfectly sequential and make sense, respecting physics and our world's laws. Not only that it needs to stay coherent physically, but it also needs to be aligned with the input text. This task is extremely difficult even for humans, so imagine for a machine.

Side note regarding "human-created videos": I recommend reading Creativity Inc. by Pixar's CEO. A great and really insightful read.

In this paper, Wenyi Hong et al. tackle this text-to-video task using once again a similar large Transformer-based model with impressive results called CogVideo. "CogVideo outperforms all publicly available models at a large margin in machine and human evaluations."

Link to the paper: https://arxiv.org/pdf/2205.15868.pdf

Code: https://github.com/THUDM/CogVideo

2?? CYCLIP: Cyclic Contrastive Language-Image Pretraining?

This new paper from UCLA and Adobe Research suggests that the image and text encodings made by CLIP may lead to inconsistent downstream predictions and are not interchangeable, which is bad news for all CLIP-based applications. And we know there are many of them.

They introduce CyCLIP, "a framework for contrastive representation learning that explicitly optimizes for the learned representations to be geometrically consistent in the image and text space."

From the abstract: "we show that the improved consistency in CYCLIP translates to significant gains over CLIP, with gains ranging from 10% ? 24% for zero-shot classification accuracy on standard benchmarks (CIFAR-10, CIFAR-100, ImageNet1K) and 10% ? 27% for robustness to various natural distribution shifts."

Link to the paper: https://arxiv.org/pdf/2205.14459.pdf

Code: https://github.com/goel-shashank/CyCLIP


?? The 5 Best AI Articles of the Month ! ??

In this iteration of the weekly newsletter, we are diving into five amazing articles written by the AI community! ??

With a great commentary by Lauren Keegan, as always! ??

Most of them come from people exchanging daily with us on the Towards AI discord community, and we would love to see more creative people join us and share their pieces. If you work with AI, blogger, YouTuber, coder, or simply learning AI, consider joining the Learn AI Together Discord server! ?? ??

Or watch the video here...


And we are already at the end of this iteration! Please subscribe and share it with your techy friends if you've enjoyed it. Once again, let me know how to improve this format as this is something I have wanted to do for quite some time and haven't figured out the best way to do so. I hope you liked the decisions here, and I would be glad to hear from you to make it even better with time.

Thank you for reading, a fellow AI enthusiast and researcher.

要查看或添加评论,请登录

What's AI by Louis-Fran?ois Bouchard的更多文章

社区洞察

其他会员也浏览了