Your Daily AI Research tl;dr

Your Daily AI Research tl;dr

Welcome to a new and unique newsletter, a tl;dr focusing on AI research (and sometimes news) intended for AI professionals and enthusiasts.

In this newsletter, I will share the most exciting papers I find on a daily basis, along with a short summary to help you quickly seize if the paper is worth investigating. I will also take this opportunity to share daily interesting news in the field. I hope you enjoy the format of this newsletter, and I would gladly take any feedback you have in the comments to improve it. Now, let's get started with this first iteration!

1?? The one and only; Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

No alt text provided for this image

If you thought Dall-e 2 had great results, wait until you see what this new model from Google Brain can do.

Dalle-e is amazing but often lacks realism, and this is what the team attacked with this new model called Imagen.

They share a lot of results on their project page as well as a benchmark, which they introduced for comparing text-to-image models, where they clearly outperform Dall-E 2, and previous image generation approaches. Learn more in the paper...

Link to the paper: https://arxiv.org/pdf/2205.11487.pdf

Video overview of the paper: https://youtu.be/qhtYPhPWCsI

Implementation is linked below!

2?? Fine-grained Image Captioning with CLIP Reward?

No alt text provided for this image

This is a really interesting paper tackling a different approach to image captioning, focusing on the specific and detailed aspects of an image that distinguish it from others rather than the most salient common objects as most models do. This should yield a better and more precise/distinctive description of queried images instead of describing a situation/scene that could be shared among many similar images.

They also introduced a new dataset for this task called FineCapEval.

Link to the paper: https://arxiv.org/pdf/2205.13115.pdf

Code and data: https://github.com/j-min/CLIP-Caption-Reward

3?? AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition?

No alt text provided for this image

You know that Transformers is a hot topic, and what's even hotter are Vision Transformers (ViTs). They are powerful architectures often surpassing CNNs (not always) and can take on large datasets for training, creating "large models" like GPT-3 (language Transformer), for instance.

In the visual world, ViTs are hard to adapt due to heavy computation and storage burdens. Instead, we often create and fine-tune a new model for each task. AdaptFormer tries to address this challenge by proposing an "effective adaptation approach for Transformer, namely AdaptFormer, which can adapt the pre-trained ViTs into many different images and video tasks efficiently." Adding only less than 2% extra parameters to a ViT model, they are able to fine-tune it to a new task and significantly outperform it, plus beating fully fine-tuned models.

Link to the paper: https://arxiv.org/pdf/2205.13535.pdf

Code: https://github.com/ShoufaChen/AdaptFormer

?? An implementation of Imagen, Google's Text-to-Image Neural Network that beats DALL-E2, in Pytorch.

No alt text provided for this image

It is the new SOTA for text-to-image synthesis discussed in 1??. This repository contains an implementation of Google's text-to-image neural network, Imagen. Imagen is architecturally much simpler than DALL-E 2.

GitHub repo: https://github.com/lucidrains/imagen-pytorch


And we are already at the end of this first iteration! Please subscribe and share it with your friends if you've enjoyed it. Once again, let me know how to improve this format as this is something I have wanted to do for quite some time and haven't figured out the best way to do so. I hope you liked the decisions here, and I would be glad to hear from you to make it even better with time.

Thank you for reading, a fellow AI enthusiast.

要查看或添加评论,请登录

What's AI by Louis-Fran?ois Bouchard的更多文章

社区洞察

其他会员也浏览了