Generative AI - Short & Sweet 07 - ?? Perpetual Views/ Videos
Generated with DALL-E

Generative AI - Short & Sweet 07 - ?? Perpetual Views/ Videos

Sign Up?|?GAI Course?|?Sponsor

[Migrating my newsletter. This is from the 13th of September.]

???The GAI topic of the week is ..

Perpetual View/Video Generation (PVG).?AI models generate open-ended videos from one image only - at least that’s the newest approach from Google Research [1]. These videos are “flying“ through a scene depicted in the input image. The range of potential applications is endless. Further, the tech behind it is incredibly smart and it marks the next cornerstone of generative AI.

Glossary to understand AI better???

  • AGI = Artificial general intelligence
  • HGNG = Hybrid Generative Neural Graphics

PVG’s complex tech simplified???

Generating a perpetual video is not trivial at all. Reading Google’s paper, this becomes quite obvious. However, I broke it down to 3+1 main ingredients:?

1)?In-painting:?As the virtual camera continues its movement the space behind objects like trees and mountains opens up which needs to be filled. For more infos see episode 4 of this newsletter [2].

2)?Out-painting: When the camera continues to move beyond what the input image captured, the AI needs to fill the space with newly generated content. See again episode 4 [2].?

3)?Superresolution:?We move the camera in the image direction, which means that we zoom into some pixels, causing low image resolution. This is tackled by upscaling the image with superresolution techniques e.g. TecoGAN [3a]. In addition, take a look at Saharia’s work in this space. A talented Research Engineer at Google Brain to follow [3b].

+1)?The dataset:?For good PVG the AI models are data-hungry. The Google Research team put a lot of work into synthesizing a labeled dataset with around 10 million videos that each includes a depth profile of the objects in it. It is impossible to do it by hand. This brings us to a new forefront of creating/ enhancing datasets synthetically. This is an evolving field that enables us to achieve new AI heights and will get more and more attention.

The tech research firm Gartner forecasts as well that synthetic data will become the main form of data used in AI training [4].

No alt text provided for this image

Sorry for that quick digression, back to PVG. ??

Google’s new approach pushes the boundaries here. The videos are significantly longer than other methods, keeping good and photorealistic quality. Further, it supports linear and non-linear movements.

There is room for improvement e.g. around video consistency, resolution, etc. But that is A-OK because this field just starts to open up. For instance, look at DALL-E’s evolution: at first, DALL-E 1 produced low-quality images [5], and now, just 1 year later, the whole community is stoked about DALL-E 2’s stunning results [6]. I am positive that we will see a similar progression with PVG.

A picture is worth a thousand words. Please, visit [1] to see an example video. Anyways, here is a screenshot:

No alt text provided for this image

And, what could this mean????

I always think about how could this evolve into 1-to-5 papers further down the research line and how could this be used in the industry, at home, or for good. Starting with the obvious one, these are my thoughts:

  • Video generation for various purposes: PVG is an important piece of in the evolution of video generation, like HGNG from episode 5 of this newsletter [7].
  • Models are becoming more complex: AI models are getting bigger. An increasing number of distinct AI tasks are being integrated into bigger models. To me, this indicates the progression towards AGI. By the way, I do recommend to listen Lex Friedman’s interview with John Cormack [8] on this topic.
  • Merge with Google Maps: Now imagine they would combine this technology with Google Maps. Then, recurring images i.e. user’s snapshots of places and Google Street View footage, would support the AI as reference points for frame recalibration. The generated “flying“ video could keep high quality in resolution and transitions while perpetually generating video material.

My mind goes in all kinds of directions here. What about 360-degree views? What could be the impact on the gaming industry? And mostly, what do YOU think is an interesting angle to this unfolding? Let us know. ??

Finally, The Top 3 GAI Gems????

  1. Play around with (stable diffusion)?AI image generation.
  2. An awesome, trippy AI-generated?music video. Try [9].
  3. Old but gold:?OpenAI’s Jukebox?generates music including singing.?


Subscribing to, giving feedback about, and sharing the?newsletter?as well as our?renowned online course?will be highly appreciated and helps a lot. ??

Do you know about ?? GAI Gems that we should consider? Or other matters??

Please, respond to this email.

Thank you for reading,?

Martin?

?

References:

[1]?Google's new AI scientific paper homepage?and many examples.

[2]?Episode 4?from 23rd of August 2022 of Generative AI - Short & Sweet.

[3a]?TecoGAN?for superresolution.

[3b]?Chitwan Saharia’s homepage.

[4]?Nvidia: What is synthetic data.?

[5]?DALL-E 1 results.

[6]?DALL-E 2 impressive images?- video from 2-min papers (a great youtube-channel).

[7]?Episode 5?from 30th of August 2022 of Generative AI - Short & Sweet.

[8]?Interview between Lex Friedman and John Cormack?with a timestamp to find the part about AGI, because it is a long, but interesting interview.

[9]?Ipython Notebook to try it out yourself.

要查看或添加评论,请登录

Martin Musiol的更多文章

社区洞察

其他会员也浏览了