Current AI Trends and how to create slide decks with the AI-Slides project
Stable Diffusion generated image

Current AI Trends and how to create slide decks with the AI-Slides project

2022 was a really good year when it comes to AI.?Today we want to introduce?https://ai-slides.com, an AI powered?slide deck generator?prototype that lets you create a whole slide deck based on a single prompt. In this first blog post, I will give an overview of which kinds of AI there are out there. In the next posts, I will then dive deeper into their details.

So let's have a look at what is underneath this technology and the latest trends in AI.

Image generation AI takes off

Artificial intelligence has made significant progress in the realm of image generation?in recent years.?One of the first notable examples was?CLIP?(Contrastive Language-Image Pre-training) from OpenAI, which was able to generate captions for images.?Then?OpenAI realized that they could turn the process around and create images based on image captions. An early example for?such a generative AI?was?Dall-E, introduced at the beginning of 2021. Since however neither the model was public nor public APIs existed, we could not make use of these neural networks until 2022. So people started recreating the models or building similar ones.?Craiyon,?introduced in early 2022,?did a fairly good job reproducing the results of the first version of DallE.?Then?all hell broke loose.

Es wurde kein Alt-Text für dieses Bild angegeben.
DallE 2 image depicting a robot in a classroom

With the introduction of?Dall-E 2?and its public availability, you could finally put the model to use.?Stable Diffusion?took a different approach?at?implementing an AI capable of creating images. Around the same time, several other implementations like?Midjourney?saw the light of day.?With all these different AIs it is now possible?to create your own images of whatever you like without much effort. However, you can only use some of the models like Craiyon and Stable Diffusion locally on your own computer at the moment. These models can then even be trained with your own pipeline on arbitrary data. The technique can also be used to?alter?images by replacing parts of the input image with different content. And it doesn't stop there.?Stable Diffusion has also been trained on audio spectrogram data in order to create music based on the input prompts.

Es wurde kein Alt-Text für dieses Bild angegeben.
Stable Diffusion Image "A Cyberpunk Christmas Tree"

Despite?its young age, this technology - which combines techniques from both Computer Vision and Natural Language Processing - shows great potential and is already used as a tool in many domains.

Major improvements in Text Generation

Es wurde kein Alt-Text für dieses Bild angegeben.
ChatGPT conversation

The second?domain that has drastically improved in 2022 was?generative text models. These have been around for a bit longer than image generation AI,?with OpenAI's?GPT-2?(Generative Pretrained Transformer) being introduced in 2019. Since then,?a third version of GPT?has been released.?The models have now?been trained on bigger and better data sets.?Additionally, the model has more parameters in its network,?which allows it to better understand human language and the context in which a specific statement takes place. At the same time,?OpenAI's?competitors?have been trying to build similar neural networks. Aleph Alpha with its?Luminous?models?has created?a similar system.?Unfortunately?neither Luminous nor GPT-3 have been open-sourced, so they can only be used via their API. But in 2022, we have also seen the first open source text generation AIs like?BLOOM. These models have now been around for a few years and can also be used in different domains like code creation. With OpenAI Codex and Github Copilot, neural networks can assist you in your coding work. In autumn 2022, however, OpenAI introduced?ChatGPT, a version of GPT-3 that was specifically trained?to react like a human chat partner when being asked. This AI, although not perfect, gives us a glimpse of what is possible in the domain of Natural Language Processing and Understanding.

What else is out there?

Es wurde kein Alt-Text für dieses Bild angegeben.
NVIDIA Omniverse Audio2Face

Neural networks can be found in pretty much any domain these days. A few notable examples are:

  • Text to speech based on machine learning models?have?been?around for years?and are part of smart assistants on any modern phone.?With e.g.?Amazon Polly, these can read text in a very convincing manner.
  • Based on a wave file of a voice, neural networks like NVIDIA's?Audio2Face?are capable of mimicking the movement of face muscles on a 3D animated character, so that it looks like this character?was reading out the text.
  • Realtime Deepfakes?can be used to digitally replace?faces?with any person based on a few minutes of training material.

So what are we going to build with all of this?

Within the next few blog posts, I will give you an introduction into neural networks based on our slide deck generator. This application will create an HTML5 slide deck based on just the title.?It will:

  • create slide captions and a few paragraphs of text?for each caption
  • create a suitable image for each of the captions
  • use text to speech to read the text automatically
  • and use a Deepfake Avatar as a convincing narrator

With each blog post, I will focus on one of those topics.

I want to do this myself!

So if you thought to yourself while reading this blog post "I want to build something with AI, but I don't know where to start", please drop me a message. We at TNG, the Innovation Hacking Team and me myself can help you getting started when it comes to Computer Vision, NLP and other AI applications.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了