GenAI in DIY-Mode (*Do It Yourself)
Everyone can learn how to use Generative AI tools - by Roberto Frossard

GenAI in DIY-Mode (*Do It Yourself)

Listen in English | Portuguese | Spanish

What are some experiments we can do with Generative AI?

So, last week I created a poll to ask which content / format I would write the next episode of the Tech Momentum newsletter (link here ).

No alt text provided for this image
The most voted format/content was "Quick hands-on experiments"


Recap on GenAI

Generative AI (GenAI), a branch of artificial intelligence (AI), is transforming the way we create and consume content. This technology, which includes models like OpenAI's GPT-4 and open-sourced Falcon 40B Model, can generate human-like text, compose music, create art, and even design software. It's trained on vast amounts of data, learning patterns and structures, and then uses this knowledge to produce unique outputs.

But what does this mean for you? Well, the applications of GenAI are vast and can impact various aspects of your life and work. If you're a journalist or content creator, AI can help automate and enhance your content creation process, freeing up your time for more complex tasks. For musicians and artists, AI can be a new tool for creativity, helping you compose new songs or create art in ways you hadn't imagined. If you're a software developer, tools like GitHub Copilot can act as your AI pair programmer, suggesting code and helping you become more efficient.

In essence, GenAI is not just a technological advancement; it's a tool that can enhance your creativity, boost your productivity, and transform the way you work. By understanding and leveraging GenAI, you can stay ahead of the curve in this rapidly evolving market.


Learn-by-Doing

I'm really glad that the most voted format was the "2-3 Quick Hands-on Experiments"!

The impact of the "learn by doing" approach, also known as experiential learning, on our ability to absorb and retain information is significant. Research has shown that people generally remember 20% of what they read, 30% of what they see, but an impressive 90% of what they do or simulate. This is often referred to as the "Learning Pyramid" or the "Cone of Learning" developed by the National Training Laboratories.

In a study conducted by the NTL Institute for Applied Behavioral Science, it was found that retention rates for learning by doing were six times higher than for traditional lecture-based learning. This approach not only improves retention but also enhances understanding, as it allows learners to apply theoretical concepts to practical situations, thereby deepening their comprehension of the subject matter.

In this article, we'll be diving into three DIY experiments that will give you firsthand experience with GenAI. These experiments are designed to be accessible and engaging, regardless of your background or expertise in AI. They involve practical applications of GenAI, such as creating PowerPoint presentations based on a tagline, cloning your own voice, and generating 3D images from plain photos. By working through these experiments, you'll not only learn about the technical aspects of GenAI but also discover how it can be used creatively and innovatively. So, let's roll up our sleeves and start the experimentation!


Experiment 1: 3D Image Generation from your photos

First things, first. We will use HuggingFace's Spaces to execute our experiments. I found it more accessible than most platforms for our experiments.

In case you haven't heard of HuggingFace, it is an open-source hub to enable the development of software applications based on machine learning, which is perfect for who wants to learn more about GenAI, new LLMs and learn from the community.

No alt text provided for this image
HuggingFace Space being built

As you can see, I just made the first experiment available, but it might be put on hold if it's not being used for a while - so let me know if you have any problems accessing it!

The "Image and 3D Model Creator" is an application that utilizes a technology known as Pixel-Aligned Implicit Function (PIFu), which is a high-resolution method for digitizing clothed humans.


PIFu is a type of generative AI that is capable of creating detailed 3D models from 2D images. It has been used in various applications, including the creation of digital avatars and the digitization of real-world objects. The technology behind PIFu involves the use of implicit functions that align with the pixels of an input image to generate a corresponding 3D model.


Follow these steps to try yourself:

  1. Access the following link
  2. Select the tab "Image-to-3D-Model"
  3. Select a PNG image (in my test, I searched for "person standing" online, and then tested some of my photos). You can use the sample images on the bottom of the page too.
  4. Click Submit (do not change the parameters in your first test, but feel free to change later and see the differences)
  5. It's going to take some minutes (sorry!) because it's an open source infrastructure running very sophisticated AI models, but it should work!

No alt text provided for this image
3D object/image generated from a photo


Experiment 2: Cloning Your Voice with AI

For our second experiment, we're going to dive into the fascinating world of voice cloning. We'll be using another Hugging Face Space, which you learned by now it's a perfect platform to get hands-on with GenAI and learn from the community.

Before we start, a quick disclaimer:

While the potential of real-time voice cloning is exciting, it's crucial to address the ethical implications of this technology. The ability to mimic someone's voice can be misused for deceptive purposes, such as deepfake audio scams or spreading misinformation. Therefore, it's essential to use this technology responsibly and ethically. Always seek consent before using someone else's voice, and refrain from using the technology for deceptive or harmful purposes.

The "Real-Time Voice Cloning" application uses a technology that can mimic a speaker's voice from a short audio sample. This is a fascinating example of generative AI's capabilities, demonstrating how it can replicate human voice with remarkable accuracy. The application uses audio files for testing and benchmarking purposes. These are the same reference utterances used by the SV2TTS authors to generate the audio samples.

Here's how you can try it out for yourself:

  • Access the following link
  • If you want to record your voice (20-30 seconds should be more than enough), click on the "file" option, and then click back on the "mic" option. With that, a "record from microphone" option should be displayed below.

No alt text provided for this image
How to enable the "Record from microphone" option

  • If you don't know what to say during the recording, I recommend looking for the lyrics of a song you like and just pronounce the words in a natural way (I did in Portuguese and English for different tests).
  • Once you upload or record your voice, the application will then generate a synthetic voice that mimics the characteristics of your voice. The application uses compressed versions of audios from the VCTK corpus and the LibriSpeech dataset.
  • Input any text you want the application to read out in your cloned voice.
  • Click Submit. It might take a few minutes again!


No alt text provided for this image
"Hi folks! Can you believe this voice was synthetically generated with a recording of my own voice while I was reading a text in Portuguese? It took me 30 seconds of recording, and then another minute to generate the voice!"


This experiment not only provides an engaging way to learn about GenAI but also demonstrates the potential of AI in transforming our interaction with digital technology. By cloning your voice, you can create personalized digital content, develop unique user experiences, and explore new possibilities in human-computer interaction.


Experiment 3: PowerPoint Presentations with AI

For our third and final experiment, we're going to keep exploring content creation, but specifically a combination of OpenAI's GPT-3.5-turbo model and GoogleImageCrawler to generate an entire PowerPoint presentation.

GPT-3.5-turbo model is used to generate the content for the PowerPoint slides. The user provides a topic and the desired number of slides, and the model generates the content for each slide. The generated content is then inserted into a PowerPoint template using Python's pptx library.

The application also uses the GoogleImageCrawler from the icrawler library to download images based on the generated content. These images are then inserted into the PowerPoint slides.

By automating the initial stages of creating a PowerPoint presentation, this experiment shows how users can focus more on refining their ideas and less on the manual task of setting up slides. This not only saves time but also enhances the quality of the final output.

What's more, I've created my own PowerPoint template for this application. This means that the content generated by the AI is adapted to fit the format I want, further demonstrating the flexibility and adaptability of GenAI.

Here's how you can try it out for yourself:

  1. Access the following link
  2. You will need to use your own OpenAI API key to use this application. To do this, visit the OpenAI website , create an account if you don't have one, and navigate to the API section to generate your key (“View API Keys” icon in the top-right area of your screen, and select the “Create an API Key” icon to set your API Key)
  3. Input your OpenAI API key into the application
  4. Input a tagline or a topic for your presentation (I used the following: "Trade-offs of Generative AI in Content Creation")
  5. Click Generate. It might return an error the first time, but click on Generate again. It will take a minute, but again - it should work!

No alt text provided for this image
PowerPoint presentation generated under 1 minute based on the topic


Once the file is generated, click on download to visualize and fine tune your PPT!

No alt text provided for this image
Original slides generated (without editing)


Some of the tests I did, had more images. So, if you're unhappy with the first test, keep on trying some variations!

I hope you liked the hands-on experience of how GenAI can be used to assist in creating content, saving time, and allowing us to focus on more valuable tasks.


Open Source: The power behind these experiments

Open source software has been instrumental in the creation and execution of these experiments. The platforms and libraries used, such as HuggingFace Spaces, Gradio, and the pptx library, are all open source. This means that their source code is freely available for anyone to view, modify, and distribute.

The beauty of open source is that it fosters a collaborative environment where knowledge is shared, and innovation is accelerated. The open source mindset is making Generative AI evolve like never before!

By testing these experiments on HuggingFace, we are all contributing to the community. Some of you might replicate these experiments, and perhaps even improve upon them. This is the power of open source - it's a cycle of continuous learning, sharing, and innovation.


Advanced Experiments with GenAI

If you're up to more advanced experiments with GenAI, I strongly recommend keeping the learn-by-doing approach going!

Some of the other spaces I've used recently that you can try too:


In conclusion...

The experiments we've explored in this article demonstrate the transformative potential of GenAI. From creating 3D models from 2D images, to cloning voices, and even generating PowerPoint presentations, the possibilities are vast and exciting.

But perhaps the most important takeaway is not just what these technologies can do, but how we can interact with them. The principle of 'learn by doing' has shown us that we can actively engage with these advanced AI models, experiment with them, and even shape their development. The future of AI is not a one-way street, but a collaborative journey.

As we continue to explore and experiment with GenAI, one thing is clear: the competition is not between humans and AI, but rather, how we can best collaborate with AI to augment our capabilities and create a better future. So, let's continue to learn, experiment, and innovate together. The future is not just about AI, it's about us, teaming up with AI, to be better, faster.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了