#10 Taming the Screenshot Beast: How AI Helped Me Build My First Useful App
Muhammad Haroon Butt
Financial Analysis & Modelling | Cost Optimization & Analysis | Telecommunications Strategic Planning & Execution | ChatGPT Prompting & Excel Automation
The joy of creating something is truly enchanting. It's no wonder that amazing creators who build disruptive things seem to be constantly fueled by the pure feeling of joy and the dopamine rush it gives them. I recently realized that I, too, love creating stuff, but I never had the tools to actually build anything substantial. My journey started with watching YouTube tutorials and following along, even though many tutorials were incomplete, often holding back some key steps for their "registered" users.
Despite these challenges, whenever I managed to build something, it was always an incredible feeling. I created websites, online stores, and YouTube channels. Sure, none of these projects blew up or got me a massive following, but I kept building anyway, driven by the sheer thrill of creation.
This year, I decided to take AI tools seriously. I learned how to prompt effectively and started using them to automate my office work in Excel. Let me tell you, the capabilities of ChatGPT blew my mind – you can do almost anything with this thing!
Recently, I created a small app using ChatGPT, and while it might not be groundbreaking, it solved a specific issue I had. Just imagine – a person with zero coding experience creating an app to automate some of their work. How cool is that?
Let me share what I needed and how ChatGPT came to the rescue. As I dove deeper into AI and LLMs, social media algorithms started bombarding me with relevant content, keeping me glued to AI-related posts. Often, I'd spot something useful, take a screenshot, and plan to use it later. But soon, managing all these screenshots became a nightmare, and I struggled to organize them into my notes.
I heard that GPT-4 could extract text from images, so I gave it a shot. But there were limits on how many images I could process, even with the paid version. Loading 20 images took forever, and sometimes I'd miss the "Continue" prompt if I got distracted, wasting precious time. Then it hit me: why not create my own image-to-text app using ChatGPT?
I'd made a couple of simple apps before, so I went to ChatGPT, gave it a basic prompt, and boom – it provided the code. I asked for step-by-step guidance, and followed along:
Prompt :you are expert python coder and are extremely helpful and think out of the box to solve the problem and come up with solutions. i want to create a simple app that can extract text that is written in any image that is uploaded. i will ultimately be using render to deploy the app online and gradio or any other simplest and lightest tool. the app has to be very light and operational so that it can be uploaded on render free platform which don't offer much processing power on free bucket. it should ask user to upload images and then extract text from image in a format that is understandable and editable. it should also allow multiple images to be uploaded and then once all images are loaded it should start working on each one and then give a word document download to get the text.
With this prompt ChatGPT gave me full instructions along with code and everything, which was overwhelming for me so i gave another prompt
Prompt : i have zero coding experience so guide me one step at a time. i already have python installed so continue from there. one step at a time
This prompt solved the problem and then it started to guide me literally one step at a time. Let me share you some of the steps it gave me.
Using Gradio for a simple front-end (because who needs fancy when functional does the trick?), I whipped up a local server where I could upload images and extract text in a flash. This app was lightning-fast, letting me convert over 200 images into text in less than half an hour. The whole setup process took me less than 30 minutes, despite having no real coding chops. That's the real power in the hands of us regular folks – solving our own problems without relying on anyone else.
Next up, I tried to upload this app on Render. They've got a free plan with limited CPU juice, but it's enough for simple tasks. I'd previously uploaded a journaling guide app there, so I figured I'd give it a go with this new app. Hit a few snags though – the app went live, but it choked when trying to convert images because the main text-extraction package wasn't playing nice with Render.
For now, the app's humming along locally, and I can use it for future images. I've got my very own converter that works without limits and at warp speed. While the output from my Python app isn't as polished as ChatGPT's image-to-text conversion, I just upload the document and use a cleanup prompt in ChatGPT to make it shine.
This journey of creating my own solutions has been incredibly empowering. It's like I've unlocked a superpower, and I can't wait to see what other cool stuff I can build with AI by my side. Who knows? Maybe you'll be inspired to create your own AI-powered solution too!