Why ChatGPT cannot generate a Photo with a correctly written name on it?
I saw the videos of SORA, and they were great videos generated with only a prompt. But why the hell cannot generate the simplest photo with text in it? You can see how my name was generated in the photo by ChatGPT or a simple quote by Leonard Sweet.?
While DALL·E (ChatGPT) is highly advanced in interpreting prompts and generating visuals, it is not without some difficulties. Especially when it comes to rendering precise text within images.?
Let's imagine we have a robot designed to bake cakes using recipes from a big cookbook. This robot is smart and can follow instructions to bake various cakes. However, when it's time to decorate these cakes with special messages written in icing, things don't always go as planned. This situation helps us understand why AI, like our robot, sometimes struggles with writing text correctly in images. Let's dive into the reasons through this simple analogy.
1. Learning from a Broad Range of Examples
Our robot has learned to bake cakes by looking at many pictures and recipes. But, it wasn't specifically taught how to decorate cakes with messages. This is similar to how AI models learn to generate images. They see lots of data, but not enough focused on writing text. So, while our robot can make icing, writing neatly with it is another story. It's like it knows what icing letters should look like but can only sometimes get them right on the cake.
领英推荐
2. Grasping the Meaning
The robot knows that messages on cakes matter but doesn't really get the meaning behind the words. For example, writing "Happy Birthday" needs a different touch than a "Congratulations" message. This shows how AI might recognize words but not fully understand when and how to use them properly in an image. It's as if the robot knows to put words on the cake but misses what makes each message special.
3. Having the Right Tools
Imagine our robot only has basic tools for cake decoration, not the precise ones needed for detailed writing. This is like AI's limitation in creating clear text in images. The AI has tools for making pictures but not for fine text details. So, when our robot tries to write on a cake, the letters might not look very good, just like AI might mess up the text in a picture.
4. Seeing the Details
Let's say the robot's camera, used to check the finished cake, isn't sharp enough to see small mistakes in the icing text. It looks at the overall cake but misses the little errors. This is similar to how AI focuses on making the main part of the image but might not pay enough attention to the text details. So, just like our robot might not notice a smudge in the letters, AI-generated images might have text that's not quite right.
Conclusion
Through this cake-baking robot story, we can see why AI has a tough time with text in images. It's about how the AI learns, understands context, uses its tools, and pays attention to details. Just like our robot tries to improve its cake decorating, AI technology is always getting better. We can hope for clearer and more accurate text in AI images as technology advances.
Author and Independent Scholar
2 个月Bull doogy! It misspells things on purpose, so you can't sell graphics, at least easily. For example, you can't generate a graphic for New Year with 2025.
Specializing in personality assessments for the nursing/medical industry for onboarding, hiring, team building, and reduction of high employee turnovers. Rebuilding workplace culture for your company.
8 个月So, how can I fix it? It's a wonderful image...
IBM MSFT SAP - B2B product management coach, consultant, trainer, and speaker passionate about increasing business impact with innovative, customized programs for individuals and organizations.
9 个月This behavior is obviously intentional as it gets it wrong 100% of the time. The interesting conversation is their motivation. At the top of my list? Probably for future monetization, they don't want people using it for commercial graphics. It's lame AF.
Senior Software Engineer & Architect
9 个月I'm sorry but your assessment of the situation is extremely naive. ChatGPT has the capability of generating suitable text. That's a fact. Measuring text in the context of an image is technology that's been around since writing text on a screen has been around. That's also a fact. Assuming they're actually not using pre-existing fonts directly for rendering, I promise you that they are using them as training data to understand how to write the language. So I promise you this is not a *technical* issue for them. That's simply not a rational assessment.
Energy Transition Expert, Energy Auditing and Management
1 年Because they spent so much time and money enhancing captcha, it would be basically working agains themselves.