Information Resolution in the World of AI
This is just a thought experiment. . .
ChatGPT has proven to be a game changer. With a few simple prompts, it can answer questions with amazing accuracy, write full length articles, compose songs, and even write code. While other AI systems can also do some of these tasks, the accuracy, thoroughness and literary ability of ChatGPT has shown the world the impact that AI will have on the future. But what will happen when all of the worlds information becomes generated by AI models, and new models are trained on information generated by old models. Similar to images loosing resolution after being reposted multiple times, will we start to loose "Information Resolution" as AI models consume and regurgitate in an iterative loop?
But first, how does an AI like ChatGPT work?
I definitely do not have the expertise to answer that question, however from my novice understanding, AI models are 'fed' incredible amounts of data in order to 'train' the model. ChatGPT, for example, is built on OpenAI's GPT-3 which was trained with over 45 terabytes of text data from sources such as Wikipedia, Book Collections and general web crawling. Basically, ChatGPT has the knowledge of the internet up to 2021. For more details on how it works check out this link: https://www.springboard.com/blog/data-science/machine-learning-gpt-3-open-ai/
Therefore, AI models basically use information gathered from millions of web sources in order to learn the most commonly accepted truths about a wide array of topics. Then, when asked questions, the AI can generate unique responses, that are not simple "copy and paste" responses from other website. When asked to develop new content in the style of a particle genre, it can generate that as well. The applications really are incredible.
This new technology has already seen widespread adoption for many forms of content creation. Blogs, Articles, and even Books are being written using AI prompts. Schools are now concerned that students will use these AI models to complete assignments and write essays.
So what is the harm in using AI models to generate unique pieces of information? If ChatGPT is a new tool in the modern toolbelt, is there any issues to using it?
The issue is not in using these AI models. The issue is when the world stops creating authentic new information. Right now AI models are trained on information created by real people. This is actually the key to AI models, they need data to be "trained" on. But what happens when an AI model is trained on data generated by an AI model? And then an even newer AI model is trained on data from this previous AI model?
Let's do a thought experiment: For simplicity, lets imagine an AI model that is solely trained by Wikipedia articles. And let's say that currently 100% of Wikipedia articles are written by humans. Let call this AI "Wik.ai.v1" where v1 represents version 1. Wik.ai.v1 has consumed all of Wikipedia and now the general public can use it to generate information. So people use Wik.ai.v1 to update and generate new Wikipedia articles.
领英推荐
Time goes by and now 80% of Wikipedia articles are written by real humans, 20% were generated by AI. Version 2 of Wik.ai is being trained on this data from Wikipedia and eventually Wik.ai.v2 is released to the public. Again, people use Wik.ai.v2 to update and generate Wikipedia articles.
Now 40% of Wikipedia articles are written by real humans, 60% were generated by AI. And the same things happens. Again and again, the AI models use the data to evolve into new versions, and people use these new version to create new data. Eventually, 100% of Wikipedia articles become AI generated. Therefore the AI models are being trained by AI generated information and a closed informational loop is formed. What happens to the quality of that information?
I consider an analogy to an Instagram image that is posted, screenshotted, reposted, and again screenshotted and reposted many more times. Every time the image is screenshotted and reposted it looses some of its resolution. In fact, the image at the top of this article shows an example of an image that was reposted 90 times. By the 90th time the image is unrecognizable. You can see the experiment done here: https://petapixel.com/2015/02/11/experiment-shows-happens-repost-photo-instagram-90-times/
In a world where AI can write your blog in seconds, and the purpose of that blog is to help you be seen as the authority in a subject matter, will we get to a point where the majority of the worlds public information becomes AI generated, and then will this lead to the iterative loop of loosing "Information Resolution"?
Even if we don't fear loosing Information Resolution, could this be the beginning of the end of authentically new ideas? Will everything start to become a simple regurgitation or copy-paste-tweak of other information?
I genuinely have no idea. And quite frankly, I'm not actually concerned about this happening. This is more just a thought experiment that I decided to work through. I genuinely believe that authentic creativity and a desire for purpose are universal elements engrained in the human experience. With this being the case, we will always have those that want to discover and generate real and new ideas . . . which can then be added to the wealth of data for AI models to regurgitate.
By no means am I an expert in Artificial Intelligence. I am merely a man with a thought that I decided write down.
Fun fact: ChatGPT was down while I was writing this, therefore this is 100% genuine human writing! Enjoy it while it lasts ;p