Getting rid of crap data
Gerry McGovern
Developer of Top Tasks research method. Author of World Wide Waste: How digital is killing the planet and what to do about it.
Most of the data that exists shouldn’t. It’s crap. In practically every data environment I have worked in over almost 30 years, you can easily delete 90% of the data and make everything work much better. We store all this crap data because it costs less to store it than it costs to think about how to clean it up. We also store it because there is a culture in technology that doesn’t think very deeply about the quality of data. To the average technologist, data is a storage problem, not a knowledge opportunity.
For years I have stored the vast majority of my data in a backup system. I only keep data in places such as Dropbox that I need to share with others on a day-to-day basis. The backup system is awkward and clunky and it takes several minutes to retrieve a file. That’s okay because it is about 100 times less polluting than keeping the same data in Dropbox, and I rarely need to retrieve any files. (It’s also a lot cheaper.)
For a long time, I noticed that the weight of data in my backup system was larger than the data it was supposed to be backing up on my computer. When I loaded my backup system, I would find lots of temporary files and weird file formats and files I had long ago deleted. I would check the settings and it would seem like it was set up to delete temporary files and also files I had deleted from my computer. Yet, when I checked back, these junk files would still be there.
I’d ring up support. That was an experience. I was treated like an exotic creature. Why did I care? was the basic response. I was only using 20% of my allocated space. They would go through a process with me and say the problem was fixed. It was never fixed because fixing it would break the business case of the industry they worked in.
Lately, I checked it again. There was 291 GB there, which was almost four times the 78 GB on my computer that I wanted to back up. So, 213 GB of total data junk. I decided to do a real spring clean of the 78 GB of data I had. I went through my files and got rid of all the stuff that wasn’t important. I particularly focused on large files. At the end of the cleanup, I had 7.37 GB of data, or about 9% of the data on my computer and 3% of the backed-up data.
I didn’t even bother getting in touch with my backup provider. I simply cancelled my subscription and started again. I have thus removed 284 GB of data junk, data that had no purpose. Data that was consuming materials, water and electricity. Think of me multiplied by millions.
Data center growth is explosive. Between now and 2026, it’s predicted this growth could add a Germany’s worth of electricity demand, not to mention a daily increase of hundreds of millions of gallons of water. Why? To store junk. To cool AI. It’s one thing for humans to destroy our environment because we need to eat and drink. To destroy our environment so that AI and data junk can feed and drink is an environmental crime. Big Tech profits as waste grows. The more data waste, the greater the profit to the data centers. We can do something by deleting our crap data.
Digital Cleanup Day is March 16 this year. Find out what you can do to clean up the huge mess of data that’s out there. Less data means less waste and fewer data centers. Data growth is totally out of control and is having a really serious impact on our environment.
领英推荐
Podcast: World Wide Waste Interviews with prominent thinkers outlining what can be done to make digital as sustainable as possible. Listen to episodes
Intriguing perspective on data hygiene—maintaining lean and relevant datasets is indeed a challenge many organizations face today.