Keeping all data is no longer an option

Data production is growing at around 25% per year. Humans produced about 60 zettabytes of data in 2020, and Statista estimates that over 2,000 zettabytes of data will be produced by 2035. This is a wholly unsustainable growth rate and it will have cataclysmic impacts on the environment if it is not radically reduced.

The culture of information technology is to store everything because you never know what might be important in the future. That was never a wise strategy. However, IT departments and CIOs could get away with it when the quantities of data were relatively low and the storage costs were equally low. However, data storage costs can now be eating up 30% of an IT department’s budget.

There are two challenges here. What to store and what not to create in the first place. According to Bob Clark, director of archives at the US Rockefeller Archive Center, the rule of thumb among professional archivists is that at most 5% of stuff is worth saving. My experience over almost thirty years of working with data and content is that 90% of data in practically any environment can be easily deleted and things will work better.

Someone in the organization needs to start actually managing data. Right now, too many IT departments are behaving like a crude warehouse, or in reality more like a data landfill. IT sees its job as adding more space to dump data. It’s not asking the crucial questions:

Why are we storing this?

Why are we creating it in the first place?

When new IT systems are installed, often the old systems don’t get properly decommissioned and all the data—regardless of quality—gets migrated to the new system. “They’re not performing an overall analysis of why we have got that particular application and its physical hardware,” data center expert John Booth told me. “Why are we moving something that’s already zombie into the Cloud? A lot of IT departments treat every single application that they have as mission critical, when actually it certainly isn’t.”

Just because you can create data doesn’t mean you should. It is simply not sustainable to create thousands of zettabytes of data every year. In 2022, there could have been almost 100 zettabytes of data created. To store all this data required about 70 million servers, with each server causing between one and two tons of CO2 to manufacture. To store 2,000 zettabytes would require 1.5 billion servers. That’s not sustainable.

Data growth is out of control. Most data is useless. The emergence of AI, automation, and the Internet of Things means we are only at the beginning of the data explosion. The cost of data—to create, manage, analyse and store it—is going to vastly outstrip the value it creates.

We must create much less data of a much higher quality. That will require a huge cultural shift among technology professionals. We will need many more data editors, whose primary job will be to decide what data not to create. We will need many more data archivists, whose primary job will be to decide that 95% of data already created needs to be deleted.

Your Memories. Their Cloud., Kashmir Hill, The New York Times, 2022

State of Unstructured Data Management Report, Komprise, 2022

Podcast with John Booth: Data centers: Data theatre and the tsunami of frivolous data

Read more from Gerry

Podcast: World Wide Waste

Interviews with prominent thinkers outlining what can be done to make digital as sustainable as possible.

Listen to episodes

GDPR broadly requires us to stop keeping all data for ever, just in case.

回复

要查看或添加评论,请登录

Gerry McGovern的更多文章

  • Data centers are noisy and smelly

    Data centers are noisy and smelly

    You do not want to live close to a data center. Having one near your home is like having a lawnmower running in your…

  • Data center energy scam

    Data center energy scam

    For years, energy efficiency was the great big shining bright green fabulously good spinning story of the Big Tech data…

  • Big Tech’s water use is 100 times bigger than expected

    Big Tech’s water use is 100 times bigger than expected

    The total amount consumed by Big Tech could be much, much higher than what they nominally disclose. “When it comes to…

    1 条评论
  • Why do data centers love deserts?

    Why do data centers love deserts?

    In so many ways, data center water use is more intensive than the way an ordinary person uses water, as Shaolei Ren…

  • Anatomy of a data center

    Anatomy of a data center

    A data center moving into a community is like a prison setting up. Only worse.

  • The anti-Nature Valley

    The anti-Nature Valley

    It worked, and like a magic trick, the digital warmonger was born and boomed as something greener, something softer…

  • Silicon Valley: designing for invisibility

    Silicon Valley: designing for invisibility

    “A lot of that design was about deliberately placing industrial infrastructure out of sight,” scientist Josh Lepawsky…

  • The greenwashing of Silicon Valley

    The greenwashing of Silicon Valley

    It wasn’t always known as the Valley of Pimps and Pushers. Once upon a time, they called it the Valley of Heart’s…

  • The three chip problem

    The three chip problem

    They like their chips well engineered in the USA. Long, straight and thin.

  • Crap data everywhere

    Crap data everywhere

    We need to talk about data. Crap data.

    8 条评论

社区洞察

其他会员也浏览了