登录查看更多内容

The "Big Data" hype is over. Aftermath.

Dmitry Gudkov

Someone at EasyMorph (the easiest-to-use ETL and API automation app with 180+ actions and 700+ customers)

发布日期: 2024年7月30日

"Big data" steadily declines in web search and this appears to be irreversible. The "Big Data" hype started in 2010, and today, we can confidently say that it's effectively over. What are we left with, as a result?

All technology hypes eventually fizzle, but most of them leave something useful in the aftermath. That includes not just technologies and products, but also learning. As I see it, the "Big Data" hype resulted in one major technological advance and one learning outcome:

Serverless and fully managed data stores

The hype gave us "serverless" data lakes and data warehouses, and they are going to stay relevant probably for decades ahead. Apache Spark (and Databricks), Amazon S3, Snowflake, Google BigQuery, and similar products have gained wide popularity.

Another big shift was the separation of storage and computing, where storage can be serverless, and computing fully managed and sometimes even provided by different vendors.

It remains yet to be seen if there will be open-source analogs of BigQuery or Snowflake with Postgres-level maturity suitable for self-hosting on a mixed pool of virtual and bare-metal machines. Or maybe it already exists, and I'm just not aware of it.

What about leveraging non-structured data?

Querying and extracting knowledge from non-structured data was promoted as one of the main benefits of "Big Data", but it failed to live up to the promise. At least, not at scale.

For now, SQL remains the main tool for querying "big data".

领英推荐

Managing Big Data with Azure Data Lake: Architecture…

ADFAR Tech 1 年前

Reflections on Scalability Challenges in Early Big…

Douglas Day 4 个月前

StarRocks vs. Snowflake: Choosing the Right Platform…

Alex Kargin 2 个月前

Medium data

One of the surprising learnings from all the fuss about "big data" is that it ... almost never exists. There are two explanations for this:

First, while the hype lasted, the hardware specs kept following Moore's law and jumped up a few orders of magnitude. That has shifted what you can perceive as "big data". No, 10TB is no longer "big data" as it once was. Nowadays, you can trivially procure a Windows machine with 1TB of RAM with tens of terabytes of local disk storage. For 99% of analytical workloads, that's effectively a "cloud warehouse" and a "data lake". Put a columnar database on it, and its performance will suffice for 99% of organizations out there. The biggest downside of it -- it won't look cool on the resume.

Second, as it turns out, "big data" is almost never needed in practice. Jordan Tigani, in his excellent article "Big Data is Dead," makes many great points. For instance, he argues that the actual median data volume of a real-life data warehouse lies in the 100GB range. That used to be big in times when MS Access was popular. But today, you can load the whole thing in memory on a machine that costs less than $100/month. Of course, you can use the "Big Data" technology with all its complexity even for a 100GB dataset, but would that be reasonable?

So what we've learned from the "Big Data" hype is that most of us who work in the data engineering field, actually work with medium data. An unexpected but logical, in retrospective, finding.

There never was a "Medium Data" hype. Probably, because it has a much cheaper (and thus less lucrative) and simpler technology stack. It has fewer barriers, requires much less ceremony, and, what's important, is approachable and usable for less technical business professionals.

Meanwhile, a new hype is in full swing. I guess, you know what I'm talking about :) It will be interesting to see what will remain in the aftermath of that hype.

Time will tell.

PS. Looking for a great data automation platform for medium data? Check out EasyMorph.

要查看或添加评论，请登录

Dmitry Gudkov的更多文章

Go beyond data (or fail)

2024年12月13日

Go beyond data (or fail)

We all know that "you can't manage what you can't measure". And it's true, you can't.

6 条评论
Ignoring "shadow IT" is bad for your company

2024年11月12日

Ignoring "shadow IT" is bad for your company

Shadow IT. All organizations have it, most CIOs tolerate it, and only a few see it as a strategically important…

5 条评论
Fun fact: there was no IT department in ancient Rome...

2024年5月26日

Fun fact: there was no IT department in ancient Rome...

..

4 条评论
What is hyper-automation?

2024年5月5日

What is hyper-automation?

Hyper-automation is a new term, and, as with any new term, its meaning can be fuzzy. I've been doing (hyper-)automation…

1 条评论
For whom is Qlik App Automation?

2021年10月1日

For whom is Qlik App Automation?

Yesterday, Qlik announced the release of Qlik Application Automation. Since I also work in data automation and I find…

5 条评论
Your company should have more enterprise applications, not fewer. Here is why

2020年7月27日

Your company should have more enterprise applications, not fewer. Here is why

The common wisdom across CIOs is that an organization should have as few enterprise applications as possible. The logic…

2 条评论
Data preparation is the new manufacturing, but few understand it

2020年4月7日

Data preparation is the new manufacturing, but few understand it

Work (in a general sense) in the digital, post-industrial age has surprisingly many similarities with work in the…

See all articles

The "Big Data" hype is over. Aftermath.

Dmitry Gudkov

Someone at EasyMorph (the easiest-to-use ETL and API automation app with 180+ actions and 700+ customers)

Serverless and fully managed data stores

What about leveraging non-structured data?

领英推荐

Medium data

Dmitry Gudkov的更多文章

社区洞察

其他会员也浏览了

Azure Data Engineering: Azure Data Lake Storage Gen2 vs Azure Blob Storage

Three V's of Big Data

Data is Business, But Big Data is a Problem

Size Matters.

Unlocking the Power of Big Data: Strategies, Solutions, and Future Trends for Your Business

Staying ahead in the competition using Power of Data Insights with Modern Data Systems !

Azure Data Engineering: Azure Blob Storage vs. Azure Data Lake Storage Gen2

Big Data Market is exploding, But Why?

The Future of Where Big Data Lives

Yellowbrick Data 2020 Predictions: The Year Of Data-Driven Innovations

Serverless and fully managed data stores

What about leveraging non-structured data?

领英推荐

Medium data

Dmitry Gudkov的更多文章

Go beyond data (or fail)

Ignoring "shadow IT" is bad for your company

Fun fact: there was no IT department in ancient Rome...

What is hyper-automation?

For whom is Qlik App Automation?

Your company should have more enterprise applications, not fewer. Here is why

Data preparation is the new manufacturing, but few understand it

社区洞察

其他会员也浏览了

Azure Data Engineering: Azure Data Lake Storage Gen2 vs Azure Blob Storage

Three V's of Big Data

Data is Business, But Big Data is a Problem

Size Matters.

Unlocking the Power of Big Data: Strategies, Solutions, and Future Trends for Your Business

Staying ahead in the competition using Power of Data Insights with Modern Data Systems !

Azure Data Engineering: Azure Blob Storage vs. Azure Data Lake Storage Gen2

Big Data Market is exploding, But Why?

The Future of Where Big Data Lives

Yellowbrick Data 2020 Predictions: The Year Of Data-Driven Innovations