登录查看更多内容

Evolution of data tech stack

Nauman Noor

Technology Leader | IT Strategy | Public Cloud | Lakehouse | Gen AI | GRC

发布日期: 2024年6月15日

Given the announcements by both Snowflake and Databricks in the past two weeks, it may be helpful to recap how far we have come in terms of data warehousing and analytics

2000s

As organizations started to realize the business value of data that they had amassed, there was a need for large scale, distributed computing paradigms to unlock that value.

This was the impetus for various initiatives, with Apache Hadoop being the more successful of the various open source projects (with an initial release on Sep 4 2007).

This was the basis for multiple vendors to offer commercialized offerings around Hadoop (more successful ones being Cloudera, MapR and Hortonworks). In addition, there was continued evolution of large-scale enterprise data warehouses, with the likes of Teradata and Netezza leading the charge. They offered appliances with dedicated processes and proprietary software to provide scale out capabilities to support the ever-growing structured data volumes. The adoption of these platforms was hindered by the complexity and level of custom development in programming languages with most organizations did not possess.

2010s

With adoption of public cloud, the desire to decouple data storage from compute gained moment. Likes of AWS EMR provided a more cost-effective alternative to the incumbents while providing a pay as you go model. The integration complexity as well as development needed to provide an unified view still existed though.

Open-source ecosystem around data lakes also gained traction as high-tech consumers such as Uber, Netflix sought to address shortcoming for scale and complexity. This resulted in multiple, seemingly redundant projects and efforts to address technical challenges as well as governance considerations posed by open-source based data lakes. Nascent interoperability standards such as file formats emerged along with a new set of compute engines (e.g., Spark, Presto now Trino) to make use of those.?

2020s

To aid adoption of a unified stack, a new set of solution providers emerged that provided an pre-integrated, unified stack that addressed additional non-functional requirements such as fine grained entitlements, job observability and polygot language support. The desire to ‘bundle’ services together has driven better interoperability and standardization across the vendors.

Key capabilities that a Lakehouse architecture provides:

领英推荐

The Top Data Analytics Platforms of 2015?

Bernard Marr 10 年前

The Bank of the East - Replacing Hadoop with MinIO and…

MinIO 6 个月前

The Evolution of Big Data Technologies

Ramesh (Jwala) Vedantam 2 个月前

1.??????? Support for multiple compute engines (e.g., Spark, Flink) to support various use cases (e.g., streaming, batch)

2.??????? Transactional support for data (i.e., ACID transactions)

3.??????? Primitives / Tools to allow for pattern-based integration with third parties

4.??????? Management of data pipelines at scale

5.??????? SSO integration with leading Identity Providers (e.g., Encarta / Azure AD)

6.??????? Support for fine grained entitlements (e.g., row level access, field obfuscation, attributed based permission model)

7.??????? Schema enforcement (i.e., systematic approach to filtering out non-conformant data)

8.??????? Broader and performance support for SQL semantics, including accessibility via leading business intelligence and visualization tools (e.g., Tableau, Microsoft Power BI)

9.??????? True decoupling of data storage from compute

Currently, there are a few vendors that provide an end-to-end ecosystem. Databricks leads given its multi cloud support as well as providing some of the capabilities via an OSS model (perhaps more of an exit strategy play) and with AWS services aligning on Lake Formation and its AWS Glue catalog offerings. Snowflake has recently made some progress on opening its proprietary offerings though more to come in terms of level of interoperability and fulsome nature of integrations.

2026 and beyond

Lots of innovation in the GenAI though it is early inning on how the overall LLM workflow will be integrated into the current data stacks. ?Though there are some initial forays, most of the approaches are point solutions with more general approaches reliant on feedback from early adopters. The hope is that we will have some of the initial outcomes in the next 18-24 months.

要查看或添加评论，请登录

Nauman Noor的更多文章

Hardening Crypto Infrastructure: Bybit's $1.5B breach as a catalyst for change

2025年2月22日

Hardening Crypto Infrastructure: Bybit's $1.5B breach as a catalyst for change

Bloomberg's article provides an excellent overview and context of the incident, with The Block offering a more…

6 条评论
Apple makes Confidential Computing mainstream

2024年9月25日

Apple makes Confidential Computing mainstream

Background: As the initial wave of new Apple iPhones are being turned on, one of the key drivers for their adoption has…

3 条评论
Key considerations for Financial Institutions (FIs) when establishing Cloud Service Provider (CSP) contracts from a risk and regulatory expectations

2024年8月31日

Key considerations for Financial Institutions (FIs) when establishing Cloud Service Provider (CSP) contracts from a risk and regulatory expectations

This article would be of interest for those involved in contracting with and managing relationships with CSPs: Over the…

3 条评论
What can telecom provider outages teach us?

2024年7月28日

What can telecom provider outages teach us?

Background We often take our cell phones for granted as the wireless networks that they connect to are built to be…
Transforming enterprise data

2024年7月8日

Transforming enterprise data

A transformation in terms of processes, data accessibility, technologies and talent are often necessary to fully unlock…
Field Notes: Anti-patterns in transformations

2019年12月18日

Field Notes: Anti-patterns in transformations

After being involved in multiple "digital" transformations (some all the way from strategy to outcomes, and others ones…

1 条评论
Which soft skills matter and who has them?

2019年4月30日

Which soft skills matter and who has them?

All too often, for technical roles like programmers, managers focus on the relevant hard skills - but when technology…

1 条评论
3 things to consider when selecting your next role

2018年10月22日

3 things to consider when selecting your next role

I am often asked by mentees and students on what are some of the considerations when selecting either their next role…

1 条评论

See all articles

Evolution of data tech stack

Nauman Noor

Technology Leader | IT Strategy | Public Cloud | Lakehouse | Gen AI | GRC

领英推荐

Nauman Noor的更多文章

社区洞察

其他会员也浏览了

The Data Value Chain: Redefined

Building a Data-Driven Future: Part 2 - Six ELT Challenges Nobody Tells You

Azure Data Lake

Era of Big Data has come to an end

Microsoft Azure Data Lake

AWS and Open Source Big Data and Analytic Frameworks

Expanding Data Lakes > >>

Fidel Vetino Deep Dive into the Concept and World of Apache Iceberg Catalogs

Join us and build the best Data Lake for any workload at scale

Top 10 Big Data Trends for 2017

领英推荐

Nauman Noor的更多文章

Hardening Crypto Infrastructure: Bybit's $1.5B breach as a catalyst for change

Apple makes Confidential Computing mainstream

Key considerations for Financial Institutions (FIs) when establishing Cloud Service Provider (CSP) contracts from a risk and regulatory expectations

What can telecom provider outages teach us?

Transforming enterprise data

Field Notes: Anti-patterns in transformations

Which soft skills matter and who has them?

3 things to consider when selecting your next role

社区洞察

其他会员也浏览了

The Data Value Chain: Redefined

Building a Data-Driven Future: Part 2 - Six ELT Challenges Nobody Tells You

Azure Data Lake

Era of Big Data has come to an end

Microsoft Azure Data Lake

AWS and Open Source Big Data and Analytic Frameworks

Expanding Data Lakes > >>

Fidel Vetino Deep Dive into the Concept and World of Apache Iceberg Catalogs

Join us and build the best Data Lake for any workload at scale

Top 10 Big Data Trends for 2017