登录查看更多内容

Modern Enterprise Data Evolution

Amit Priyadarshi

Director @ Synechron | Data Practice | Data Engineering | Big data & Analytics | DWH, ETL, Data Modelling | Cloud, DevOps

发布日期: 2025年1月15日

+ 关注

The Youtube video is right here.

https://www.youtube.com/watch?v=eQLoTHKOUzY

Here are the summary of the same.

*************************************************************************************

领英推荐

Data Science Tools To Consider Using

Doug Rose 5 个月前

Top 8 ETL Tools For Data Extraction, Transformation…

NEX Softsys 2 年前

YOUR SQL PERFORMANCE SUCKS - AND HOW TO FIX IT

Andrew Madson 3 周前

There are a lot of changes that happened in the data area in last couple of years and anyone joining new to this field find themselves extremely confused about different jargons, different tools and the requirements.

So the 1st phase we can say from early 2000 till around 2010, when RDBMS, Unix, ETL based systems served the need of building data warehouses and BI system. Oracle used to rule the world for databases, Informatica was the leader along with Ab initio and later datastage for ETL processing. we used to build complex data warehouses by using some complex data models mainly normalized form for enterprise data model and dimensional data models for datamarts which was meant for specific needs of the consumers. ETL tools like Informatica would load nightly batch data into these data warehouses by converting different standard of data by applying complex rules which was derived from these data models and the BI platforms. Business objects and cognos were leaders in the BI platform and together with the ETL tools they would form the data platform of any organizations which generates complex reports and the analysis of the data mainly the descriptive ones.

The second wave of data journey starts here when something called Hadoop and mapreduce was introduced from a different world and forced upon data world. Why i am saying they are forced as because in data side, we never really had to do programming like java , C etc. we used to have complex query language but they never had any overhead which are added to a developer and a developer would only focus on developing the logic and the underlying complexity of running them and optimizing them lies with the database and ETL engine only. However the new world of mapreduce is a different game. HDFS was hadoop distributed file system which was a distributed file system on multiple commodity hardware and the map reduce written on java was rthe language which supports the data processing by adding code to tis mapper and reduced classes. and who is going to do the coding. that was a big question mark as data people have never done the java programming and java people have never seen large amount of mad data which tells its stories and problems only at the execution time when millions of data getting loaded and solving them overnight is not something a weak people can do. Somebody found a solution which was apache hive and that provided its own query language called Hive QL. Big Data ecosystem is formed for handling large amount of data with lots of variety and speed of processing needs.

The final phase of data evolution is kinda scary now. With the stability in the underlying platform for data and its use cases, now the focus is shifted towards non functional requirements a lot, where we now have huge cloud adoption in terms of aws, automated code deployment using CICD mechanism through Jenkins. We have infra as a code using ansible and terraform and with these a lot of info, a data developer needs to know or to have hands on. Someone who would know just sql till 10 yrs back now need to know how AWS works along with its networking and load balancing. how automated code are written and how automated testing using Junit are done. How to build a data product which is quite different than creating data pipelines for individual tables. we now have to think and follow the cycle of a product development. One need to understand how docker and kubernetes work. One need to get exposure creating microservices and all the patterns attached with it like event sourcing, CQRS, api gateway etc.

Puneet Singh Walia

Engagement Director/Program Management/Azure 3x Certified

1 个月

Very Informative Video Amit Priyadarshi please keep posting

1 次回应

要查看或添加评论，请登录

Amit Priyadarshi的更多文章

The changing landscape of Data - The good and the scary ones

2022年12月20日

The changing landscape of Data - The good and the scary ones

----------------------------------------The Story-------------------------------------------------------- Not so long…

1 条评论
Implement Modern Real time ETL

2022年12月11日

Implement Modern Real time ETL

1. Context : Modern data architecture has come a very long way from the earlier days of Data Warehouses and Data…

2 条评论

Modern Enterprise Data Evolution

Amit Priyadarshi

Director @ Synechron | Data Practice | Data Engineering | Big data & Analytics | DWH, ETL, Data Modelling | Cloud, DevOps

领英推荐

Amit Priyadarshi的更多文章

社区洞察

其他会员也浏览了

SSIS

ETL or ELT

ETL IS DEAD

SSIS

What is SSIS?

Azure Data Factory vs. SSIS: Choosing the Right ETL Tool

Architecting Data Migration: Efficiently Loading Data from Oracle (On-Premises) to Snowflake

Dynamic Schema with Talend.

MS Azure Data Factory Vs SSIS

SQL Query Performance Improvement in SQL Server

领英推荐

Amit Priyadarshi的更多文章

The changing landscape of Data - The good and the scary ones

Implement Modern Real time ETL

社区洞察

其他会员也浏览了

SSIS

ETL or ELT

ETL IS DEAD

SSIS

What is SSIS?

Azure Data Factory vs. SSIS: Choosing the Right ETL Tool

Architecting Data Migration: Efficiently Loading Data from Oracle (On-Premises) to Snowflake

Dynamic Schema with Talend.

MS Azure Data Factory Vs SSIS

SQL Query Performance Improvement in SQL Server