Modern Enterprise Data Evolution
Amit Priyadarshi
Director @ Synechron | Data Practice | Data Engineering | Big data & Analytics | DWH, ETL, Data Modelling | Cloud, DevOps
The Youtube video is right here.
Here are the summary of the same.
*************************************************************************************
领英推荐
There are a lot of changes that happened in the data area in last couple of years and anyone joining new to this field find themselves extremely confused about different jargons, different tools and the requirements.
So the 1st phase we can say from early 2000 till around 2010, when RDBMS, Unix, ETL based systems served the need of building data warehouses and BI system. Oracle used to rule the world for databases, Informatica was the leader along with Ab initio and later datastage for ETL processing. we used to build complex data warehouses by using some complex data models mainly normalized form for enterprise data model and dimensional data models for datamarts which was meant for specific needs of the consumers. ETL tools like Informatica would load nightly batch data into these data warehouses by converting different standard of data by applying complex rules which was derived from these data models and the BI platforms. Business objects and cognos were leaders in the BI platform and together with the ETL tools they would form the data platform of any organizations which generates complex reports and the analysis of the data mainly the descriptive ones.
The second wave of data journey starts here when something called Hadoop and mapreduce was introduced from a different world and forced upon data world. Why i am saying they are forced as because in data side, we never really had to do programming like java , C etc. we used to have complex query language but they never had any overhead which are added to a developer and a developer would only focus on developing the logic and the underlying complexity of running them and optimizing them lies with the database and ETL engine only. However the new world of mapreduce is a different game. HDFS was hadoop distributed file system which was a distributed file system on multiple commodity hardware and the map reduce written on java was rthe language which supports the data processing by adding code to tis mapper and reduced classes. and who is going to do the coding. that was a big question mark as data people have never done the java programming and java people have never seen large amount of mad data which tells its stories and problems only at the execution time when millions of data getting loaded and solving them overnight is not something a weak people can do. Somebody found a solution which was apache hive and that provided its own query language called Hive QL. Big Data ecosystem is formed for handling large amount of data with lots of variety and speed of processing needs.
The final phase of data evolution is kinda scary now. With the stability in the underlying platform for data and its use cases, now the focus is shifted towards non functional requirements a lot, where we now have huge cloud adoption in terms of aws, automated code deployment using CICD mechanism through Jenkins. We have infra as a code using ansible and terraform and with these a lot of info, a data developer needs to know or to have hands on. Someone who would know just sql till 10 yrs back now need to know how AWS works along with its networking and load balancing. how automated code are written and how automated testing using Junit are done. How to build a data product which is quite different than creating data pipelines for individual tables. we now have to think and follow the cycle of a product development. One need to understand how docker and kubernetes work. One need to get exposure creating microservices and all the patterns attached with it like event sourcing, CQRS, api gateway etc.
Engagement Director/Program Management/Azure 3x Certified
1 个月Very Informative Video Amit Priyadarshi please keep posting