The Futuristic Way To Consume Data
Mukesh Chaudhary
Managing Director and Lead - Data and AI, Advanced Technology Centers in India
Companies that harness data to derive value through data-driven decisions are differentiating themselves with scale, agility, and the power to drive reinvention. A modern cloud foundation makes data accessible and trustworthy at scale.
Cloud enables data democratization
The latest cloud-based tools democratize data and empower more people across your enterprise to easily find and consume data that are relevant to their specific business needs and at a faster pace. It makes data accessible and easy to use in a timely manner; while feeding multiple consumption models (self-serve, BI systems, and Analytics).
Mitigating the challenges of multiple sources of data
When data is moved around through multiple sources, it may lose its completeness, accuracy, quality, consistency, and value. The process of retrieving data after it comes through multiple sources not only makes the process complex but also increases the risk of failure. For instance, in the first-generation platforms (pre-data lake or Hadoop), the data was extracted-transformed-loaded (ETLed) from operational data systems directly into a warehouse. In the current second-generation, data is loaded into lakes first, without any transformation to serve the use cases from Analytics and Data Scientist’s (as they prefer more data), and then again ETLed into warehouses to serve the BI-related consumption use cases.
Having multiple Data Stores also causes inevitable delays as data professionals need to copy or move data multiple times. This also makes data governance very convoluted, especially with GDPR, GXP, and regulations around it. Data in the lake may not align with data in the warehouse resulting in rework or inhibiting parallel teams from working together.
Hence the need for a Data Lakehouse. Lakehouse helps data consumers to find the right data for their respective needs in one place. This?significantly reduces the overhead to move data to multiple locations thereby decreasing the time to market and making governance easier.
领英推荐
Why is Data Lakehouse possible now?
New cloud-based technologies and advancements make this shift possible; fast SQL access directly on data lakes which was not possible all these days until recently. This is made available by industry pioneers such as Delta lakes from Databricks or AWS in Redshift Spectrum and others. For instance, Spark 3.0 (heart of Delta Lake) is much more advanced than previous versions where it is making “Kimball possible in Lake”.
Sure, these are very early stages to celebrate success for Data Lakehouse architecture, there are gaps in physical data models and transaction applications that still need to be addressed. The key enablers are there to scale and advance this to the next level, and companies such as Databricks are betting big on this. AWS has a combination of Redshift Spectrum, Athena, S3 for solving this, similarly Azure has Synapse.
In one of the data and analytics implementation projects, we have enabled self-service and analytics for business users by creating import jobs on lake data in S3. Additionally, Databricks notebook is used to enable lake data via table or views in mart layer for BI. Databricks enables machine learning, data science, and real-time processing directly on the Lakehouse.
Advantage of using Data Lakehouse
Data Lakehouse is a one-stop shop for the data for BI and data science use cases. It helps in calibrating the cost and flexibility features of cloud storage. We can get the best possible attributes of the data lake and data warehouse in a Lakehouse.
Considering how fast the data landscape is evolving, we can hope that Lakehouse will be able to resolve the data challenges mentioned above. We personally believe this is going to change the way Data Warehouse will exist in the future and reduce technology debt 3-4 years from now.
Here at Accenture, we believe in creating innovative solutions to put data to work, leveraging the latest technologies, and if you are a Data Enthusiast, explore data careers by accessing Cloud Careers.
Learn more about how Accenture is Lighting the way with Data on Cloud with our analyses on a modern cloud foundation, its characteristics, and steps on how to transform data on the cloud.?
Data & Analytics | Scalable Engineering, Architecture & Governance Solutions | Leadership in Innovation
3 年Excellent article and very nicely explained Data challenges . However, I tend to believe Data Virtualization will be playing a bigger role in Tomorrow's market . Denodo is already making strides . Data Lakehouses being virtually built on Denodo eliminating Extract and Load too.
Managing Director, Data and AI
3 年Very lucid..loved the articulation. Thanks Mukesh. Agree its too early to celebrate as there are many sticky datalake problems to be handled under lakehouse hood. But with commercial spark platforms, and others solving some of these fast - it is getting very good traction..
Google Cloud, Data Analytics & AI
3 年Thanks Mukesh for demystifying the data lake house concepts in layman terms. In recent conversations, I am hearing that 'Data isn't fuel but renewable asset'. Data Lakehouse enables that reuse, recalibrate and regenerate new analytics without replication. As you state, this technology architecture is still in its infancy and a lot more inventions are ahead of us in this space, particularly with industry use cases..
Managing Director at Accenture, Data & AI: Delivery & Talent Transformation Lead
3 年Neatly written Mukesh. Good foundation read! Thanks for posting..
Global Lead - Financial Services Technology - Accenture
3 年Interesting read Mukesh Chaudhary !!!! Thanks...While data is the backbone of every future ready business - technology confluence across cloud, data and ML will make business differentiated at scale and speed