Modernize data lakes to be ready for Generative AI
karim badr
Top Accounts Sales Executive | CXO level engagement, Cloud Computing and Artificial intelligence
Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. Some argue though that the vast majority of these deployments have now become data “swamps”. Regardless of which side of this controversy you sit in, reality is that there is still a lot of data held in these systems. Such data volumes are not easy to move, migrate or modernize.
In the case of Hadoop, one of the more popular data lakes, the promise of implementing such a repository using open-source software and having it all run on commodity hardware meant you could store a lot of data on these systems at a very low cost. Data could be persisted in open data formats, democratizing its consumption, as well as replicated automatically which helped you sustain high availability. The default processing framework offered the ability to recover from failures mid-flight. This was, without a question, a significant departure from traditional analytic environments, which often meant vendor-lock in and the inability to work with data at scale.
The data lakehouse is an emerging architecture that offers the flexibility of a data lake with the performance and structure of a data warehouse. Most lakehouse solutions offer a high- performance query engine over low-cost storage in conjunction with a metadata governance layer. Intelligent metadata layers make it easier for users to categorize and classify unstructured data, such as video and voice, and semi-structured data, such as XML, JSON and emails.
Currently, we see the lakehouse as an augmentation, not a replacement, of existing data stores, whether on-premises or in the cloud. A lakehouse should make it easy to combine new data from a variety of different sources, with mission criticaldata about customers and transactions that reside in existing repositories. New insights are found in the combination of new data with existing data, and the identification of new relationships. And AI, both supervised and unsupervised machine learning, is the best and sometimes only way to unlock these new insights at scale.
领英推荐
Data lakehouse was designed to bring together best features of a data warehouse and a data lake, it yields specific key benefits to its users. This includes:
IBM’s answer to the current analytics crossroad is watsonx.data. This is a new open data store for managing data at scale that allows companies to surround, augment and modernize their existing data lakes and data warehouses without the need to migrate. Its hybrid nature means you can run it on customer-managed infrastructure (on-premises and/or IaaS) and Cloud.
A key differentiator is the multi-engine strategy that allows users to leverage the right technology for the right job at the right time all via a unified data platform. Watsonx.data enables customers to implement fully dynamic tiered storage (and associated compute). This can lead, over time, to very significant data management and processing cost savings.
If your organization has existing on premises big data implementations, a lakehouse offers a less-expensive alternative for storing data in open formats on object storage. You’ll lower the cost of analytics, decrease complexity and improve time to value.
Strategic Account Executive | Middle East & Africa
1 年A very nice breakdown , thanks Karim for such valuable insights
Storage Technical Sales Leader - MEA
1 年Great summary karim badr