登录查看更多内容

Cloudera Big Data Platform Advantages on IBM Power Systems and ESS

Fredrik Lundholm

发布日期: 2022年5月26日

Last year, I created and published blueprint for running Cloudera Data Platform on IBM Power and ESS Cloudera Data Platform (CDP) Private Cloud Base on IBM Power and IBM Elastic Storage System (ESS).

Figure 1: Cloudera CPD DC 7.1.7sp1 running on IBM POWER and ESS

Cloudera have also published a comprehensive Architectural Blueprint for running CDP:

Cloudera Data Platform - Data Center (CDP-DC) Reference Architecture

What supporting statements are essential when building a Power and ESS based solution?

At initial glance you might believe there is “missing memory” or “not enough compute” in an IBM proposal. This is because we are designing our proposal around the premise of the Strategic Cloudera trajectory of segregating Compute and Storage requirements, allowing us to scale independently and right size the setup to your particular need.

Figure 2: Cloudera Modernization Strategy, Segregated Compute and Storage

Compute

This is what Cloudera has to say about the compute requirements:

This means for typical Spark jobs the 8-issue POWER9 processor is ideally crafted for Big Data workloads.

Evidence: Cloudera treats 10 POWER cores as 80 logical cores/threads.

Figure 3: 10 Core POWER virtual machines having 80 logical processors.

In a typical distributed x86 environment a large number of data nodes are required to hold 3 copies of data. 3 replicas of the data will restrict only these 3 nodes to access that data. This means for job scheduling purposes 3x 20 core x86 nodes is similar to a single a 15 core POWER9 partition which can run as many simultaneous jobs.

But wait, it gets better, as we use a shared everything architecture, this artificial performance limitation on x86 architectures caused by data locality which limits performance dramatically does not apply In contrast on POWER and ESS solution the full compute capacity of the cluster can access the same byte of data. This prevents data getting imbalanced and the need to shut down the cluster for periodic rebalancing as well as providing better scalability.

Memory

Desegregating compute and storage as per Cloudera strategy allows each IBM POWER node to access all the data. With 3x86 data nodes, each holding 384 GB of memory, a compute block of 40TB of data addresses 768 GB RAM. With our desegregated solution, all the data node’s memory is aggregated across the whole Spark cluster. This provides, for typical cluster sizes, 3x more available memory for the IBM solution and reduces the segregated investment on memory and storage.

Availability:

The reference architecture documents also provide support to the thesis that an OS/Log data size of 500GB+ should be sufficient. There is no practical value of reserving large areas for OS and kernel dumps on data nodes as they are expected to fail without affecting the overall solution. Since HW and OS failures will not impact the solution architecture, only application errors would be causing problems that need more investigation. These are use space application errors and would not be captured in a system dump.

What more, the POWER and ESS Solution can be extended on the storage side to provide data replication (synchronous or asynchronous) depending on or distance, for both active/Active or Production/DR Scenarios. In Such cases care must be taken to replicate the HDFS (Hive) metadata along with the solution and have a consistent name space in order to avoid issues when failing over.

Figure 4: Potential Multi Site CDP design with POWER and ESS.

Hence the IBM Power and ESS solution is ideally crafted for an efficient Clodera CDP DC workload implementation on a single site or across multiple locations.

Somas Kandhan Balasubramanian

Chief Architect - Focusing on assisting enterprise customers in their Hybrid MultiCloud, Data& AI solution needs.

2 年

Mr. Best Practice!!! Amazing view!

1 次回应

Tony Ojeil

Presales Manager at QuanTech SAL

2 年

Impressive reference material ... ??

Mohammad Safir

IBM Champion 2025 | Enterprise IT Architect | AI & Data Science Enthusiast

2 年

Awesome Fredrik Lundholm , this message has to reach out to every Big Data customer and please if you could also build a case study to integrate with existing non-IBM CDP environments that would be great.

2 次回应

Loay Tabbaa

Storage Technical Sales Leader - MEA

2 年

Great Fredrik Lundholm as always

查看更多评论

要查看或添加评论，请登录

Fredrik Lundholm的更多文章

Improving Practical Oracle Database Security Posture with guidance from Trustwave’s SpiderLabs! Part 2..

2025年1月22日

Improving Practical Oracle Database Security Posture with guidance from Trustwave’s SpiderLabs! Part 2..

Trustwave Skillz Middle East Dieter Hovorka Stefan Magnusson SpiderLab Agnieszka Borkowska Paulina Skrzypinska This is…

1 条评论
Practical Oracle Database Security Posture Improvement with guidance from Trustwave’s SpiderLabs! Part 1.

2024年12月30日

Practical Oracle Database Security Posture Improvement with guidance from Trustwave’s SpiderLabs! Part 1.

Trustwave Skillz Middle East Dieter Hovorka Stefan Magnusson SpiderLab Agnieszka Borkowska Paulina Skrzypinska This is…
Prove that you have a secured data landscape to avoid fines! (DORA, GDPR, PCI DSS)

2024年10月29日

Prove that you have a secured data landscape to avoid fines! (DORA, GDPR, PCI DSS)

Trustwave Skillz Middle East Gothia System Office AB Dieter Hovorka iSAP Solutions ltd / иСАП Солюжн ооо / ????? ??????…

3 条评论
CDP 7.1.9 With Power10 and Storage Scale brings Ice Berg 1.3 to the Data Lakehouse

2023年11月28日

CDP 7.1.9 With Power10 and Storage Scale brings Ice Berg 1.3 to the Data Lakehouse

This release embraces new possibilities by introducing powerful Open Data Lakehouse analytics capabilities on CDP PvC…
Join us at IBM TechXchange Summit EMEA

2023年11月8日

Join us at IBM TechXchange Summit EMEA

Innovating in the Age of AI: Transformative Learning for Technologists 23-25 January 2024 Barcelona International…
IBM Power Systems Network and Fibre adapters in 2023

2023年2月17日

IBM Power Systems Network and Fibre adapters in 2023

The Power implementation best practices were recently updated to version 27! (I have had some issues with the link…
Unlock you Intelligent Data Fabric on Power10 with Cloudera Data Platform 7.1.8!

2023年2月6日

Unlock you Intelligent Data Fabric on Power10 with Cloudera Data Platform 7.1.8!

Traditionally data fabric projects consists of three distinct phases, often using three different sets of applications…

1 条评论
Virtual NVMe over Fabric makes it into PowerVM on Power10 with AIX 7.3

2023年1月30日

Virtual NVMe over Fabric makes it into PowerVM on Power10 with AIX 7.3

IBM Power Systems with PowerVM have for two decades allowed for efficient virtualization of Processor, Memory and I/O…

2 条评论
November IBM Power Best Practices

2021年11月21日

November IBM Power Best Practices

https://ibm.box.

2 条评论
The fight rages on: NPIV vs vSCSI!

2021年6月1日

The fight rages on: NPIV vs vSCSI!

In my best practice recommendations for IBM PowerVM in 2021 and for POWER9 machines I have now started to recommend…

3 条评论

See all articles

Cloudera Big Data Platform Advantages on IBM Power Systems and ESS

Fredrik Lundholm

Compute

Memory

Availability:

Fredrik Lundholm的更多文章

社区洞察

其他会员也浏览了

The Future is Open

Sneak Peek into Trino with Azure HDInsight on AKS

Migrating from Traditional Databases to Databricks: A Strategic Path to Data Modernization

NuoData open data lake-house

Real-Time Challenges and Solutions for Data Engineers in Azure Databricks

Using Azure Cosmos DB for Globally Distributed Applications

Serverless Data Processing: The Game-Changer Your Business Needs for 2025

Designing Data-Intensive Applications with Azure Cosmos DB

DATA LAKES

The Guide To DynamoDB Streams

Compute

Memory

Availability:

Fredrik Lundholm的更多文章

Improving Practical Oracle Database Security Posture with guidance from Trustwave’s SpiderLabs! Part 2..

Practical Oracle Database Security Posture Improvement with guidance from Trustwave’s SpiderLabs! Part 1.

Prove that you have a secured data landscape to avoid fines! (DORA, GDPR, PCI DSS)

CDP 7.1.9 With Power10 and Storage Scale brings Ice Berg 1.3 to the Data Lakehouse

Join us at IBM TechXchange Summit EMEA

IBM Power Systems Network and Fibre adapters in 2023

Unlock you Intelligent Data Fabric on Power10 with Cloudera Data Platform 7.1.8!

Virtual NVMe over Fabric makes it into PowerVM on Power10 with AIX 7.3

November IBM Power Best Practices

The fight rages on: NPIV vs vSCSI!

社区洞察

其他会员也浏览了

The Future is Open

Sneak Peek into Trino with Azure HDInsight on AKS

Migrating from Traditional Databases to Databricks: A Strategic Path to Data Modernization

NuoData open data lake-house

Real-Time Challenges and Solutions for Data Engineers in Azure Databricks

Using Azure Cosmos DB for Globally Distributed Applications

Serverless Data Processing: The Game-Changer Your Business Needs for 2025

Designing Data-Intensive Applications with Azure Cosmos DB

DATA LAKES

The Guide To DynamoDB Streams