Unlock you Intelligent Data Fabric on Power10 with Cloudera Data Platform 7.1.8!
Data Fabrics avods static data silos!

Unlock you Intelligent Data Fabric on Power10 with Cloudera Data Platform 7.1.8!

Traditionally data fabric projects consists of three distinct phases, often using three different sets of applications from different vendors.

  1. Ingest and Integrate data
  2. Store and and process the data
  3. Analyze and present the data

The challenge being that there are many different specialized data teams in an organisation having different requirements. This leads to a plethora of different tools fundamentally doing the same thing.

The move from static “Data Clusters” to elastic “Data Services” that are practitioner focussed rather than operator focussed is a journey. Including centralized control and customized environments, we can establish an Intelligent Data Fabric.

To support Data Services using an Intelligent Data Fabric your solution must support:

  • Modularity
  • Scalability
  • Decoupled Data and Compute

Here is where Cloudera Data Platform (CDP) Private Cloud Base on IBM Power10 and IBM Elastic Storage System whitepaper comes in play.

Desegregated Storage and Compute leads to flexibility and modularity.
Disaggregated Strage and Compute leads to Modularity and Flexibility.

The IBM Power10 portfolio of servers enables flexible deployment options for running CDP Private Cloud Base. IBM recommends the IBM Power? S1022 and S1024 servers for CDP Private Cloud Base deployment.

The IBM Power servers provides performance, virtualization, reliability, availability and delivers twice the throughput of Intel? processor-based offerings and is highly economical for elastic data services deployments.

IBM PowerVM? allows for virtualizing the IBM Power Systems server without performance penalties traditionally associated with software-based hypervisors, and this is due to the enablement of single root I/O virtualization (SR-IOV) and dedicated I/O -virtualization options.

No alt text provided for this image
Elastic and Scalable CDP Solution using S1022 servers and ESS 3500.

?To gain the benefits of the IBM Power and ESS technology stack, an elastic deployment topology is recommended.

?A new option with the Power S1024 server is the traditional data cluster deployment topology. This avoids the high-speed network and ESS for a MVP or non-scalable scenarios. The Power S1024 server can accommodate 16x6.4 TB NVMe persistent storage modules. Each NVMe persistent storage module is independently assigned to a VM, which leaves a large number of high-capacity modules available for hosting worker nodes in the same physical server together with master nodes and gateway nodes.


No alt text provided for this image
Data Cluster with S1024 IBM Power10 Servers.


Using three Power S1024 servers, each DataNode holds a full data replica. Therefore, inter-server communication is not expected to require high bandwidth. This eliminates the 100 GbE network requirement and the ability to deploy the data network on regular 10/25 Gb ports provided in the data center.

This significantly lowers the barrier of adopting Power10 for Intelligent data fabric workloads.

No alt text provided for this image
Blueprint evolution with Power10

Disaggregated compute or storage with IBM Elastic Storage? System (ESS) reduces the traditional HDFS 3-way data replication overhead of HFDS data by up to 85% using IBM Spectrum? Scale Native RAID.?

  • A typical ESS system provides data throughput exceeding local NVMe storage implementations.
  • Shuffle and sort performance is also typically higher on spectrum scale file systems remotely accessed from an ESS, compared to local solid-state drive (SSD).
  • In addition, the ESS provides multi-protocol data access. This results in greatly simplified data ingest scenarios where data might get ingested directly over NFS or POSIX without being copied into HDFS from a local file system.
  • Using AFM/DR, the ESS system can copy data to a secondary location. This greatly simplifies DR scenarios and avoids installing additional software components in your elastic data topology cluster for the same.

Therefore, the IBM Power and ESS elastic deployment topology solution provides both performance advantages as well as potential cost savings from the reduced raw data requirements and simplified DR scenarios.




César D Delgado Ponce

Especialista en Sistemas en Banco Mercantil | IBM/EMC/PureStorage SAN switch skills, AIX and RHEL or SUSE Linux installation

9 个月

On the way .....

回复

要查看或添加评论,请登录

Fredrik Lundholm的更多文章

社区洞察

其他会员也浏览了