登录查看更多内容

Data Tiering using Mapr

Vishal Garg

Product Owner - Intelligent Application Platforms (Agentic AI,Gen AI, NLP, AIML, Data Serving App Development, Data Fabric/Mesh, Data Storage Solutions, MLOPs, Cloud Platforms) at Ericsson |Ex-IBMer

发布日期: 2021年4月4日

The industry started with a single server Data Base Management Systems with scalability limited to single server. The invent of unstructured data lead to huge flood in the IT systems whereby Bigdata became a professional discipline. It gives immense capabilities of horizontal scale by adding nodes to the cluster running on low cost commodity hardware. Off late with more and more data getting generated the space becomes a new scarcity resource leading the industry to think about categorization of the data and its storage.

It gives me an immense pleasure to share an article of hands on experience by implementing the Data Tiering concept over our BigData lake by defining the WARM topology and cold data source such as AWS

Types of Data Tiers: - Data starts off as hot when it is first written to local storage. It becomes warm or cold based on the rules and policies the administrator configures. Data can then be set up to be automatically offloaded using the MapR automated storage tiering (MAST) Gateway service to the erasure coded volume on the low-cost storage alternative on the MapR cluster (warm tier) or to the low-cost storage alternative on the 3rd party cloud object store (cold tier) like AWS S3.

MapR provides rule-based automated tiering functionality that allows you to seamlessly integrate with:

Low-cost storage as an additional storage tier in the MapR cluster for storing file data that is less frequently accessed ("warm" data) in erasure-coded volume.
3rd party cloud object storage as an additional storage tier in the MapR cluster to store file data that is rarely accessed or archived ("cold" data).

Data Offload and Purge

Warm Tiering & Purge:- For volumes configured for warm tiering, the MAST Gateway service detects the files that meet the criteria in the configured rules, collects data to offload from the read-write containers of the front-end volume on the MapR filesystem, and:

Cold Tiering & Purge:- For volumes configured for cold tiering, the MAST Gateway service detects the files that meet the criteria in the configured rules, collects data to offload from the read-write containers and snapshots for the volume on the MapR filesystem, Here the offload is happening AWS.

I request all the Big Data community members to read the article and share your valuable insights from your own experiences. I whole heartily welcome any suggestion or questions regarding my work. Thanks a lot for giving your time to read through my art

Kalyana Bedhu

VP | AI & ML Transformation | GenAI Product Leader | AI Architecture & Strategy | AI Platform Owner

3 年

Vishal Garg Nice article. How did you derive the MAST rules? How can you take advantages of different offerings of blob storage or S3 extending this design?

2 次回应

查看更多评论

要查看或添加评论，请登录

Vishal Garg的更多文章

Snowpipe in action for Realtime ingestion

2023年4月1日

Snowpipe in action for Realtime ingestion

In addition to my post on LinkedIn https://www.linkedin.
MLOPs monitoring Solution

2022年4月1日

MLOPs monitoring Solution

Really glad to showcase my work/POC for Model Monitoring using Data Bricks and Microsoft Azure ML. Problem Statement/s…

5 条评论
Spark via Kubernetes using MapR as Data Storage Layer

2022年1月9日

Spark via Kubernetes using MapR as Data Storage Layer

Some real good implementation in my Data Platform. We have used NFSV3 to expose the HDFS/MFS for data storage and used…

3 条评论
Hadoop Multi Data Centre Migration

2020年11月26日

Hadoop Multi Data Centre Migration

MapR Cluster Migration Via Multi Data Centre Setup Task Despeciation: - It was really challenging when I started this…

3 条评论

Data Tiering using Mapr

Vishal Garg

Product Owner - Intelligent Application Platforms (Agentic AI,Gen AI, NLP, AIML, Data Serving App Development, Data Fabric/Mesh, Data Storage Solutions, MLOPs, Cloud Platforms) at Ericsson |Ex-IBMer

Vishal Garg的更多文章

社区洞察

其他会员也浏览了

Cloud Data Lakes - The Keystone to the Decade of Data

Kubernetes for Data

Real-Time Challenges and Solutions for Data Engineers in Azure Databricks

Data Engineering Day 4: AWS S3 for Data Storage

Data Engineering AND Data on Cloud

Review of Three Data Lake Technologies #innovation #technology #datalake

Choose the suitable database to map the right workload

Why Is Cloud Object Storage Useful?

Designing Modern Data Platforms with Azure

Microsoft Azure

Vishal Garg的更多文章

Snowpipe in action for Realtime ingestion

MLOPs monitoring Solution

Spark via Kubernetes using MapR as Data Storage Layer

Hadoop Multi Data Centre Migration

社区洞察

其他会员也浏览了

Cloud Data Lakes - The Keystone to the Decade of Data

Kubernetes for Data

Real-Time Challenges and Solutions for Data Engineers in Azure Databricks

Data Engineering Day 4: AWS S3 for Data Storage

Data Engineering AND Data on Cloud

Review of Three Data Lake Technologies #innovation #technology #datalake

Choose the suitable database to map the right workload

Why Is Cloud Object Storage Useful?

Designing Modern Data Platforms with Azure

Microsoft Azure