Data Tiering using Mapr

The industry started with a single server Data Base Management Systems with scalability limited to single server. The invent of unstructured data lead to huge flood in the IT systems whereby Bigdata became a professional discipline. It gives immense capabilities of horizontal scale by adding nodes to the cluster running on low cost commodity hardware. Off late with more and more data getting generated the space becomes a new scarcity resource leading the industry to think about categorization of the data and its storage.

It gives me an immense pleasure to share an article of hands on experience by implementing the Data Tiering concept over our BigData lake by defining the WARM topology and cold data source such as AWS

Types of Data Tiers: - Data starts off as hot when it is first written to local storage. It becomes warm or cold based on the rules and policies the administrator configures. Data can then be set up to be automatically offloaded using the MapR automated storage tiering (MAST) Gateway service to the erasure coded volume on the low-cost storage alternative on the MapR cluster (warm tier) or to the low-cost storage alternative on the 3rd party cloud object store (cold tier) like AWS S3.

No alt text provided for this image

MapR provides rule-based automated tiering functionality that allows you to seamlessly integrate with:

  • Low-cost storage as an additional storage tier in the MapR cluster for storing file data that is less frequently accessed ("warm" data) in erasure-coded volume.
  • 3rd party cloud object storage as an additional storage tier in the MapR cluster to store file data that is rarely accessed or archived ("cold" data).

Data Offload and Purge

Warm Tiering & Purge:- For volumes configured for warm tiering, the MAST Gateway service detects the files that meet the criteria in the configured rules, collects data to offload from the read-write containers of the front-end volume on the MapR filesystem, and:

No alt text provided for this image

Cold Tiering & Purge:- For volumes configured for cold tiering, the MAST Gateway service detects the files that meet the criteria in the configured rules, collects data to offload from the read-write containers and snapshots for the volume on the MapR filesystem, Here the offload is happening AWS.

No alt text provided for this image

I request all the Big Data community members to read the article and share your valuable insights from your own experiences. I whole heartily welcome any suggestion or questions regarding my work. Thanks a lot for giving your time to read through my art

 


Kalyana Bedhu

VP | AI & ML Transformation | GenAI Product Leader | AI Architecture & Strategy | AI Platform Owner

3 年

Vishal Garg Nice article. How did you derive the MAST rules? How can you take advantages of different offerings of blob storage or S3 extending this design?

要查看或添加评论,请登录

Vishal Garg的更多文章

  • Snowpipe in action for Realtime ingestion

    Snowpipe in action for Realtime ingestion

    In addition to my post on LinkedIn https://www.linkedin.

  • MLOPs monitoring Solution

    MLOPs monitoring Solution

    Really glad to showcase my work/POC for Model Monitoring using Data Bricks and Microsoft Azure ML. Problem Statement/s…

    5 条评论
  • Spark via Kubernetes using MapR as Data Storage Layer

    Spark via Kubernetes using MapR as Data Storage Layer

    Some real good implementation in my Data Platform. We have used NFSV3 to expose the HDFS/MFS for data storage and used…

    3 条评论
  • Hadoop Multi Data Centre Migration

    Hadoop Multi Data Centre Migration

    MapR Cluster Migration Via Multi Data Centre Setup Task Despeciation: - It was really challenging when I started this…

    3 条评论

社区洞察

其他会员也浏览了