Data Tiering using Mapr
Vishal Garg
Product Owner - Intelligent Application Platforms (Agentic AI,Gen AI, NLP, AIML, Data Serving App Development, Data Fabric/Mesh, Data Storage Solutions, MLOPs, Cloud Platforms) at Ericsson |Ex-IBMer
The industry started with a single server Data Base Management Systems with scalability limited to single server. The invent of unstructured data lead to huge flood in the IT systems whereby Bigdata became a professional discipline. It gives immense capabilities of horizontal scale by adding nodes to the cluster running on low cost commodity hardware. Off late with more and more data getting generated the space becomes a new scarcity resource leading the industry to think about categorization of the data and its storage.
It gives me an immense pleasure to share an article of hands on experience by implementing the Data Tiering concept over our BigData lake by defining the WARM topology and cold data source such as AWS
Types of Data Tiers: - Data starts off as hot when it is first written to local storage. It becomes warm or cold based on the rules and policies the administrator configures. Data can then be set up to be automatically offloaded using the MapR automated storage tiering (MAST) Gateway service to the erasure coded volume on the low-cost storage alternative on the MapR cluster (warm tier) or to the low-cost storage alternative on the 3rd party cloud object store (cold tier) like AWS S3.
MapR provides rule-based automated tiering functionality that allows you to seamlessly integrate with:
- Low-cost storage as an additional storage tier in the MapR cluster for storing file data that is less frequently accessed ("warm" data) in erasure-coded volume.
- 3rd party cloud object storage as an additional storage tier in the MapR cluster to store file data that is rarely accessed or archived ("cold" data).
Data Offload and Purge
Warm Tiering & Purge:- For volumes configured for warm tiering, the MAST Gateway service detects the files that meet the criteria in the configured rules, collects data to offload from the read-write containers of the front-end volume on the MapR filesystem, and:
Cold Tiering & Purge:- For volumes configured for cold tiering, the MAST Gateway service detects the files that meet the criteria in the configured rules, collects data to offload from the read-write containers and snapshots for the volume on the MapR filesystem, Here the offload is happening AWS.
I request all the Big Data community members to read the article and share your valuable insights from your own experiences. I whole heartily welcome any suggestion or questions regarding my work. Thanks a lot for giving your time to read through my art
VP | AI & ML Transformation | GenAI Product Leader | AI Architecture & Strategy | AI Platform Owner
3 年Vishal Garg Nice article. How did you derive the MAST rules? How can you take advantages of different offerings of blob storage or S3 extending this design?