The December 2023 MinIO Newsletter

The December 2023 MinIO Newsletter

Welcome to the December 2023 MinIO newsletter. This is a special time of year and we are grateful to have such an amazing crew of folks with us on the MinIO journey. Thank you for being a part of that. The newsletter growth continues to amaze.

Our experiment worked so we will keep the grouped format again this month. If you have an opinion - please share it.?


Bathe in the Modern Datalake:

Terminology matters, but we are a bit flummoxed by the datalake/datalakehouse debate. We have settled on “modern” datalake because that is what our clients talk about. It is multi-engine, supports open table formats, is big and is fast. We think AI datalake will be a thing shortly. Either way - we are deeply into the subject.?

We start this month with a superb piece from Brenna Buuck on why the modern datalake is increasingly a private cloud affair. The transformation of datalake architecture is driven by cloud-native technologies and models and building them on private cloud infrastructure offers cost savings, flexibility, and future-proofing of data management strategies. Check it out here.

Brenna is back with another post, this time plumbing the depths of the datalake with Nessie, Dremio, Iceberg and MinIO. Nessie is pretty cool. It allows each engineer to work in separate branches while maintaining a single source of truth in the form of the main branch. Brenna provides a step-by-step guide demonstrating how Nessie, Dremio and MinIO work together to enhance data quality and collaboration in your data engineering workflows. Whether you're a data engineer, ML engineer, or just a modern data lake enthusiast, this blog equips you with the knowledge and tools needed to effectively enhance your data versioning practices.

Brenna completes the triple with her post on open table formats and data portability. Recent advancements in open table formats by Databricks' Delta Lake 3.0 and Apache Iceberg enhance interoperability and data portability across various environments, including public and private clouds, edge computing, and bare-metal setups. This shift, emphasizing cloud-native technology and the decoupling of storage and compute, marks a seachange in data management, aligning with the modern, modular data stack and the cloud operating model to optimize flexibility, efficiency, and cost-effectiveness in data infrastructure. Must read stuff.


GenAI and Reproducibility:

We have two posts this month on the AI front.

The first is on GenerativeAI - perhaps the hottest corner of the scorching hot AI landscape. SME Keith Pijanowski excplains how generative AI offers enterprises a novel way to leverage internal data, using customized Large Language Models (LLMs) similar to ChatGPT but trained specifically on a company's unique data corpus. This has direct application to research, customer support and document creation - just to name a few. He compares and contrasts the GenAI approach with Retrieval Augmented Generation (RAG) an area we have written about extensively.?


The second post is focused on the immensely important area of reproducibility in AI. Co-authored with the team at LakeFS, the post details how teams can leverage reproducibility to repeat experiments using the same procedures to get the same results. It’s the foundation of the scientific method and, therefore, a handy approach in ML. True reproducibility includes data, tools, libraries, frameworks, programming languages and operating systems. Reproducibility means that team members have the capability to time travel between multiple versions of the data, taking snapshots of the data at various periods and with varying degrees of modification. The post includes instructions on how to achieve reproducibility with MinIO and LakeFS.?


Executive Corner:

The Executive Corner was super successful in its debut last month. As a reminder, these are less technical posts. They look at macro issues of interest to the C-Suite and its direct reports. They tend to be opinionated and are designed to challenge the “conventional wisdom” of the data world.?

We will lead off with last month’s #1 read - The Unified Storage Narrative and Why It Is a Lie. There are some vendors that would have you believe that they are best in class at everything - storage, databases, AI. We all know that is not true. The best-in-class tool is almost never a Swiss Army knife, it is something designed for a specific problem class. The idea of supporting S3, NFS, SMB, HDFS, iSCSI, FCoE, NVMeoF and FCP all at once is a false promise.

Matt Sarrel has an excellent post that builds off of some work in Harvard Business Review by Leandro DalleMule and Thomas H. Davenport. Their work creates a strategic framework for data management and analytics. Matt interprets that framework in the context of modern data lakes and MinIO centric data infrastructure.

Jonathan Symonds also weighs in with his semi-annual review of Kubecon. Hint - he was a fan of this year’s event and sees maturity as the big takeaway for the Kubernetes community - just in time for the AI wave. The timing is very fortunate for the continued dominance of orchestration in large scale data infra.


MinIO Internals:

There is a massive appetite for MinIO specific content in the blog and the Newsletter. We are going to consolidate those posts in a single section. Let’s start with how to use tags with MinIO. Tags are a valuable way to categorize objects saved to MinIO. Each tag is a key-value pair. You can assign tags to an object when it is saved to MinIO, or you can add them to existing objects. Go deep.

Next up is a cool piece on how to use object lambda for regulatory use cases. Object Lambdas are a feature in MinIO that enables on-the-fly customization of requested data, making it perfect for scenarios like redacting sensitive information, enriching data, or altering data formats without modifying the original stored data.?

Staying with object lambda, you can use event notifications as bucket notifications, allowing administrators to send notifications to external services such as Kafka or RabbitMQ. These events can be operations like; adding an object to a bucket, accessing an object in a bucket, deleting and removing an object and creating/deleting buckets. This can be configured to be asynchronous or synchronous.?

While AI/ML and databases are the cool new kids in town when it comes to object storage, backup remains a perennial favorite. Matt Sarrel takes us through the backup process with Restic. There are some mind blowing stats in the post but the one that really hit home is that ninety-six percent of companies do not back up user workstations. That means your organization is almost certainly at risk. Uptime matters. Data protection matters and replication does not equal backup. When an entire node fails it often takes several minutes to get it back online once reprovisioned. AJ Jambu takes us through how to set up multiple LXMIN servers backing up to a multi-node multi-drive MinIO cluster with the result being backing up the configuration on the node required for MinIO to run and operate in the cluster itself - not the data inside MinIO.?

MinIO is frequently used to store data from logging, metrics and trace data. Quickwit is designed for sub-second search straight from object storage allowing true decoupled compute and storage. Quickwit and MinIO share a lot of the same principles. AJ shows us how to use MinIO as a storage provider and as a metadata store for Quickwit. The Internet’s loved this - you will too:

Liked AJ’s previous post on VMBroker? He is back with another one, this time about tenant and lab considerations.?


New and Notable - Release Notes from November 2023:

We made over 70 features and bug fixes across 6 releases in the month of November, with the help of 3 new contributors (@DaniElectra, @adriangitvitz, @vicmunoz). Their contributions include relaxing enforcement of filename on PostPolicyKey, use the MINIO_BROWSER_SESSION_DURATION env variable to change the token expiration created by an OpenID provider to increase the availability of the object share link, among other things. Moreover, in our quest for observability, we added several new Grafana graphs exposing KMS metrics and Erasure Set Tolerance value. We also fixed an issue to ensure that we do not keep creating new MarkOffline routines that never exit, this generally happens when a node is offline for a prolonged period like in days to months. We also added a handy way of figuring out the drive failure tolerance per erasure set, which takes the guesswork out of figuring it out manually when you have failed nodes in the cluster, which is pretty cool. Last but not least we’ve updated the console to version 0.41.0.


Bits and Bytes:?

Happy December! Consider this awesome content our holiday gift to you.?

MinIO was popping on Medium this month. Let’s dive in.?

MLops engineer Zakariae Mkassi wrote a step by step guide migrating local MLflow experiments to a centralized MLflow instance, creating a robust infrastructure for machine learning projects. He used MinIO as a centralized storage location for MLflow artifacts.?

Alexander Kapincev hit us with two awesome MinIO pieces this month. He wrote a developer’s guide to integrating MinIO with Angular and Spring Boot. He guides the procedure of configuring MinIO using Docker, integrating it with a Spring Boot backend, and building an Angular frontend to oversee books, encompassing their cover images. He also wrote an easy guide to setting up MinIO with MicroK8s Kubernetes. Alex, thank you for being part of the community!?

IT Consultant Alessandro Tinivelli tests Veeam’s new direct backup to object storage feature using MinIO.??

System Engineer Lubomir Tobek dives into the details of MinIO in Docker, describing what it is, why we need it, and how to deploy it in a Docker environment on the Photon OS operating system.?

Data Engineer Dogukan Ulu wrote a stellar piece titled “Data Engineering End-to-End Project,” using MinIO as object storage for streaming data.?

Phew, thanks Medium folks. On to the next…?

Senior HPC System Engineer Murad Bayoun wrote an article covering what MinIO is, its key features, common use cases, as well as the (tons and tons of) benefits. “Overall, MinIO is a powerful and versatile object storage solution that is well-suited for cloud-native environments. It is a high-performance, scalable, and reliable solution that can be used for a variety of use cases.” Murad gets it.?

MinIO has also been killing it on Linkedin recently. Senior System & DevOps Specialist Abdussamed KO?AK wrote a Linkedin article on deploying a distributed MinIO cluster on Kubernetes using a StatefulSet object.?

It’s no secret that at MinIO we LOVE videos, hence our plethora of resources on our YouTube channel. Lead Data Engineer Soumil S. created an awesome video on how to use MinIO and Apache Hudi Delta Streamer with a hands-on lab. Check it out.??

Staying on the video wave, YoanDev on YouTube created a video called FTP vs. S3 using MinIO. It’s in French! MinIO is global.?

We’ve got another French resource for you. This one’s titled “Deploy a MinIO cluster for object storage under Proxmox.”?

DineshReddy Kayithi, DevOps Engineer, wrote an article covering Kubernetes applications backup and restoration using Velero and MinIO.?

Penetration Tester and Sysadmin Marco Fabbri wrote a piece titled “Direct to Windows Object Storage on premise with MinIO.” This guide has some great details in it. Give it a read.?

In his blog post, Sudhakar Soni explores how to set up Active-Active Site Replication in MinIO, enabling smooth data replication between MinIO-1 and MinIO-2 clusters.?

Laravel Developer Carlos Santiago walks you through a step-by-step guide to dockerize a Laravel project using MinIO, Apache, MySQL, and Mailhog.?

Last but not least… Garima was quoted in a Forbes article covering the 20 best practices for using data center as a service facilities. “Strong encryption is a must-have—both in flight and at rest…” Check out the article for the full quote.?

Happy Holidays, everyone. We’ll see you in the new year!?

Regards,

Your friends at MinIO?


要查看或添加评论,请登录

MinIO的更多文章

社区洞察

其他会员也浏览了