The January 2024 MinIO Newsletter

The January 2024 MinIO Newsletter

Welcome to the January 2024 MinIO newsletter. With the new year, we wanted to reflect on the growth of our little endeavor. We reached more than 500K readers in 2023. We did 180 bits and bytes stories. We covered more than 145 blog posts and chronicled more than 108 releases.?Pretty cool.

We had so much content that we had to change formats mid-year to account for the content surge (16 new blogs posts in December alone) and folks seemed to like that so that will be the structure going forward. While most folks love the updates, if you don’t, you should always feel free to opt out (at the bottom). Conversely, if this was forwarded to you, you can always opt-in to stay current.


Bathe in the Modern Datalake:

We may be in peak winter, but the modern datalake is heating up. As the geologists in the crowd know, where there are tectonic shifts, there is heat and that is what is keeping this area so hot. AI is a big driver of those shifts but we have a separate section for that so we are going to keep it focused on the core architecture and component parts.?

The success of your AI strategy is dependent less on the model and more on the data and the infrastructure. This is why many enterprises will have to restart their AI initiatives in the middle of the year - because they are too focused on the frontend and are not putting a proper foundation in place. Keith Pijanowski knows this - as a practitioner and as an architect. His piece on building an AI Datalake is must read stuff from the boardroom to the architect’s Slack channel. It emphasizes the importance of having a scalable, efficient, and data infrastructure layer, which is fundamental to the successful deployment and operation of AI applications.

Feeding off this thought leadership angle, Brenna Buuck shows us how the success of managed service offerings like ClickHouse and MotherDuck is premised on the decoupling of storage and compute which allows both the storage and compute to scale independently – effectively unlocking both infinite and workload specific scale for each.

A key new offering in the database space is the vector database. They enable the efficient handling of complex, high-dimensional data in the AI/ML workflow. MinIO is core to that world and products like Milvus and Weaviate use MinIO under the hood. Brenna checks out one of the new kids on the block LanceDB and looks at how it supports various data types as well as explicit and implicit vectorization.

Brenna closes out with an excellent piece on MinIO and the StarRocks DB. As we have noted, with object storage as primary storage, databases choose to run directly on top of them - provided your object store can deliver the type of performance and more importantly performance at scale required. Brenna provides a step-by-step tutorial for getting started with StarRocks with MinIO as primary storage. Bonus content can be found here.?


The AI Corner:

We have four AI posts this month and they are all worth your time. Keith Pijanowski took us on a whirlwind tour of the Ray Framework this month. His first post, Distributed Data Processing with Ray Data and MinIO showed how to do distributed data processing. Be sure to check out this post if you are using MinIO to store datasets that can’t fit into memory. His next post, Distributed Training with Ray Train and MinIO built upon his first by showing how to code an ML pipeline with distributed processing and distributed training. He also showed how to use Ray Train for checkpointing. His final post, Distributed Training and Experiment Tracking with Ray Train, MLflow, and MinIO, pulled everything together using MLflow for hyperparameter logging and metric tracking within a distributed ML pipeline. The code samples for this series are reusable. We hope they save you a lot of time.

Matt Sarrel joins the AI party with his piece on using Microsoft SQL Server 2022 and MinIO for use in AI programs. SQL Server’s external tables function is a powerful one and we are going to see more and more AI/ML applications take advantage of it.?


Executive Corner:

The Executive Corner was super successful in its debut. These are less technical posts. They look at macro issues of interest to the C-Suite and its direct reports. They tend to be opinionated and are designed to challenge the “conventional wisdom” of the data world.?

Last month’s #1 read was, for the second straight month ?The Unified Storage Narrative and Why It Is a Lie. That earns it another look. There are some vendors that would have you believe that they are best in class at everything - storage, databases, AI. We all know that is not true. The best-in-class tool is almost never a Swiss Army knife, it is something designed for a specific problem class. The idea of supporting S3, NFS, SMB, HDFS, iSCSI, FCoE, NVMeoF and FCP all at once is a false promise.

Moiz Kohari lit up LinkedIn with his take on the cloud operating model and how you can save money by going to the cloud and how you can save money returning from the cloud. He expanded that take with a blog titled Two Things Can Be True at the Same Time - it explains how both can be true, it just depends on where you are with the cloud lifecycle.?

Finally, Jonathan Symonds weighed in on our AI year and what we might be missing. Specifically, the modern data infrastructure stack doesn’t need AI in the way that AI needs the modern data infrastructure stack. This often overlooked point will define the success of enterprise AI in 2024.?


?MinIO Internals:

We have lots of great MinIO centric content this month. Let’s start with Sasha Wodtke’s excellent review of MinIO top organic posts from the year. Getting hot on HackerNews is really helpful. See what posts did.?

At reInvent, AWS announced a new service, ExpressOne that was designed for even more speed. We break it down and reveal why it is a MinIO compliment.?

Most MinIO production deployments are run in airgapped environments. Why? Well, security for one. Learn how to optimize those deployments and still take advantage of all that MinIO has to offer.

Speaking of best practices, AJ scores again with a stellar post on Day 2 scaling. This post delves into some of the considerations for long-term MinIO management that you need to take into account. Must read stuff for all you production deployments.?

Klaus Post enters the chat with another gem this month, this time on how to deal with hundreds of servers and long running requests. The answer required us to look at small requests with small payloads as well as long-running requests with small, streaming payloads that could not hold a connection open for the duration of the call. The solution ended up being Websockets, since they provide A) Two-way communication B) Binary messages C) Clean integration into existing connectivity and D) Good performance. Check out the whole post here.?

David Cannan debuted in December with his first post. It takes us on a journey through the nuances of Docker networking, unraveling how to efficiently bridge Docker containers with localhost environments so a Dockerized MinIO service can effectively communicate with a Flask application running on your host machine. Whether you're developing on a laptop or deploying on a global scale, these skills ensure your applications remain robust, scalable, and secure.


YouTube:

It’s no secret that we have an incredible motion graphics team here at MinIO. And they put their skills to work this month. Introducing our “What is MinIO” video—it’s aesthetic, simple, and informative (not to mention impressive). MinIO is a high-performance, S3-compatible object store. We’re built for large-scale AI/ML, data lake and database workloads. This video covers it all for you. Also – huge shoutout to Jill Inapurapu, Software Engineer at MinIO, for guiding us through the voiceover. Team effort.?

Subscribe to our YouTube channel here.?


New and Notable - Release Notes from December 2023:

We made over 115 features and bug fixes across 8 releases in the month of December, with the help of 3 new contributors (@bestgopher, @opencmit2, @vanugrah). This is a major refactor release with WebSockets for internal communication between nodes. We advise our users to follow our upgrade best practices. This release brings a larger change to internode communication, which is now re-implemented to use WebSockets for most of the internal calls that enables efficient scalability to 100s and 1000s of nodes. To learn more about the WebSockets grid implementation please read. Site replication now supports the ability to heal ILM configuration to all sites, allowing ILM settings across buckets to be centrally managed via any site. Batch replication from MinIO -> MinIO uses a compressed archive by default to more effectively utilize the bandwidth during Batch Replication. MinIO now supports external caching for metadata calls such as HeadObject() in a distributed memory for faster lookups when you rely on slower media such as HDD combined with a READ-heavy workload pattern. Last but not least, MinIO supports starting the server with arguments and configuration via a YAML file. This YAML configuration describes everything that can be defined in a MinIO setup, such as '--address', '--console-address', and command line arguments for the MinIO server. Official documentation will be subsequently updated. Internal documentation on this feature is described here.


Bits and Bytes:

Wow—what an awesome start to 2024. Our bits and bytes section is STACKED this month. No wasting time. Let’s get into it.?

First on the agenda are our wonderful Medium articles.?

Software Engineer Pavlo Sharhan got in early this month with an in-depth tutorial on how to host your own S3 server at home. This article includes an informative section about some of the hidden costs of AWS S3. Don’t miss that.??

Next up Platform Engineer Paul Robu wrote about CockroachDB and MinIO. More specifically, his article covers a simple example with steps to back up and restore a virtual table in a CockroachDB from MinIO storage on Kubernetes.?

Huseyin Kafali, Data Engineer, demonstrates an end-to-end data streaming project, grasps the operational data changes and writes them to an object storage, MinIO.?

QSS smart it revisits our recent collaboration, our game-changing data analytics transformation, with Microsoft SQL Server 2022 and MinIO. This blog reflects on the key insights shared during this collaborative showcase. Here’s our take on it: https://blog.min.io/data-science-sql-server/.??

Lately, we’re seeing more and more of our community articles being written in different languages. This means we are doing our job. Alexander Petrov writes about saving backup copies of Blobs as S3 objects using MinIO and Kubernetes. His post is originally in Russian, but if you have the Google Translate Chrome extension, you can easily read it in English. You’re killing it, Alex… can’t wait to see more from you.?

Username ‘D’ on Medium keeps it easy breezy with a short and sweet tutorial on how to run a local MinIO store in under five minutes. Thanks for embodying the MinIO key tenant of simplicity!?

For our avid newsletter consumers, you may remember that last month’s newsletter included Data Engineer Dogukan Ulu’s Part 1 of his “Data Engineering End-to-End Project.” He’s back with Part 2! Stellar work here.?

Data Engineer Burak U?ur hits us with an article all about MinIO—in Turkish.?

Cloud Native Engineer Utku Mert uses a sample scenario to explain the zero downtime MinIO replication process.?

Sarthak Sarbahi, Data Engineer at American Express, guides you through an analytics use case, analyzing semi-structured data with Spark SQL. He starts with the data engineering process, pulling data from an API and finally loading the transformed data into a data lake (represented by MinIO).?

Medium completed. Onto the rest.?

DineshReddy Kayithi, DevOps Engineer, published a tutorial titled, “MinIO Single Node Singe Disk Deployment on Linux (CentOS/RHEL or Ubuntu) Using a Script.” His article has tons of photos to help you easily follow along throughout the process.?

He also posted an article focusing on “OpenStack Deploy Using Minio (s3 Compatible Object storage) as Cinder Backup Driver and Testing the Cinder Volume Backup and Restore.”

Alex Woodie from Datanami released an article about the big data predictions for 2024, featuring our very own AB. AB weighs in with the following statement, “In 2024, we’ll see an enterprise explosion of truly unstructured data (audio, video, meeting recordings, talks, presentations) as AI applications take flight.” Check out the article to see the rest of what AB had to say.?

Over on dev.to, open source advocate Mario Garcia covers MinIO object storage on Docker as part of his series on Percona Backup for MongoDB (PBM). His article shows how to configure MinIO to use it with PBM for storing the backup of your databases.?

Platform Engineer Danilo Correa uses MinIO to compress files directly to S3 with PHP. And his article is in Portuguese!?

We saw a couple of cool things on Linkedin this month too. Senior DevOps Engineer Ray J. released some code written in Go for MinIO and Kubernetes to sync game worlds from a bare-metal MinIO cluster to multiple cloud providers’ object storage. Check out his Linkedin post about it.??

On YouTube Naresh Dulam created a tutorial on how to build an on-prem data lake using Apache Spark, Apache Hudi and MinIO.?

Developer Advocate at Dremio and friend of MinIO Alex Merced created a step-by-step guide tailored for beginners and pros alike, showcasing how to utilize Dremio and MinIO with Docker Desktop—all without writing a single line of code. Thanks, Alex!?


Happy New Year, everyone! It’s going to be a good one. 2024 here we come.


Regards,

Your friends at MinIO?


要查看或添加评论,请登录

MinIO的更多文章

社区洞察

其他会员也浏览了