Containerization - How we got here
A (Very) Brief Ancient History of Computing
In the beginning there was the mainframe - a singular computer of unparalleled power(at the time). Users would connect to the mainframe from dumb terminals and everything ran there. These were simple times.
Then computing evolved. No longer were dumb terminals the standard - everyone had their own usable computer and these computers interacted with stronger computers that provided a service of some kind (servers) to their clients (PCs).
Everyone saw this and thought it was great - no more competing for mainframe time. The amount of software available grew and so did the fracturing of systems into specific areas of concern begin. Hardware was cheap, relatively speaking so each service got its own service to run on. This was good because while the hardware was relatively cheap - it was still fairly limited, both in terms of cpu speed and ram.
Then came the rapid advancement of the 1990s/2000s. More and more cores were added to CPU dies. More and more ram became viable in a single slot. However, the applications (with some exceptions - I'm looking at you Oracle and Exchange) weren't really taking full advantage of this advancement. You could order a beast of a machine and only end up using it to 20% of its capacity. Far smarter minds than mind saw that - and the waste of the electricity going into these half-used machines and saw an opportunity to make the old new again.
A New Paradigm
With hypervisors becoming the new hot thing... again (funny thing there - Remember those mainframes earlier? They ran hypervisors too - WAY back in the 1960s) - the things that allow virtualization as we know it to happen - the scene rapidly changed. Now there are two types of hypervisors - Type 1 which runs on bare metal (Think vSphere or Xen), and Type 2 which runs on top of a host OS (Hyper-V nowadays, KVM/QEMU, and all those little random apps you can install on your laptop). What these share in common is the ability to make virtual representations of physical hardware and present them to a guest operating system. It takes care of making sure that the requests sent to those virtual assets gets handled by physical assets under its control. This lets you do neat things - like memory deduplication! You have 20 identical systems that are all loading the exact same thing into memory? Congrats - you now have about 1/20th the actual ram being used (minus overhead to keep track of this) - atleast until what is there changes. Given that's not a feature of all hypervisors, but it is something the technology makes possible. Now we had powerful servers loaded with as much cores and ram as possible hosting 10, 15, 20 different guest operating systems, each running their own app. That little reverse proxy server that was taking up it's own slot in a rack using 3% of the CPU is joined by other servers using up another 70-80% of the CPU. This allowed us to achieve a greater density of compute.
Chasing more density
You'd think the above would be enough for people. However we were still not content. We still had to spin up guest operating systems for each server and that started to seem just as wasteful. However at the time we didn't really have anything to get it less wasteful. Work started to find ways of decoupling the processes from the Operating System itself, ways to ensure that they could be isolated, and ways to ensure that they could control the resource usage. This work started in the mid-2000s in earnest but didn't really come to fruition until the mid-2010s. The technologies that allowed this were cgroups - and namespaces.
CGROUPS or Control Groups were a way to limit a specific thread - and eventually a specific process - to a certain amount of IOPS, CPU Time, Ram, Network Bandwidth, etc.
领英推荐
Namespaces are basically compounds. Everything related to a specific namespace is restricted to that compound - It's their entire world. Think of the Truman Show. All they can see is what is in their namespace. Instead of SystemD or init being Process ID (PID) 1 for example - it's whatever the process is that had the namespace generated for it.
The combination of the the two allowed containerization to really happen, with docker, containerd, and CRI-O being the main ones people are familiar with. We had Linux chroots, or BSD jails, or Solaris Zones before that sure - but containerization made things more standard. No longer were apps tied to the underlying OS. Need library x.y.z but the os only has library a.b.c? No problem. In a container - the app doesn't even know lib a.b.c exists. Slap lib x.y.z in it and continue about your day. The container only needs those specific items that the application needs to function. If it theoretically only needs the binary, two libraries, and a file in etc for it's config - then thats all you need in the container. This makes them very small. Perfect for density, as you aren't wasting disk space, ram, and cpu on multiple VMs running a full operating system and all that entails. While some containers include a micro-os like alpine or redhat's ubi for an easy deployment of basic utilities, these are still using the host kernel.
All of that matters for one big reason - Deployment of software. No longer is the concern that a server is set up this way and if you're running multiple services on it that they don't have conflicting library requirements. Instead of developers going "Well - it worked on my dev box" you now have a packaged application that will work the same no matter where you are deploying it.
A Big New Problem
With the expansion of containers - where three servers might host 20-30 vms - they now may host 300 containers. Each container is providing it's own service or function - from tiers in a webapp to a single api. Managing these containers at scale - ensuring they have access to the same namespaces as necessary, ensuring that ones that are dependent on each other can communicate, ensuring they all get deployed together - became a large issue. That lead to the orchestration revolution. Docker at the time was the big name and they put their efforts towards Docker-compose and later Docker Swarm - in my opinion a far simpler to deal with methodology. However when it comes to being able to bend it to your will - lets face it - nothing is more capable than Kubernetes (K8S). There is a reason both of the major ease of use webapps for containers - Rancher and Openshift utilize kubernetes (in both it's full form k8s as well as it's more lightweight k3s varieties).
Parting words and Recommended Reading
So that's how we got here. Containers are the way going forward - atleast until something else old becomes new again. They've greatly simplified deployments from a dependency perspective - but now the pain point has shifted to the configuration side. YAML is the new king and you'll be seeing plenty of it as we continue along our journey through the other revolutions brought on in the last decade - Infrastructure as Code and Automated Configuration Management.
For additional reading I'll start with the 800lb gorilla in the room: What is a Hypervisor? | VMware Glossary
I really can't recommend this entire section of the Red Hat site enough: Containers explained: What they are and why you should care (redhat.com). RedHat has been a major contributor to and proponent of containers. They provide one of the two most used platforms for deploying containers and making them easy to manage (the other being Rancher by SUSE).
For those that love actual books - This is always going to be my desk reference recommendation The Kubernetes Book: Poulton, Nigel: 9798402153776: Amazon.com: Books.