Software Clusters for the future
A watershed moment is when everything changes.
For the development of software applications, this happened in 2004 when a couple of significant events unfolded:
- Law of diminishing returns caught up with CPU speed, challenging Moore’s law which had held steady since 1965. Intel scrapped its single core architecture for the first time and made it obvious that the performance gains from single core machines will be limited in nature moving forward. Following graph shows the stunted growth in clock speeds:
Courtesy : The Free Lunch is Over
- The advent of MapReduce architecture and the unveiling of its implementation for search at Google showed what was possible for distributed applications.The possibility of commoditized hardware working in unison became ripe, to be explored for solving the problems which had not been solved before.
The paradigm of encapsulating computations and shipping it to data residing on different machines brought a fresh look at problems operating on massive datasets.
The above events made the software community take notice. In fact, the thought now was that the whole data center could potentially become a single scaling domain for the application(s) as so eloquently described in this seminal document by folks at Google.
Subsequently opensource software started appearing on the horizon which was built to scale horizontally.
While the above events were just the beginning, it opened up the floodgates of innovation. The acceleration in new software paradigms was exponentially fast, leading to quicker adoption. Over the next few years, there was a surge in stateless applications built on stateful cluster components. Following are some of the building blocks for modern software stack which appeared on open source horizon abstracting away the services needed for writing a carrier-grade application:
- Notification Service: Kafka, RabbitMQ, ZeroMQ
- New Storage Paradigms: Cassandra, MongoDB, CouchDB, Riak, Neo4j
- Cache: Memcached, Redis, NCache
- Search Service: ElasticSearch
- RDBMS: CockroachDB, Clustrix
- Time-series Database: InfluxDB, Riak, OpenTSDB, Druid, Graphite
Even though readily available components in the open source software domain made it possible to get applications consuming it up and running, the operations of clusters of these infrastructure elements remained a daunting task, thus giving rise to the whole DevOps movement.
Most of these services are now available on the public cloud as managed services, hence removing friction for cluster management.
A Perfect Cluster Storm for Mass Consumption
We are at another watershed moment. Over the last couple of years, a perfect storm is brewing again where a desire for operating applications built around clustered components as well as optimizing the hardware (to eke out the last bit of performance) has necessitated new software innovations.
The coming together of needs for clusters and operating it with less friction is evident; we are at a cusp of the next application paradigm which is going to be built on top of the rapidly evolving ecosystem. The components that will lead the way are:
- Cluster Manager: Kubernetes, Mesos, Swarm
- Containers: Docker
- Serverless: Openwhisk, Iron.io, Fission.io
The above three components stitched with the right infrastructure and married to the right business logic to solve the customer automation problems will drive the next phase of growth.The whole loop is explained rather nicely in the below picture (courtesy):
The future belongs to products which can be sliced and diced on a customer assembly line with little to no intervention (to solve their use case), to make sure their assets can be automated to extract maximum value possible.Next couple of years holds promise for software engineers to hone in on the clusters for the future.
Note: Hiring engineers throughout the stack UX, UI- (Angular, React), Distributed systems-(Kubernetes, Microservices, Kafka, etc.), Machine Learning to build an awesome team working in unison to offer delightful product(s) for the customers. If interested,send your resume to vibhu(dot)pratap(at)gmail(dot)com
Note++: Many thanks to my friends Romil Khansaheb & Sachin Rao for helping me refine some of the thoughts as well as Hari Harikrishnan for taking the time to have a discussion.
Very well written Vibhu
Senior Technical Staff Member, IBM Open Technology and Developer Advocacy. Opinions are my own.
7 年Very nice Vibhu Pratap !! On missing pieces, not really but, perhaps mention of logging (e.g. logstash) service which is an important piece of stack?
Driving Impactful Products using Generative AI - Hiring Product Managers in Palo Alto/Abu Dhabi/Austin
7 年Very nicely articulated!
Nicely articulated Vibhu. You are right, it takes a perfect marriage of said infrastructure (cluster, container, serverless) to build a automation business logic. The tying knot could very well be a computing platform that automates seamlessly across things, people and processes.
Architect @ Microsoft | Ex-Salesforce, Juniper Networks | IIT, Chicago
7 年Nicely summarized. Last decade or so has seen amazing developments as to how people build/use distributed systems. Keep sharing such thoughts