Clearing Cloud Confusions
Cloud conundrum
... foolish not to move to cloud, foolish to stay in cloud ...
Both enterprises, and startups, alike are in a honeymoon period. It all starts when they enjoy the hard and soft benefits of the cloud: agility, cost, security and great best of breed features etc. - which they enjoy while they have a low to moderate amount of cloud usage. Then all of a sudden, as scale goes up, costs go up quite fast...
...and all of a sudden, on-premise infrastructure looks very appealing.
This causes a major dent in the 'Cost of Revenue' metric - which directly impacts market cap - which is dependent on free cash flows. If you're a private company, your VCs will be putting pressure and if you're a listed company, Wall St. will not like it. So, as a result, a massive panic button gets pressed and a new initiative, to repatriate back on-premise gets underway. For example, DropBox did exactly this, and over a 2y period (2015 to 2017) saved $75M while seeing gross margins increase from 33% to 67%, which they?noted?was “primarily due to Infrastructure Optimization...”.
Knee Jerk Repatriation
I think whole scale repatriation is a mistake for 99% of companies out there. We need to look at the hits to gross margin in totality, and I would like to suggest that instead of ROI, we need to look at ROA because it is not just about infrastructure costs savings, what about the additional impact on gaining new talent and skills, developing new processes, procuring new technology know how, facilitating knowledge transfer and other softer operating model nuances that need to be conscientiously thought through. It's one thing for Dropbox, a digitally native iconic silicon valley company, to hire and figure out an operating model - but what about a legacy enterprise?
Unless enterprises ( and even the scaled-up startups ) take responsibility for usage by improving chargeback efficiency, infrastructure optimization utilization observability, cost governance and above all - stop looking at cloud with an ROI lens, but more of an ROA lens - we will always fall back to repatriation as a strategy which is very much of a knee jerk - and I believe will not be a successful strategy organizationally - because rolling back such a footprint is traumatic to an already traumatized workforce amidst the great resignation, COVID stresses, mental health , burnout and so forth.
Educate CFOs on the end-end of Cloud operating model
Unfortunately, companies that have scaled out are under immense pressure to repatriate very very quickly to solve the gaping financial repercussions I've mentioned. But -- the alternative picture ( life without cloud, but rather - managing your own infra ) is never seriously considered down to the grassroots. I believe CFOs and business execs need to truly immerse themselves into studying the operating model 'with and without' cloud. They should learn from Jeff Bezos - who always stood by Amazon's ( lack of ) performance for a dozen years - because he had conviction and deep familiarity with pros and cons of his initiatives. Unfortunately CFOs today are only looking at the Wall St. angle, and simply put - if they just inserted a set of 'if we moved out of cloud' slides that depicted the additional total repercussions such as hiring new people, defining a new operating model, and other softer impacts - we could train Wall St how to think and develop a certain patience. This will call out BS from gross margin inflaters and truly sustainable businesses focussing on innovation, and eventually bring more authenticity and alignment between financial management and digital strategy.
Habit Changes are the sustainable answer
Onus should be on the managers and leadership to institute first class infrastructure KPIs to incentivize a change in the way things are done (in the same way SPIFFS exist for sales people to upsell or cross sell products more aggressively). This will empower makers (engineers) to take more responsibility for the costs associated with the products they’re building. Tracking how an engineer saves cloud spend by optimizing their footprints and better managing infra utilization can make them eligible to receive a spot bonus - this is a tried and true method in many leading edge silicon valley startups that are trying to stay ahead of the curve with financial incentives.
When I hear about cloud cost analytics (i.e., fin-ops ) practices emerging, it is very encouraging - BUT - we need to look at the context behind such initiatives and who is driving them. Are VP Engineering/CTO/CIOs driving them or is the CFO? Rather, let us have Technical leaders drive it ( without pressure ), while CFOs endorse and empower habit-changing incentives in order to save millions in months ( and I'm talking from experience ).
Become Cloud Smart from the get-go
Ok - So - I know I have been hard on CFOs for ...doing their job ... but let's reframe towards a 'Cloud Smart' architecture where the realities of current AND projected future costs ( i.e., infra, maintenance, software, network, storage) and functional requirements ( i.e., latency, data gravity, runtime constraints and so on), are taken into consideration. (Credit to Greg Ogle who originally coined 'CloudSmart' ).
Ultimately it ends up in a multi-hybrid cloud footprint with globally distributed business processes amongst dozens of global applications, services and workloads ranging from custom applications to ERP, CRM, HRM, and other large on-premise systems and databases, to microservices on public cloud IaaS, SaaS providers, HCI, traditional servers and everything in between.
So morale of the story: Think Hybrid Cloud, Think Cloud Smart. Do not put eggs in one basket and if you have, it's never too late to hedge and invest into a hybrid Cloud Smart strategy. Of course, as you do so, you must understand the implications to infrastructure...
The Infrastructure that got you here, won’t get you there
This is my (highly scientific) take on how the infra world has evolved as applications have shifted to hybrid cloud powered modularity and API-enabled microservices...
The takeaway from this above picture is that a hybrid cloud footprint requires for a sea change in storage and network infrastructure, which I don't believe has not caught up with this evolution ( unlike compute ) - and I would like to share some thoughts about that now.... this is not an exhaustive analysis on storage/network - but rather my opinion on the most overlooked aspects...
Network
Network Interconnectivity in an Edge Computing Age
Emergence of SD-WAN in an edgy world
Imagine composite, globally distributed workloads, (e.g., master-slave setups for databases, active-active global deployments of middleware/integration and API lifecycle software, enterprise applications, microservices and web products and so on), all needing to coordinate and talk to each other between Cloud to Cloud, On-Prem to On-Prem and On-Prem to Cloud connections across various runtimes on private and public IP spaces.
领英推荐
Let's take a scenario where 'Org A' uses a combination of AWS, GCP and Azure across a range of IPs ( representing workload runtimes ) in each respective cloud, and 'Org B' is heavily on-premise ( private IP space ). Now Org B wants to connect their workloads to that of Org A seamlessly almost as if it is in one network ( without needing to leave ). This is possible by virtually extending Org B's private network by deploying an SD-WAN into Org A's network spaces in AWS,GCP and Azure thereby allowing for an effective overlay across both private and public spaces. It is going to be critical to do this over private connections ( as opposed to over the internet) because it provides greater security, performance and reliability. This becomes even more powerful if you have SaaS vendors in place along with multiple collocations (i.e., remote offices, IoT factories etc.) - where in this case one can deploy SD-WAN at the Edge to virtually create a single network plane between various On-Prem, IaaS, PaaS and SaaS locations.
Another benefit to using private piping is that it is an opportunity to dramatically save on extra incremental data transfer costs across multiple cloud regions ( e.g., say across AWS West 1, West 2, and Azure East etc. ), and not to mention egress being an additional incremental cost as well -?accumulating to a potentially large cloud data exchange bill...
Morale of the story: treat network interconnectivity as a first class issue and plan for establishing private, secure and performant, network inter-connectivity between and across public Clouds and on-premise data centers globally and a good example of a platform to help organizations with this is Equinix.
Storage
Gartner has said that Ultra-low-latency requirements will drive demand for edge data center infrastructure, where more than 50% of data will be generated and processed...so now the big question is, how to solve for data and subsequently...storage. When we think about storage in a hybrid cloud world, there are predominantly two types: object storage, and block storage. ( I know this well - hands on - from my days as a product manager for an OpenStack cloud ).
I want to shed a little light on a couple of innovations that I see in the storage world, as it relates to building distributed modular applications in a hybrid cloud world...
De-centralized Storage: network led storage fit for a hybrid world
There is an interesting new phenomena occurring now in the object storage world made possible via harnessing from a distributed network of tens of thousands of independent storage providers and enterprises with unused storage/bandwidth. Vendors such as Storj do exactly this, where they have established a control plane and enterprise grade storage - for example, they take a file, encrypt it, break it up into fragments and distribute across their storage network of over 11,000+ nodes. Whenever a customer needs to retrieve it, the algorithm pieces it back together from all the places ( nodes) the file was fragmented into, within a defined SLA interval, and sends it back in one piece.
This is a win-win because storage providers ( enterprises, individuals ) develop new revenue sources by leasing out spare storage, while customers get economical, secure storage at the edge. Key to such plays are the data orchestration algorithms, quality and SLA policies/promises and security. So far, this emerging Decentralized Storage space is very promising and I predict it will become a major mainstay for economical, secure, globally distributed and reliable object storage requirements for AI/ML, MultiMedia Stream and Data distribution requirements.
Traditional Block Storage is Broken ( readiness for Hybrid Cloud )
Let’s remind ourselves why enterprises bought the block storage systems in place today? – it was for scale up oriented applications, with a stable operating model of "once in a while" provisions, decommissions and steady cadence of generally predictable maintenance. Now imagine hundreds, if not thousands, of containerized microservices, around the world in various geographies, floating around an enterprise with storage getting provisioned and decommissioned at an exponentially higher rate, and also constantly? requiring patches, upgrades, policy changes at an unpredictable level of change. Not pretty.
Ultimately, developers will struggle to force-fit disaggregated cloud-native datasets into monolithic, shared dependency storage systems. They will get pulled into discussions and meetings about shared data infrastructure. In addition, they will need to account for discrepancies between various storage APIs and how storage systems' behaviors differ across on-premise and cloud infrastructures thereby limiting workload mobility. So moving from a 'dev' environment in one cloud to 'prod' on another cloud or on-prem requires rewiring storage - and eventually the onus will be on them to explicitly manage complex IT landscapes - leading to further stress and burnout because it's not their job!
So as a Band-Aid, enterprises are moving to Kubernetes in part to free themselves from lock-in from legacy system vendors and from particular clouds but also to empower developers to control infrastructure as they like. From a storage perspective, the Kubernetes Container Storage Interfaces (CSI) are supposed to make storage look and feel more ‘cloud native’ - arising out of the need for traditional storage to play well with cloud native stacks. As a result, all the traditional storage vendors are playing nice and have Kubernetes CSIs, thereby improving perception that they are K-Native and more importantly, make CFO happy by leveraging their heavy investments in block storage appliances to serve as a storage plane for their k8s workloads. But, in my opinion - in reality - it is lazy because it ultimately creates a fundamental impedance mismatch in terms of process, operating model and architecture. Let me explain...
From an operating model perspective: DevOps as we all know is all about loosely coupled teams.?Well these teams can only be loosely coupled IF their systems are loosely coupled (Conway's law) and putting everyone's shared state into a shared box or cloud service breaks this paradigm.?Instead of my 2-pizza team being in control I am now arguing with other teams about whether we should buy storage system A that is good at write ahead logs and so forth vs. storage system B that deals with my need for extraction better.?So essentially what I'm saying is that CSIs have pushed the problem down to poor storage admins. If we are to truly free developers to innovate and unleash the actually intention and power of Kubernetes, a new block storage approach is required...
Container Attached Block Storage is the answer
While CSIs are a healthy step forward, until and unless container native storage ( distributed storage at the container level) is implemented – thereby giving every team their own mini storage system and putting them are in control again – the promise of Kubernetes for Storage orchestration will always be limited. ?
Let me explain - and please stay with me and hold your breath: so, architecturally, a Kubernetes pod (where applications run on containers) puts out a 'claim' for storage - this is called a Persistent Volume Claim (PVC), which comes from a pre-defined set of templates or configurations defined from storage classes - another Kubernetes construct - where there can have a storage class for slow data, fast data, streaming data, batch data and so forth and it declares infra characteristics for the same.
Ok, enough technical speak... you can breath again: So the beauty of container attached storage is that it aligns well to the notion of loosely coupling PVCs to the worker nodes, thereby allowing for granular control of underlying storage and flexible scheduling, working against workflow constraints and accommodating policies when it comes down to implementing these PVCs. This is in opposition to the centralized storage models where the associated PVCs are tightly coupled to the worker nodes– and limits how they can be safely rescheduled and orchestrated as per policies and workflow constraints thereby defeating the purpose of being on Kubernetes in the first place! --- ( and I haven't even begun to talk about cross-AZ redundancy, fault tolerance, backup / DR , replication etc. ).
Morale of the story: Centralized storage vendors, despite their CSI conformance, do not align well with decentralized, loosely coupled architectures where data disaggregates at a low level of granularity & control. Do not be fooled.
Container attached storage is a relatively untapped architecture because, as originally conceived, containers couldn't save state information –as they were designed for stateless workloads. They were just supposed to do their job, and disappear. If they performed any operations involving data coming from/going to somewhere else, they were given the data by another process or service, and in turn handed the result off to some other process.? Good container attached solutions are still broadly unavailable, despite best efforts from solutions like OpenEBS ( CNCF project ), and others such as MinIO and Portworx which got acquired by Pure Storage. I believe this is still a white space for a startup to truly innovate – but not for the faint of heart as storage is a very very difficult.
Summary
It is key to widen the aperture of cloud financial management beyond just FinOps ( analyzing IaaS / PaaS costs in and off itself)- but take into account a holistic operating model implication on what it means to host, vs. outsource infrastructure. I believe doing so will dramatically decrease repatriation of workloads from public to on-premise and key to this is appropriate education for CFOs to stand behind cloud strategies. Inevitably, taking a hybrid cloud approach is critical to strike a balance and while doing so - go in eyes wide open that you need to shift your approach to both storage and network as your cloud footprint increases.
Scholar, Author, Higher-Ed Consultant ;Research Mentor; Academic Content Creator
2 年Terrific article — lucid and without jargon ????