Data, The Silver Lining of Cloud
To transform interactions with customers, employees, and partners many enterprises are expanding their adoption of cloud at an accelerated pace. Cloud has proven itself during times of uncertainty with its resiliency, scalability, flexibility, and speed. Cloud underpins most new technological disruptions that has helped businesses to be adaptive, giving them an ability to pivot quickly to a new opportunity, industry, customer base or revenue stream. As per the Harvard Business Review Analytic Services survey [1] 83% of survey respondents say that cloud is very or extremely important to their organizations’ future strategy and growth. Gartner forecasts end-user spending on public cloud services to grow by 21.7% to reach $482 billion by end of 2022 [2]
But while the potential benefits of adopting cloud are numerous, so are the challenges like security, costs, vendor lock-in, skills availability, etc. which needs to be addressed.
How secure is my data and applications on cloud?
Security had been a major concern for enterprises. 66% of IT professionals consider security to be a major challenge to cloud adoption [3]. The biggest concern enterprises have is to store their sensitive data with a third-party cloud services provider. The high risk of damage to the business in case if data is lost, leaked, or exposed, weigh highly on the minds of decision makers. But the reality is that large public cloud services providers invest significantly in their security like top-of-the-line perimeter firewalls, intrusion detection systems, internal firewalls for individual applications and databases, and data-at-rest encryption, etc. making their security robust and reliable than any organizations could afford. With this becoming a common knowledge this perception about security is therefore fading away slowly.
Is public cloud really a cheaper?
It seems intuitive that by sharing resources to smooth out peaks, paying only for what is used, and cutting upfront capital investment in deploying IT solutions, the economic value will be there. But almost every enterprise has already invested significantly in building data centers and there is no “operating cost” for using on premise computing and network infrastructure. Whereas moving to cloud there is no upfront investment required but there will be cost of using the cloud resources and services, not to forget the cost of transformation required to migrate applications and data to cloud. Sooner, there will be time when the cost of migration and accumulated operating costs will exceed the initial investment and the cost of maintaining the on-premises data centers. How soon that will be is always a matter of how and how much you use cloud resources. Saying that, most of the enterprises have maxed out of the data center capacities and are not able to accommodate increasing demand of storage and compute. Adopting cloud in such scenarios seems to be an easy option rather than spinning a new data center altogether.
“If it’s not broke, why fix it?”
Moving existing legacy systems to cloud platforms is a difficult and high-cost process. Most of the legacy systems may need to be completely re-architected and rebuild to use cloud resources and services optimally and effectively. But it is quite possible that the knowledge about the legacy systems being outdated, imperfect, and undocumented. Hence, the biggest challenge is an in-depth understanding of their current state which is important for better migration outcome. When I was putting together cloud migration strategy for one of my clients, the legacy system was having a set of applications which were build decades ago and never changed, only had binary executables. Though it was running fine, there was no code available to understand the business logic of these applications.
Do I have the people with right skills?
One of the pressing challenges in adopting cloud is the talent availability. In a PwC’s 23rd Annual Global CEO Survey [4] 74% of respondents were concerned with a lack of availability of the right skills. Of those, 32% were “extremely concerned”. The survey also found that those organizations that focused on expanding their employees’ skills were ahead of their peers in many ways and were more confident in their future.
With the advent of cloud formulating an information technology (IT) strategy has suddenly become a lot more complex. The CIOs and CTOs are expected to be able to address the frequent and often unpredictable variations in demand for IT resources quickly and, in many cases, without long-term commitments. And all of this be delivered with predictable declining year-over-year costs. Cloud computing opens up significant benefits in time-to-market and cost of IT services. But there are challenges and obstacles to overcome along the way. The strategy must be balanced and match corporate appetite for risk versus reward. The CIOs and CTOs are, therefore, on the look out to find the Silver Lining of cloud computing.
What Cloud has to offer for Data?
For a long time, enterprises have seen value in data which has helped making informed business decisions and develop business.
In recent years, given the phenomenal advancement in data technologies, availability of affordable data storage, and processing capabilities, has enabled businesses to use data beyond business insights and decision support system to gain competitive intelligence on customer and demand.
This rapid evolution of data and way companies are using it to change their business models is driving powerful change in the digital economy. With this significant new relationship of data and business strategy, data is now really seen as an asset on the same scale as traditional assets of an enterprise like real estate, cash, intellectual property, human resources, etc.
Companies that realize the true value of their data and leverage it are seeing continued growth, gaining market share, and are disrupting older legacy businesses that have been slower in adapting to such trends.
For many decades, people have been using data analysis and analytics techniques to support their decision-making process. Until the time the data growth was linear, particularly limited to data collected from transactional systems, and did not vary much in structure, a combination of database management systems and visualization tools for applying statistical analytics techniques was enough. However, in the last two decades, the volume, variety and speed with which data is generated has changed significantly. And to extract value from this data the analytics techniques applied has also become more advanced like predictive analytics, user behavior analytics, etc., all this together termed as Big Data. Organizations started to see value in using these advanced techniques and perceive big data to be critical for a wide spectrum of strategic corporate goals, from new revenue generation and new market development to enhancing the customer experience and improving enterprise-wide performance. Big data is definitely disruptive, potentially transformational. It has the ability to revolutionize business. Accenture the survey conducted by Accenture 93% of the respondent feels big data is important for their organization, of which 59% feels it is extremely important. 79% of respondents agree that companies that don’t embrace big data may lose their competitive advantage and might even go extinct [5].
Cloud Computing makes it easier and cheaper to get more value from Big Data
Storing ever growing huge data sets needs a lot of storage and analyzing them needs much larger compute power than any enterprise can easily afford. Using standard data center servers to analyze them would be either impossible or impractical due to the amount of time it would take or may even lack capabilities for advanced analytics solutions.
Cloud Computing has made it easy to rent a state-of-the-art infrastructure and only pay for the time and power that you use. Additionally, cloud computing can provide fast load times, scalability, automated application deployment, multiple back-ups, and constantly updated hardware. The costs for this kind of always on, always growing, always up-to-date service can be prohibitive for enterprises with tight budgets.
One of my clients was developing a real time traffic information system to analyses and predict traffic patterns on busy roads using mobile GPS locations of travelers. The model used to take about 190 hours (about 8 days) to train with a cluster of 8 HPE ProLiant DL380 G10 Intel Xeon servers (1 master, 6 workers, 1 parameter server). To make it run faster it was estimated to add another 10 workers to the cluster. Instead, when we moved the processing to Google Cloud (GCP) AI Platform using Premium-1 tier we got access to NVIDIA Tesla K80 GPU machines and a 31-node cluster (1 master, 19 workers, 11 parameter servers) and the training for the same model completed in just 10 hours. The cost of this execution was hardly about $150 (this is a highly simplified version of the AM/ML solution, please contact the author for more details or building an AI/ML solution for your scenario).
If you need to deliver content to a range of devices in a matter of seconds, you will need a live-streaming solution. In most streaming scenarios, there’s a certain level of volatility related to audience size. For an internal audience with somewhat predictable traffic, on-premises deployment might be perfectly suitable. But, if you have a much larger audience on global scale and seldom peaks, you will have to plan to capacity for those peaks, keeping the capacity idle for most of the time.
Conversely, cloud-based deployments can scale your capacity up and down in minutes, while also cutting costs for hardware and maintenance.
Cloud provides cheaper alternatives to expensive data warehouse appliances
A data warehouse appliance is a combination hardware and software product that is designed specifically for big data analytical processing and delivers a high-performance data warehouse right out of the box. They offered pre-configured redundancy and availability. Most DW appliances use massively parallel processing (MPP) architectures to provide high query performance and platform scalability. Well! Until now. Teradata. Oracle Exadata, SAP Hana and Netezza are some of the commonly used data warehouse appliances.
The biggest challenge with these systems is that they are expensive and need careful capacity planning since the scaling is achieved by expanding capacity and the lead time to procure, real-estate requirements, and downtime adds to the complexity. In today’s world, with the ever-increasing requirement for storage and compute for analytic applications and budget constraints making it difficult to scale up additional appliances to expand storage, analytical workload capacity, or disaster recovery, enterprises are looking out for alternative solutions.
Cloud offers attractive options for data warehouse appliances with better economics such as pay-as-you-go and better scale (elasticity and the ability to expand a cluster within minutes). Cloud offers multiple adoption possibilities like lift-and-shift from on premise DWA to a similar DW on cloud like from Teradata DWA to Teradata Vantage on cloud, embrace cloud native DW like Redshift on AWS or BigQuery on Google or as a secondary platform for purposes of disaster recovery.
Cloud encourage data driven innovation at scale
Cloud has turned out to be much more than just a rented and scalable infrastructure. It has proven to be a catalyst for business innovation, reinvention, and growth by helping businesses derive insights from real-time data, enabling real-time and event-driven decision-making and quickly able to capitalize on new opportunities. Cloud has encouraged innovation at scale by making it easy for the broader teams create new products, services, and experiences, while simultaneously benefiting from others’ creativity.
In my opinion cloud has accelerated innovation in following ways:
Democratization of data?
Data is the fuel that powers nearly all business innovation today. And cloud is the perfect platform to quickly tap into your data and create new insights using advanced data science. As per Google Cloud/Harvard Business Review paper [6] 97% of industry leaders surveyed said democratizing access to data and analytics across the organization is important to business success.
Data democratization is making data accessible to the average non-technical user. One of the biggest barrier of data democratization always had been lack of user-friendly tooling. An average user had to wait for the data to be available in the data warehouse and eventually in the reports.
Cloud provides radically simplified and intuitive tools empowering users to generate insights by leaning into the tools and skills they already have.
Data wrangling tools help visually exploring, cleaning, and preparing structured and unstructured data for analysis, reporting, and machine learning.
Users can drag and drop data from variety of sources, structured or unstructured, apply rich transformations like aggregation, joins, union, extraction, calculation, comparison, condition, merge, regular expressions, and more with click of a mouse. Visually identify data anomalies such as missing values, outliers, and duplicates and easily correct them.?
In my view the data wrangler tools, provided by most CSPs like Dataprep by GCP or Data Brew by AWS, is an important feature of cloud to enable democratization of data.
Democratization of AI/ML
Most enterprises perceive Artificial Intelligence and Machine Learning (AI/ML) is a rocket science and out of their reach. AI/ML has traditionally been a domain of experts and specialists like data scientists, and although invaluable, is complex, expensive, and slow process for many enterprises. These barriers hinder the rapid adoption and absorption of AI/ML.?
As per McKinsey [7] by 2030, companies that fully absorb AI tools across their enterprises could double their cash flow, while companies that don’t could see a 20% decline.
For business to quickly adapt to changing market trends and be ahead of the curve to capitalize on new revenue opportunities, they need a culture that facilitates experimentation and failure alongside the operations and scale. Designing for scale is therefore essential to ensure speed-to-market. This ability to immediately begin experiments and build new products & services, can be achieved by enabling a larger workforce with deep business knowledge, but limited technical capabilities, to experiment by themselves and innovate at scale.?
Because advanced analytics and machine learning are available as a service in the cloud, virtually any user without any specialized skills can now access these capabilities.
In my opinion following services offered by cloud services provider help democratize AI:
Take advantage of ready-made AI: To jump-start an AI/ML initiative without much upfront investment, this is the best option offered by all major cloud services providers. These services enable people with limited machine learning expertise to build custom AI/ML models in minutes. It uses a No-Code approach with a set of prebuilt models via a set of APIs. There are pre-built AI/ML models in areas like natural language processing, computer vision, translations, and knowledge mining. Examples Vertex AI and AutoML by Google Cloud which is a unified platform to help you build, deploy, and scale more AI models. Other examples are Azure AI or AWS AI Services.
Easy ML using SQL: If your availability of Python or R skills within your organization is hindering your speed-to-market of machine learning based solution. Then how about building ML models using SQL, which is generally a talent availability with larger employee base. The data warehouse solutions offered by CSPs today come along with features that let you create and execute machine learning models in using standard SQL queries. Example BigQuery ML by GCP and Redshift ML by AWS. This code snippet from Google Cloud documentation [9] will help you understand how easy it is to build, evaluate and predict using ML models created in SQL.
--Creating and training ML mode in BigQuery
#standardSQL
CREATE OR REPLACE MODEL `bqml_tutorial.penguins_model`
OPTIONS
? (model_type='linear_reg',
? ? input_label_cols=['body_mass_g']) AS
SELECT
? *
FROM
? `bigquery-public-data.ml_datasets.penguins`
WHERE
? body_mass_g IS NOT NULL
--Evaluating the ML mod
#standardSQL
SELECT
? *
FROM
? ML.EVALUATE(MODEL `bqml_tutorial.penguins_model`,
? ? (
? ? SELECT
? ? ? *
? ? FROM
? ? ? `bigquery-public-data.ml_datasets.penguins`
? ? WHERE
? ? ? body_mass_g IS NOT NULL))
--Predicting using (executing the) ML mod
#standardSQL
SELECT
? *
FROM
? ML.PREDICT(MODEL `bqml_tutorial.penguins_model`,
? ? (
? ? SELECT
? ? ? *
? ? FROM
? ? ? `bigquery-public-data.ml_datasets.penguins`
? ? WHERE
? ? ? body_mass_g IS NOT NULL
? ? ? AND island = "Biscoe"))
Quick visualization: How about an ability for a general non-technical user to seamlessly create pixel-perfect dashboards in minutes–securely connecting to petabytes of data? Or even an ability to everyone in the organization to explore data by creating interactive dashboards just asking questions in natural language? The cloud native BI services offered today provide such features which help business users interact easily with data. They no longer must wait 2-3 days for the data to get processed thorough ETL and refreshed on reports. This helps business users make critical decisions in near real-time as the data is generated or when the event occur. Example, AWS Quick Sight, Azure Power BI or GCP Data Studio or Looker, etc.
Cloud makes the geographic expansion of business easy
Business which are geographically diversifying may need to setup IT infrastructure across different location due to regulatory reasons or latency requirements for local customers. Managing data centers across the globe could be very expensive.
Almost all the major cloud services provide have globally distributed data centers (AWS regions and availability zones, or Google Cloud regions and zones).
Edge computing could have an answer. Edge computing is a distributed computing paradigm that brings computation and data storage closer to the sources of data. It improves response time and saves network bandwidth. All major CSPs provide globally distributed edge locations for low latency content delivery.
Businesses who want to diversify globally, either physically or expanding their customer base across countries, should consider the option of using CSPs data centers in the regions they want to operate and / or take advantage of their edge locations for content distribution.
The cloud promises business continuity and disaster recovery.
The cost of losing your data can be detrimental to your business. In fact, a study conducted by the British Chambers of Commerce found that 93%of businesses that suffer data loss for more than 10 days file for bankruptcy within one year. This is supported by The Diffusion Group who found that 72% of businesses suffering a major data loss disappear within 24 months [8].
Eliminating downtime is critical for businesses that want to survive and remain competitive following a data loss incident. When developing a disaster recovery strategy cloud could be a useful option. Cloud reduces the need to add and maintain a redundant capacity to on premise data center required just for the purposes of disaster recovery. Moreover, it certainly reduces the need for an additional geographically diverse data center that will be required for geo-redundancy. There are many ways cloud can be used for purposes of disaster recovery.
Typically, every data center needs to maintain a tape library for regular backups. Tape media management, media costs, 3rd party offsite contracts, and the sheer volume of data growth makes tape backup challenging. All major cloud services provide offer a cheaper solution for data archival. Example Google cloud’s Coldline or archive storage or the AWS Storage Gateway service offers a Tape Gateway configuration that gives you an alternative to physical backup tapes that fits seamlessly into an existing backup process.
For more aggressive RTOs and RPOs, a standby disaster recover environment can be setup on cloud using virtual machines and data replicated from on-premises to cloud. For one of our clients running Cloudera Hadoop ecosystem on an on-premises data center with a 300-node cluster we set up a disaster recovery stand by environment on AWS using CloudFormation and regular data replication. When tested, we were able to stand-up the DR environment and make it functional within hours. Considering a geographically diverse availability zone automatically took care of the geo-redundancy requirements. When not in use the only cost incurred was for data storage and no cost was incurred for the virtual machines and other resources (this is a highly simplified version of the DR solution, please contact the author for more details or building a DR solution for your scenario).
Conclusion
The cloud computing is a paradigm shift in the way companies deal with customizable and resourceful platforms to deploy software. It has been receiving increasing attention, partly due to its claimed financial and functional benefits. Today‘s CIOs and CTOs are therefore under pressure to drive cloud adoption and acquire this new cloud capability sooner, if not immediately. But, given the scale and scope of change required to exploit this opportunity fully, what constitutes a successful cloud implementation to actually capture that value, is an important question.
Journey to cloud is not a straight line but complex decision involving many parameters and unknowns. For instance, moving applications hosted internally to SaaS, or rewrite applications to more cloud native forms such as PaaS, or to lift and shift existing workloads from the data center to the public cloud such as IaaS? Large enterprises have built many applications over decades which need to be re-architected to run efficiently, securely, and resiliently in the cloud. This increases the cost of migration and as well, in some cases, might cost more to run in the cloud than before. The economics, skills, processes, and organizational changes required are too complex and span too many different parts of the businesses. Defining the cloud opportunity too narrowly with siloed business initiatives may not fetch the expected returns. That’s because no that no two migrations are the same and significant consideration should be given to how the organization will need to operate holistically in cloud.
But this is now changing. When I talk to CIOs and CTOs today, they no longer look at cloud as just a rented and scalable infrastructure but as modern technology platforms that enable business resiliency, agility, flexibility and more importantly innovation.
As per S&P Global Market Intelligence "Voice of the Enterprise 2021" survey [10], more than 95% of enterprises consider AI technology to be important to their digital transformation efforts.
The IT strategy now considers cloud as an enabler and catalyst for digital transformation, business innovation, reinvention, helping their organizations to be disruptive and spring forward and grow from where they are.
I think they have found Data & AI as the Silver Lining of cloud.
References
This content is provided for general information purposes and is not intended to be used in place of consultation with our professional advisors.
Copyright ? 2022 Accenture. All rights reserved. Accenture and its logo are registered trademarks of Accenture (https://www.accenture.com ).