Cloud vs. on Premise for Data Warehouse

Cloud vs. on Premise for Data Warehouse

The global cloud computing market was approximately $ 371.4 BILLION in 2020, which will double by 2024. This is comparable with the GDP of a smaller EU country like Austria.

If we talk about any transactional system (OLTP), web applications, etc., it is tough to argue against the advantage of Serverless frameworks like Lambda in AWS.??

Besides all the famous stories/fairy tales regarding Cloud Migrations, why is the financial sector still very conservative, especially in analytical systems like Data Warehouses??

In the next couple of characters, I will try to explain the motivations behind the resistance of the data domain in the Financial Industry.?

Let me start from the other end. Why would anyone consider migrating a workload-heavy app to Cloud? The typical answers (I heard) are low cost, better performance, better developer experiences.?

Better performance

One of the significant differences between OLTP and OLAP-like systems is separating the Storage layer from the Computation layer. The idea that you can scale the two components independently from each other works properly in case of a lot of short transactions with minimal data consumption. However, it fails in the moment of large data-intensive long-running queries because it introduces network latency issues if the CPU, Memory, and storage components are not close to each other. In the case of WebApps, it is not relevant compared to the advantage of scalability due to the above-mentioned low data acquisition and short-lived characteristics of the transactions.

Cloud architecture - Separating compute and storage for elastic scalability

Separation of Compute Node and Storage

However, the MPP (massively parallel processing) database providers (Teradata, Greenplum, Snowflake, etc.) are all insists on architectures where the compute nodes reside close to each other (same rack, same datacenter) and have dedicated local storages (SSDs or SAS disks) and recommend to store only the so-called cold data in separate storages like S3 in AWS which is not used in daily or monthly processing or in any use case where the performance is not the number one priority.

MPP architecture - Shared Nothing Nodes with dedicated CPU, memory, and local disks

Low Cost

According to my experience, the Nirvana of Cloud Computing, at least from a cost perspective, simply does not exist. I can enumerate the advantages of Cloud endlessly, but the cost is not one of them. I saw a vast amount of misleading TCO calculations, which all have the same issue. The number of IFs.

  • If you completely redesign your data warehouse, built over 20-40 years using 100s of thousands of engineering hours.
  • If you train all of your Users (Thousands again) not to write a query that export too much data to Excel
  • If you optimize all of your reports (10s of Thousands again) to avoid the same problem.
  • If you are willing to commit yourself, you will never return (migrate back) to the?Prem world under any circumstances.
  • If you are willing to sacrifice performance to some extent due to the previous point 1.
  • If your source applications have already been migrated to the Cloud.

Security

The most popular argument among the big Cloud providers is that they can hire the best Security experts worldwide. The latest data breaches from the last couple of years:

  • Alibaba, November 2019. Impact: 1.1 billion pieces of user data
  • LinkedIn, June 2021. Impact: 700 million users.
  • Facebook, April 2019. Impact: 533 million users

These companies can further operate without any big bump despite the data breaches. For a financial institution, it can mean bankruptcy within months if this happens. So compared to the current solution (complete network segregation, dedicated data centers, etc.), the answer “I have the best Techies” is simply not good enough.

I have been an advocate and active supporter of cloud migrations since the beginning. In fact, I have been doing it today, but in 2015 I forecasted negligible successful cloud migrations in the Financial Industry for the next ten years. Besides my wish, I have doubts if it changed dramatically due to the points as mentioned earlier.

If you want to have a successful analytical project in the cloud, please consider the following recommendations:

  • Tailor the Use Case. Pick up only one report, one specific complex calculation which is deliverable within three months, which will be at the end 6-9 months
  • Avoid sensitive data, especially client data. A good choice if the data can be categorized as public or internal but not confidential
  • Select a vendor/product which recommends real MPP databases (No, Hadoop-based solutions are not one of them). Be prepared that you have to replicate your existing on Prem infrastructure to the Cloud
  • Focus on providing a better service instead of cost optimization

Please tell me if you have different experiences and let me know any successful Cloud implementation of real Data Warehouses, especially from the financial Industry.

Ruskál Imre

EMEA Global Black Belt, Analytics and DW Specialist at Microsoft

3 年

Nincs kérdés :-)

要查看或添加评论,请登录

社区洞察

其他会员也浏览了