Planning to migrate your Data Warehouse to Azure? Select the right strategy!
By Robin Everaars and Lars Bouwens
?The biggest risk when migrating to Azure is picking the wrong approach. In that case it’s inevitable that the migration will fail.
Motivators for moving to Azure
There is a non-stop growth in data volumes and, related to that, the rise of data utilization across businesses. This trend has experienced exponential growth
Next to gaining a competitive edge there are several other motivators for migrating to the Azure cloud. Think about:
The Data Warehouse: replace or move?
Existing data warehouses have evolved and expanded over the years and are critical for business decision-making processes. All kinds of other (external) systems are integrated and stakeholders are dependent on the data provided by the data warehouse.
Replacing or moving an existing Data Warehouse is complex and comes with risks. Therefore, the right experience, expertise and a fitting Migration Strategy
In this article, we analyze and contrast the most common migration strategies, outlining their applications in various scenarios. This guidance will help you to prepare a succesfull migration to Azure.
Why is a Migration Strategy important?
Just like how a strategy helps organizations in cohesive planning and decision making, a Migration Strategy helps aligning on priorities and ongoing decision making. A strong Migration Strategy is crucial to the success of a Data Warehouse migration to Azure. Attention should be given to the reason for – or purpose of – migration. Questions that must be answered include: What reason is there for this change? What value do we hope to achieve with migrating (for example, saving costs, new capabilities
An ideal Migration Strategy sets clear goals and critical success indicators and helps to determine relevant constraints like timelines, resource availability and budget.
?
What are proven Migration Strategies?
Strategies that Microsoft advises when migrating to the cloud:
?
How to select the right Migration Strategy
The biggest risk when migrating to Azure is picking the wrong approach. In that case it’s inevitable that the migration will fail in the short or long-term.
To pick the right strategy we compare the options based on four criteria:
Strategy 1: Rehost
Rehost is a tempting strategy to select. However, managing expectations is vital when choosing this option.
Ease of implementation
Generally, this strategy requires the least effort and will thus have the shortest lead time and the lowest implementation costs. There can be extra complexity, in case not all data sources or data consumers are migrated to Azure. Setting up secure connectivity will require - for example - configuration of ExpressRoute to connect to a Virtual Private Network (VPN) in Azure.
Efficiency gains
Efficiency gains depend heavily on your current landscape. Is the current Data Warehouse running on another (private) cloud or on your own data center? What’s the level of utilization (using for example virtualization) and what is the effort in maintenance? Are the current tasks like backups, upgrades and (security) updates time consuming or expensive?
Moving to Infrastructure as a Service (IaaS) means that Azure will take care of hardware and operating system level. This will save time and costs in server administration. Saving costs is also possible with scaling and pausing services. Impact is high for solutions that have time-windowed workloads like nightly ETL processes and 9-5 data analysis.
Infrastructure as a Service can upgrade hardware but has limitations in performance scalability and is less cost efficient. If you run a SQL Server instance you will not be able to separate storage and compute and will be limited in both data volumes and processing power. This will not be the case when you use, for example, Azure SQL Database Serverless, where storage and compute can be scaled up and down independently. An even better separation of storage and compute can be achieved using for example Delta Lake and Synapse Serverless, but this will require a different strategy.
New capabilities
Your current solution running on new hardware will not create new innovative capabilities or efficiently make use of the cloud. It will not allow you to easily ingest data from API’s, store massive data volumes on a Data Lake, or create a data science solution on top of your data. If stakeholders have high expectations of what the future Data Warehouse will do, it’s important to manage this beforehand and be clear about what the new platform will (and what not) offer.
Restrictions
Lift and Shift is only possible if your current solution can be hosted on the Azure infrastructure. If your current solution for example runs on deprecated software (old Windows or SQL Server versions), you will not be able to do a simple lift and shift.
?
In short: This strategy offers the lowest effort while maintaining current capabilities. Short lead times can be of importance when dealing with outdated or failing hardware, expired licenses or ending support contracts. This is the wrong strategy if the organization requires new functionalities such as Advanced Analytics, expects significant increase in data volumes, or if stakeholders expect a reduction in operational expenditure or significant performance benefits.
领英推荐
Strategy 2: Refactor
Refactoring strikes a balance between Lift and Shift and complete Rebuild but is not without risks.
Ease of implementation
A clearly defined scope is essential to come up with a feasible refactor plan. It will require an in-depth analysis of the current solution to determine which functionalities are suitable to be refactored. Refactoring has the biggest benefit if the current solution is either already Microsoft based and/or you have code access when using proprietary / black box solutions. It is for example easier to refactor SSIS to Azure Data Factory or Synapse Pipelines because of the SSIS runtime support, than to refactor Informatica PowerCenter Data Flows. If your current solution provides access to the code behind, such as SQL or Python, it will also be more straightforward to refactor it towards Azure cloud-based services.
Depending on the amount of work it takes to alter code and adjust components to move to new services it might be more interesting to rebuild. If all existing ETL data flows depend heavily on a local file store it will be worthwhile to investigate the options to save time by rebuilding this component. The only way to get this clear is by conducting an in-depth analysis.
Efficiency gains
Refactoring will improve utilization of cloud capabilities compared to rehosting. It will introduce Platform as a Services (PaaS) that can be scaled or paused depending on the workload. Think about services such as Azure Data Factory or Azure Synapse Pipelines (for orchestration), Azure SQL Database and Azure Data Lake (for data storage), Azure Functions (for data retrieval via API's) or Databricks (for ETL processes). Again, the performance and cost benefits depend on the implementation.
New capabilities
Refactoring will provide new functionality. By moving to PaaS services new options in handling data will become available, as well as the possibility to store larger data volumes and increase refresh frequencies. Azure Data Factory for example provides standard connectors for popular cloud applications and API’s. An Azure SQL database can be scaled more cost friendly, and an Azure Function can run pay-per-use on a minute basis to retrieve new data.
Restrictions
Refactoring is a detailed operation that needs to be executed with the help of experts that understand both the current landscape and the services that Azure offers. It’s crucial to have the right knowledge, development capacity, and know-how to refactor the current codebase.
In short: Refactoring will most likely take longer than Lift and Shifting your current solution. A more complex and detailed operation should bring you more scalability, cost benefits and new functionality. It’s a strategy that works well for some cases (Microsoft based solution or heavily code based) but will be frustrating for others (visual ETL tools, more exotic database storage, PaaS based non-azure services).
Strategy 3: Rebuild
A high impact strategy can bring the most value to the table.
Ease of implementation
Rebuild provides the most flexibility. Current code will not be re-used, and new functionality is created from scratch. This starts with the challenge of determining the right solution. What (new) capabilities should the platform support? Is there for example a need for retrieving streaming data or to handle complex JSON structures? What is the expected self-service usage? Ease of implementation is also linked to the maturity level of the organization and the data analytics team. What amount of effort are you willing to invest in building up the knowledge of new ways of working, tools, technologies, and code languages?
Even though new services are usually easier to maintain and implement, the lack of knowledge and experience can be a big impediment in rebuilding existing solutions. When moving from SQL stored procedures to Databricks Notebooks for example, engineers will need to be able to write and understand Python and Spark. The same goes for business analysts that for example no longer query a SQL database but will connect directly to a Data Lake or Power BI dataset. It will take time and effort to make this work for your organization.
Efficiency gains
The wide range of Azure services allows you to select the right tool for the job. Storage services such as Azure Data Lake are optimized for storing large datasets at low costs and compute services like Databricks and Fabric offer parallel processing to load data faster and achieve big performance gains. Scale on demand and auto pause resources can lead to significant cost savings at the same time.
New capabilities
Moving to the latest technology means you can profit from the newest features. Those features impact both the data platform (ingesting, storing, transforming) and data usage (self-service, reporting, advanced analytics).
Restrictions
Defining the right solution that meets the functional and governance requirements is a complex task. It requires the right skills and knowledge from an architectural point of view. Depending on the outcome there will be an impact on the development team and business users on how to work with the platform. Be sure to have the right set of resources, time, and budget to make this work.
In short: Best option when your organization wants to leverage the latest capabilities. Relevant when dealing with large data volumes, a lot of unstructured data, offering advanced analytics, and considering data as a competitive advantage. Will have the biggest impact on your organization.
Wrap up
Whatever Migration Strategy you choose, make sure to motivate, share, and discuss this strategy with your stakeholders.
A clear Data Warehouse Migration Strategy is critical for success. Proven strategies are Rehost for a quick transition, Refactor when most of the existing solution is reusable, or Rebuild when large gains are likely to be achieved. The input of stakeholders is key to deciding what strategy fits. Take enough time to understand the current data solution, constraints and limitations like team knowledge, development capacity, available time, and budget.
If your organization wants to benefit from the cloud (performance, cost reductions, new capabilities) but must deal with time constraints it might be an option to combine strategies or go for a phased approach. You can start with Rehost and then - or simultaneously - Rebuild. Or you can Rehost some parts of the solution and Refactor others. Combining a “faster” and “slower” strategy can lessen time pressure. This will allow teams working on more impactful changes to focus on quality, without needing shortcuts, guaranteeing long-term success. Whatever you choose, make sure you motivate, share, and discuss your strategy.
Next steps
If you're thinking about migrating to Azure or if you’re already taking the first steps and need expert guidance, don't hesitate to get in touch with Lars Bouwens or Robin Everaars. We can provide guidance on both strategy and implementation and help our clients in building up the right knowledge in a team.
Get in touch via LinkedIn or Rockdata.nl
Videographer | creative director | Founder @ Imagit 30 sec media
1 年Very informative and helpful. There are more ground level details which the user needs to consider based on their DB and how much is their code is using SQL system functionalities like stored procedures and system tables. Because based on that it becomes instantly clear which migration strategy to use Rehost/ Refactor/ Rebuild.