登录查看更多内容

Migrate Legacy System Data to Dataverse: Automate Schema and Integration Flow Creation

Dmitri Riz

Azure and Power Platform solutions architect

发布日期: 2020年12月1日

Microsoft’s Dynamics 365 platform is widely used to implement complex and diverse line-of-business systems. Many of these implementations require the migration of data and functionality from a set of legacy systems and components.

Often clients want all legacy data and relationships between data elements to be preserved in the Dynamics post-implementation for purposes of historical data analysis and reporting, and new features built on top of existing ones.

I recently had a chance to work on a project that required migrating a legacy Ruby on Rails system with PostgreSQL backend to Dynamics. The database contained 30+ tables with about 3,500,000 records.

The unique challenge was to recreate the Postgres data schema in Dynamics quickly: the implementers of the legacy system chose to highly denormalize several tables in Postgres, resulting in tables with 500-600 fields in them. Re-creating this schema configuration in Dynamics by hand was not an option. Instead, I used the Postgres metadata queries and a couple of tools to automate schema creation, and then used Microsoft’s SQL Server Data Tools (SSDT) with Kingswaysoft integration toolkit to implement the data migration and integration package.

Extracting schema from Postgres and recreating it in Dynamics

The XrmToolbox, an open-source collection of tools for Dynamics, has a handy plugin that allows bulk creation, update or delete of attributes for an entity, called Excel to CRM Bulk Attribute Creator.

To generate an Excel file in the format shown above (note that the format provides many more columns for various attribute values such as option set values, restrictions, format types etc.), I’ve used the metadata query to get the list of fields from a given Postgres table, along with some field properties, and convert them into a set of fields that the Attribute Editor requires for its input.

The basic query extracts name and type of each field for a given table by querying information_schema.columns table. The full query text and its analysis is provided below:

SELECT
case when data_type = 'int' or data_type = 'bigint' or column_name like '%id' then 'Whole Number'
when data_type = 'integer' then
case when column_name like '%amount%' or column_name like '%value%' or column_name like '%cost%' then 'Currency' else 'Whole Number' end
when data_type = 'date' or data_type = 'datetime' or data_type = 'datetime2' or data_type like 'timestamp%' or column_name like '% at' or column_name like '% Date' then 'Date and Time'
when data_type = 'bit' or data_type = 'boolean' or column_name like '%?' then 'Boolean'
when data_type = 'float' then 'Floating Point Number'
when data_type = 'double precision' then 'Currency'
when data_type = 'text' then 'Multiple Line of Text'
when data_type = 'character varying' then 'Single Line of Text'
else ‘'Multiple Line of Text' end as AttributeType,
INITCAP(replace(column_name, '_', ' ')) AS DisplayName,
'fn_' || replace(column_name, '_', '') as AttributeSchemaName,
INITCAP(replace(column_name, '_', ' ')) AS Description,
'None' as RequiredLevel, 'Auto' as IMEMode, 'Yes' as AuditEnable, 'Yes' as IsValidForAdvancedEdit, '' as StringFormat, case when data_type = 'text' then 10000 else 200 end as StringLength,
'' as OptionSetValues, '' as DefaultValue, 'Date and Time' as DateTypeFormat,
'' as IntegerFormat, ‘-100000000' as IntegerMinimumVal, '100000000' as IntegerMaximumVal, '2' as FloatingNumberPrecision, '-100000000' as FloatMinimumVal, '100000000' as FloatMaximumVal,
'3' as DecimalPrecision, ‘-2147483648' as DecimalMinimumVal, '2147483647' as DecimalMaximumVal, '2' as CurrencyPrecision, ‘-21474836480' as CurrencyMinimumVal, '21474836470' as CurrencyMaximumVal
FROM information_schema.columns
WHERE table_schema = 'public'
AND table_name = ‘people’
and column_name not like '%deprecated%'

order by column_name;

The query discards all fields marked as deprecated in the legacy system and then investigates the PosgtreSQL type value of each field to infer desired Dynamics type value. For example, if Postgres type is integer, it would use Whole Number for Dynamics unless the field name contains ‘amount’ or ‘value’ or ‘cost’, in which case we’d want the matching Dynamics attribute to be created as Currency.

This logic is highly dependent on overall logic and conventions used by Postgres database designers. The database I was working with invariably terminated field names that logically represented bit values with a question mark, while being a bit sloppy with actual field types, so these bit fields sometimes carried text type attribute.

The query also creates display names for attributes by replacing underscore in Postgres field name with a space and using INITCAP function to convert the text to proper case.

The query is executed for each table we need to migrate and query’s output, shown below in PGAdmin (a PostgreSQL database management and query tool), is copied to the Excel template file generated by Excel to CRM Bulk Attribute Creator, and then this XrmToolbox plugin is used to add all these legacy attributes (I chose to manually create the entity itself first in Dynamics, without any custom attributes besides its name).

Note: XrmToolbox contains another, similar plugin: Attribute Editor allows a user to generate an Excel template for an entity, modify it and use it as an input for bulk attribute modifications. I chose not to use this tool due to its lack of support for some attribute properties. It doesn’t expose properties such as max text length, or boundaries for numeric values to its Excel template. Instead, the code for its attribute model classes hardcodes these values. For example, this line in StringAttribute.cs would force each text field to a max length of 100.

namespace Colso.Xrm.AttributeEditor.AppCode.AttributeTypes {
class StringAttribute : AttributeMetadataBase {
protected override void AddAdditionalMetadata(StringAttributeMetadata attribute) {
   attribute.MaxLength = 100;

}}}

As a result of this design choice, during create or even update of an attribute some of its properties can be changed to an undesired value. You can easily fork the github project for this plugin and either extend the template model or add some other choices for handling attribute properties.

Setting up Dynamics entities for integration

The referential integrity in the source Postgres database is implemented via numeric primary and foreign keys. To be able to recreate faithfully the legacy relationships among new entities in Dynamics and to set up the ability to validate the completeness of the migration/integration steps, I’ve retained all these numeric primary and foreign key values as numeric fields in Dynamics entities and used each legacy primary key to created a related Key object in dynamics.

Using legacy ID as a Dynamics key allows us to guarantee the uniqueness of the record in Dynamics, and to use it a search attribute in integration steps to connect source Postgres with target Dynamics data. It also allows implementation of post-integration data validation steps. For example, to verify the completeness of a given step, we can run a query matching source to target data based on the same key value present in both systems, and ascertain that each record has 1:1 relation between source and target systems.

The entity key will ensure that neither integration step nor erroneous input by a user will lead to a duplicate record. Trying to duplicate key will lead to an error:

Implementing data migration project with SSDT and Kingswaysoft adapter

The Integration Services bundled with Microsoft SQL Server Data Tools (SSDT) provide a highly versatile platform for building complex integration scenarios and implementing them in Visual Studio. While it comes with a plethora of connectors for various systems and data types, the connectors for Dynamics 365 or for PostgreSQL are not included. I used Kingswaysoft integration toolkit for SSDT to provide connectivity to Dynamics data and CData PostgreSQL ADO.NET Provider to connect to PostgreSQL.

Note: PostgreSQL comes with its own free ODBC driver, but I had numerous issues trying to use it with my integration project. I had a very limited time for this project and decided to use CData provider instead.

Both connectors allow a developer to configure a connection in an Integration Services project and use a query against the connector as a datasource, or use it as a data destination.

Once the query against the Postgres data source is setup, and it is connected to a Dynamics destination, Visual Studio provides a standard mapping dialog to map source fields to destination fields.

The mapping UI allows us also to resolve Dynamics lookups. For example, in the screenshot above the Postgres id field is mapped to Dynamics id field, which is configured as an entity key and is used to resolve Create vs Update operation. The Dynamics connector provides an Upsert operator which will automatically execute correct operation based on key data resolution.

While I could manually set up the mappings as shown above, the Integration Services will automatically map source fields to destinations as long as field names match. This is a great feature when you have to deal with multiple tables containing hundreds of fields in each; it saves a ton of time!

In order to enable this auto-mapping, I wrote another metadata query that reads Postgres table definition and converts it into a Select statement with field names transformed to match the Dynamics attribute schema names.

SELECT
-- DATE FIELDS ONLY
(SELECT

'SELECT ' || string_agg('CASE when ' || column_name || ' < ''1/1/1901'' then null else ' || column_name || ' end AS ' || 'fn_' || replace(column_name, '_', ''), ', ')
FROM information_schema.columns
WHERE table_schema = 'public'
and table_name = 'inspections' and column_name not like '%deprecated%'
and (data_type = 'date' or data_type = 'datetime' or data_type = 'datetime2'))
|| ', ' ||
--DO NOT INCLUDE DATE FIELDS
(SELECT
string_agg(column_name || ' AS ' || 'fn_' || replace(column_name, '_', ''), ', ') || ' FROM inspections;'
FROM information_schema.columns
WHERE table_schema = 'public'
and table_name = 'inspections' and column_name not like '%deprecated%'
and (data_type <> 'date' and data_type <> 'datetime' and data_type <> 'datetime2'));

The query is using STRING_AGG function to concatenate field names while slightly changing each field name to match the Dynamics schema name conventions; we need to add schema prefix to each field and drop underscores. It also processes date fields separately to exclude date values Dynamics doesn’t support, and it ignores deprecated fields.

The integration package consists of several sequential data flows. When the currently executing flow ends, the next flow’s execution begins. Any error stops the integration run. The flows are organized according to the level of derivation and data dependency: Level 0, for example, contains data elements that do not depend on anything else, Level 1 entities only depend on Level 0 entities, etc.

Within each flow, multiple integration tasks execute in parallel. This level of concurrency proved to be sufficient for my project.

At the end of a data task’s successful execution, the count of processed rows maybe viewed in Visual Studio window and also queried from Dynamics 365 and compared with the PostgreSQL record count:

Conclusion

Metadata discovery and custom SSDT connectors allow us to quickly implement fairly complex data migration scenarios.

要查看或添加评论，请登录

Dmitri Riz的更多文章

Power Platform app dev - approach to ALM and implementing custom CI/CD pipeline

2023年12月29日

Power Platform app dev - approach to ALM and implementing custom CI/CD pipeline

In this post I will examine two topics. First, I will suggest the optimal environment topology for multi-developer…
Power BI Reports in Dataverse with direct access to Azure SQL Database

2022年2月8日

Power BI Reports in Dataverse with direct access to Azure SQL Database

I had worked recently on a project to modernize a set of components for a client. One of the larger pieces was a…

1 条评论
Intro to Microsoft Power Platform

2022年1月16日

Intro to Microsoft Power Platform

TL;DR It is sometimes hard to understand what exactly Power Platform is. This brief summary lists Power Platform…

1 条评论
Roadmap for Data Integrations with Microsoft Dataverse

2021年8月3日

Roadmap for Data Integrations with Microsoft Dataverse

Designing Dataverse Data Integrations After working on dozens of projects that depended on data integration and…

3 条评论
Building a Stock Rank Notifier with Azure Function App and Logic App

2021年2月9日

Building a Stock Rank Notifier with Azure Function App and Logic App

Everyone and their cat is a trader today. I thought it'd be interesting to see how a market noob like myself can…
Azure Stream Analytics for Real-Time Fraud Detection

2020年8月8日

Azure Stream Analytics for Real-Time Fraud Detection

Microsoft Azure Logic Apps is a service that allows you to design and build scalable solutions for app integration…
Optimizing Dataverse Localization with Microsoft Translator Text API

2020年4月24日

Optimizing Dataverse Localization with Microsoft Translator Text API

Microsoft Translator Text API "is easy to integrate in your applications, websites, tools, and solutions. It allows you…
Using Power Virtual Agent Bots with Microsoft Portal

2020年4月10日

Using Power Virtual Agent Bots with Microsoft Portal

Microsoft's new Power Virtual Agents is a SaaS service allowing non-developers to create complex conversational bots…
Custom Hyperlinks in Dynamics 365 Portals

2020年1月25日

Custom Hyperlinks in Dynamics 365 Portals

Microsoft's Dynamics 365 Portals offer a quick path for setting up rich interactive sites allowing external audiences…
Use Azure Machine Learning for Spam Detection in Email to Case process in Dynamics 365

2019年9月19日

Use Azure Machine Learning for Spam Detection in Email to Case process in Dynamics 365

Major software platforms such as Microsoft Azure are rapidly developing toolkits that simplify access to AI or Machine…

See all articles

社区洞察

Data Architecture

What's the best way to design T-SQL stored procedures for real-time data streams?

Migrate Legacy System Data to Dataverse: Automate Schema and Integration Flow Creation

Dmitri Riz

Azure and Power Platform solutions architect

Extracting schema from Postgres and recreating it in Dynamics

Setting up Dynamics entities for integration

Implementing data migration project with SSDT and Kingswaysoft adapter

Conclusion

Dmitri Riz的更多文章

社区洞察

其他会员也浏览了

Data Transformation and Compatibility Issues When Migrating from On-Prem SQL Server to Snowflake

Creating Partition Table in Odoo 17 for Query Performance Improvement on Large Data

May 2024 Product Updates

The missing link: data lineage for Power BI

Top 10 Applications According to Gartner with Comparative Open-Source Alternatives

Getting the Most Out of Your Applications

Introduction to Snowflake Streams

Database Vs Data Warehouse Vs Data Lakes

Force Data Replicator Upgrades

The Modern Data Platform - an introduction

Extracting schema from Postgres and recreating it in Dynamics

Setting up Dynamics entities for integration

Implementing data migration project with SSDT and Kingswaysoft adapter

Conclusion

Dmitri Riz的更多文章

Power Platform app dev - approach to ALM and implementing custom CI/CD pipeline

Power BI Reports in Dataverse with direct access to Azure SQL Database

Intro to Microsoft Power Platform

Roadmap for Data Integrations with Microsoft Dataverse

Building a Stock Rank Notifier with Azure Function App and Logic App

Azure Stream Analytics for Real-Time Fraud Detection

Optimizing Dataverse Localization with Microsoft Translator Text API

Using Power Virtual Agent Bots with Microsoft Portal

Custom Hyperlinks in Dynamics 365 Portals

Use Azure Machine Learning for Spam Detection in Email to Case process in Dynamics 365

社区洞察

其他会员也浏览了

Data Transformation and Compatibility Issues When Migrating from On-Prem SQL Server to Snowflake

Creating Partition Table in Odoo 17 for Query Performance Improvement on Large Data

May 2024 Product Updates

The missing link: data lineage for Power BI

Top 10 Applications According to Gartner with Comparative Open-Source Alternatives

Getting the Most Out of Your Applications

Introduction to Snowflake Streams

Database Vs Data Warehouse Vs Data Lakes

Force Data Replicator Upgrades

The Modern Data Platform - an introduction