Migrate Legacy System Data to Dataverse: Automate Schema and Integration Flow Creation

Migrate Legacy System Data to Dataverse: Automate Schema and Integration Flow Creation

Microsoft’s Dynamics 365 platform is widely used to implement complex and diverse line-of-business systems. Many of these implementations require the migration of data and functionality from a set of legacy systems and components.

Often clients want all legacy data and relationships between data elements to be preserved in the Dynamics post-implementation for purposes of historical data analysis and reporting, and new features built on top of existing ones.

I recently had a chance to work on a project that required migrating a legacy Ruby on Rails system with PostgreSQL backend to Dynamics. The database contained 30+ tables with about 3,500,000 records. 

The unique challenge was to recreate the Postgres data schema in Dynamics quickly: the implementers of the legacy system chose to highly denormalize several tables in Postgres, resulting in tables with 500-600 fields in them. Re-creating this schema configuration in Dynamics by hand was not an option. Instead, I used the Postgres metadata queries and a couple of tools to automate schema creation, and then used Microsoft’s SQL Server Data Tools (SSDT) with Kingswaysoft integration toolkit to implement the data migration and integration package.

Extracting schema from Postgres and recreating it in Dynamics

The XrmToolbox, an open-source collection of tools for Dynamics, has a handy plugin that allows bulk creation, update or delete of attributes for an entity, called Excel to CRM Bulk Attribute Creator.

No alt text provided for this image

To generate an Excel file in the format shown above (note that the format provides many more columns for various attribute values such as option set values, restrictions, format types etc.), I’ve used the metadata query to get the list of fields from a given Postgres table, along with some field properties, and convert them into a set of fields that the Attribute Editor requires for its input.

The basic query extracts name and type of each field for a given table by querying information_schema.columns table. The full query text and its analysis is provided below:

SELECT
case when data_type = 'int' or data_type = 'bigint' or column_name like '%id' then 'Whole Number'
when data_type = 'integer' then
case when column_name like '%amount%' or column_name like '%value%' or column_name like '%cost%' then 'Currency' else 'Whole Number' end
when data_type = 'date' or data_type = 'datetime' or data_type = 'datetime2' or data_type like 'timestamp%' or column_name like '% at' or column_name like '% Date' then 'Date and Time'
when data_type = 'bit' or data_type = 'boolean' or column_name like '%?' then 'Boolean'
when data_type = 'float' then 'Floating Point Number'
when data_type = 'double precision' then 'Currency'
when data_type = 'text' then 'Multiple Line of Text'
when data_type = 'character varying' then 'Single Line of Text'
else ‘'Multiple Line of Text' end as AttributeType,
INITCAP(replace(column_name, '_', ' ')) AS DisplayName,
'fn_' || replace(column_name, '_', '') as AttributeSchemaName,
INITCAP(replace(column_name, '_', ' ')) AS Description,
'None' as RequiredLevel, 'Auto' as IMEMode, 'Yes' as AuditEnable, 'Yes' as IsValidForAdvancedEdit, '' as StringFormat, case when data_type = 'text' then 10000 else 200 end as StringLength,
'' as OptionSetValues, '' as DefaultValue, 'Date and Time' as DateTypeFormat,
'' as IntegerFormat, ‘-100000000' as IntegerMinimumVal, '100000000' as IntegerMaximumVal, '2' as FloatingNumberPrecision, '-100000000' as FloatMinimumVal, '100000000' as FloatMaximumVal,
'3' as DecimalPrecision, ‘-2147483648' as DecimalMinimumVal, '2147483647' as DecimalMaximumVal, '2' as CurrencyPrecision, ‘-21474836480' as CurrencyMinimumVal, '21474836470' as CurrencyMaximumVal
FROM information_schema.columns
WHERE table_schema = 'public'
AND table_name = ‘people’
and column_name not like '%deprecated%'

order by column_name;

The query discards all fields marked as deprecated in the legacy system and then investigates the PosgtreSQL type value of each field to infer desired Dynamics type value. For example, if Postgres type is integer, it would use Whole Number for Dynamics unless the field name contains ‘amount’ or ‘value’ or ‘cost’, in which case we’d want the matching Dynamics attribute to be created as Currency. 

This logic is highly dependent on overall logic and conventions used by Postgres database designers. The database I was working with invariably terminated field names that logically represented bit values with a question mark, while being a bit sloppy with actual field types, so these bit fields sometimes carried text type attribute.

The query also creates display names for attributes by replacing underscore in Postgres field name with a space and using INITCAP function to convert the text to proper case.

The query is executed for each table we need to migrate and query’s output, shown below in PGAdmin (a PostgreSQL database management and query tool), is copied to the Excel template file generated by Excel to CRM Bulk Attribute Creator, and then this XrmToolbox plugin is used to add all these legacy attributes (I chose to manually create the entity itself first in Dynamics, without any custom attributes besides its name).

Note: XrmToolbox contains another, similar plugin: Attribute Editor allows a user to generate an Excel template for an entity, modify it and use it as an input for bulk attribute modifications. I chose not to use this tool due to its lack of support for some attribute properties. It doesn’t expose properties such as max text length, or boundaries for numeric values to its Excel template. Instead, the code for its attribute model classes hardcodes these values. For example, this line in StringAttribute.cs would force each text field to a max length of 100.

namespace Colso.Xrm.AttributeEditor.AppCode.AttributeTypes {
class StringAttribute : AttributeMetadataBase {
protected override void AddAdditionalMetadata(StringAttributeMetadata attribute) {
   attribute.MaxLength = 100;

}}}

As a result of this design choice, during create or even update of an attribute some of its properties can be changed to an undesired value. You can easily fork the github project for this plugin and either extend the template model or add some other choices for handling attribute properties.

Setting up Dynamics entities for integration

The referential integrity in the source Postgres database is implemented via numeric primary and foreign keys. To be able to recreate faithfully the legacy relationships among new entities in Dynamics and to set up the ability to validate the completeness of the migration/integration steps, I’ve retained all these numeric primary and foreign key values as numeric fields in Dynamics entities and used each legacy primary key to created a related Key object in dynamics.

No alt text provided for this image

Using legacy ID as a Dynamics key allows us to guarantee the uniqueness of the record in Dynamics, and to use it a search attribute in integration steps to connect source Postgres with target Dynamics data. It also allows implementation of post-integration data validation steps. For example, to verify the completeness of a given step, we can run a query matching source to target data based on the same key value present in both systems, and ascertain that each record has 1:1 relation between source and target systems.

The entity key will ensure that neither integration step nor erroneous input by a user will lead to a duplicate record. Trying to duplicate key will lead to an error:

No alt text provided for this image

Implementing data migration project with SSDT and Kingswaysoft adapter

The Integration Services bundled with Microsoft SQL Server Data Tools (SSDT) provide a highly versatile platform for building complex integration scenarios and implementing them in Visual Studio. While it comes with a plethora of connectors for various systems and data types, the connectors for Dynamics 365 or for PostgreSQL are not included. I used Kingswaysoft integration toolkit for SSDT to provide connectivity to Dynamics data and CData PostgreSQL ADO.NET Provider to connect to PostgreSQL.

Note: PostgreSQL comes with its own free ODBC driver, but I had numerous issues trying to use it with my integration project. I had a very limited time for this project and decided to use CData provider instead.

Both connectors allow a developer to configure a connection in an Integration Services project and use a query against the connector as a datasource, or use it as a data destination.

Once the query against the Postgres data source is setup, and it is connected to a Dynamics destination, Visual Studio provides a standard mapping dialog to map source fields to destination fields.

No alt text provided for this image

The mapping UI allows us also to resolve Dynamics lookups. For example, in the screenshot above the Postgres id field is mapped to Dynamics id field, which is configured as an entity key and is used to resolve Create vs Update operation. The Dynamics connector provides an Upsert operator which will automatically execute correct operation based on key data resolution.

While I could manually set up the mappings as shown above, the Integration Services will automatically map source fields to destinations as long as field names match. This is a great feature when you have to deal with multiple tables containing hundreds of fields in each; it saves a ton of time!

In order to enable this auto-mapping, I wrote another metadata query that reads Postgres table definition and converts it into a Select statement with field names transformed to match the Dynamics attribute schema names.

SELECT
-- DATE FIELDS ONLY
(SELECT

'SELECT ' || string_agg('CASE when ' || column_name || ' < ''1/1/1901'' then null else ' || column_name || ' end AS ' || 'fn_' || replace(column_name, '_', ''), ', ')
FROM information_schema.columns
WHERE table_schema = 'public'
and table_name = 'inspections' and column_name not like '%deprecated%'
and (data_type = 'date' or data_type = 'datetime' or data_type = 'datetime2'))
|| ', ' ||
--DO NOT INCLUDE DATE FIELDS
(SELECT
string_agg(column_name || ' AS ' || 'fn_' || replace(column_name, '_', ''), ', ') || ' FROM inspections;'
FROM information_schema.columns
WHERE table_schema = 'public'
and table_name = 'inspections' and column_name not like '%deprecated%'
and (data_type <> 'date' and data_type <> 'datetime' and data_type <> 'datetime2'));

The query is using STRING_AGG function to concatenate field names while slightly changing each field name to match the Dynamics schema name conventions; we need to add schema prefix to each field and drop underscores. It also processes date fields separately to exclude date values Dynamics doesn’t support, and it ignores deprecated fields.

The integration package consists of several sequential data flows. When the currently executing flow ends, the next flow’s execution begins. Any error stops the integration run. The flows are organized according to the level of derivation and data dependency: Level 0, for example, contains data elements that do not depend on anything else, Level 1 entities only depend on Level 0 entities, etc.

Within each flow, multiple integration tasks execute in parallel. This level of concurrency proved to be sufficient for my project.

No alt text provided for this image


No alt text provided for this image










At the end of a data task’s successful execution, the count of processed rows maybe viewed in Visual Studio window and also queried from Dynamics 365 and compared with the PostgreSQL record count:

No alt text provided for this image

Conclusion

Metadata discovery and custom SSDT connectors allow us to quickly implement fairly complex data migration scenarios.


要查看或添加评论,请登录

Dmitri Riz的更多文章

社区洞察

其他会员也浏览了