Drupal Migration Kick Start and Cheat Sheet
Screenshot of a partial list of migration process plugins. One of several tips in this post. Read on!

Drupal Migration Kick Start and Cheat Sheet

Drupal migrations! The can be complicated, and even once you get started with them, it’s easy to get confused or lost, or find yourself feeling like you’re swimming upstream in the sea of microtasks and options. As Software Architect at Phase2 I’ve collected a set of resources to help you in your development and running of content migrations for #Drupal 10 and 11.

Drupal core includes a robust Migrate API

Start here to get oriented to what’s included in Drupal core. It’s a lot. You don’t have to read it all right away. Just know that these resources exist and are valuable to return to in the future.

  • The Drupal Migrate API overview is the official home of technical documentation on migrating data from various source systems to Drupal, detailing the Extract-Transform-Load (ETL) process, migration plugins, and tools for executing migrations. It's a key resource for developers looking to migrate content into Drupal. For in-depth information, start here: https://www.drupal.org/docs/drupal-apis/migrate-api/migrate-api-overview
  • The migrate module included in Drupal core provides all of the needed mechanisms of importing content, including quite a few source, destination and process plugins. There are some process plugins like concat and migration_lookup that you’ll use over and over. There are others that are used less frequently, but it’s good to know that they are available in core, like the route and make_unique_entity_field process plugins. Core migrate module: https://git.drupalcode.org/project/drupal/-/tree/11.0.x/core/modules/migrate

Equipped with that introduction to Drupal core’s migration capabilities, let’s dive into the cheat sheet to help you with your Drupal migration development.

1. Essential contrib

There are several contributed modules that greatly enhance the process of developing and running migrations. Here they are, along with why I find each one helpful.

  • The migrate_plus module has additional migration plugins, several working example migrations, enables the use of standard config for migrations, and enables the use of migration groups for shared properties.
  • The migrate_tools module primarily provides drush commands for running and managing migrations. drush ms node_article for example.
  • The migrate_devel module adds some tools for inspecting data as it comes through migration runs. The --migrate-debug option is one example, which shows the entire source data and transformed destination data for each row.

2. The “Again” script

As you go about developing migrations, you’ll find yourself repeating a whole lot of steps. Import 1. Import 10. Add mappings for a few more fields. Roll back. Import some more. Putting the set of commands that you’re repeating into a simple script file saves a lot of time. And putting set -x at the beginning makes sure that the terminal output still shows what commands are being run.

  • Make a custom again.sh script. Or do-it.sh or you-got-this.sh. I put mine in a root level stuff folder, which I have ignored in git. Into it, put the stack of commands that you’re frequently rerunning as you work on a migration. Here’s the current contents of mine:

set -x
MIGRATION=example_location
drush mr $MIGRATION
drush cim --partial --source=modules/custom/example_import/config/install/ -y
drush cr
drush mim $MIGRATION --limit=1 --migrate-debug        

3. Find available migration plugins, especially process plugins, and their configuration options.

One of the key tasks of developing migrations is transforming the source data so that it matches the destination Drupal fields. That’s what process plugins are for. They are provided by core, contrib modules, and custom development unique to your site. Here’s a good way to find all of the process plugins currently enabled and available on your site.


Screenshot of Plugin Info in the Devel menu.
Screenshot of Plugin Info in the Devel menu.


Screenshot of a list of all migration plugin types.
Screenshot of a list of migration plugin types. This is the Plugin Info page.

Click on the Plugins button beside plugin.manager.migrate.process and you'll see all the available process plugins currently enabled on your site.

Screenshot of the first several listed process plugins.
Screenshot of the first several listed process plugins.

Generally how to use these pages:

  • Search the lists for migrat or proces, then click the "plugins" button.
  • Discover options for a given plugin by copying its class name and using it to find the corresponding file in your editor. There is usually good, concise documentation in a docblock in the class file.

4. Find drush options

There are many migration options in drush. Knowing what they are and how to find details about them will speed your migration development process.

  • drush list will list every drush command available. Search the output for migrat for all migration commands. Running drush list | grep migrat matches both migrate and migration words.
  • drush help [migration-command] will provide details about the given command. Example: drush help migrate-import will show example commands, plus all available options, including --update, --limit, and --sync.

5. Development workflow

Here’s my typical development workflow, involving editing config files, importing config, testing changes, and capturing final config.

  • Step 1: Edit migration files inside a custom migration module, such as project/modules/mysite_migration/config/install/migrate_plus.migration.node_article.yml
  • Step 2: Import your edited config using --partial such as drush cim --partial --source=modules/custom/mysite_migration/config/install/ -y
  • Step 3: Test your changes using rollbacks, imports, updates, etc. Edit config files, reimport, test until it’s good. Use your again.sh script here.
  • Step 4: Once it’s doing what you want, export config like usual, using drush cex -y.
  • End result: migration config will exist in the custom module directory AND the default config directory. The custom module directory is only for organizing files and aiding in development. The site’s main default config directory is the one that is actually in control of the running site.

6. Migration Groups

A migration group can be used for setting configuration that is common across several individual migrations.

In this example, there is a migration group that is used for all terms, and imports from a JSON source.

migrate_plus.migration_group.term.yml

id: term
label: Terms
shared_configuration:
  source:
    plugin: url
    data_fetcher_plugin: http
    data_parser_plugin: json
    urls:
      - 'https://www.example.com'
    track_changes: true
    # Set skip_count to true if the paging through to collect the count starts
    # taking annoyingly long.
    #skip_count: true
    pager:
      # The "cursor" pager type uses a token style value from the JSON document
      # to use as a param value in the next page request.
      type: cursor
      # URL param key for next page.
      key: pageToken
      # JSON: path to key regarding current/next page. Specific are TBD.
      selector: response/nextPageToken
    item_selector: response/docs
    fields:
      0:
        name: id
        label: 'Unique identifier'
        selector: '/id'
    ids:
      id:
        type: string
  destination:
    plugin: 'entity:taxonomy_term'
    # Set default_bundle in individual migration
    #default_bundle: example        

Migrations identify which group they’re part of, and add to the configuration.

migrate_plus.migration.term_condition.yml

id: term_condition
label: Conditions from JSON
# Identify the group to inherit config from.
migration_group: term
# Add config in addition to what is in the group shared_configuration.
source:
  fields:
    1:
      name: name
      label: Name
      selector: name
    2:
      name: synonyms
      label: Synonyms
      selector: taxonomy_synonyms
destination:
  default_bundle: condition
process:
  name: name
  field_original_id: id
  field_synonyms: synonyms        

Important update, from Benji Fisher : The migrate_tools module now enables shared configuration , similar to how migration groups work. The most notable difference to me though is that with shared configuration, you can include multiple configuration sets in a single migration. This could be very useful for example if all content types use a common set of field mappings, like title, published, etc, and a group of other content types additionally share some common field mappings. Migration groups still work, but shared configuration seems more flexible.

7. Process Examples

The process section of migration configuration is where most of the work is. This is where data is transformed from the incoming source and mapped to Drupal fields. Here are some examples that show essential techniques.

Simple 1:1 field import. Source data format matches exactly the destination.

In this example, the value of the source created property (an integer timestamp) should be imported verbatim into the destination field. So the process mapping can be this one simple line:

created: created        

That’s exactly the same as explicitly using the get process plugin:

created:
  plugin: get
  source: created        

Simple chained process

In this example, we need to alter the data slightly, before it is ready for the destination, because the source data sometimes has leading or trailing whitespace.

title:
  - plugin: get
    source: title
  - plugin: callback
    callable: trim        

Entity Generate / Entity Lookup

In this example, terms are generated in the Search Categories vocabulary, but only if they do not yet exist. Entity Generate extends Entity Lookup, which is how it first determines whether the desired entity exists before creating one.

featured_data:
  -
    plugin: static_map
    source: field_featured
    map:
      '1': 'Featured'
      '2': 'Secondary'
      '3': 'On Deck'
      default_value: null
  -
    plugin: skip_on_empty
    method: process
  -
    plugin: entity_generate
    entity_type: taxonomy_term
    # Taxonomy uses "vid" (vocabulary id) as bundle keys. Nodes use "type".
    bundle_key: vid
    bundle: search_categories
    # Taxonomy uses "name" as the primary value. Nodes use "title".
    value_key: name        

By the time the entity_generate plugin is reached, in this example, the source value will be the word "Featured" or “Secondary”. The entity_generate plugin will create a term with that name in the defined bundle only if it does not yet exist. It returns the id of the term.

To explore the bundle_key and value_key settings, one way is to look in the database, as the taxonomy_term_field_data and node_field_data tables, and see what the column names are.

Migration Lookup

Use this when one migration needs to set values in a reference field, referencing content that was imported by a different migration. In this example, the current migration needs to reference content created by a migration named “taxonomy_term_topic”.

field_associated_topic:
  -
    plugin: migration_lookup
    source: field_assoc_topics
    no_stub: true
    migration: taxonomy_term_topic        

field_associated_topic is a term reference field, and its values need to be set based on the term IDs from a term migration.

The source is an ID: the original ID of a term.

The migration is the migration that brought in these terms.

The result is the ID in the new site of the imported term.

Media and files import

A common way to import media into Drupal is to use two migrations: one for the basic files (the actual JPGs, PDFs, etc), and one for the media items (for the metadata like a friendly name or tags). A migration lookup is used in the media migration to reference the files from the files migration.

Content migrations that reference this media through media reference fields can use a migration lookup to retain that connection.

In the following example, the source CSV has columns for image_url and name (name of article). And the articles will have image references intact.

File migration.

id: example_file_image
label: Images from CSV
source:
  plugin: csv
  path: '/var/www/files/example.csv'
  ids:
    - primary_key
destination:
  plugin: 'entity:file'
process:
  status:
    plugin: default_value
    default_value: 1
  _destination_uri:
    - plugin: skip_on_empty
      source: image_url
      method: row
    # Custom plugin that sets the destination URL.
    - plugin: prepare_destination_image_url
  uri:
    - plugin: skip_on_404
      method: row
      source: image_url
    - plugin: file_copy
      file_exists: 'use existing'
      source:
        - image_url
        - '@_destination_uri'        

Media migration, looks up files from the file migration.

id: example_media_image
label: Image media from CSV
migration_dependencies:
  required:
    - example_file_image
source:
  constants:
    image_alt_prefix: Image for
    image_name_suffix: example image
  plugin: csv
  path: '/var/www/files/example.csv'
  ids:
    - primary_key
destination:
  plugin: 'entity:media'
process:
  uid:
    plugin: default_value
    default_value: 1
  name:
    plugin: concat
    source:
      - name
      - constants/image_name_suffix
    delimiter: ' '
  status:
    plugin: default_value
    default_value: 1
  bundle:
    plugin: default_value
    default_value: image
  field_media_image/alt:
    plugin: concat
    source:
      - constants/image_alt_prefix
      - name
    delimiter: ' '
  field_media_image/target_id:
    - plugin: skip_on_empty
      method: row
      source: image_url
    - plugin: migration_lookup
      source: primary_key
      migration: example_file_image        

Node migration, looks up image media from the media migration.

id: node_article
label: Articles from CSV
migration_dependencies:
  required:
    - example_media_image
source:
  plugin: csv
  path: '/var/www/files/example.csv'
  ids:
    - primary_key
  track_changes: true
destination:
  plugin: 'entity:node'
  default_bundle: article
process:
  # Truncated for brevity.
  field_image/target_id:
    - plugin: skip_on_empty
      source: image_url
      method: process
    - plugin: migration_lookup
      source: primary_key
      migration: csv_provider_media_image        

Wrapping up

Okay, that was a stack of techniques and info I find myself returning to repeatedly. Do you have any migration shortcuts you could share? I’d love to hear them. I hope this was helpful to you! Please let me know your thoughts and if you might find more posts like this useful in your work. Have a (mi)great day!

Bonus

Bonus tip from Benji Fisher : You will end up with specific and unique questions and use cases. A great place to get answers and discuss issues is the #migration channel in Drupal Slack .


#Drupal #DrupalMigration #Drupal10 #Drupal11

Danny Englander

Senior Drupal engineer, artificial intelligence enthusiast, and Acquia Certified Developer

2 个月

Well done David!

Benji Fisher

Drupal developer and core contributor

2 个月

Thanks for helping to publicize this information. I suggest two changes: 1. Instead of using migration groups to share configuration, use the newer mechanism from the mograte_tools module: https://www.drupal.org/node/3263258 2. After getting started, people new to the Migrate API (and people who are not so new) will have specific questions. The best place to get help is the #migrate channel in Drupal Slack.

Mads N?rgaard

Tech lead at Novicell with 10+ years Drupal experience

3 个月

Went through it and looks like solid tips on the most common pitfalls. Large data and long feedback loop can be a killer so this cover this and the use of Devel is also pretty insightful.

Sebin A. Jacob

Editor In Chief at The Drop Times

3 个月

Anish Anilkumar, you were looking for some migration resources, right?

Chris Kelly

Drupal consultant and core/module contributor. Contact me today for a quote. From site architecture and migrations to module development and debugging, I'll get it done for you either alone or as part of my or your team.

3 个月

Also, writing custom migration code isn't that difficult and in many cases it's easier & more maintainable than trying to have the same impact using YAML and contrib plugins. Custom code can also be used to provide better debugging so you can figure out what's wrong with the YAML etc.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了