How to migrate Legacy SCM
Intro
As became clear in the first part, the often-demanded complete migration is a myth. Due to the differences in SCM systems and the costs of migration, even with unlimited time and human resources, an approximation is possible at best.
But what does a pragmatic and cost-effective approach look like that doesn't neglect compliance? We'll stick with the use case Subversion replacement by Git, but it can be applied to all conceivable combinations in a general form.
Step 1 – Analysis and Triage
The first and most important preparation for the rest of the process is a precise examination of the existing data. In addition to pure volume data, it also includes other information such as links between repositories and the type and scope of so-called binaries. A manual examination is not possible due to the typical quantity structure. We use an analysis script from Openpool ?. The license from Openpool allows the script to be modified and expanded. The result is a precise overview of all data relevant for a migration.
Triage can be used to divide the data into three groups:
1. Instant archiving
The date of the last change is usually used for a decision, e.g. more than 5 or 10 years. This data has demonstrably not been changed for a long time and can therefore be archived immediately.
2. Independent repositories
The criterion for this category is that there are no externals to other repositories. This data is migrated automatically.
3. Dependent repositories or a high inventory of binaries
Data in this group has externals to other repositories, or a high inventory and binaries (> 30-50% of the number of objects). This group requires manual and special migration.
Step 2 – Archiving
Archiving consists of 2 steps.
With the help of Sourcepool, also a tool available in Openpool, individual revisions (e.g. versions with a tag) are stored in a normal, hierarchically organized file system.
/<repo name>
/<branch or tag>
/<root directory and all subdirectories>
This file system can later be used to access certain versions (e.g. those that have gone into production) without the need for the legacy SCM system. All binaries are also stored in the file system, so that the structure from the branch/tag level corresponds to that of a workspace in Subversion.
The second step is also quick and easy to do. The repository to be archived is set to read-only, the information about the repository is deleted from the server, and the repository's data directory is saved in a tarball/zip archive.
Step 3 – Automated Migration
领英推荐
Despite the automation, some manual preparatory work is necessary, which often takes more time than the actual migration. At best, you have the option centrally to define the number of revisions to be migrated. In this case, “Congratulations”!
In most cases, however, discussions with the projects and individual agreements about the scope are necessary. Likewise, with the projects, if necessary. It must be determined what happens to the binaries that are excluded from the transfer during the migration, since the projects in any case had a good reason to store them in an audit-proof manner.
After that, the procedure is simple. They hand over the repositories and the history area to Blackbird, another script from the Openpool family. The result is a Git repository with the desired history, as well as a tarball/zip archive with all binaries, including the path in the internal directory tree of the repository.
Step 4 – Edit special cases manually
One positive message first – the preparatory work is manual, while the implementation is supported by the tools mentioned.
Let's start with the simpler use case, a high inventory of binaries. A conversation is usually enough to understand why binaries were placed in Subversion. The following distinction can be used as a guideline:
- Executable programs, libraries, frameworks etc. should be stored in a binary management system such as Nexus or Artifactory.
- All forms of documents and graphics should be stored in a document management system, but can also be stored in a separate Git repository.
The discussions about dependencies between the repositories are usually more complex. For us, the starting point is always whether a module/component/subsystem is used more than once. If yes, it will be migrated to a separate Git repository; if no, it will remain part of the shared repository or will be transferred there before the migration.
Following the migration, parts that are used multiple times are manually integrated into the logically higher-level repository as submodules or as a subtree, depending on your preference.
The aforementioned tools help with the transfer of data and history.
One last task remains - as soon as the last of the connected repositories has been transferred to Git, the tasks listed under step 2 should be processed for all repositories!
However, step 2 should only be started after the data owners have given their approval for the result of the migration.
?
After the repositories have all been transferred to Git, you have reached the following intermediate status.
- All Subversion repositories are archived in compressed form, important versions can be accessed without additional tools.
- The Git repositories exist, but still need to be integrated into the Git environment so that access control takes effect in accordance with company policy.
- The Git repositories have been cleaned of binaries. These are available as a tarball/zip file. The individual elements still have to be transferred to their future storage locations or they are generally archived.
?
The last part is about dealing with the archived data of the legacy SCM system.