How to handle legacy data

Let's assume that the previous repositories have been converted and the projects are fully operational with the new SCM environment: Congratulations!

Many companies are now taking a simple approach - the previous environment is declared a museum and continues to operate. In the best case scenario, licenses that are no longer needed are canceled and access is made more difficult by revoking permissions. This is usually based on the fear of not being able to react quickly enough if access to the complete data (especially non-archived interim statuses) should become necessary.

However, this supposed security has a price and involves risks.

? The costs for the infrastructure and, for commercial solutions, licenses continue

? The know-how must be further maintained, especially for administration and problem solving. This puts a strain on the resources responsible for it.

? Such systems are often not maintained like production systems; upgrades and patches are no longer installed. Hardware and operating systems also age. This opens up security risks for operations.

? The know-how for dealing with the old system must be available both on the developer side and in IT support.

Depending on the size of the area, the costs incurred can add up to a 5-7 figure amount per year. Even with appropriate budgets, this money should be spent more sensibly. It is therefore important to actively promote the shutdown of legacy systems.

Preparation

The first step is to cut off access to the old environment. Whether this takes place through withdrawal of rights or changes in the network is irrelevant. What is particularly important is that existing users no longer can actively work with the data. The repositories are then permanently archived in their original format in accordance with company guidelines.

It is important to carry out recovery tests after archiving. This ensures that the archives are readable and that all process steps have been documented to be able to put a repository back into operation from the archive.

Building a museum environment

For this we are again using Subversion as a blueprint for the structure. We use a real museum as a model. Every museum does not have all the exhibits it owns in its exhibition. As a rule, the number of exhibits stored in the depot far exceeds the items shown.

Instead of simply leaving everything in the SCM MuseumSCM Museum online, it makes more sense to first empty it, save everything in a depot and “issue” the requested repositories when necessary.

What do you need for this? A stack of containers, cheap disk space and robot workers. Sequentially:

? Subversion, Nexus and Jenkins (we stay in the open source area) are installed and configured in containers

? The repositories are each packed and stored as a tarball in a Nexus repository

? Building a Jenkins pipeline with the help of which a repository is fetched from the Nexus repository, unpacked and taken online in the Subversion container.

? Building a Jenkins pipeline that waits for X days and then automatically deletes the Subversion repository.

It is then very easy for the user to use it. You log in to the Jenkins instance (which runs on almost any, even older hardware with little memory and CPU power). He starts his personal “exhibition” by entering the repository name and receives the access data if the pipeline is successfully executed. This allows him to explore history as he wishes. At the same time, it is ensured that a certain amount of time pressure remains, otherwise the “exhibition” will end after a defined period.

The advantage of the approach is, on the one hand, that both data and systems only become active when there is a specific reason, and on the other hand, that users can gain access in self-service mode, meaning no additional IT resources are burdened.

Another aspect of the museum approach is that it puts a small threshold in place for a user to access the data. Even though the user can gain access with only a few clicks, it still requires an action. With that, it shows much more accurately the actual demand of users to access the museum.

Tool-independent archiving

Since the environment exists with limited access but is fully functional, all compliance use cases can be covered with such a museum approach. However, most requirements can also be met more easily and cost-effectively. If you do not need the complete history and all intermediate statuses, tool-independent archiving of the data under version control is the cheapest and easiest approach.

This is based on the approach of unloading the necessary versions of the development from the proprietary repositories into an easy-to-understand directory structure. For later access, a simple search interface is sufficient; all other actions can be carried out using normal operating system means.

How does this work in detail? To do this, you need a client system with access to the legacy repositories and a source pool (projects – OpenPool). This can be scripted and automated using the Sourcepool CLI. Sourcepool consists of connectors that allow access to the legacy data, as well as other connectors that enable the data to be transferred to the target format (currently Flat Fileplain file system hierarchy and Git, as of April 2024).

To do this, it is determined which configurations should be read from the legacy system, e.g. only tagged versions, major releases, etc. These are then stored in a directory structure with the following structure:

<SCM System>/<Repository>/<Version Identifier>/Sources

Thanks to the simple structure of the directory hierarchy, stands can be easily found and indexed to simplify the search. A cheap storage location is sufficient for the storage itself, which reduces infrastructure costs. And if configurations were forgotten during the actual migration, they can be loaded from the archive into a new version control system using Sourcepool.

Further details and example implementations for Subversion will be available via Openpool from Q3/2024.

In general, the procedure described can also be transferred to other SCM systemsSCM systems.

Finally, an important note. All approaches presented are blueprints, e.g. standardized approaches. These should (and have to) be adapted to the specific situation in the respective environment, depending on compliance requirements and the requirements of the projects.

要查看或添加评论,请登录

ASERVO Software GmbH的更多文章

社区洞察

其他会员也浏览了