Challenges of database automation
In recent years we witness the rise of Agile methodologies in the software development arena alongside the assimilation of DevOps tools as an integral part of the development process. If we cross-check several surveys done in recent years, we can surmise that 60-65% of organizations have already adopted Continuous Delivery processes and from those that have not done it yet, a majority claims they wish to do so.
Implementing continuous delivery methodologies in the development lifecycle deliver many advantages for an organization. It lets organizations to rapidly, efficiently, and reliably bring updates, service improvements, software patches, new versions, (etc. ) to market and stay a step ahead of the competition. The development teams gain faster delivery of improvements and fixes.
The problems arise when we go and look at the database side of the development process. Recent surveys and studies show that most of the organizations that have adopted Continuous Delivery processes for the application code development have not even begun to address adding the database to the Continuous Delivery process and from those that have, they still implement manual steps in the process.
So we see a definite gap between the applicative side which incorporated automation, continuous delivery and continuous integration to its development lifecycle and the database side which its development is still riddled with manual steps.
Organizations are having difficulties balancing between the two sides. Automation on the applicative side while using manual steps when implementing database changes. It is apparent that while the database should be our strongest link in the development lifecycle (All of our data sits in the database – Errors in the database may have catastrophic results), it is actually still developed with manual steps, with direct, undocumented changes, with no version control, etc. Those are behaviors and processes that have not been tolerated on the applicative side for a long time.
A few years ago, a new player appeared in the field of DevOps for the Database which aimed to solve development and change management issues while bringing all of the benefits of DevOps practices to database development.
DBmaestro Teamwork supports agile development methodologies, continuous delivery, continuous integration and DevOps for the database. Teamwork streamlines management of development processes and enforces change management and deployment policies while strengthening collaboration between developers and varied teams.
In this article I will list several of the main issues troubling database developers and DBAs striving to incorporate automation in database change deployments and show how DBmaestro Teamowrk solves them in order to provide a true safety net for Database Automation.
The first aspect I will examine is Version Control. When I look at application code development done nowadays, I expect to find the developers working with one of the many Version Control tools that are prevalent in the industry. Tools like TFS, Subversion or GIT. If I want to affect a change in Java code or C# code, I perform a check-out, modify the code, I perform a check-in and I get a closed, ordered development work process. If I did not firstly check-out the desired code and only modified it locally, than I actually did not affect anything. My change was not registered and nothing was inserted to the build. The only way to affect the next build, is to go through the ordered process and perform a check-out before modifying my code.
But when I converse with colleagues at enterprises that incorporated DevOps methods on the database side, I encounter the use of a version control system operating side by side the database runtime engine. Many of those enterprises implemented, what appeared to be, a solid work process for implementing DB changes.
They save the DB scripts in the VC system. When a DBA wishes to modify a DB object, he takes the relevant script from the VC repository to his runtime environment (The database) and performs the required modifications, fixes, adding new features (etc.). When he is done, he returns the newly modified script back to the VC repository. It appears that we have met our goal and have in our hands a solid, closed and ordered work methodology.
But, if we examine this process closely, we can see that what we have are 2 separate entities that have no correlation between them. We do not have a single source of truth. Not to mention the fact that our "automatic" process is riddled with manual, unsupervised steps.
What will prevent 2 different developers from modifying the same procedure simultaneously? What will prevent one of them from overriding the other's implemented changes? Also, when manual steps are involved, there is always a risk of human errors (Did we remember to return the modified script to the VC system? Did we validate that the DB is synchronized with the latest version saved in the VC?)
DBmaestro takes a different approach. Instead of working with 2 separate tools, one for the version control and the other for our runtime, the solution implements the same approach we are used to seeing in applicative code. Here, we implement a single source of truth by combining the version control in our runtime environment.
If I want to modify a DB object, whether it’s a procedure, an index, a table, metadata (etc.), first, I need to check the DB object out. Then I can perform the required modification, run the required tests and do a check-in after I am done. This process is enforced from every interface. Whether the change came from the command line, from IDE A, IDE B or any other source. This way, I have a closed loop as my work process and a single source of truth for the database instead of working with 2 separate entities.
Using this method, I have solved many of the issues that arise from unsupervised, undocumented manual steps. Since, every change has to go through the check-in\check-out process, I eliminate undocumented changes in the DB. I have to declare and document each modification I perform. Also, once I have documented the change, I can correlate it to Task-Based development. I can correlate a change to a business process, to a work item, to a trouble ticket.
Now, let's examine configuration drifts. A database development landscape consists from several environments. Dev, QA, Pre-PRD, PRD (etc.). Contrary to application code development, each environment must hold the entire DB schema, procedures, indexes, metadata (etc.). Each environment holds the database's configuration.
This tells me that too easily we can encounter a scenario where we have inconsistent configuration between different environments. For example, my team develops a change script. We just finished testing it in QA and plan to move it to the Staging environment after the weekend.
During the weekend, a performance issue caused the on-call DBA to rush from home to solve it. As a result, he introduced a hot fix straight to the staging environment and from there to production. When my team gets back to work the following week, unbeknownst to us, the configuration of the database in Staging is not what the script we created expects to find. Our, already tested script is out-of-date.
Best case, we catch it when we move the script to Staging and need to back-track in order to modify the script accordingly. Worst case, we do not notice the configuration drift since most of the testing was already done in QA and the bug will continue on straight to production. Over 60% of organizations report that they have to back-track and repair scripts all the time. Such work practices hinder the move to automation and continuous delivery.
In the past, enterprises employed Compare-and-Sync tools as part of database change management. It is an amazing set of tools that could identify for the user the differences between a source environment and a target environment.
But, all of these tools have a common drawback. They can indicate for the user what is different, but is it safe to use them in automated processes? Let's examine 2 opposing scenarios where we are trying to move from QA (source) to PRD (target).
The Compare-and-Sync tool identified that we have an Index in QA but that it is absent from PRD. The tool would automatically suggest that we add the Index in PRD. But, is it really safe to accept the recommendation at face value? Who said that the Index was added in QA? How can we know that the Index was not removed during a hot fix implemented in PRD?
Let's look at it from an opposing example. This time, the tool identified that we do not have an Index in QA but that we do have an Index in PRD. Again, the tool would suggest to override the Index and remove it from PRD. As we saw in the previous example, again we need a human to intervene and check whether the Index was removed as part of the change we are implementing or was it added before we advanced the script. More than 70% of enterprises report that they have to insert manual steps while using this tools in order to decide the next course of action.
DBmaestro Teamwork takes this to the next level. Instead of damage control, the solution employs impact analysis with a 3-way baseline-aware analysis. First, we need to be able to define the source for the change the tool identifies for me. So, instead of examining a source and a target only, we add a baseline to the mix. The baseline can be any good reference point such as the previous version, or the last label.
In this way, DBmaestro allows for a real database build and merge automation by identifying which changes are safe to deploy and provides a robust safety net for all of out-of-process updates.
- If the Source, the Target, and the Baseline are the same – No change was introduced and no action is needed.
- If the Source differs from the Baseline, and the Target is the same as the Baseline – An update was introduced to the source environment and no conflict is recognized in the target environment so we can safely deploy the changes from the source to the target.
- If the Target differs from the Baseline, and the source does not – No modification was made to the source, but a change was employed to the target (A second team introduced its update, etc.) than we need to protect the changes in the target environment and not allow a previous version from the source to override them.
- If both the Target and the Source differ from the Baseline – Teamwork recognizes a conflict and suggests a merge script to solve it.
With its Roles & Responsibilities feature that allows an organization to force change management policies down to the DB object level per user, the integration with all of the relevant IDEs, its integration with task-based development tools and release automation tools, DBmaestro Teamwork combines this with its version control and its 3-way baseline-aware analysis to bring true continuous delivery, deployment automation and DevOps practices to the database.
You are welcome to contact me for more information at
Shachar.Furman@Ness-Tech.co.il