Digital Transformation - Modern DB
Tech Debt also lives in the database! I've done a MRP transformation a few years ago, and one of the shocking things was the huge amount of totally invalid data that should have never got into the database. Of course I'm a firm believer in testing with a copy of full production data, to find such the sediment that will cause issues when things go to production. So how do you really get a database transformed from the old to the new?
First, where are we comings from? What db is used, and what is the schema for that data. A test environment will need to be created with the original data, and the schema needs to be extracted. and the data. In my case, the db was proprietary, so I had to contact the company and get a library that would let me access the data. Since the db was "ancient", there was no modern support for it, so I had to do a minimal driver for it. Now we need to transform the data into a new db, preferably something that is more current.
One of the concepts that I love for data migration/transformation is the concept of ORM and Migrations. This is a Ruby On Rails concept originally, and works really well. a Object Relationship Manager gives standard semantics to a underlying db, and allows code to be independent of db. Literally you could use sqlite for test, and oracle or postgres for production. In today's environment I would use golang and a framework called Buffalo, which includes "fizz". Fizz is a Domain Specific Language for migrating databases,. It tries to be as database-agnostic as possible and simplify the conversion, change and modification of data. Effectively it gives us a easy way of "scripting" database transformations.
Now the next problem was getting all the tables and everything into the "fizz" files we need to recreate the tables in the new system. For me, I was transforming a production application that had been in use for more than 10 years with a large number of tables. So ideally for this component I would write a bit of golang that reads the schema of the original db, and creates fizz files that would allow the recreation of the db in any db supported by the framework. This also gives us a bit of future proofing our db choices. As once we have our fizz files, we could change the target db with little to no pain.
Next choice is, what db do we use? The intention would be to use something that would work in a Kubernetes or OpenShift environment, that could scale horizontally and vertically, and would support a modern cloud native environment. It should also support todays SSD and memories sizes and really be able to handle the level of performance that is needed. Traditional solutions just do not offer the level of scalability that is needed for modern applications, and will incur a huge amount of technical debt right from the start. So the better solution is to choose a cloud native solution as our target. NUODB is already integrated with OpenShift, and gives us SQL and ACID compliant transactions, fully redundancy and scalable. So in my 2020 Technical vision using NUODB really solves a huge number of problems.
So now we have the migration tools, the source and the target, there a few more points. One of the most important is to make sure we can run the migration multiple times, and it not cause corruption of data. In my own experience migrations can take a huge amount of time to run, and you will often want or have to restart them due to issues of bad input data. So in designing your transformation, error recovery and restart is very important element.
During this transformation, the other area to be concerned on, it matching the hardware runtime environment for the new infrastructure to match the current industry standards. While this could be a log article by itself, modern db's can use huge ram footprints, and Pci Express based SSD's to offer huge performance. In a OpenShift environment it may be worthwhile to have nodes that are configured for databases, and tagged for that application. This could allow your production db pods to run significantly better.
The DB is one of the important layers in moving to a 2020 cloud native environment, also looking at the api's, the framework, and the total flow of the app, and where is runs are additional parts of the total solution. Also making sure we get the most done in the least time. (Agile/Scrum) is equally important.
I will explore these elements further in coming articles.