RDBMS TO MONGODB MIGRATION
Ashutosh Bhardwaj
CISO at Choice | Leading Cybersecurity Innovation & Ensuring Business Resilience
With More and More enterprises moving to MongoDB, this article explain RDBMS to MongoDB Migration steps as recommended by MongoDB Community.
INTRODUCTION
The relational database has been the foundation of enterprise data management for over thirty years.
But the way we build and run applications today, coupled with unrelenting growth in new data sources and growing user loads are pushing relational databases beyond their limits.
This can inhibit business agility, limit scalability and strain budgets, be compelling more and more organisations to migrate to alternatives like MongoDB or NoSQL databases.
Enterprises from a variety of industries have migrated successfully from relational database management systems (RDBMS) to MongoDB for myriad applications.
This article explains to the project teams that want to know
how to migrate from an RDBMS to MongoDB.
A STEP-BY-STEP MIGRATION ROADMAP.
Organising for Success
Before considering technologies and architecture, a key to success is involving all key stakeholders for the application, including the line of business, developers, data architects, DBAs and systems administrators. In some organisations, these roles may be combined.
The project team should work together to define business and technical objectives, timelines and responsibilities, meeting regularly to monitor progress and address any issues.
Schema Design
The most fundamental change in migrating from a relational database to MongoDB is the way in which the data is modelled.
As with any data modelling exercise, each use case will be different, but there are some general considerations that you apply to most schema migration projects.
Schema design requires a change in perspective for data architects, developers and DBAs:
- From the legacy relational data model that flattens data into rigid 2-dimensional tabular structures of rows and columns.
- To a rich and dynamic document data model with embedded sub-documents and arrays.
This process involves following steps:
MIGRATION FROM RIGID TABLES TO FLEXIBLE AND DYNAMIC BSON DOCUMENTS
MongoDB stores JSON documents in a binary representation called BSON (Binary JSON). BSON
JOINING COLLECTIONS FOR DATA ANALYTICS
While not offering as rich a set of join operations as some RDBMSs, $lookup provides a left outer equi-join which provides convenience for a selection of analytics use cases. A left outer equi-join, matches and embeds documents from the “right” collection in documents from
the “left” collection.
DEFINING THE DOCUMENT SCHEMA
As a first step, the project team should document the operations performed on the application’s data, comparing:
- How these are currently implemented by the relational database
- How MongoDB could implement them.
MODELLING RELATIONSHIPS WITH EMBEDDING AND REFERENCING
EMBEDDING
Data with a 1:1 or 1:many relationships (where the "many"
objects always appear with or are viewed in the context of their parent documents) are natural candidates for embedding within a single document
REFERENCING
Referencing should be used:
- When embedding would not provide sufficient read performance advantages to outweigh the implications of data duplication
- Where the object is referenced from many different sources
- To represent complex many-to-many relationships
- To model large, hierarchical data sets.
INDEXING
Indexes in MongoDB largely correspond to indexes in a relational database. MongoDB uses B-Tree indexes and natively supports secondary indexes. As such, it will be immediately familiar to those coming from a SQL background.
APPLICATION INTEGRATION
With the schema designed, the project can move towards integrating the application with the database using MongoDB drivers and tools.
MONGODB DRIVERS AND THE API
MongoDB has idiomatic drivers for the most popular languages, including eleven developed and supported by MongoDB (e.g., Java, Python, .NET, PHP) and more than 30 community-supported drivers.
MAPPING SQL TO MONGODB SYNTAX
For developers familiar with SQL, it is useful to understand how core SQL statements such as CREATE, ALTER, INSERT, SELECT, UPDATE and DELETE map to the MongoDB API. A comparison chart with examples to assist in the transition to MongoDB Query Language structure and semantic
Application Integration can also involve the use of following as per the need of the Project Database. We will discuss them separately in our next series of articles.
MongoDB Aggregation Framework
Business Intelligence Integration – MongoDB Connector for BI
Atomicity in MongoDB
Maintaining Strong Consistency
Implementing Validation & Constraints
Enforcing Constraints With Indexes
MIGRATING DATA TO MONGODB
Project teams have multiple options for importing data from existing relational databases to MongoDB. The tool of choice should depend on the stage of the project and the
existing environment.
Many users create their own scripts, which transform source data into a hierarchical JSON structure that can be imported into MongoDB using the mongoimport tool.
Extract Transform Load (ETL) tools are also commonly used when migrating data from relational databases to MongoDB. A number of ETL vendors including Informatica, Pentaho and Talend have developed MongoDB connectors that enable a workflow in which data is extracted from the
source database, transformed into the target MongoDB schema, staged then loaded into document collections.
Many migrations involve running the existing RDBMS in parallel with the new MongoDB database, incrementally transferring production data:
- As records are retrieved from the RDBMS, the application writes them back out to MongoDB in the required document schema.
- Consistency checkers, for example using MD5 checksums, can be used to validate the migrated data.
- All newly created or updated data is written to MongoDB only.
OPERATIONAL AGILITY AT SCALE
The considerations discussed thus far fall into the domain of the data architects, developers and DBAs. However, no matter how elegant the data model or how efficient the
indexes, none of this matters if the database fails to perform reliably at scale or cannot be managed efficiently.
The final set of considerations in migration planning should focus on operational issues.
The MongoDB Operations Best Practices enlists:
- Management, monitoring and backup with MongoDB Ops Manager or MongoDB Cloud Manager, which is the best way to run MongoDB within your own data center or public cloud, along with tools such as mongotop , mongostat and mongodump
- High availability with MongoDB Replica Sets, providing self-healing recovery from failures and supporting scheduled maintenance with no downtime.
- Scalability using MongoDB auto-sharding (partitioning) across clusters of commodity servers, with application transparency.
- Hardware selection with optimal configurations for memory, disk and CPU.
- Security including LDAP, Kerberos and x.509 authentication, field-level access controls, user-defined roles, auditing, encryption of data in-flight and at-rest,and defense-in-depth strategies to protect the database.
CONCLUSION
Following the best practices outlined in this article can help project teams reduce the time and risk of database migrations, while enabling them to take advantage of the benefits of MongoDB and the document model. In doing so, they can quickly start to realize a more agile, scalable and cost-effective infrastructure, innovating on applications that were never before possible.