Choosing the Right Database for Your Business Needs: Relational vs. Non-Relational vs. Graph
Peter Tylee
Professionally delivered software projects. On-time. On-budget. Above expectations. We solve your software problems ? Fully-managed ? Outsourced ? Project Delivery Specialists.
There are more than three types of databases. The more common ones are hierarchical databases, network databases, and object-oriented databases. But in this blog post, we will take a closer look at other types of databases that have been emerging such as relational, non-relational, and graph databases and their differences and how they can be used.
Databases used to store and manage data include relational databases, non-relational databases, and graph databases. They do, however, have their own distinct features and characteristics that make them better suited for specific applications.
Relational Databases
For starters, when you speak of relational databases, ultimately think of it in terms of data being stored in tables. Those tables have columns and rows, very similar to an Excel spreadsheet, and the data can be linked between those tables; a column in one table can point to other data in another table, and that's why they're called relational databases. They are also known as SQL databases as they are the most traditional type of database. Some examples of popular relational databases include MySQL, Oracle, and Microsoft SQL Server. Major companies that use relational databases include Bank of America, eBay, and FedEx.
One of the main advantages of relational databases is that they are highly structured and offer a consistent, predictable way of storing and accessing data. This makes them easy to use and understand, and they are well-suited for applications that require a high level of data integrity and consistency, such as financial systems and customer relationship management (CRM) systems.?
However, relational databases can be inflexible when it comes to storing data that doesn't fit neatly into a table structure, and they can be difficult to scale as the volume of data grows. They also require a lot of upfront design and planning, which can be time-consuming and inflexible. When you think of a relational database, you can think of it as having limited horizontal scalability. It tends to have slow table joints for very complex joints and very large amounts of data.? Other considerations for relational databases tend to be that they have a rigid schema. They are very high performance for transactions, and they tend not to be particularly good for deep analytics.
Non-relational Databases
Moving on to another type of database, the non-relational databases or also known as NoSQL databases, these are newer types of databases that were developed in the mid-2000s in response to the limitations of its predecessor. Unlike relational databases, which are structured around tables and rows, non-relational databases are more flexible and can store data in a variety of formats, including key-value pairs, documents, and graphs.? Some examples of popular non-relational databases include MongoDB, Cassandra, and Couchbase. Major companies that use non-relational databases include Facebook, Google, and Amazon.
One of the main advantages of non-relational databases is their flexibility and scalability. When you think about non-relational databases, large amounts of data goes along with it, and what's happening is that you're doing sort of multiple scans over a very, very large table to be able to index tables and find the data that you're looking for. These types of databases are designed to handle large volumes of unstructured data and can be easily scaled up or down as the needs of the application change.
However, non-relational databases can be less predictable and consistent than relational databases, and they may not offer the same level of data integrity. What you tend to get is the possibility of having a very loosely defined schema and you can ultimately dump whatever data you want in there. It's high performance for simple transactions but it tends to be poor for deep analysis. They can also be more difficult to work with and query than relational databases, as they do not use a standardised language like SQL.
Graph Databases
Graph database, on the other hand, is the new kid on the block. As the name suggests, these are databases that are designed to store and manage data in the form of graphs. A graph is a collection of nodes and edges, where nodes represent entities (such as people, places, or things) and edges represent relationships between these entities. Some examples of popular graph databases include Neo4j and Amazon Neptune. Major companies that use graph databases include eBay, Airbus, and NASA.
There had been a number of generations for this type of database. Putting into perspective, the first and second generations were not really satisfactory based on experience. And sometimes some of the critical articles about graph databases that you'll find online are referring to version one, like generation one graph databases, and they were not native graphs so they were not storing things as graphs actually do on the persistence layer. There are about 12 other types that have turned up, which we could call the Cambrian explosion of graph databases, because we're just getting so many. However, we could also suspect that not all of them will survive long term. Hence, if you are thinking of picking one of the particular graph databases to use for a project, then longevity of that specific implementation might be a big consideration.
Nonetheless, when you think graph databases, you don't need to do table joins to be able to extract the data. They can have flexible schemas and can have high performance with complex transactions and you can get high performance deep analytics. One of the main advantages of graph databases is their ability to represent complex relationships and connections between data in a natural and intuitive way. They are well-suited for applications that require fast querying and analysis of relationships, such as social networks and recommendation engines.
If you use Facebook, Instagram, Twitter, or even Google Search, which uses Google's knowledge graph, you are using some form of graph database. These databases have the ability to handle large amounts of data and can scale up and out to meet global demands.
Three Considerations in Choosing the Right Database
There are three major factors to consider when selecting the right database. The first is horizontal scalability, or the database's ability to handle increased workload by adding more nodes. The second factor is the schema's flexibility, or the database's structure. A flexible schema allows for greater flexibility in data storage and organisation. The third factor to consider is the database's ability to perform deep analytics, or to analyse large amounts of data to uncover patterns and trends. You can choose the best database for your application or project by carefully evaluating these three factors.
Horizontal Scalability
What exactly is horizontal scalability? The best example of this is launching a global project. Assume you have users in Australia and users in the United States; if you have a single database, you'll have to choose some group of users who will receive poor performance. If you host the database in Australia, then when people in the United States try to access the application, even if they have a local application server, the performance won't be great because it's talking to a database in Australia.
If you replicate that database and you host a copy over in the United States, then you can get high performance reaccess in both locations. However, you're really only going to be writing to one of those databases. So, if you have the database in Australia as the one that you're writing to, then you get great performance in Australia for writing the database, but not great performance in the United States.
领英推荐
Although sharding can help to a certain extent, it's important to note that it can be complex and may not always be successful. If you're planning on scaling a relational database to a global level, be aware that you may need to invest significant effort in achieving horizontal scalability. Just a word of caution.
Now, some cloud service providers have resolved a significant portion of that problem for you, and you can use things off the shelf that can handle sub-10 millisecond speeds at global scale. As you may have noticed, Microsoft promotes sub-10 milliseconds at the global scale for the Cosmos database. However, be prepared to pay for it. At least based on experience, cosmos is not particularly cheap. Ergo, if you are thinking about horizontal scalability and if you require it, perhaps don't go relational.
Flexible Schema
?Schema flexibility is another consideration on the list. Now, if you have a situation where you have a rapidly evolving schema, then it may not make sense to store the data in a relational database. Startups that frequently work in a fast-paced environment frequently pivot, but they also don't have a complete understanding of what it is that they need to model because business requirements change. So being able to plan ahead of time in terms of schema flexibility is an advantage
If you work in a fast-changing environment and require a flexible schema, you should consider using a database other than a relational database. While it is possible to change the schema of a relational database, translating the data to the new schema can be difficult. It is also not recommended to store multiple versions of a schema or duplicate tables in the same database.
On the other hand, say a non-relational database has a flexible schema, such as graph databases, but let's ignore them for the time being. Finally, you can persist, say, the beginning of the year, a specific version of the schema in the database. You can then change the schema later on and then persist into the same database without having to change anything, hence, a new version of the schema. And as long as the application code can handle both sets of data, then you're good. You don't actually have to do anything else in the data.
Ultimately, something to think about if you're going to have a rapidly evolving schema, maybe look at a NoSQL database.
Deep Analytics
The third point is all about deep analytics and reporting. We've had a number of different projects that have had deep analytics requirements or complex reporting requirements and yes, you can do it with any of the database technologies.?
Depending upon how much effort you wanna put into it, you can do pretty much anything with any database. It's just that you're going to run into issues and difficulties, and perhaps, have regret later on.
We recently encountered a complex reporting requirement in the agricultural sector. The data for this requirement was stored in a NoSQL database, which allowed us to access it easily. However, the requirements changed, and we needed to generate reports that involved data from multiple document types within the database. This posed a problem because NoSQL databases do not support table joins, which are necessary to combine data from multiple tables. As a result, we were only able to generate the reports that we needed by manually fetching data from the relevant documents and combining it ourselves.
Relational databases, also known as SQL databases, are capable of generating reports. The complexity of the reports that can be created depends on the data stored in the database and the ability of the database execution engine to optimise the queries used to access that data. If the data is very complex and the database has a large number of rows, it may be difficult to generate the desired reports within a reasonable amount of time, even with a powerful and sophisticated database engine.
That is probably why some of the large data tech companies don't use it anymore. Although there are still companies that are successfully using a denormalized relational database, to do reporting, and that can work well. Let's say it is good for transactional purposes, database design for relational databases try and achieve something called third normal form, which is simply a way of saying, do not duplicate any data in this database.
When storing data, it is important to avoid storing multiple copies of the same information. This can be thought of as similar to having a column in an Excel spreadsheet with repeated values, such as "cat" and "dog". To "normalise" this data, it can be moved to a separate table with just two entries, "cat" and "dog", and the original table can refer to this data. This process, called "normalising" the database, is beneficial for transactions and storage, and is a common goal in database design.
However, on the flip side of things, it's not great? for reporting purposes, because then you've got two tables and you have to join them. Now for a trivial example like that, it's not going to be a problem. Modern database execution engines can handle that without any problem at all. The difficulty is that if you have billions of records in this database, then suddenly it's not trivial anymore and it does require a lot of work and consequently, it can bring a reporting application down to a sneeze and it won't be able to cope.?
If you store denormalized data in the original table, you may still be able to use it for reporting purposes. The design of the database should be based on the intended use of the data. Denormalized data is not as helpful for deep analytics, but it can be useful for reporting.
How about graph databases? Well, they are good at horizontal scaling and vertical scaling. They can cope with schema flexibility if you need changes, and they are good at deep analytics. In fact, their strength is deep analytics and reporting. And that is a reason why the really big tech players in town are using graph databases for those purposes.
While graph databases have their benefits, there are also considerations to keep in mind. One of these is the possibility that the database provider may cease operations, which could cause issues for applications that rely on it. There may also be a consolidation phase in the future, with some graph databases not surviving. It is important to choose a stable and reliable database provider to avoid potential issues with the application.
This is like the end of life already. So that's a risk and something to definitely take into consideration. But for the most part, there are a lot of positives about graph databases. There aren't as many developers that are familiar with it and because it's a newer database technology, we don't have decades worth of experience the same way that we do with the other types.
Although there are potential risks to consider when using a graph database, the technology has advanced to the point where it is now ready for use in production scenarios. In the past, it may not have been recommended for complex projects, but that is no longer the case.
And the reason for this is that they have been battle-tested by the city's big tech players. And technology has advanced significantly in the third generation. Now, if none of the things we've just discussed are relevant to your application, I'd probably avoid it because it will be slower to develop simply because the tooling around it isn't as advanced as some, say, relational databases or NoSQL databases. But as to where it makes sense, where you have some of these requirements, such as big data, deep analytics, or horizontal global scaling, it is definitely worth looking into.
Conclusion
In conclusion, while relational, non-relational, and graph databases are all useful tools for storing and managing data, each is best suited for a different type of application. Relational databases are ideal for applications requiring a high level of structure and consistency, whereas non-relational databases are more adaptable and scalable. Graph databases are ideal for applications that require complex relationships and connections to be represented. Finally, the best database will be determined by the application's specific needs and goals.
Head of Business Transformation | Quema | Building scalable and secure IT infrastructures and allocating dedicated IT engineers from our team
2 年Peter, thanks for sharing!