Mongo VS Postgres
Mongo VS Postgres

Mongo VS Postgres

Dear everybody.

I am (too) often asked “Which database is better, Postgres Or Mongo” And my reply is always that it is the wrong question. The right question to be asked? is “Which database is better for our needs” So the idea behind this article is to revisit the main points by which you may decide on which database to pick, and give my take for each.

These are ONLY my ideas and experience with both databases. You will not find any TPC/performance tests here. Also I probably missed entire topics, and if so please mention that in the comment.

After reading this article, I may sound like a Postgres fan boy (admit as charged) but I honestly tried to be as even and impartial as possible. I do not gain anything from this article, Nobody from Mongo or Postgres reviewed this article. Shortly put, I don't owe anything to anyone.

As it is a big subject, I will publish it in 2 parts. The first part includes:

  • ACID vs Base
  • Data types
  • JSON support
  • Speed for small updates/queries (OLTP)
  • Speed for batch processing (large updates and queries)
  • Database design (ERD)
  • SQL support
  • Database Limitations
  • The second part includes:
  • Extensions
  • Indexing
  • Backup and recovery
  • Practical max size per node
  • Sharding
  • Monitoring
  • Connection pooling
  • Proxies
  • Development and Administration tools.

So let’s begin!

ACID vs BASE

First and foremost, Postgres is ACID VS Mongo which is BASE. ACID stands for (Atomicity, Consistency, Isolation, Durability) and refers to a standard set of properties that guarantee database transactions are processed reliably. BASE stands for Basically Available, Soft State, Eventually Consistent. BASE databases (mostly NOSQL databases such as Mongo) are not transaction safe as ACID but compensate with better performance in some scenarios. Mongo is making progress to become more ACID (as explained nicely here - https://www.Mongo.com/basics/acid-transactions) but is still behind Postgres in that matter. For example, Mongo now has transactions but creating/dropping a collection is not part of the translation (according to my tests as there is no reference to that in the Mongo documentation( With Postgres however:

BEGIN; Create table MANAGERS as select * from EMP WHERE manager = ‘YES’;

DELETE FROM EMP WHERE manager = ‘YES’;

COMMIT;

The ‘create table MANAGERS’ command is still part of the transaction. If the create statement fails, the entire transaction can be rolled back.

In fact Postgres is even more ACID than Oracle for example. When you issue a DDL command (data definition language, such as creating a table) in Oracle it automatically commits the transaction, making the transaction a mess.

So Mongo does support multi document transaction support but the documentation says “Limit each transaction to 1,000 documents” which makes it (Mongo’s multi document transaction) useless in some common scenarios , for example loading a large collection. ACID databases are preferred for applications where accuracy of the code data is most important. For example Banking, trading, and health.

BASE databases are preferred in applications where performance is crucial and you are willing to sacrifice some accuracy for example advertisement industry, fraud detection, prepaid phones, and monitoring systems.

You may argue a fraud detection system? is a classic case of a system that MUST use an ACID database, and to some extent you will be right. However these systems rely on a high volume of incoming data (which Mongo is pretty good at) and then running many rules/checks on each incoming card transaction - to identify a fraud attempt. The more checks you run the higher? the chance of identifying a fraud attempt. So there is the trade off:

With Mongo you can run many mostly accurate queries.

With Postgres you can run less queries but with 100% accuracy.

My view on this subject is unless you know for a fact your application is similar to the ones I stated as eligible for Mongo, use a general purpose ACID database such as Postgres.


Data Types

Mongo “speaks” JSON. Data is received, stored and returned as a JSON document. Mongo has other data types (INT,DATE) but these are hardly data types but more a way to tell Mongo “Save this date inside the JSON doc as a date”. You see, Mongo is schema less. As such You cannot define a “Table” with a column of type date or int. So the way Mongo handles data types (which are not JSON) is not ideal.

Postgres on the other hand supports JSON and several other data types. All the basic ones: int, number, text, date, bytea and some more advanced:

Chkpass- A Postgres data type that is designed for storing encrypted passwords. Each password is automatically converted to encrypted form upon entry, and is always stored and fetched as encrypted. A typical query will be: select ‘user pass is OK’ from myusers where stored_pass=’1234’ This will check if the user passed the correct password. On the other hand, running the next query is useless as Postgres will not show the actual content of the stored_pass column: Select stored_pass from myusers;

IP - another Postgres data type that will store and validate the format of the received value to be in an IP format. And it’s not a half baked feature. For example inet and CIDR are supported, enabling queries such as: SELECT '192.168.1.19'::inet << '192.168.1.0/24'::cidr;

Mongo is committed (no pawn intended) to JSON while Postgres is not. JSON is cool and widely used today but so were XML,ConfigObj and YAML. You can never expect when a data type goes out of fashion so why commit to one?

JSON support

So we established Postgres is better with other data types but what about specific functionality around JSON? A word about JSON first. JSON stands for JavaScript Object Notation and is a lightweight format for storing and transporting data.

JSON is often used when data is sent from a server to a web page and is "self-describing" and easy to understand.

Both Postgres and Mongo support JSON.

With Mongo, the entire database is built around JSON. data is stored and retrieved as JSON. The Mongo API also “speaks” JSON. For example when you run a simple update in Mongo, this would be:

db.toys.updateOne({ toy_name: "Buzz Lightyear" }, { $set: { "can_do": "fly" } })

Here we supply two parameters in json format. First parameter is a json document: { toy_name: "Buzz Lightyear" } Locates the data that needs to be updated. Second parameter is also a json document: ?{ $set: { "can_do": "fly" } } Instructs Mongo to set the attribute “can_do” to “fly”. Obviously Mongo has a wider built-in support for JSON. This is because, as demonstrated before, Mongo is all about JSON while Postgres is not.

Both databases deliver functions and methods for querying and changing data, but Mongo delivers more in depth options.

For example, let's examine the way Mongo and postgres update a JSON document.

The list of Mongo update operators and modifiers includes:

$currentDate,$inc,$min,$max,$mul,$rename,$set,$setOnInsert,$unset,$,$[],$[<identifier>],$addToSet,$pop,$pull,$push,$pullAll,$each,$position and more …

while the Postgres option for updating a JSON field is simply: column=new_value

Or functions such as jsonb_set, jsonb_insert (Full list is here :https://www.postgresql.org/docs/current/functions-json.html)

A simple update in Mongo would be:

db.toys.updateOne({ toy_name: "Buzz Lightyear" }, { $set: { "can_do": "fly" } })

But what if "Buzz Lightyear" is now?

able to talk and dance?

Mongo :

db.toys.updateOne(

???{ toy_name: "Buzz Lightyear" },

???{ $push: { "can_do": "talk" } })

db.toys.updateOne(

???{ toy_name: "Buzz Lightyear" },

???{ $push: { "can_do": "dance" } })

?But (unfortunately) has lost his ability to fly:

db.toys.updateOne(

???{ toy_name: "Buzz Lightyear" },

???{ $pop: { "can_do": "fly" } })

Postgres :

Assuming the following JSON data structure represents our target field:

{"name": "Buzz Lightyear", "can do": ["run", "dance"]}

Inserting a New Value into the JSON Field:

update toys

set json_col = jsonb_insert(json_col, '{can do, -1}', '"fly"'::jsonb, true)

where toy_name = "Buzz Lightyear";

Updating an Existing Value in the JSON Field:

update toys

set json_col = jsonb_set(json_col, '{can do, 0}', '"run"'::jsonb)

where toy_name = "Buzz Lightyear";

Removing a Value from the JSON Field is a bit tricky as there is no equivalent of Mongo’s $pop operator:

update toys

set json_col = jsonb_set(json_col, '{can do}', (

????select jsonb_agg(value)?

????from jsonb_array_elements(toys->'can do') as value?

????where value <> '"fly"'

??))

where toy_name = "Buzz Lightyear";

Updating a JSON doc is just an example showing Mongo is overall more fluent with JSON. And there is the Mongo’s built in JSON validator which is much more convenient then the postgres JSON validator (validate_json_schema), and also the Mongo’s integration with Node.js which is seamless compared to ORM tools on top of postgres.

So Mongo is better suited for working with JSON than Postgres. No doubt about that

Speed for small updates/queries (OLTP)

OLTP stands for Online Transaction Processing. OLTP applications will typically use a primary key or a regular index to fetch a small set of rows while inserts and updates are on a small number of rows.

A matter of fact, Most systems you have daily interaction with (Pooling cash at an ATM, paying your bills, buying a movie ticket) are classified as OLTP systems.

OLTP applications have a high volume of small transactions where data accuracy (transaction management) is critical. Here, the 2 main features I am looking for are effective indexing (for fast data access and updates) and transaction management.

Mongo has great indexing options. You can index practically everything and do it effectively. This means data can be fetched and updated quickly.

You can find the same indexes in Postgres (and then some more index types using extensions) and also full ACID transaction management. While both databases can deliver very high rates of transactions per second, postgres is ACID and can deliver 100% data accuracy. I believe Postgres is the winner here.?

Speed for batch systems (big updates/ queries)

DWH/DCS/Big data systems involve large inserts and updates (data loading) parallel to heavy queries on large amounts of data.

Postgres has several tools to manage DWH/DCS/Big data systems:

Data sharding (such as Citus DB) - splitting data across different machines in Read and Write mode.

Standby instances can be queried.

Materialized views for refreshing a table data from base/staging tables.

Foreign Data Wrappers (FDW) allow you to query remote databases (not necessarily a postgres database) for easy data integration.

Table partitioning - allows you to effectively split your data across disks and even different machines (when used in conjunction with FDWs) achieving an effective poor man’s database sharding.

Mongo also has features that are handy for DWH/DCS/Big data systems:

Data sharding (Simple called a “shard cluster”)

TTL collections - deleting old data without extra coding.

Capped collection that can grow up to a predefined size.

Replicas can be queried.

Read concerns- an automatic rule driven system as of which node should be queried and when. Effectively moving the queries load from the primary to the secondaries. This great feature is unmatched in postgres and sadly is rarely in use.

Mongo is great for dynamic queries of frequently written or read data. That’s because it is designed to work with data of all different types that are changing all the time, without requiring complex transactions between objects. You’re going to get good performance, even when running ad hoc queries on small subsets of fields, contained in documents with loads of fields.

On the other hand Postgres delivers unmatched features such as: Foreign data wrappers, great partitioning options, effective data sharding (which does not consume as much resources as a Mongo shard cluster), and parallel query operations.

But above all, Postgres is SQL while Mongo is not. I know there are SQL bridges for Mongo (discussed later here) but they are hardly as effective and full featured compared to Postgres’s full ANSI compliance. The ability to use any JDBC tools with Postgres is a winner.

Database design (ERD)

First the obvious facts: Postgres is a SQL compliant relational database with JSON functionality.

Mongo is not relational and not SQL compliant and is also schema less.

Mongo is bound to JSON while in Postgres you can choose to use regular column data types and JSON in the same table. This “hybrid” nature of Postgres is a great advantage. For example, consider a products table, storing data about, well, products.

Every product has fixed columns such as ID,name,category but many other bits of information that are unique to a specific product, for example how much RAM a laptop has. In that case if a product is an Orange then the amount of RAM is not relevant. The table structure in postgres can be:

ID as integer

Name as text

Category as integer

Extanded_data jsonb

The extanded_data column will include all the pieces of information that are relevant to this product only. That way you can gain schemaless database design but still have a database design. The product fields ID, name and category are not expected to ever change. There is no scenario a product will not have these 3 fields and still the JSON field is the place to add new attributes when needed. Having fixed columns makes reading the schema much easier as the column name implies its content. Fixed columns also enable foreign keys for better data validation and speedy table joins.

In Mongo there is no “table” structure as the database is schemaless.

But the document will contain the same “column” (here in JSON this is called an attributes)

Being the database totally schema less has advantages (quick development and deployment) but one big disadvantage, logical data corruptions. For example consider a Mongo collection as above already containing lots of data under the attribute name “Extanded_data”: Now a novice programer? “creates” (by mistake) a new attribute “ExtandedData“, simply by: db.products.insert (“id”:1,”name”:”Buzz Lightyear”,”category”:”toys”,ExtandedData”:”It can fly”} This data is lost to all others as the are looking for the “Extanded_data” rather than “ExtandedData”

This is in fact a severe case of data corruption which is extremely hard to find because it cannot be monitored.

With Mongo, as there are no FKs, it is common practice to break the normalization rules and include the parent data in the child collection. This is to improve performance but with a cost of having the same data in many places.

Having a normalized schema design saves space (as each piece of information is saved once), improves performance (as you can write effective queries that scan smaller tables) and also improves reliability of the data (as there is a single source of truth of each piece of data). So with Postgres you can better design, document and manage the schema structure while getting better performance and reliability.

SQL support

SQL is by far the query language in wide support and use.

Postgres is fully ANSI SQL compliant. This means you can use any SQL client tool to query the Postgres data.?

Mongo is not SQL compliant but you can use SQL tools on top of Mongo in order to query data using SQL commands. Tools such as Studio3T deliver some representation of collections as tables and json attributes as columns and allow basic SQL queries. This is limited (not all SQL commands are supported) and hard to setup and maintain (you need to set up some Mongo to SQL bridge). And still, all the basic SQL commands such as distinct, avg,sum, max,min, order by,group by? - are supported. More advanced features such as CTE (Common table expressions) and Window functions are not supported.

You can only query the database using select statements. Updates, deletes or inserts are not possible through sql bridges.

Don’t count on these tools as the infrastructure for your data engineering system.?

Tools such as Tableau/Power bi also deliver some capabilities for working with Mongo but these are nothing near the support they offer for an ANSI SQL database such as Postgres.

Bottom line - If you are going to have anything other than basic reports, you need Postgres.

Database Limitations

Obviously each database has its limitations regarding max number of clients, max size of each data type and so on, so no point comparing all the different limitations of each database. Except for 2 big shortcomings of Mongo:

Max connections can be easily exceeded. Mongo does not deliver a built in or a server side connection pool (comparable to pgbouncer) and relies on connection pooling from the driver. Postgres delivers pgbouncer (and others as pg_pool) which is an excellent server side connection pooling solution. This means it’s much easier to exhaust the allowed connection in Mongo than in Postgres.

The other big limitation from Mongo is the max document size. In postgres the max jsonb size is 1GB while in Mongo it’s only 16MB. This means in Mongo a single document cannot exceed 16MB and in some? cases you will have to split a single json document to several? documents. This is not an extreme scenario and is easier to come by than you think. For example assume your developer planned on having an collection named customer_activity which stores (obviously) the customer’s activity. In most cases a smart developer will create a new document with a current timestamp for a new customer activity but there is the option of placing the entire activity of a single customer under one document (Using the $push operator). This single document contains all the customer’s activity and will become huge over time. Unfortunately more developers than you might think choose this option leading to huge documents in a matter of weeks/months. So if you are going to plan your application badly, Do it in Postgres as it is often more forgiving.




That’s the end of part one. Part 2 will be posted soon.

Until then please like, share and let me know what you think about the matter.

Andrei Malianov

MS SQL DBA / Database Engineer

1 周

Thanks!

回复
Agustin Palazuelos

DBA & Infra Engineer | MongoDB | Redis | Linux | Contractor

1 年

Thanks Robert hope to read more from you about data sources

回复
Alex Bruskin

Bespoke Generative AI for Engineering & Manufacturing (PLM, MES, ERP) | Cloud Native | Air Gapped | System Integration | Concepts, Technologies, Execution

1 年

Great effort and excellent, quite comprehensive explanation.

Doron Yaari

Senior Database Consultant

1 年

Brilliant article to read and learn from ??

要查看或添加评论,请登录

Robert Yackobian的更多文章

社区洞察

其他会员也浏览了