Scaling PostgreSQL Database

Database scalability is one of the most crucial aspects of software solutions design and development, as well as the operational aspects in terms of properly designing and configuring a scalable infrastructure that operates the database engine.

In this article we will cover both aspects in brief but somewhat detailed.

Database Software Scalability

Basically, there are three concepts that cover enhancing the scalability of a PostgreSQL database at the software development level.

These three concepts apply not only to PostgreSQL but also to other relational database management systems (RDBMS) such as Oracle, MS-SQL, MySQL, and MariaDB.

Table Indexing

Indexing is the concept of building partial tables that hold columns used in a search criteria that is used to build the where close in a select statement in an SQL based database.

For example, if you have a table of a user that holds the following entity structure:


If you look at the table, you will see that some columns are designated as either a primary key (Id) or an index key (UserName and UserStatus).

When building this table at the RDBMS system level, the RDBMS also build three other related tables one for each of these indexes.

It would be a waste of storage of you are assigning a column as an index key if it is never used in a where statement when selecting from that table. On the other hand if you do not do so and that column is frequently used in select statements; the performance of the database shall degrade by time when the database table gets populated with thousands and millions of records.

In some cases, you frequently use multiple columns in a where statement such as trying to select user records from the table where usernames start with say “A” and their status is Active (1). In this case the RDBMS will need to load both indexes to search for the requested set of records. If you add another index that holds both username and user status columns; that should enhance performance of your database.

You should be careful when adding extra indexes as they require more data storage to be considered.

Partitioning

One of the most used techniques in huge databases when records count in millions is partitioning.

Partitioning is an RDBMS feature where the RDBMS splits a main table into multiple table accessed using some mathematical criterion such as date range, year, or calculated hash.

This allows a program to provide the criterion along with the select statement so that the proper partition is accessed.

Only use partitioning for huge databases and when you can predict enormous growth of data. For example, while we usually keep online frequently accessed records in a separate database for say 12 months, archived data for long-term periods in years is usually designed to use partitions.

Even with such a case if the data is not predicted to grow so huge, there is no need to add the overhead of partitioning unless you know for a fact that your data will grow huge.

Caching

RDBMS engines usually access data within disk tablespaces. The speed of retrieving data is highly dependent on disk speed. For example, NVMe SSD drives are much better than SATA SSD drives who’s in turn are better than old IDE drives.

Yet, all those types of disk drives are never better than random access memory (RAM). Engines like Redis use in memory cache to store datasets that are resulting from frequently accessed data records based on a specific select statement construction.

Adding a memory cache like Redis to your software design and operational environment shall tremendously boost your RDBMS performance.

Database Hardware Infrastructure Scalability

As we mentioned above one of the main hardware components that an RDBMS performance is affected by is the type of storage used and that is usually measured in Read/Write Input/Output Operations Per Second (IOPS). While disk drive storage is still struggling within the boundaries of millisecond access, RAM has reached a nanosecond permeance. But RAM is not considered a permanent storage as it will lose all data stored if power supply is brought offline.

Unless you could build an always online RAM based storage, you should be obliged to continue relying on SSD disk storage.

Vertical Scalability

RDBMS engines are known to be vertically scalable. This means that you add more CPU power and more RAM memory to get better performance. Adding more server nodes in a horizontally scalable model may not be supported by all RDBMS engines, specifically speaking not in PostgreSQL.

Locally Attached vs Shared Storage

The best performance is achieved using locally attached highly performing disk drives with least access time. This is today called NVMe.

Providing cache at the disk controller level adds another layer of performance boosting.

Using shared storage such as network access storage (NAS) or storage area network (SAN) adds another factor affecting performance that is the network latency and network speed.

Although – in some RDBMS engines – such type of storage is used to provide horizontal scalability allowing multiple database serves to be used to serve application requests at the same time; first, not all engines support this concept, and second it adds the networking overhead.

Cache Configuration

If a cache server like Redis is used it should be configured on its own server cluster with an adequate amount of RAM memory to serve frequently used select statement.


要查看或添加评论,请登录

Mahmoud Zamel的更多文章

  • Determining Customer Story Complexity

    Determining Customer Story Complexity

    As we are using customer stories as an indicator of team productivity, we figure out that each story contribution is…

  • Customer Stories Factor in Productivity Measurements

    Customer Stories Factor in Productivity Measurements

    Customer stories can be used as a metric to measure software team productivity. When a sprint starts with planning…

  • Measuring Software Team Productivity

    Measuring Software Team Productivity

    Productivity of a software development team is a bit tricky to measure without falling int one or more fallacy beliefs.…

    1 条评论
  • Playing Multiple Roles During Project Execution

    Playing Multiple Roles During Project Execution

    In the far past I had the chance to play multiple roles during execution of an undertaken project. I did that in so…

    2 条评论
  • Planning vs. Procrastination

    Planning vs. Procrastination

    Should you plan your daily activities? Should you plan your month, your year, and your life? Planning is a crucial…

  • The Journey from Business Requirements to Production Code

    The Journey from Business Requirements to Production Code

    I am an advocate of using agile processes to tackle undertaken projects to implement custom code solutions. But agile…

  • The Software Bug Nightmare - September of 2001

    The Software Bug Nightmare - September of 2001

    The first recorded instance of a bug causing a technical malfunction occurred in 1947 when engineers working on the…

  • Converting from Monolith to Microservice Architecture

    Converting from Monolith to Microservice Architecture

    Monolith applications may suffer from multiple problems and issues that can be resolved using microservices…

  • The Gift of Fast Failing

    The Gift of Fast Failing

    We fail more times than those we do succeed! Failing fast is a gift given to those who know the value of failing fast…

  • Colors and Shapes Matching in Technical Drawings

    Colors and Shapes Matching in Technical Drawings

    I am an artist even when it comes to producing a technical drawing. Like painters and graphics designers do with wall…

社区洞察