登录查看更多内容

Scaling PostgreSQL Database

Mahmoud Zamel

CSA at ProgressSoft Corporation

发布日期: 2024年8月23日

Database scalability is one of the most crucial aspects of software solutions design and development, as well as the operational aspects in terms of properly designing and configuring a scalable infrastructure that operates the database engine.

In this article we will cover both aspects in brief but somewhat detailed.

Database Software Scalability

Basically, there are three concepts that cover enhancing the scalability of a PostgreSQL database at the software development level.

These three concepts apply not only to PostgreSQL but also to other relational database management systems (RDBMS) such as Oracle, MS-SQL, MySQL, and MariaDB.

Table Indexing

Indexing is the concept of building partial tables that hold columns used in a search criteria that is used to build the where close in a select statement in an SQL based database.

For example, if you have a table of a user that holds the following entity structure:

If you look at the table, you will see that some columns are designated as either a primary key (Id) or an index key (UserName and UserStatus).

When building this table at the RDBMS system level, the RDBMS also build three other related tables one for each of these indexes.

It would be a waste of storage of you are assigning a column as an index key if it is never used in a where statement when selecting from that table. On the other hand if you do not do so and that column is frequently used in select statements; the performance of the database shall degrade by time when the database table gets populated with thousands and millions of records.

In some cases, you frequently use multiple columns in a where statement such as trying to select user records from the table where usernames start with say “A” and their status is Active (1). In this case the RDBMS will need to load both indexes to search for the requested set of records. If you add another index that holds both username and user status columns; that should enhance performance of your database.

You should be careful when adding extra indexes as they require more data storage to be considered.

Partitioning

One of the most used techniques in huge databases when records count in millions is partitioning.

Partitioning is an RDBMS feature where the RDBMS splits a main table into multiple table accessed using some mathematical criterion such as date range, year, or calculated hash.

This allows a program to provide the criterion along with the select statement so that the proper partition is accessed.

领英推荐

Tuning Tips to Maximize Your PostgreSQL Performance

Thiago Azadinho - MBA/OCP/OCE/MCSE 4 个月前

Understanding PostgreSQL Configuration Parameters for…

Ankush Thavali 4 个月前

Navigating the Database Dilemma: Postgres vs. MySQL

Adam Brown Sr. 1 年前

Only use partitioning for huge databases and when you can predict enormous growth of data. For example, while we usually keep online frequently accessed records in a separate database for say 12 months, archived data for long-term periods in years is usually designed to use partitions.

Even with such a case if the data is not predicted to grow so huge, there is no need to add the overhead of partitioning unless you know for a fact that your data will grow huge.

Caching

RDBMS engines usually access data within disk tablespaces. The speed of retrieving data is highly dependent on disk speed. For example, NVMe SSD drives are much better than SATA SSD drives who’s in turn are better than old IDE drives.

Yet, all those types of disk drives are never better than random access memory (RAM). Engines like Redis use in memory cache to store datasets that are resulting from frequently accessed data records based on a specific select statement construction.

Adding a memory cache like Redis to your software design and operational environment shall tremendously boost your RDBMS performance.

Database Hardware Infrastructure Scalability

As we mentioned above one of the main hardware components that an RDBMS performance is affected by is the type of storage used and that is usually measured in Read/Write Input/Output Operations Per Second (IOPS). While disk drive storage is still struggling within the boundaries of millisecond access, RAM has reached a nanosecond permeance. But RAM is not considered a permanent storage as it will lose all data stored if power supply is brought offline.

Unless you could build an always online RAM based storage, you should be obliged to continue relying on SSD disk storage.

Vertical Scalability

RDBMS engines are known to be vertically scalable. This means that you add more CPU power and more RAM memory to get better performance. Adding more server nodes in a horizontally scalable model may not be supported by all RDBMS engines, specifically speaking not in PostgreSQL.

Locally Attached vs Shared Storage

The best performance is achieved using locally attached highly performing disk drives with least access time. This is today called NVMe.

Providing cache at the disk controller level adds another layer of performance boosting.

Using shared storage such as network access storage (NAS) or storage area network (SAN) adds another factor affecting performance that is the network latency and network speed.

Although – in some RDBMS engines – such type of storage is used to provide horizontal scalability allowing multiple database serves to be used to serve application requests at the same time; first, not all engines support this concept, and second it adds the networking overhead.

Cache Configuration

If a cache server like Redis is used it should be configured on its own server cluster with an adequate amount of RAM memory to serve frequently used select statement.

要查看或添加评论，请登录

Mahmoud Zamel的更多文章

Determining Customer Story Complexity

2024年9月12日

Determining Customer Story Complexity

As we are using customer stories as an indicator of team productivity, we figure out that each story contribution is…
Customer Stories Factor in Productivity Measurements

2024年9月12日

Customer Stories Factor in Productivity Measurements

Customer stories can be used as a metric to measure software team productivity. When a sprint starts with planning…
Measuring Software Team Productivity

2024年9月11日

Measuring Software Team Productivity

Productivity of a software development team is a bit tricky to measure without falling int one or more fallacy beliefs.…

1 条评论
Playing Multiple Roles During Project Execution

2024年9月1日

Playing Multiple Roles During Project Execution

In the far past I had the chance to play multiple roles during execution of an undertaken project. I did that in so…

2 条评论
Planning vs. Procrastination

2024年8月25日

Planning vs. Procrastination

Should you plan your daily activities? Should you plan your month, your year, and your life? Planning is a crucial…
The Journey from Business Requirements to Production Code

2024年8月6日

The Journey from Business Requirements to Production Code

I am an advocate of using agile processes to tackle undertaken projects to implement custom code solutions. But agile…
The Software Bug Nightmare - September of 2001

2024年8月5日

The Software Bug Nightmare - September of 2001

The first recorded instance of a bug causing a technical malfunction occurred in 1947 when engineers working on the…
Converting from Monolith to Microservice Architecture

2024年8月4日

Converting from Monolith to Microservice Architecture

Monolith applications may suffer from multiple problems and issues that can be resolved using microservices…
The Gift of Fast Failing

2024年7月22日

The Gift of Fast Failing

We fail more times than those we do succeed! Failing fast is a gift given to those who know the value of failing fast…
Colors and Shapes Matching in Technical Drawings

2023年11月5日

Colors and Shapes Matching in Technical Drawings

I am an artist even when it comes to producing a technical drawing. Like painters and graphics designers do with wall…

See all articles

Scaling PostgreSQL Database

Mahmoud Zamel

CSA at ProgressSoft Corporation

Database Software Scalability

Table Indexing

Partitioning

领英推荐

Caching

Database Hardware Infrastructure Scalability

Vertical Scalability

Locally Attached vs Shared Storage

Cache Configuration

Mahmoud Zamel的更多文章

社区洞察

其他会员也浏览了

Transitioning from Oracle to PostgreSQL: Tablespaces

Connect to Postgres Database using pgAdmin

Distributed SQL Tips and Tricks for PostgreSQL and Oracle DBAs – July 2020

Understanding the Difference Between SQL and PostgreSQL: Which One Should You Choose for Your Project?

DBMS

60+ Top PostgreSQL Tools

Deep Dive into PostgreSQL: Unveiling the Internal Architecture

PostgreSQL Background Process Wait Types and Their Impact on Performance

Database landscape and trends 2022

Oracle Database API for MongoDB

Database Software Scalability

Table Indexing

Partitioning

领英推荐

Caching

Database Hardware Infrastructure Scalability

Vertical Scalability

Locally Attached vs Shared Storage

Cache Configuration

Mahmoud Zamel的更多文章

Determining Customer Story Complexity

Customer Stories Factor in Productivity Measurements

Measuring Software Team Productivity

Playing Multiple Roles During Project Execution

Planning vs. Procrastination

The Journey from Business Requirements to Production Code

The Software Bug Nightmare - September of 2001

Converting from Monolith to Microservice Architecture

The Gift of Fast Failing

Colors and Shapes Matching in Technical Drawings

社区洞察

其他会员也浏览了

Transitioning from Oracle to PostgreSQL: Tablespaces

Connect to Postgres Database using pgAdmin

Distributed SQL Tips and Tricks for PostgreSQL and Oracle DBAs – July 2020

Understanding the Difference Between SQL and PostgreSQL: Which One Should You Choose for Your Project?

DBMS

60+ Top PostgreSQL Tools

Deep Dive into PostgreSQL: Unveiling the Internal Architecture

PostgreSQL Background Process Wait Types and Their Impact on Performance

Database landscape and trends 2022

Oracle Database API for MongoDB