[Postgres] How to speed up bulk load
Nikolay Samokhvalov
?? Let's make your Postgres healthy now. Reach out to me: [email protected] // I stand with Ukraine ????
I post a new PostgreSQL "howto" article every day. Join me in this journey – subscribe here or on X , provide feedback, share!
If you need to load a lot of data, here are the tips that can help you do it faster.
1) COPY
Use COPY to load data, it's optimized for bulk load.
2) Less frequent checkpoints
Consider increasing max_wal_size and checkpoint_timeout temporarily.
Changing them does not require restart.
Increased values lead to increased recovery time in case of failure, but benefit is that checkpoints occur less often, therefore:
3) Larger buffer pool
Increase shared_buffers, if you can.
4) No (or fewer) indexes
If load happens into a new table, create indexes after data load. When loading into an existing table, avoid over-indexing .
Every additional index will significantly slow down the load.
领英推荐
w5) No (or fewer) FKs and triggers
Similarly to indexes, foreign key constraints and triggers may significantly slow down data load – consider (re)creating them after the bulk load.
Triggers can be disabled via ALTER TABLE … DISABLE TRIGGERS ALL – however, if triggers support some consistency checks, you need to make sure that those checks are not violated (e.g., run additional checks after data load). FKs are implemented via implicit triggers, and ALTER TABLE … DISABLE TRIGGERS ALL disables them too – loading data in this state should be done with care.
6) Avoiding WAL writes
If this is a new table, consider completely avoiding WAL writes during the data load. Two options (both have limitations and require understanding that data can be lost if a crash happens):
7) Parallelization
Consider parallelization. This may or may not speed up the process, depending on the bottlenecks of the single-threaded process (e.g., if single-threaded load saturates disk IO, parallelization won't help). Two options:
If you use TimescaleDB, consider timescaledb-parallel-copy .
Last but not least: after a massive data load, don't forget to run ANALYZE.
Marathon progress: ▓??????????????????? 8.77%
This series is also available in Markdown format: https://gitlab.com/postgres-ai/postgresql-consulting/postgres-howtos
Boosting Startups with Custom Software & Funding assistance | Founder Investor TrustTalk, Mechatron, Chemistcraft ++ | AI & ML | Enterprise Software | Inventor holding patents | Pro Bono help to deserving
2 个月Nikolay, Nice!
Senior Engineer at Nagarro
11 个月Any built in function developed for that ?