from File to DataBase - PostgreSQL Import CNPJ

from File to DataBase - PostgreSQL Import CNPJ

The Brazilian government publishes CNPJ (National Register of Legal Entities) data on a monthly basis. For November 2024, there are 35 compressed files in ‘.zip’ format (download link at the end of the text).

One of our clients approached us to ask us to improve their application, which keeps a local database updated with CNPJ data.

Our client's database serves other internal applications and should be maintained on a Microsoft SQL Server. We maintained the code with some improvements and there was a gain in data update time, but it still took a few hours - not exactly a problem for data updated every four or five weeks.

Even though we had delivered the solution to the client, the difficulty of importing, both because of the size of the files and the import time, was the subject of some discussions within the Necto team. So we decided to carry out further tests.

Necto has been using PostgreSQL as its database for years, and it's our first choice whenever possible. It's a robust, open-source database that doesn't fall short of other database management systems (DBMSs) on the market. That's why we decided to do some tests with PostgreSQL to import the CNPJ data.

CNPJ data is made available in CSV files compressed into ZIP format. Today (2024), there are 35 ZIP files containing around 170 million records, of which around 60 million are unique CNPJs.

With the CSV files in hand, we use PostgreSQL's COPY command.

See the complete article with code and repository at

https://necto.com.br/articles-list/postgresql-setup-for-a-massive-import-160000000-rows-in-minutes/


Eduardo Sakaue

Professor at Faculdade de Tecnologia de S?o José dos Campos

3 个月
回复

要查看或添加评论,请登录

社区洞察