Difference Between Parquet and CSV

Difference Between Parquet and CSV

Difference Between Parquet and CSV

CSV is a simple and widely spread format that is used by many tools such as Excel, Google Sheets, and numerous others that can generate CSV files. Even though the CSV files are the default format for data processing pipelines it has some disadvantages:

  • Amazon Athena and Spectrum will charge based on the amount of data scanned per query.
  • Google and Amazon will charge you according to the amount of data stored on GS/S3.
  • Google Dataproc?charges?are time-based.

Parquet has helped its users reduce storage requirements by at least one-third on large datasets, in addition, it greatly improved scan and deserialization time, hence the overall costs.

The following table compares the savings as well as the speedup obtained by converting data into Parquet from CSV.

No alt text provided for this image


要查看或添加评论,请登录

社区洞察

其他会员也浏览了