Data compression in Hive – An Introduction to Hadoop Data Compression - SQLRelease

Data compression is a technique that encodes the original data in such a way so that it can be represented with fewer bits on the disk. The data compression process is used to reduce the size of the data files on the disk. We know that the Hadoop framework is meant for large scale data processing (Big Data processing) which includes lots of data files stored on HDFS or supported file systems. So data compression can be very helpful in reducing storage requirements, and in reducing the amount of data to be transferred between mappers and reducers which usually occurs over the network. In Hadoop, data compression can be implemented using Hive or any other MapReduce component. In this post, we will discuss the widely used HiveQL data compression formats or codec (compressor/decompressor) schemes. To read more, please visit this post:

https://www.dhirubhai.net/feed/update/urn:li:activity:6790856737600741376

https://bit.ly/3dDGwOB

#hive #datacompression #sqlrelease #hadoop

要查看或添加评论,请登录

社区洞察