Columnar file format
Chetesh Bhagat
Big Data Developer at EPAM Systems Specializing in Spark, Scala, Hive, Hadoop, and AWS Solutions
RC File(Row Columnar):
Behavior: These are flat files consisting of binary key value pairs.
Read/write: RC developed for Faster read but compromise with write performance.
Compression: Provides significant block compression can be compressed with high compression ratio.
Splittable: Yes
Schema evaluation: Was mainly designed for Faster read so no schema evaluation .
---------------------------------------------------------------------------------------------------------
Parquet file:
Behavior: Parquet stores nested data structure in flat columnar format.
Read/Write: good for Faster read.
Compression: Support compression mostly with snappy algorithm.
Splittable:: Parquet file are conditionally splittable.
Schema evaluation: Limited schema evaluation.
# Bigdata
Associate at Deutsche Bank
3 年Very informative