Handling Large Dataset - PySpark Part 2
Mohan Sivaraman
Senior Software Development Engineer specializing in Python and Data Science at Comcast Technology Solutions
Python PySpark:
Dataset Link:
Representation:
In our previous we discussed about 10 points that pyspark helps us in achieving . So we will analyze now, whether all the 10 points were covered as part of the above program.
Distributed Data Processing
In-Memory Computation
Fault Tolerance
领英推荐
Optimized Execution with DAGs
Support for Multiple Data Formats
Seamless Integration with the Hadoop Ecosystem
Scalability for Big Data
High-Level Abstractions
Machine Learning Integration
Built-In Fault Recovery
System Administrator
1 个月Useful tips