Spark - Read and Write back to same S3 location

When you read from and write to same S3 location in spark,job fails


The reason this causes a problem is that you are reading and writing to the same path that you are trying to overwrite. It is standard Spark issue

When you read data from same location and write using override, 'write using override' is action for DF. When spark sees 'write using override', in it's execution plan it adds to delete the path first, then trying to read that path which is already vacant; hence error.

Spark uses lazy transformation on DF and it is triggered when certain action is called. It creates DAG to keep information about all transformations which should be applied to DF.

Possible workaround would be to write to some temp location first and then using it as source, override in dataset2 location




要查看或添加评论,请登录

Vipul Bramhankar的更多文章

社区洞察

其他会员也浏览了