Spark - Read and Write back to same S3 location
When you read from and write to same S3 location in spark,job fails
The reason this causes a problem is that you are reading and writing to the same path that you are trying to overwrite. It is standard Spark issue
When you read data from same location and write using override, 'write using override' is action for DF. When spark sees 'write using override', in it's execution plan it adds to delete the path first, then trying to read that path which is already vacant; hence error.
Spark uses lazy transformation on DF and it is triggered when certain action is called. It creates DAG to keep information about all transformations which should be applied to DF.
Possible workaround would be to write to some temp location first and then using it as source, override in dataset2 location