Spark Windows Temp Cleanup
One of the more frustrating features of today is open source projects that pay zero attention to the issues that can easily arise on Windows due to basic differences in how Linux and Windows handle file locks.
Windows is a fragile little petal when it comes to deleting files that were created by a still running process. It will generally refuse to do that with an error or exception.
Linux is different because the kernel keeps a count of references and will happily mark the file for deletion, but not actually delete it until no process still references it.
Working around these differences can be a nightmare and there is lot of java (and scala) code out there which ignores the problem with the all purpose cop-out:
Windows is not supported
That can be just a tad irritating to anybody working in an environment where they have few options but to use Windows and are puzzled as to why an ostensibly cross-platform language like java (or scala) simply does not work as advertised.
Not the first time this has happened...
The same business of locked files producing deadlocks or steadily piling up disk garbage has happened before (so-called web-application servers in the early 2000s).
What most people need is something that is an honest hack which is good for the likely time frame over which a permanent fix might arrive (think geological time).
Fortunately, you can do this in PowerShell - see script above!
The script uses the magic of PowerShell to wrap the call to "spark-shell", take a note of the process identifier for the resulting process. Then it twiddles thumbs waiting for you to be done with doing what you are doing. Then ... take out the trash!
Okay, so it is not complicated.
However, in this world you could die waiting for buggy software to get fixed. This problem has been an open issue for Spark for - well - forever.
Addendum
For what it is worth, the problem outlined above is actually caused by a bug in the code for the scala language run-time - specifically the REPL.
You can fix that problem, when running the scala REPL on Windows, by adapting the above PowerShell script to replace "spark-shell" with "scala".
That way you can move on.
There are way too many geniuses involved for this to be fixed this side of the end of time.