Why Java parallel streams perform poor?
Java 8 brings us lots of cool new features such as lambda functions, new Date time APIs, streams, functional interfaces etc. I have been using java 8 for quite some time now. One topic which usually comes up in tech meetings is about when to use serial streams and parallel streams. I read some articles why parallel streams perform poor than serial streams sometimes. Lets talk about this
Parallel streams use FORK/JOIN framework released in Java 7 which creates a thread pool equal to no of cores in the system.
FORK operation splits the task to smaller tasks. Smaller tasks are again split further until a task cannot be split anymore.
JOIN operation collects the results of each split task and merge results to return final result.
Now since it is multi threaded system, there is an overhead handling multiple threads and task allocation and context switching.
Parallel streams work exactly like this. Diagram above explains it
Before using parallel stream for any source, we should always analyse how complex is it to jump to middle element of stream.
Say for example, we are splitting a stream built from collection
Array: easy to split in half(using index)
ArrayList: easy again using index
HashSet/TreeSet : can be split in half in medium complexity
LinkedList: HARD to split in half because we must traverse first half to split it
In some cases, it may be good to convert LinkedList to an array and then use parallel stream
Now lets analyse the different type of operations and their performance on parallel streams
Parallel Friendly operations:
These operations perform well in parallel streams.e.g.
-> filter
-> map
-> flatMap
These operations work on each element separately and partial results need only be concatenated
Parallel Unfriendly Operations:
Not very efficient in parallel streams. e.g
-> limit(n) : It needs to know how many elements are already consumed and should current element be considered or not
-> takewhile(predicate): was the predicate violated by previous element or not(java 9)
-> dropwhile(predicate):
Their result depends on the past, cannot process different chunks of elements separately
Parallel Unfriendly Intermediate Operations:
Operations like sort() and distinct() can work on different chunks of data but merging will require some reprocessing
sort(): merging part will need to sort collective data from different chunks again
distinct(): same as sort(), merging will need some reprocessing. It can be sped up by un-ordering the stream of data
Standard terminal operations are Parallel Friendly provided the functional arguments are stateless
e.g. foreach, count, allMatch, reduce, sum, max, min
Collectors:
Standard collectors are parallel friendly: e.g. toList, toSet etc.
Grouping collectors are relatively efficient : e.g. toMap, groupingBy
So now we have an idea that different sources and different operations varies in their parallel behavior. So if parallel streams are performing poor for you, it could be their inappropriate use i.e. using parallel streams for inappropriate sources or parallel unfriendly operations.