Why Java parallel streams perform poor?
Parallel Streams task splitting

Why Java parallel streams perform poor?

Java 8 brings us lots of cool new features such as lambda functions, new Date time APIs, streams, functional interfaces etc. I have been using java 8 for quite some time now. One topic which usually comes up in tech meetings is about when to use serial streams and parallel streams. I read some articles why parallel streams perform poor than serial streams sometimes. Lets talk about this

Parallel streams use FORK/JOIN framework released in Java 7 which creates a thread pool equal to no of cores in the system. 

FORK operation splits the task to smaller tasks. Smaller tasks are again split further until a task cannot be split anymore.

JOIN operation collects the results of each split task and merge results to return final result.

Now since it is multi threaded system, there is an overhead handling multiple threads and task allocation and context switching.

Parallel streams work exactly like this. Diagram above explains it

Before using parallel stream for any source, we should always analyse how complex is it to jump to middle element of stream.

Say for example, we are splitting a stream built from collection

Array: easy to split in half(using index)

ArrayList: easy again using index

HashSet/TreeSet : can be split in half in medium complexity

LinkedList: HARD to split in half because we must traverse first half to split it

In some cases, it may be good to convert LinkedList to an array and then use parallel stream 

Now lets analyse the different type of operations and their performance on parallel streams

Parallel Friendly operations: 

These operations perform well in parallel streams.e.g. 

-> filter

-> map

-> flatMap

These operations work on each element separately and partial results need only be concatenated

Parallel Unfriendly Operations:

Not very efficient in parallel streams. e.g 

-> limit(n) : It needs to know how many elements are already consumed and should current element be considered or not

-> takewhile(predicate): was the predicate violated by previous element or not(java 9) 

-> dropwhile(predicate): 

Their result depends on the past, cannot process different chunks of elements separately

Parallel Unfriendly Intermediate Operations: 

Operations like sort() and distinct() can work on different chunks of data but merging will require some reprocessing

sort(): merging part will need to sort collective data from different chunks again

distinct(): same as sort(), merging will need some reprocessing. It can be sped up by un-ordering the stream of data

Standard terminal operations are Parallel Friendly provided the functional arguments are stateless

e.g. foreach, count, allMatch, reduce, sum, max, min

Collectors:

Standard collectors are parallel friendly: e.g. toList, toSet etc.

Grouping collectors are relatively efficient : e.g. toMap, groupingBy

So now we have an idea that different sources and different operations varies in their parallel behavior. So if parallel streams are performing poor for you, it could be their inappropriate use i.e. using parallel streams for inappropriate sources or parallel unfriendly operations.

要查看或添加评论,请登录

Surinder Kumar Mehra的更多文章

社区洞察

其他会员也浏览了