Java Puzzle 2 - Stream API
This stream code below will surprise you
Before I set the ball rolling, let’s explain basic things with the Stream API.?
All intermediate operations of the Streams API are executed lazily. This means they require a terminal operation to be executed. Without a terminal operation none of the intermediate operations in the pipeline would be executed. An intermediate operation performs an action on the elements of the stream and return another stream. A terminal operation consumes elements of the stream, meaning that when the terminal operation completes, there would be no element in the stream. Hence each terminal operation must appear last in the chain.?
From the code snippet above, limit(4) is an intermediate operation that returns a stream allowing only 4 elements down the pipeline in the stream. The map() is also another intermediate operation that maps from one object type to another. Peek() is also another intermediate operation used to support debugging, where one wants to see elements as they flow past a certain point in a pipeline. Count() is one of many terminal operations in a Stream API. Having said that, the call to count() at the end of the chain for each pipeline (line 11 - 15 and 20?- 23) should trigger execution of all intermediate operations in the stream pipeline of which peek() is one of them. This implies that peek should be triggered thereby printing the elements of the stream using the static method reference specified. What we are sure of is that the elements would be printed on both streams using the peek() method.
Let’s see the output.
The first stream pipeline (line 11 - 15)
Begin by limit(4) which cuts or truncate the stream to have only 4 elements. Secondly, the map() function is just squaring the numbers. Peek just inject an action specified on each of the 4 elements using a method reference, which in this case is just printing the elements. That’s fine and no rocket science.
Second Stream pipeline (line 20 - 23)
Does not begin by limiting elements of the stream. First it calls map() function which just squares the numbers. Peek then executes the action specified on the element, which is just printing the elements. That’s fine and no rocket science too. But wait a minute! The second pipeline is the one that’s misbehaving! It does not at all print the elements as we expect. What the hack is happening here?
Please pause for a moment and run the code to observe the results yourself. You should have observed that the output for the two Streams differ!
领英推荐
Please seat down and relax, Let’s decipher this mystery.
But before that, let me ask you this question:
What difference does the limit(4) bring to the pipeline that changes the output? Because my understanding is that limit(4) is just another intermediate operation that truncates a stream returning a 4 elements stream, which technically does not need to be the one that affect the output in my opinion? In fact, what should be happening in my own understanding is that the first stream should print 4 elements owing to a limit(4) call in the pipeline, and the second stream should print all elements of the stream. But that is not happening!
Note that some intermediate operations do not have side effects on the final call to a terminal call count(). This concept is paramount to understanding what is happening here. If I ask you a question again: for both streams which operation is certain to affect the count of elements? You will agree with me that it’s the limit(4) right?
Because limit(4) reduce the elements on the stream pipeline to 4, therefore it certainly affects the count. Having established that, you should then agree with me that only map() and peek() do not affect the output of count(), which means they do not have any side-effects. Any operation performed by them is of no use to the count. It’s like when Man United is winning matches in carabao cup, that has no impact on their count of points towards winning the EPL. I hope this analogy help you understand the point am trying to illustrate. Well, I am not insinuating that Arsenal will win the EPL this season after their sensational consistent performance. I know at Old Trafford, I would be persecuted as that is a serious offence that undermines their title hopes! Excuse me for veering off the lane.
So, now, back to the point.
Having identified the operations with no side effects, we can all agree that an intelligent implementation of the Stream interface should not execute those intermediate operations as that may even affect the speed when calculating the results. Ideally, it should not execute map() and peek() when it’s not essential to do so as we see that being the case on the second stream pipeline. In conclusion, if an implementation is capable of performing the count from the source, after determining that the operations in the pipeline have no impact on the results wanted by the terminal operation count(), it is at liberty to do so. But you may also argue that those intermediate operations on the first pipeline (line 11 - 15) do not have any side-effects on the results right? Which means they should also not be triggered. But we do agree that limit() should be triggered. So, the prerogative here is on the underlying implementation to choose what to do. If you have really good platform developers who can write that kind of intelligent code, then they can have a complex code to make such determinations.
Before I draw a curtain to my article, let’s quickly review what the JavaDoc says:
"An implementation may choose to not execute the stream pipeline (either sequentially or in parallel) if it is capable of computing the count directly from the stream source. In such cases no source elements will be traversed and no intermediate operations will be evaluated. Behavioral parameters with side-effects, which are strongly discouraged except for harmless cases such as debugging, may be affected.?
The number of elements covered by the stream source, is known and the intermediate operation,?peek, does not inject into or remove elements from the stream (as may be the case for?flatMap?or?filter?operations). Thus the count is the size of the?List?and there is no need to execute the pipeline and, as a side-effect, print out the list elements."
I talk more about Streams in my channel and how peek works specifically on the video below. Please be kind to subscribe to the channel if you benefitted from this article. That's the best you can do. Thank you in advance for your support.
https://www.youtube.com/watch?v=Mqp7GFcCe_c&list=PLUI8bjgSqiUtepa5IlPD0apYHlCUhlVgY&index=64