java - Stream.skip behavior with unordered terminal operation -
i've read this , this questions, still doubt whether observed behavior of stream.skip intended jdk authors.
let's have simple input of numbers 1..20:
list<integer> input = intstream.rangeclosed(1, 20).boxed().collect(collectors.tolist()); now let's create parallel stream, combine unordered() skip() in different ways , collect result:
system.out.println("skip-skip-unordered-tolist: " + input.parallelstream().filter(x -> x > 0) .skip(1) .skip(1) .unordered() .collect(collectors.tolist())); system.out.println("skip-unordered-skip-tolist: " + input.parallelstream().filter(x -> x > 0) .skip(1) .unordered() .skip(1) .collect(collectors.tolist())); system.out.println("unordered-skip-skip-tolist: " + input.parallelstream().filter(x -> x > 0) .unordered() .skip(1) .skip(1) .collect(collectors.tolist())); filtering step nothing here, adds more difficulty stream engine: not know exact size of output, optimizations turned off. have following results:
skip-skip-unordered-tolist: [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20] // absent values: 1, 2 skip-unordered-skip-tolist: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20] // absent values: 1, 15 unordered-skip-skip-tolist: [1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20] // absent values: 7, 18 the results fine, works expected. in first case asked skip first 2 elements, collect list in no particular order. in second case asked skip first element, turn unordered , skip 1 more element (i don't care one). in third case turned unordered mode first, skip 2 arbitrary elements.
let's skip 1 element , collect custom collection in unordered mode. our custom collection hashset:
system.out.println("skip-tocollection: " + input.parallelstream().filter(x -> x > 0) .skip(1) .unordered() .collect(collectors.tocollection(hashset::new))); the output satisfactory:
skip-tocollection: [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20] // 1 skipped so in general expect long stream ordered, skip() skips first elements, otherwise skips arbitrary ones.
however let's use equivalent unordered terminal operation collect(collectors.toset()):
system.out.println("skip-toset: " + input.parallelstream().filter(x -> x > 0) .skip(1) .unordered() .collect(collectors.toset())); now output is:
skip-toset: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18, 19, 20] // 13 skipped the same result can achieved other unordered terminal operation (like foreach, findany, anymatch, etc.). removing unordered() step in case changes nothing. seems while unordered() step correctly makes stream unordered starting current operation, unordered terminal operation makes whole stream unordered starting beginning despite can affect result if skip() used. seems misleading me: expect using unordered collector same turning stream unordered mode just before terminal operation , using equivalent ordered collector.
so questions are:
- is behavior intended or it's bug?
- if yes documented somewhere? i've read stream.skip() documentation: not unordered terminal operations. characteristics.unordered documentation not comprehend , not ordering lost whole stream. finally, ordering section in package summary not cover case either. i'm missing something?
- if it's intended unordered terminal operation makes whole stream unordered, why
unordered()step makes unordered since point? can rely on behavior? or lucky first tests work nicely?
recall goal of stream flags (ordered, sorted, sized, distinct) enable operations avoid doing unnecessary work. examples of optimizations involve stream flags are:
- if know stream sorted,
sorted()no-op; - if know size of stream, can pre-allocate correct-sized array in
toarray(), avoiding copy; - if know input has no meaningful encounter order, need not take steps preserve encounter order.
each stage of pipeline has set of stream flags. intermediate operations can inject, preserve, or clear stream flags. example, filtering preserves sorted-ness / distinct-ness not sized-ness; mapping preserves sized-ness not sorted-ness or distinct-ness. sorting injects sorted-ness. treatment of flags intermediate operations straightforward, because decisions local.
the treatment of flags terminal operations more subtle. ordered relevant flag terminal ops. , if terminal op unordered, back-propagate unordered-ness.
why this? well, consider pipeline:
set.stream() .sorted() .foreach(system.out::println); since foreach not constrained operate in order, work of sorting list wasted effort. back-propagate information (until hit short-circuiting operation, such limit), not lose optimization opportunity. similarly, can use optimized implementation of distinct on unordered streams.
is behavior intended or it's bug?
yes :) back-propagation intended, useful optimization should not produce incorrect results. however, bug part propagating past previous skip, shouldn't. back-propagation of unordered flag overly aggressive, , that's bug. we'll post bug.
if yes documented somewhere?
it should implementation detail; if correctly implemented, wouldn't notice (except streams faster.)
Comments
Post a Comment