-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tuple.collectors() should be able to share state #344
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
BiFunction<List<Tuple2<T, U>>, Function<? super T, ? extends U>, ? extends R1> sf1 could be declared as a sub-class of BiFunction, like: @FunctionalInterface
public interface Tuple2ListMapping<T,U,R>
extends BiFunction<List<Tuple2<T, U>>, Function<? super T, ? extends U>, ? extends R> {} allowing much shorter column declarations, like Tuple2ListMapping<T,U,R> sf1 I don't think that column source sharing needs to be complex, because id binding is simpler and is probably safe than introspection guesses:
Something like this may work (not tested yet): public class IdCollector<T,A,R,RR> implements Collector<T,A,RR> {
final Collector<T,A,R> baseCollector;
final Function<R,RR> primaryFinisher;
private R r; // baseCollector.finisher().apply(a) result.
public IdCollector(final Collector<T, A, R> baseCollector, final Function<R, RR> primaryFinisher) {
this.baseCollector= baseCollector;
this.primaryFinisher= primaryFinisher;
}
// Assumes that created secondary collector finisher() will always be called after primary collector finisher()
public Collector<T,A,RR> secondary(final Function<R,RR> secondaryFinisher) {
return Collector.of(
()->null,
(a,t) -> {},
(a1,a2) -> null,
a -> secondaryFinisher.apply(r));
}
@Override
public Supplier<A> supplier() {
return baseCollector.supplier();
}
@Override
public BiConsumer<A, T> accumulator() {
return baseCollector.accumulator();
}
@Override
public BinaryOperator<A> combiner() {
return baseCollector.combiner();
}
@Override
public Function<A, RR> finisher() {
return baseCollector.finisher()
.andThen(r -> primaryFinisher.apply(this.r = r));
}
@Override
public Set<Characteristics> characteristics() {
return baseCollector.characteristics();
}
} |
For *By columns this would be most efficient if the T to U keyExtractor function and keyComparator are both reference equal for all columns, to allow sharing a collected, the pre-sorted List, sorted by the same U values, otherwise the list would need to be copied and resorted for each column. It looks more memory efficient to base the *By functionality off a list of the T value, then using keyComparator with the keyExtractor e.g via this method in Comparator (Java 1.8+) public static <T, U> Comparator<T> comparing(
Function<? super T, ? extends U> keyExtractor,
Comparator<? super U> keyComparator) Likewise if the List collector based sourced columns Comparator must be reference equal for all columns, otherwise the list would need to be copied and resorted for each column. |
See my latest Sharable Collector code, with support for reference and mapped id based results sharing at https://github.com/rwperrott/sharable-collector |
@rwperrott I appreciate your efforts, thank you very much. To make sure expectations are set right, I currently cannot invest much time in jOOL. This includes reviewing your various messages and code (as this is by no means a trivial task). At some point in the future, I'll be able to allocate some dedicate time to see what's currently open and missing, but I cannot make any promises. |
@lukaseder Thanks. I expected you were busy. given prior feedback. I also developed this speculative solution because it's a fun exercise and I think it'll be useful for my projects too. |
The current
Tuple.collectors()
implementation is a canonical one, where collectors are completely independent of one another. It would be better if they could share state in case they're related. For example, when calculating a set of percentiles:It would be great if the 5 collectors could somehow share their internal state, which is a sorted list of all values in the stream. Executing 1 sort instead of 5 is definitely preferrable.
This can be implemented in 2 ways:
The text was updated successfully, but these errors were encountered: