-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Intersection of multiple Sets at the same time #2023
Comments
It might be cleaner to turn |
While that does seem pretty nice from the API perspective, iterators are meant to be iterated over only once. Given an arbitrary iterator, we'd need to create a new iterator with a buffer which we collect all items into, just to be able to perform a nested loop join. While this can be very optimized with specialization for sets, it is very heavy and imperformant for most other cases. So in my opinion it does not fit the nearly no-cost iterator operations we have currently. |
Intersections speed can be optimized by using the same Add Just fyi, this speedy intersection is completely trivial for Cuckoo tables, and |
I guess currently the way to do it is with let mut sets: Vec<HashSet<char>> = Vec::new();
sets.push(['a', 'b', 'c', 'd'].iter().cloned().collect());
sets.push(['c', 'd'].iter().cloned().collect());
sets.push(['d', 'a'].iter().cloned().collect());
let union = sets
.iter()
.fold(HashSet::new(), |acc, hs| {
acc.union(hs).cloned().collect()
});
let intersection = sets
.iter()
.skip(1)
.fold(sets[0].clone(), |acc, hs| {
acc.intersection(hs).cloned().collect()
});
println!("Union: {:?}", union);
println!("Intersection: {:?}", intersection); Or is there a better way? |
@Stannislav heh, doing Advent of Code? 😛 This is what I did: let mut lines = group
.lines()
.map(|line| line.chars().collect::<HashSet<_>>());
let mut s = lines.next().unwrap();
for line in lines {
s = s.intersection(&line).copied().collect();
} I agree it's ugly though. I think the unstable |
FWIW, I think what we're missing is iterator adaptors that work on already-sorted data. It should be possible to implement efficient N-way intersection on sorted iterators without introducing a HashSet at all. |
Ha, got me 🎄 ! Here's my solution Anyway, regarding the actual topic, I saw there's an external Edit: just found the |
…g a fold is noticably slower than maintaining a mutable map of seen values. See also rust-lang/rfcs#2023
Currently the method intersection of HashSet and BTreeSet only support an intersection between two sets. If you want to intersect multiple sets, for example 3, you'll need to calculate the intersection between the first two, collect them into another set and calculate the intersection between that immediate result and the third set. This is very intense on computation, memory usage and lines of code.
I'd like to suggest the following:
Intersection
structs to contain a list of other sets to intersect with instead of a reference to a single other set. This applies tostd::collections::btree_set::Intersection
as well asstd::collections::hash_set::Intersection
.intersect_many
(orintersect_multiple
) onHashSet
andBTreeSet
which take a list of other sets to all intersect with the current one.intersect
andintersect_many
(orintersect_multiple
) to theIntersection
structs as well. Those can only be called if theIntersection
hasn't been iterated over yet. Otherwise, they'll panic.If (3.) is implemented, the "list of sets to intersect with" will need to be a
Vec
in order to be growable after creation. For performance reasons, the third suggestion could be dropped and the "list" can instead be a reference to the provided slice.The current implementation of
{Hash,BTree}Set::intersection
would need to be changed to pass&[other]
instead ofother
.While this request should be relatively easy to implement for
HashSet
(self.others.iter().all(|set| set.contains(elt))
, implementing it forBTreeSet
could result in a lot more code.Unresolved Questions:
intersect_many
vsintersect_multiple
vs ???Intersection
struct) in favor of using an allocation free slice-reference instead of aVec
?The text was updated successfully, but these errors were encountered: