-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cartesian power. #486
base: master
Are you sure you want to change the base?
Cartesian power. #486
Conversation
Some quick answers:
Yes, please! The smaller the PR, the better, since I need to review every changed line before merging. We're really not that particular about formatting. Just indent with four spaces, and I'm sure it'll be fine.
You don't need to add anything! I'll need a little more time before I can answer your other questions. |
Hi there. Foremost, thank you for your upfront effort to clarify the modalities. I hope I do not miss anything, but what is the difference to If it is supposed to do the same, you still might look into the existing implementation, and e.g. see if it has a decent |
@jswrenn Okay, I'll reverse that so it'll be easier for you to read. Don't get tired scrolling meanwhile :) @phimuemue The difference is order, |
Thanks - Too early to read code... |
@jswrenn I have reversed the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there, thanks for the ideas in this PR.
When we do this, we should possibly ask ourselves inhowfar Power
is similar to CombinationsWithReplacement
. (To be honest, it seems quite similar to me, so I would expect similar inner workings, too. As always, other thought-out opinions are welcome.)
- Should we go with
Vec
or withLazyBuffer
? - Should we have a
Vec<I>
, iterating on demand to get the next element, or should we go with the index-based approach like inCombinationsWithReplacement
?
src/adaptors/mod.rs
Outdated
Some(res) | ||
} | ||
// Check trivial case last as it should be less frequent. | ||
Power::Empty => None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For pow==0
/Power::Empty
, shouldn't the iterator should yield a single element (namely the empty vector)? (Instead of yielding no elements at all.)
This would ensure that it.cartesian_power(n)
yields it.count().pow(n)
elements. (We had a similar case back then where (I think it was) combinations had a special case for n==0
, which lead to inconsitencies.)
It would possibly also help in getting rid of some special casings (i.e. possible eliminate Power
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phimuemue Well, I had this vague feeling I was neglicting something sneaky here. Thank you for pointing it out.
The sneaky stuff is that cartesian_product("abc", 0)
should not yield the same as cartesian_product("", 3)
. And I bet that your criterion it.count().pow(n)
is the right thing to consider. If we agree that:
2 | 1 | 0 | |
---|---|---|---|
2 | 2^2=4 | 2^1=2 | 2^0=1 |
1 | 1^2=1 | 1^1=1 | 1^0=1 |
0 | 0^2=0 | 0^1=0 | 0^0=1? |
Then the adaptor should yield:
2 | 1 | 0 | |
---|---|---|---|
"ab" | ["aa", "ab", "ba", "bb"] |
["a", "b"] |
[""] |
"a" | ["aa"] |
["a"] |
[""] |
"" | [] |
[] |
[""] ? |
So there is not 1 degenerated case, but 2 or 3, depending on the convention we pick for the 0^0
situation. I propose we stick to the "combinatorics" convention that 0^0=1
, so there are only 2 corner cases. The only other solution I can think of is erroring or panicking instead when encountering cartesian_product("", 0)
.
I'll update the PR to better accomodate these degenerated situations :)
[from the future] see 9027ef0
Here's another way to look at it: the |
@phimuemue I think it is similar in its signature, because they both yield iterables whose length is eventually decided by the user. I've tried to inspire from the This said, I doubt it implies that the inner workings should resemble each other. I could think of a very inefficient implementation of |
I'm not exactly sure what you mean, but this is sure something I would like to improve. My most flexible idea so far would be to make the user provide any The other option I like is what @jswrenn suggests:
If you mean that the user would do something like |
If we want the type-based approach, I could imagine it is possible to do it right now - with an upper bound on the array length. However, the type-based approach would prevent us from specifying The following probably only applies if we go with the dynamic/runtime approach:
Moreover, Please note that the existing implementations may not be better than your suggestion. All I advocate for is uniform behavior. Still, the existing implementation at least avoids |
@phimuemue Okay, I'll have a deeper look into I agree about the runtime
I disagree, because Regarding runtime pow valuation, I'm okay to consider fixed-sized arrays up to 32 (or maybe more?). I also agree that uniform behaviour is a desirable thing. Before I understand
For instance, I can think of an alternate implementation of |
These are good questions indeed. I do not know if we have specific recommendations. I guess we are usually open to discuss the various tradeoffs - if we do not have any precedence. If we, however, have precedence, I strongly tend to follow existing patterns. It enforces uniformity, and builds upon possibly good decisions already made by others. If an existing design turns out to be bad, I tend to first fix that, and then build the new thing. In various past situations this technique simplified the code base quite a bit. In this case, and if we go with the runtime approach, this might even mean looking at But maybe wait for @jswrenn's opinion on all this. |
@jswrenn Friendly ping. I suspect you're maybe busy with safe transmute or other stuff. Feel free to ignore me if it's the case, there's no rush here :) |
How would this implementation differ/be better than something like this: use std::fmt::Debug;
use itertools::Itertools;
fn cartesian_power<const N: usize, T, I>(iterator: I) -> impl Iterator<Item = [T; N]>
where
T: Debug + Clone,
I: Iterator<Item = T> + Clone,
{
(0..N)
.map(|_i| iterator.clone())
.multi_cartesian_product()
.map(|vec| TryInto::<[T; N]>::try_into(vec).unwrap())
}
macro_rules! cartesian_power {
($iterator:expr, $count:expr) => {
$crate::cartesian_power::<$count, _, _>($iterator)
};
}
fn main() {
for [x, y, z] in cartesian_power::<3, _, _>(0..4) {
println!("({x},{y},{z})");
}
for [a, b, c, d, e] in cartesian_power!(0..4, 5) {
println!("({a},{b},{c},{d},{e})");
}
} |
@DusterTheFirst I guess it would differ in one or several of the tradeoffs listed here, but IIRC we're still waiting on @jswrenn input here, are we? There was questionning about the (existence of a) general implementation policy in |
Hi @iago-lito , I like the idea of having a cartesian power. But for this to be included, I think it should be somehow more performant than |
17271f3
to
8957f70
Compare
009b594
to
8957f70
Compare
5b51535
to
47f44a6
Compare
6273611
to
8631037
Compare
ac30a04
to
bed8bdf
Compare
30cbc0d
to
b68611d
Compare
b68611d
to
3efe36c
Compare
a81cfcd
to
e3ded56
Compare
e3ded56
to
014b99e
Compare
014b99e
to
ba667dc
Compare
ba667dc
to
2d52e1a
Compare
2d52e1a
to
aa75f69
Compare
Hi @jswrenn, @phimuemue @Philippe-Cholet. It's been long due, but I'm finally happy with my implementation, and I feel ready for review :) Due to the non-streaming nature of This PR now features efficient Happy to hear your feedback! |
360f14b
to
1cfbd3a
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #486 +/- ##
==========================================
- Coverage 94.38% 94.36% -0.03%
==========================================
Files 48 50 +2
Lines 6665 7047 +382
==========================================
+ Hits 6291 6650 +359
- Misses 374 397 +23 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there, wow, thanks for your continued efforts.
Aside from failing CI, I think this (huge) PR has issues when it comes to merging:
- The PR is hard for me to review - probably harder than necessary. Please strip the PR down to implementing an iterator supporting
next
. Other parts can be built on top of it later. - Please make the iterator non-cycling - users do not expect it.
Combinations
andCombinationsWithReplacement
both are more or less this:Please adopt this pattern.indices: Box<[usize]>, pool: LazyBuffer<I>, first: bool,
- Please do not unnecessarily introduce functions. (In particular, keep - for now -
increment_indices
withinnext
.) (I know functions have their merits, but these here are non-trivially manipulating state and it's easier to keep track of what's going on when they are local.) - The tests contain lots of infrastructure - can we go without it? Maybe exploit python's
itertools
and have it generate some test cases? Or comparingcartesian_power
against plain nested loops?
tl;dr: If we should merge this, I'd prefer smaller, easily reviewable and verifiable PRs, and I want to re-use established patterns.
As always, I'm more than happy to accept substantiated technical arguments, but I'm wary that we want to maintain it in its current state.
/// assert_eq!(it.next(), Some(vec!['a', 'a'])); | ||
/// assert_eq!(it.next(), Some(vec!['a', 'b'])); | ||
/// assert_eq!(it.next(), Some(vec!['a', 'c'])); // Underlying a"abc".chars()` consumed. | ||
/// assert_eq!(it.next(), Some(vec!['b', 'a'])); | ||
/// assert_eq!(it.next(), Some(vec!['b', 'b'])); | ||
/// assert_eq!(it.next(), Some(vec!['b', 'c'])); | ||
/// assert_eq!(it.next(), Some(vec!['c', 'a'])); | ||
/// assert_eq!(it.next(), Some(vec!['c', 'b'])); | ||
/// assert_eq!(it.next(), Some(vec!['c', 'c'])); | ||
/// assert_eq!(it.next(), None); // Cartesian product exhausted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we shorten this? Something like:
assert_eq!("abc".cartesian_power(2), vec![aa, ab, ac, ba, bb, bc, ... ])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, my point was to illustrate the cycling, but we'd better not document it.
/// you may try the underlying streaming | ||
/// [`CartesianPowerIndices`] instead. | ||
/// | ||
/// The resulting iterator is "cycling", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cycling-by-default is not in line with what we usually do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue that it does make sense in the case of cartesian power and also flows rather naturally from the imlementation. But maybe we should hide it?
/// let mut it = "ab".chars().cartesian_power(3); | ||
/// assert_eq!(it.next(), Some(vec!['a', 'a', 'a'])); | ||
/// assert_eq!(it.next(), Some(vec!['a', 'a', 'b'])); | ||
/// assert_eq!(it.next(), Some(vec!['a', 'b', 'a'])); | ||
/// assert_eq!(it.next(), Some(vec!['a', 'b', 'b'])); | ||
/// assert_eq!(it.next(), Some(vec!['b', 'a', 'a'])); | ||
/// assert_eq!(it.next(), Some(vec!['b', 'a', 'b'])); | ||
/// assert_eq!(it.next(), Some(vec!['b', 'b', 'a'])); | ||
/// assert_eq!(it.next(), Some(vec!['b', 'b', 'b'])); | ||
/// assert_eq!(it.next(), None); | ||
/// assert_eq!(it.next(), Some(vec!['a', 'a', 'a'])); | ||
/// assert_eq!(it.next(), Some(vec!['a', 'a', 'b'])); | ||
/// // ... | ||
/// ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we shorten this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it's easier to write with a while
loop if we don't want to illustrate cycling.
/// If this is too much allocation for you, | ||
/// you may try the underlying streaming | ||
/// [`CartesianPowerIndices`] instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not expose the inner iterator for now. It enshrines API that we'd have to support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your concern about the API. What would be the best way to offer the feature without the heavy allocation-on-every-next-call stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly having a dedicated iterator borrowing a mut [usize]
. Or even a specialized implementation that unrolls to nested loops.
itertools
' combinatorics primitives err on the side "easy to use", paying the price of being suboptimal for extreme use cases. But for these extreme use cases, you're probably better off using specialized algorithms anyway.
Thus, don't offer the functionality (at least not in the first iteration).
let mut total_n = Vec::new(); | ||
for &(n, (exp, e_hint)) in expected { | ||
total_n.push(n); | ||
let ctx = || { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we usually use that detailed error reporting - it is not needed in most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. I really was useful to me when debugging my implementation. But I can remove it now. Lemme clean that up.
Hello @phimuemue and thank you for feedback :)
I understand. I has also been hard for me to write because I wanted the adaptor to not eagerly call
I implemented cycling because it is "free" in terms of state size + runtime calculation. We are not commited to yielding
This being said, I must admit that I have no real-world use case in mind. I just figured it would be expected that an endless odometer would not reallocate on every overflow, or require
Sorry I did not understand that. What is it you want to maintain in its current state? I'll make sure to keep you updated as I make progress. But it'll take time of course. Stay tuned :) |
I'm not sure this is even needed. I think I'm familiar with enumerating the combination's indices.
Please keep this PR to only contain
I stand by my point. Please make it non cycling. Cycling would confuse users.
You can go with
Just to be clear: Please do not define
Maybe the python script to generate tests is short enough to be copy-pasted into the
I'm unsure whether we (the itertools maintainers) would want to maintain the implementation as proposed by this PR. (That's why I requested the changes.) |
Hello, this is my first contribution to a rust library, can somebody guide me a little out here?
The idea is to complete the
cartesian_product
logic with acartesian_power
adaptor. It scrolls the cartesian product of an iterator with itselfpow
times. For instance with the setabc
andpow=4
, it yields:Put it another way, it's supposedly equivalent to
iproduct!("abc", "abc", "abc", "abc")
except for the technical types quirks.I have a few unresolved questions, which I need help with:
Dummy stuff first: the diff is big because I thought it was okay to hit the
rustfmt
button, was I wrong? Should I reverse formatting?I cannot get the
size_hint
test to pass because it just.. times out, although I'm unsure why. How am I supposed to correctly test it?What am I supposed to add in the
bench
files?Deepest questions last:
How could the
Self::Item
type be more generic in this case? I have chosenVec
to start with, but the problem is that the expected returned length actually depends on thepow
argument.Vec
) a type parameter? If yes, how?This is it, thank you for your patience, I'd be happy to read your feedback :)