-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Permutations #292
Permutations #292
Conversation
Just to say, I'd really like a "permutation" function, so I'd like to see this PR, or something like it, merged. |
@bluss Have you had a chance to look this over? This would scratch an itch for me. |
Sorry that I haven't had time. This looks good. The name sounds good to me, but probably explain to the users what a k-permutation means in the doc. If you think something is a good addition, you can always help by jumping in and reviewing |
It's hard to compete with premutohedron performancewise maybe, since that one works with modifications in place, mainly. |
Is there anything else you need to move this forward? Or anyone you could nominate to review? |
Is this PR dead? 😟 |
@barabadzhi it seems like all PRs are on hold, @bluss is not on very often. |
Agreed for keeping I will have a look at re-using We could actually make permutations useful for infinite iterators by changing the order of permutations in a similar manner to #330. In fact, It might be a fairly simple case of flat-mapping the proposed iterator in that issue with sub-permutations where |
Note #340, which was modified to use |
Infinite iterators are now handled in a similar manner to the combinations adaptors. If iteration is stopped and restarted, the previously-collected buffer is reused, rather than running through the iterator again. This means side-effects will only occur on the first run-through. Let me know what you think of this. |
Sorry for being slow in getting to this; it's been an exceptionally busy month. I think my only remaining reservation for merging is that the history is very messy. Could you squash this down to just one commit that's rebased on master? |
abfcc43
to
80aa334
Compare
80aa334
to
2a03a03
Compare
Squashed this work onto master. I also updated some of the documentation - that change can be seen here. Thank you for reviewing this! |
2a03a03
to
88834c9
Compare
This looks great! bors r+ |
292: Permutations r=jswrenn a=tobz1000 Fixes #285. This implementation is based on the Python implementation specified in the issue. There are quite a few points which should be considered. ## Method name, _k_-less form The adaptor function is called `permutations`, for user familiarity. However, since the size of the output can be specified, a more mathematically accurate name would probably be `k_permutations`. Perhaps two adaptor functions would be better: `.k_permutations(k: usize)` and `.permutations()` (which just sets `k = vals.len()`)? ## Item value ordering/distinctness Input items are not inspected in any way; they are treated purely by their initial index. This means that: 1. Permutations are yielded in lexicographical order of the index, not by any `PartialOrd` implementation of the items. 2. Identical items will not be detected, and will result in some identical permutations. __1__ can be achieved by the user by collecting-and-sorting their input. __2__ is a little trickier, but can be achieved by filtering the output. However, I think there is a more efficient algorithm to avoid duplicated. Maybe we should provide this option? ## Permutations from source buffer/indices In addition to the iterator adaptor, I've added `Permutations::from_vals`, to create directly from a `Vec` or a slice. This saves some cloning compared to using `source.[into_]iter().permutations(k)`. There is also `Permutations::new(n, k)`, which is functionally equivalent to `(0..n).permutations(k)`, but a little faster (about 0.6x the run time). But perhaps you would consider these unnecessary additions to the API? ## `PermutationSource` trait These different implementations (from `Vec`/slice/just indices) are achieved with the trait `PermutationSource`. It's visible to the user to implement for other structures if they wish, but this is probably of limited value. There's not much harm in allowing the user to access it, but again, maybe it's just API bloat. (or a future breaking change when it's removed/changed...) On the other hand, perhaps there are other places in the library which could benefit from taking a source generically? Any adaptor where input elements are used more than once will need to store them, and it might be more efficient to allow users to supply the memory directly. This could be done in another PR. For completeness, I also made full implementations of the three variations, without the trait, for benchmarking. The pure-indices implementation is about 10% slower using the trait, but `Vec`s and slices are unaffected (or even a little faster...). ## `Combinations` changes I've made small changes to the `Combinations` documentation to match `Permutations` - most significantly, replacing the letter `n` with `k`. I've also changed all uses of the variable `n` to `k` in the implementation... but maybe this is considered commit noise :) There are some other potential changes which would bring `Combinations` and `Permutations` more in line with one another. As mentioned above, `Combinations` might benefit from using a `_____Source` trait, to allow direct memory buffers. My `Permutations` implementations doesn't make use of `LazyBuffer`. Perhaps the buffer is useful for iterators which never complete, and have a very small `k` value. Has any benchmarking been done? Either way, it makes sense for both adaptors to either use or not use `LazyBuffer`. Maybe it could be integrated into the `_______Source` trait if it's useful? --- Let me know what you think when you have the chance. (Sorry for submitting two reasonably big PRs at once. I hope you get a chance to go through it all eventually!) Co-authored-by: Toby Dimmick <tobydimmick@pm.me>
Build succeeded |
Fixes #285.
This implementation is based on the Python implementation specified in the issue. There are quite a few points which should be considered.
Method name, k-less form
The adaptor function is called
permutations
, for user familiarity. However, since the size of the output can be specified, a more mathematically accurate name would probably bek_permutations
. Perhaps two adaptor functions would be better:.k_permutations(k: usize)
and.permutations()
(which just setsk = vals.len()
)?Item value ordering/distinctness
Input items are not inspected in any way; they are treated purely by their initial index. This means that:
PartialOrd
implementation of the items.1 can be achieved by the user by collecting-and-sorting their input.
2 is a little trickier, but can be achieved by filtering the output. However, I think there is a more efficient algorithm to avoid duplicated. Maybe we should provide this option?
Permutations from source buffer/indices
In addition to the iterator adaptor, I've added
Permutations::from_vals
, to create directly from aVec
or a slice. This saves some cloning compared to usingsource.[into_]iter().permutations(k)
.There is also
Permutations::new(n, k)
, which is functionally equivalent to(0..n).permutations(k)
, but a little faster (about 0.6x the run time).But perhaps you would consider these unnecessary additions to the API?
PermutationSource
traitThese different implementations (from
Vec
/slice/just indices) are achieved with the traitPermutationSource
. It's visible to the user to implement for other structures if they wish, but this is probably of limited value. There's not much harm in allowing the user to access it, but again, maybe it's just API bloat. (or a future breaking change when it's removed/changed...)On the other hand, perhaps there are other places in the library which could benefit from taking a source generically? Any adaptor where input elements are used more than once will need to store them, and it might be more efficient to allow users to supply the memory directly. This could be done in another PR.
For completeness, I also made full implementations of the three variations, without the trait, for benchmarking. The pure-indices implementation is about 10% slower using the trait, but
Vec
s and slices are unaffected (or even a little faster...).Combinations
changesI've made small changes to the
Combinations
documentation to matchPermutations
- most significantly, replacing the lettern
withk
. I've also changed all uses of the variablen
tok
in the implementation... but maybe this is considered commit noise :)There are some other potential changes which would bring
Combinations
andPermutations
more in line with one another.As mentioned above,
Combinations
might benefit from using a_____Source
trait, to allow direct memory buffers.My
Permutations
implementations doesn't make use ofLazyBuffer
. Perhaps the buffer is useful for iterators which never complete, and have a very smallk
value. Has any benchmarking been done?Either way, it makes sense for both adaptors to either use or not use
LazyBuffer
. Maybe it could be integrated into the_______Source
trait if it's useful?Let me know what you think when you have the chance. (Sorry for submitting two reasonably big PRs at once. I hope you get a chance to go through it all eventually!)