Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permutations #292

Merged
merged 1 commit into from
Sep 17, 2019
Merged

Permutations #292

merged 1 commit into from
Sep 17, 2019

Conversation

tobz1000
Copy link
Contributor

Fixes #285.

This implementation is based on the Python implementation specified in the issue. There are quite a few points which should be considered.

Method name, k-less form

The adaptor function is called permutations, for user familiarity. However, since the size of the output can be specified, a more mathematically accurate name would probably be k_permutations. Perhaps two adaptor functions would be better: .k_permutations(k: usize) and .permutations() (which just sets k = vals.len())?

Item value ordering/distinctness

Input items are not inspected in any way; they are treated purely by their initial index. This means that:

  1. Permutations are yielded in lexicographical order of the index, not by any PartialOrd implementation of the items.
  2. Identical items will not be detected, and will result in some identical permutations.

1 can be achieved by the user by collecting-and-sorting their input.

2 is a little trickier, but can be achieved by filtering the output. However, I think there is a more efficient algorithm to avoid duplicated. Maybe we should provide this option?

Permutations from source buffer/indices

In addition to the iterator adaptor, I've added Permutations::from_vals, to create directly from a Vec or a slice. This saves some cloning compared to using source.[into_]iter().permutations(k).

There is also Permutations::new(n, k), which is functionally equivalent to (0..n).permutations(k), but a little faster (about 0.6x the run time).

But perhaps you would consider these unnecessary additions to the API?

PermutationSource trait

These different implementations (from Vec/slice/just indices) are achieved with the trait PermutationSource. It's visible to the user to implement for other structures if they wish, but this is probably of limited value. There's not much harm in allowing the user to access it, but again, maybe it's just API bloat. (or a future breaking change when it's removed/changed...)

On the other hand, perhaps there are other places in the library which could benefit from taking a source generically? Any adaptor where input elements are used more than once will need to store them, and it might be more efficient to allow users to supply the memory directly. This could be done in another PR.

For completeness, I also made full implementations of the three variations, without the trait, for benchmarking. The pure-indices implementation is about 10% slower using the trait, but Vecs and slices are unaffected (or even a little faster...).

Combinations changes

I've made small changes to the Combinations documentation to match Permutations - most significantly, replacing the letter n with k. I've also changed all uses of the variable n to k in the implementation... but maybe this is considered commit noise :)

There are some other potential changes which would bring Combinations and Permutations more in line with one another.

As mentioned above, Combinations might benefit from using a _____Source trait, to allow direct memory buffers.

My Permutations implementations doesn't make use of LazyBuffer. Perhaps the buffer is useful for iterators which never complete, and have a very small k value. Has any benchmarking been done?

Either way, it makes sense for both adaptors to either use or not use LazyBuffer. Maybe it could be integrated into the _______Source trait if it's useful?


Let me know what you think when you have the chance. (Sorry for submitting two reasonably big PRs at once. I hope you get a chance to go through it all eventually!)

@ChrisJefferson
Copy link

Just to say, I'd really like a "permutation" function, so I'd like to see this PR, or something like it, merged.

@estk
Copy link

estk commented Oct 3, 2018

@bluss Have you had a chance to look this over? This would scratch an itch for me.
I see that you're the owner of https://github.com/bluss/permutohedron, and honestly folding that in to itertools would be great too.

@bluss
Copy link
Member

bluss commented Nov 24, 2018

Sorry that I haven't had time. This looks good. The name sounds good to me, but probably explain to the users what a k-permutation means in the doc.

If you think something is a good addition, you can always help by jumping in and reviewing

@bluss
Copy link
Member

bluss commented Nov 24, 2018

It's hard to compete with premutohedron performancewise maybe, since that one works with modifications in place, mainly.

@tobz1000
Copy link
Contributor Author

tobz1000 commented Feb 8, 2019

Is there anything else you need to move this forward? Or anyone you could nominate to review?

@barabadzhi
Copy link

Is this PR dead? 😟
I badly seek permutations functionality similar to the one we have in Python.
Any merge plans? 🤔

@Avi-D-coder
Copy link
Contributor

@barabadzhi it seems like all PRs are on hold, @bluss is not on very often.

@jswrenn jswrenn self-assigned this Jul 18, 2019
@tobz1000
Copy link
Contributor Author

tobz1000 commented Aug 5, 2019

Agreed for keeping PermutationSource private.

I will have a look at re-using LazyBuffer, or otherwise not collecting the source iterator.

We could actually make permutations useful for infinite iterators by changing the order of permutations in a similar manner to #330. In fact, It might be a fairly simple case of flat-mapping the proposed iterator in that issue with sub-permutations where k == n (although maybe not the most performant way).

@jswrenn
Copy link
Member

jswrenn commented Aug 8, 2019

Note #340, which was modified to use LazyBuffer.

src/permutations.rs Outdated Show resolved Hide resolved
@tobz1000
Copy link
Contributor Author

Infinite iterators are now handled in a similar manner to the combinations adaptors.

If iteration is stopped and restarted, the previously-collected buffer is reused, rather than running through the iterator again. This means side-effects will only occur on the first run-through. Let me know what you think of this.

src/lib.rs Outdated Show resolved Hide resolved
@jswrenn
Copy link
Member

jswrenn commented Sep 16, 2019

Sorry for being slow in getting to this; it's been an exceptionally busy month. I think my only remaining reservation for merging is that the history is very messy. Could you squash this down to just one commit that's rebased on master?

src/lib.rs Outdated Show resolved Hide resolved
@tobz1000
Copy link
Contributor Author

Squashed this work onto master. I also updated some of the documentation - that change can be seen here.

Thank you for reviewing this!

src/lib.rs Outdated Show resolved Hide resolved
@jswrenn
Copy link
Member

jswrenn commented Sep 17, 2019

This looks great!

bors r+

bors bot added a commit that referenced this pull request Sep 17, 2019
292: Permutations r=jswrenn a=tobz1000

Fixes #285.

This implementation is based on the Python implementation specified in the issue. There are quite a few points which should be considered.

## Method name, _k_-less form

The adaptor function is called `permutations`, for user familiarity. However, since the size of the output can be specified, a more mathematically accurate name would probably be `k_permutations`. Perhaps two adaptor functions would be better: `.k_permutations(k: usize)` and `.permutations()` (which just sets `k = vals.len()`)?

## Item value ordering/distinctness

Input items are not inspected in any way; they are treated purely by their initial index. This means that:

  1. Permutations are yielded in lexicographical order of the index, not by any `PartialOrd` implementation of the items.
  2. Identical items will not be detected, and will result in some identical permutations.

__1__ can be achieved by the user by collecting-and-sorting their input.

__2__ is a little trickier, but can be achieved by filtering the output. However, I think there is a more efficient algorithm to avoid duplicated. Maybe we should provide this option?

## Permutations from source buffer/indices

In addition to the iterator adaptor, I've added `Permutations::from_vals`, to create directly from a `Vec` or a slice. This saves some cloning compared to using `source.[into_]iter().permutations(k)`.

There is also `Permutations::new(n, k)`, which is functionally equivalent to `(0..n).permutations(k)`, but a little faster (about 0.6x the run time).

But perhaps you would consider these unnecessary additions to the API?

## `PermutationSource` trait

These different implementations (from `Vec`/slice/just indices) are achieved with the trait `PermutationSource`. It's visible to the user to implement for other structures if they wish, but this is probably of limited value. There's not much harm in allowing the user to access it, but again, maybe it's just API bloat. (or a future breaking change when it's removed/changed...)

On the other hand, perhaps there are other places in the library which could benefit from taking a source generically? Any adaptor where input elements are used more than once will need to store them, and it might be more efficient to allow users to supply the memory directly. This could be done in another PR.

For completeness, I also made full implementations of the three variations, without the trait, for benchmarking. The pure-indices implementation is about 10% slower using the trait, but `Vec`s and slices are unaffected (or even a little faster...).

## `Combinations` changes

I've made small changes to the `Combinations` documentation to match `Permutations` - most significantly, replacing the letter `n` with `k`. I've also changed all uses of the variable `n` to `k` in the implementation... but maybe this is considered commit noise :)

There are some other potential changes which would bring `Combinations` and `Permutations` more in line with one another.

As mentioned above, `Combinations` might benefit from using a `_____Source` trait, to allow direct memory buffers.

My `Permutations` implementations doesn't make use of `LazyBuffer`. Perhaps the buffer is useful for iterators which never complete, and have a very small `k` value. Has any benchmarking been done?

Either way, it makes sense for both adaptors to either use or not use `LazyBuffer`. Maybe it could be integrated into the `_______Source` trait if it's useful?

---

Let me know what you think when you have the chance. (Sorry for submitting two reasonably big PRs at once. I hope you get a chance to go through it all eventually!)

Co-authored-by: Toby Dimmick <tobydimmick@pm.me>
@bors
Copy link
Contributor

bors bot commented Sep 17, 2019

Build succeeded

@bors bors bot merged commit 88834c9 into rust-itertools:master Sep 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Permutation
7 participants