-
-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional methods to itertools #3992
Conversation
Hi @rhagenson, The changelog - added label was added to this pull request; all PRs with a changelog label need to have release notes included as part of the PR. If you haven't added release notes already, please do. Release notes are added by creating a uniquely named file in the The basic format of the release notes (using markdown) should be:
Thanks. |
Only missing method left from the original ones commented out is: fun ref dedup[H: HashFunction[A] val = HashIs[A]](): Iter[A!]^ =>
"""
Return an iterator that removes duplicates from consecutive identical
elements. Equality is determined by the HashFunction `H`.
## Example
```pony
Iter[I64]([as I64: 1; 1; 2; 3; 3; 2; 2].values())
.dedup()
```
`1 2 3 2`
""" I tried implementing this one first, but could not get it correct. I think I can use a similar structure to what I have already by having a |
My plan on this is: 1) to take another look at what methods I have here already to see if I can clean anything up, 2) look at other similar libraries (e.g., the ones in the Zulip thread) for related methods we may want to add. My initial look at other libraries yielded possible |
I am foregoing implementing a let iter = Iter[USize](Range(0, 4)).interleave(Range(4, 6)) // 0 4 1 5 2 3
iter.sort(2) // 0 1 4 2 3 5
/*
Steps:
1. Create buffer of size 2, load values 0 4 in buffer
2. Yield 0
3. 4 1 in buffer
4. Yield 1
5. 4 5 in buffer
6. Yield 4
7. 5 2 in buffer
8. Yield 2
9. 5 3 in buffer
10. Yield 3
11. 5 _ in buffer
12. Yield 5
/* This seems like a "bad" idea as it breaks how I would expect sort to work, but is the idea I had for making it work under the possible infinite iterator condition. |
packages/itertools/iter.pony
Outdated
let cur_hash = H.hash(v) | ||
match _prev_hash | ||
| let x: USize => | ||
if x == cur_hash then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it seems somewhat "off" that we are only using H.hash
and not also using H.eq
to confirm true value equality (instead of just hash-equivalence).
Some hash functions may have a non-negligible collision rate for non-identical elements, and because HashFunction
has both a hash
method and an eq
method, users may be relying/assuming that the H.eq
function will be used to check for true value equivalence after the H.hash
equivalence has been checked, so that the overall check is resilient to false collisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to look at the source again around HashFunction, but I see what you mean and will make the necessary change(s) to use H.eq where needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jemc Please check this again and make suggestions for improvement. In order to use H.eq
I needed to keep both the previous hash and previous value. I initially tried to solve this with a var _prev: ((A!, USize) | None) = None
but unpacking the tuple was causing me issues with ponyc insisting _iter.next()?
having a return type of (A #any ! | None val^)
so I split the value and hash into separate variables. I presume based on your comment here and the documentation around H.eq
that the order of equality should be checking the hash match then using H.eq
but am not 100% certain it should not be the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this code is a bit messy right now with cur_{value,hash}
, _prev_{value,hash}
, and prev_value
so if you have advice on making these names less redundant or the code less repetitive, please let me know!
@SeanTAllen I made the changes you suggested in release notes. |
@rhagenson awsome. |
Previously there were methods for
Iter
withinitertools
which, while originally planned, where not implemented. These methods have now been implemented.The added methods are:
dedup
which removes local duplicates from consecutive identical elements via a provided hash functioninterleave
which alternates values between two iterators until both run outinterleave_shortest
which alternates values between two iterators until one of them runs outintersperse
which yields a given value after everyn
elements of the iteratorstep_by
which yields everyn
th element of the iteratorunique
which removes global duplicates of identical elements via a provided hash function