Add additional methods to itertools #3992

rhagenson · 2022-02-03T02:43:23Z

Previously there were methods for Iter within itertools which, while originally planned, where not implemented. These methods have now been implemented.

The added methods are:

dedup which removes local duplicates from consecutive identical elements via a provided hash function
interleave which alternates values between two iterators until both run out
interleave_shortest which alternates values between two iterators until one of them runs out
intersperse which yields a given value after every n elements of the iterator
step_by which yields every nth element of the iterator
unique which removes global duplicates of identical elements via a provided hash function

ponylang-main · 2022-02-03T02:43:39Z

Hi @rhagenson,

The changelog - added label was added to this pull request; all PRs with a changelog label need to have release notes included as part of the PR. If you haven't added release notes already, please do.

Release notes are added by creating a uniquely named file in the .release-notes directory. We suggest you call the file 3992.md to match the number of this pull request.

The basic format of the release notes (using markdown) should be:

## Title

End user description of changes, why it's important,
problems it solves etc.

If a breaking change, make sure to include 1 or more
examples what code would look like prior to this change
and how to update it to work after this change.

Thanks.

rhagenson · 2022-02-03T03:20:49Z

Only missing method left from the original ones commented out is:

fun ref dedup[H: HashFunction[A] val = HashIs[A]](): Iter[A!]^ =>
    """
    Return an iterator that removes duplicates from consecutive identical
    elements. Equality is determined by the HashFunction `H`.

    ## Example

    ```pony
    Iter[I64]([as I64: 1; 1; 2; 3; 3; 2; 2].values())
      .dedup()
    ```
    `1 2 3 2`
    """

I tried implementing this one first, but could not get it correct. I think I can use a similar structure to what I have already by having a _prev_hash variable to compare against, but my issue there is that has_next() can give the wrong result. has_next() should logically give whether there is another dedupped value, but as I currently have it written it gives the presence of a possibly duplicated value. Given that I expect there is a way to lag only one value behind the underlying iterator, I do not want to have to traverse all values just to fix this has_next() problem.

rhagenson · 2022-02-03T04:39:58Z

My plan on this is: 1) to take another look at what methods I have here already to see if I can clean anything up, 2) look at other similar libraries (e.g., the ones in the Zulip thread) for related methods we may want to add.

My initial look at other libraries yielded possible step_by and sort methods.

packages/itertools/iter.pony

rhagenson · 2022-02-04T03:28:56Z

I am foregoing implementing a sort method as I cannot determine the "best" way to make that work using the existing Sort primitive and the possibility of an infinite iterator. My initial thought was to have a buffered sort which read in n elements, keeps those n elements sorted and returns the lowest/highest based on user preference.

let iter = Iter[USize](Range(0, 4)).interleave(Range(4, 6))  // 0 4 1 5 2 3
iter.sort(2)  // 0 1 4 2 3 5
/*
Steps:
1. Create buffer of size 2, load values 0 4 in buffer
2. Yield 0
3. 4 1 in buffer
4. Yield 1
5. 4 5 in buffer
6. Yield 4
7. 5 2 in buffer
8. Yield 2
9. 5 3 in buffer
10. Yield 3
11. 5 _ in buffer
12. Yield 5
/*

This seems like a "bad" idea as it breaks how I would expect sort to work, but is the idea I had for making it work under the possible infinite iterator condition.

jemc · 2022-02-04T21:22:41Z

packages/itertools/iter.pony

+          let cur_hash = H.hash(v)
+          match _prev_hash
+          | let x: USize =>
+            if x == cur_hash then


To me it seems somewhat "off" that we are only using H.hash and not also using H.eq to confirm true value equality (instead of just hash-equivalence).

Some hash functions may have a non-negligible collision rate for non-identical elements, and because HashFunction has both a hash method and an eq method, users may be relying/assuming that the H.eq function will be used to check for true value equivalence after the H.hash equivalence has been checked, so that the overall check is resilient to false collisions.

I had to look at the source again around HashFunction, but I see what you mean and will make the necessary change(s) to use H.eq where needed.

@jemc Please check this again and make suggestions for improvement. In order to use H.eq I needed to keep both the previous hash and previous value. I initially tried to solve this with a var _prev: ((A!, USize) | None) = None but unpacking the tuple was causing me issues with ponyc insisting _iter.next()? having a return type of (A #any ! | None val^) so I split the value and hash into separate variables. I presume based on your comment here and the documentation around H.eq that the order of equality should be checking the hash match then using H.eq but am not 100% certain it should not be the other way around.

I also think this code is a bit messy right now with cur_{value,hash}, _prev_{value,hash}, and prev_value so if you have advice on making these names less redundant or the code less repetitive, please let me know!

packages/itertools/iter.pony

.release-notes/3992.md

rhagenson · 2022-02-05T20:35:07Z

@SeanTAllen I made the changes you suggested in release notes.

SeanTAllen · 2022-02-05T20:36:57Z

@rhagenson awsome.

Interleaving methods

d9a29e4

rhagenson added do not merge This PR should not be merged at this time changelog - added Automatically add "Added" CHANGELOG entry on merge labels Feb 3, 2022

ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Feb 3, 2022

rhagenson added 2 commits February 2, 2022 20:58

Unique method

644c009

Intersperse method

e792b64

Dedup method

51adda5

ergl reviewed Feb 3, 2022

View reviewed changes

packages/itertools/iter.pony Outdated Show resolved Hide resolved

rhagenson added 3 commits February 3, 2022 15:06

Make unique() work with infinite iter

15dc8ec

Do not change function signature by using consume

457674a

step_by method

61f9c55

Add release notes

adc61d3

rhagenson removed the do not merge This PR should not be merged at this time label Feb 4, 2022

rhagenson requested a review from a team February 4, 2022 03:40

jemc reviewed Feb 4, 2022

View reviewed changes

packages/itertools/iter.pony Outdated Show resolved Hide resolved

jemc reviewed Feb 4, 2022

View reviewed changes

packages/itertools/iter.pony Outdated Show resolved Hide resolved

jemc reviewed Feb 4, 2022

View reviewed changes

packages/itertools/iter.pony Outdated Show resolved Hide resolved

jemc reviewed Feb 4, 2022

View reviewed changes

packages/itertools/iter.pony Outdated Show resolved Hide resolved

SeanTAllen changed the title ~~Implement missing itertools methods~~ Add additional methods to itertools Feb 5, 2022

SeanTAllen requested changes Feb 5, 2022

View reviewed changes

.release-notes/3992.md Outdated Show resolved Hide resolved

.release-notes/3992.md Outdated Show resolved Hide resolved

rhagenson added 3 commits February 5, 2022 14:07

Use not keyword

f9ae14f

Clarify release notes

e6555cf

Use H.eq as well as compare H.hash values

47a62ab

SeanTAllen removed the discuss during sync Should be discussed during an upcoming sync label Feb 8, 2022

jemc approved these changes Feb 8, 2022

View reviewed changes

ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Feb 8, 2022

SeanTAllen merged commit 786d21c into main Feb 8, 2022

SeanTAllen deleted the itertools-additions branch February 8, 2022 19:56

ponylang-main removed the discuss during sync Should be discussed during an upcoming sync label Feb 8, 2022

github-actions bot pushed a commit that referenced this pull request Feb 8, 2022

Updates release notes for PR #3992

626bc74

github-actions bot pushed a commit that referenced this pull request Feb 8, 2022

Update CHANGELOG for PR #3992

416eda6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add additional methods to itertools #3992

Add additional methods to itertools #3992

rhagenson commented Feb 3, 2022 •

edited by SeanTAllen

Loading

ponylang-main commented Feb 3, 2022

rhagenson commented Feb 3, 2022

rhagenson commented Feb 3, 2022

rhagenson commented Feb 4, 2022

jemc Feb 4, 2022 •

edited

Loading

rhagenson Feb 4, 2022

rhagenson Feb 5, 2022

rhagenson Feb 5, 2022

rhagenson commented Feb 5, 2022

SeanTAllen commented Feb 5, 2022

Add additional methods to itertools #3992

Add additional methods to itertools #3992

Conversation

rhagenson commented Feb 3, 2022 • edited by SeanTAllen Loading

ponylang-main commented Feb 3, 2022

rhagenson commented Feb 3, 2022

rhagenson commented Feb 3, 2022

rhagenson commented Feb 4, 2022

jemc Feb 4, 2022 • edited Loading

Choose a reason for hiding this comment

rhagenson Feb 4, 2022

Choose a reason for hiding this comment

rhagenson Feb 5, 2022

Choose a reason for hiding this comment

rhagenson Feb 5, 2022

Choose a reason for hiding this comment

rhagenson commented Feb 5, 2022

SeanTAllen commented Feb 5, 2022

rhagenson commented Feb 3, 2022 •

edited by SeanTAllen

Loading

jemc Feb 4, 2022 •

edited

Loading