Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Define API invariance for keys, values, getindex and pairs #36175

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tkf
Copy link
Member

@tkf tkf commented Jun 7, 2020

There are discussions for extending keys #34678 and pairs #34851 API. Rather than extending the implementation purely for convenience, I suggest to document the expected relationships between the functions. This clarifies what to expect from these APIs without looking at each implementation. Otherwise, it is difficult to use these functions in generic code.

Concretely, for keys, values, and getindex I suggest:

It is highly recommended that each collection that implements keys to satisfy the following relationship between values and getindex:

isequal(collect(values(collection)), [collection[k] for k in keys(collection)])

For pairs:

It is highly recommended that each implementation of pairs satisfies the following relationships:

isequal(map(first, paris(collection)), keys(collection))
isequal(map(last, paris(collection)), values(collection))

The fallback generic implementation satisfy this invariance.

Since it is a breaking change to impose more strict API requirements for already existing and used API, I only used a very loose phrasing "It is highly recommended." It is up to discussion if upgrading this to "must" is a minor change.

close #34851

@tkf tkf added the collections Data structures holding multiple items, e.g. sets label Jun 7, 2020
@tkf tkf requested review from andyferris and mbauman June 7, 2020 00:07
@tkf
Copy link
Member Author

tkf commented Jun 7, 2020

A case study: Base.Generator

With the API proposed, some care must be taken when defining keys, values and getindex for Base.Generator (#34678). This is because it's very easy to use stateful mapping with the generator expression:

collection = let counter = 0
    (
        begin
            counter += 1
            (counter, x)
        end for x in 1:10
    )
end

One possible solution is to clarify the condition in which keys etc. for Base.Generator are usable:

The generator object is "indexable" (i.e., getindex, keys, values, and pairs behave consistently) if the input collection is also indexable and the mapping is stateless in the sense that the same input produces the same output.

(This can be added in the docstring of Iterators.map #34352).

It might be useful to define and mention axes and its related methods for Base.Generator.

I think it makes sense to define more methods for Base.Generator assuming the purity of the mapping function and then explain on which conditions certain interface can be used consistently. For example, such "narrow contract" API is required for supporting parallelism with the generator expressions via Transducers.jl/SplittablesBase.jl.

(Somewhat similar discussion with Broadcasted: #36081)

@rapus95
Copy link
Contributor

rapus95 commented Jun 7, 2020

While clarification is always good(!) I think that the current understanding of Generators expects them to only depend on their input arguments (and no hidden state) since, strictly speaking, there are no guarantees/claims about the order in which it'll be iterated.

Generator(f, iter)

  Given a function f and an iterator iter, construct an iterator that yields the values of f applied to the elements
  of iter.

@tkf
Copy link
Member Author

tkf commented Jun 7, 2020

Thanks, that's a good point. Strictly speaking, I don't think Base.Generator's docstring is a public API because it's not included in the documentation (discussion about this in #35715). But at least it indicates that the intended use-case is the pure mapping (which is great). OTOH, something like mean(exp(randn()) for _ in 1:10) is so innocent-looking and I don't think it's reasonable to "ban" this kind of use-cases.

@rapus95
Copy link
Contributor

rapus95 commented Jun 7, 2020

To clarify, in the example you gave most recently, I think having keys defined is definitively good/important for convenience (even though, whatever key you'll get won't help you predict the result), but as the function passed to the Generator is not pure, the invariance you suggested isn't expected to hold (that's why the key isn't worth anything). That way we keep the Generator as a very convenient construct for stateful iterators while still increasing the overall consistency in the stateless case.

Edit: We also might consider to change the isequal in your invariance to be a "set-equal" to not to enforce any particular order on iterators. If the implementation follows a given order (like for Dict) then it'll be noted in the functions definition.

Edit2: in the case of collections and iterators (not dicts) values should always evaluate to the object itself I guess. Would you expect anything else?


```julia
isequal(map(first, paris(collection)), keys(collection))
isequal(map(last, paris(collection)), values(collection))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, that currently neither Array nor Dict satisfies these relationships, because map is not defined for the return types of pairs:

julia> coll = [10, 20];
julia> map(first, pairs(coll))
ERROR: map is not defined on dictionaries

@KlausC
Copy link
Contributor

KlausC commented Jun 10, 2020

strictly speaking, there are no guarantees/claims about the order in which it'll be iterated.

IMO the claim, that the order of the generated iterator follows the order of the base iterator is so natural, that it is expected by most users. In my understanding it is implied in the description.

That would be consistent with the documentation of map and Array comprehension, which also do not explicitly mention, that the order is preserved. I would always expect these invariants for all iterables:

collect(f(x) for x in itr) == collect(Generator(f, itr)) == map(f, itr) == [f(x) for x in itr]

the following relationships:

```julia
isequal(map(first, paris(collection)), keys(collection))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🇫🇷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Define pairs(A) = enumerate(A)
4 participants