-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC/WIP: creation of mapdict method to modify Dict values #31223
Conversation
API question: Should we really return a different dict if the types don't fit? I'd prefer Can we improve on flexibility by an additional (keyword? Valtype?) argument Can we improve flexibility by passing and additional (keyword? Valtype?) argument Constant propagation should be able to handle that, i.e. one would need no source-code duplication and get no runtime overhead inside the loop. |
I like this. I'm wondering if a more specific name would be better? I can imagine a different function which maps (renames) keys instead of values, so maybe call this one |
Renaming/changing keys cannot be reasonably done inplace without temporaries (because of the possibility of needing a rehash). However, the function That is potentially a heavy downside, so one could do something like
|
I agree with liking the functionality but preferring a name that indicates mapping over values. The names |
Alternatively, since |
I like Is everyone okay with the mutating and type widening behavior? |
No. |
It's not actually possible to widen the type of an existing Dict, so type-widening isn't really a question. |
I might be in the minority but I think that the difference for something like a The other datatype that can also fit that bill are SparseArrays, and I would argue that something like a mutating @StefanKarpinski as for type widening, is we end up not allowing |
The type system doesn't care that you think this is a valid use case. It cannot be done. |
So no mutation then? I am okay with that. But on the copying version should we not allow the type to be changed? |
@ndinsmore: The question is not about valid use cases (there are many of those), but what should/could be specialized for mapvalues(f, dict) = Dict(zip(keys(dict), map(f, values(dict)))) can be improved by not rebuilding the table, and similarly The point of providing these methods is that they can be faster and allocate less, not to support some use case (since they are easy to do already, just suboptimal). The in-place version cannot change the type (it should error when trying to do so, and this should be tested), while the "functional" version should work generally, and figure out the result type like julia> d = Dict(string(i) => i for i in 1:5)
Dict{String,Int64} with 5 entries:
"4" => 4
"1" => 1
"5" => 5
"2" => 2
"3" => 3
julia> mapvalues(v -> Symbol(string(v)), d)
Dict{String,Symbol} with 5 entries:
"4" => Symbol("4")
"1" => Symbol("1")
"5" => Symbol("5")
"2" => Symbol("2")
"3" => Symbol("3") |
That's not what I said. The values in a dict can change but the type of a dict cannot. You can write |
Yes, my point of view is that it would be easiest to do data manipulation if dictionaries and arrays followed the same interface. However at least for the moment I wouldn't like to break the link between iteration and Other related data manipulation functions could include I wonder if it might be more productive then to implement a more "interesting" container for |
If I understood you right, then I really like your idea: All higher-order functions on dictionaries would operate on If people want to write their kernels in a way that does not get to operate on the key-value pair, we define corresponding higher-order functions on Out-of-place filter and map that does not change keys should still be done with zero hash evaluations (unless the dict could shrink by a very large factor). There could be an argument for returning a |
I agree that |
That is nice but how would you write the non-mutating version? |
It would be great if we could come up with a solution that makes sense for |
I think it would be better to leave sparse arrays out of this PR and focus on |
Indeed we can't change the behavior of |
I think the current semantics of In contrast, The main value of this function is as a performance optimization in the in-place case, anyway; in a typical context where you are willing to make a copy of the dictionary, I'm guessing that the performance cost of re-hashing the keys is not such a big deal. |
What about if we used |
A It is also unclear what is meant by
, would the two dicts share structure? I can't imagine that being useful. In any case, merely copying keys, tables, and associated information should be very cheap. It is building the hash table that is expensive. |
Co-Authored-By: ndinsmore <45537276+ndinsmore@users.noreply.github.com>
Co-Authored-By: ndinsmore <45537276+ndinsmore@users.noreply.github.com>
Co-Authored-By: ndinsmore <45537276+ndinsmore@users.noreply.github.com>
Looks good to me. Should be squashed when merging. |
I believe this was your first PR to Julia? Great work, and welcome as a contributor. |
Yes, this is a really nice first PR. Thank you! |
This is the initial pass at the inclusion of methods
mapdict
&mapdict!
which allows the modification of Dict values without the overhead of using the Dict keys to access & store the values.The need for this method was discuss at:
https://discourse.julialang.org/t/fast-dict-value-modification-by-accessing-dict-vals
At the recommendation of @tpapp this was built out for base.
An example of the use would be as follows:
Currently this implementation handles most of the function cases that were brought up:
1.) Functions which return the same type as the input
2.) Functions that are mutate the value type but are typesafe
3.) Functions which are not type safe.
While the function is currently only implemented for
Dict
, it would be simple to add a naive fallback forAbstractDict
using keys.If there is interest in including this in Base, test would also be built out.
Benchmarking results for essentially
mapdict!(v-> v > row ? v-1 : v,D)
where row is set to change half the values.