-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Proposal: Add AddOrUpdate to Dictionary<K,V> #26848
Comments
Here's a link to the access-the-internals hack that @AnthonyLloyd used to get the perf benefits of AddOrUpdate https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/knucleotide-csharpcore-9.html |
@AnthonyLloyd Could you give an indication of the degree of the perf benefits available here? |
How is this different to just setting via the indexer? |
Thats what I was wondering too :) |
It's roughly twice as fast because you need to effectively do the same operation twice. (The Func on AddOrUpdate may bring this back though) You need to do the following: dictionary[key] = dictionary.TryGetValue(key, out var i) ? i + 1 : 1; but you get a perf improvement if you do the following: public class Wrapper
{
public int V;
public Wrapper(int v) { V = v; }
}
if (dictionary.TryGetValue(key, out var w))
w.V++;
else
dictionary.Add(key, new Wrapper(1)); |
@AnthonyLloyd it would be useful to post the API shape you expect - see API review process for details and example. |
May be |
@AnthonyLloyd so the motivation is, setting a value that may depend on any existing value? And this new API would allow this to be done with only one key lookup? If so that makes sense, although there is a high bar for touching Dictionary as it gets instantiated many times (for many different value types) so every extra method bloats our assemblies significantly. We would need good evidence that it is worth it. Would this eliminate the need for the reflection in your latest KNucleotide submission (which makes it very hard to read 😃 )? Would it give better perf for that scenario, or comparable perf? For API review, the top post would need to be in the format described at https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/api-review-process.md |
By the way @AnthonyLloyd , many thanks for your work on the C# submissions to CLBG. |
Yes it's an update based on a prior value. It would simplify the KNucleotide code and would possibly have almost comparable perf. I'm not sure of other evidence out there. Personally in most systems I have added this as an extension method using the TryGetValue code above. I've also seen it often discussed on stackoverflow. As you say Dictionary is used a lot hence performance should be carefully considered but any improvements could possibly have a large impact for people. |
Can you share some top links demonstrating how popular and upvoted it is? |
It's been discussed in a few issues already: Just a quick google gives a few hits: https://stackoverflow.com/questions/46655665/update-dictionary-value-and-get-the-previous-one http://www.nilzorblog.com/2013/01/improving-generic-c-dictionary-with.html I think most sizable systems add these kind of extension methods. The question is is it worth bringing them into Dictionary for perf reasons. |
First, I think that the basic use case (A table where you want to accumulation (typically increment, but you may be adding to a list or doing some other accumulation), is pretty common (I have run into it myself on numerous occasions and have always lamented the fact that that I either have to look up the key twice (once to get the value and once to update it), or create some sort of wrapper (this is what I do when I care about perf). I note that Concurrent Dictionary already has an AddOrUpdate method. (They needed it for atomicity). So the existing proposal is reasonable as it 'follows suit'. However the CurrnetDictionary API was created before we had the ref-return capability in C#, Now that we have it I think the @omariom idea of
Is better (simpler, more efficent, no delgate in the API, covers the importnat scenarios). The caller can do the creation or update as needed, but in common scnearios (e.g. when TValue is an int or other numeric type), nothing special is needed. If you wanted to have an overload to supply a factory to create TValue I would not have a problem with that (since I think it will be the rare case, and avoids testing logic after the call). Thus I am conflicted. What I can say is I believe in the scenario (we should do something). I guess my best case would be ad GetOrAdd to both DIctionary and ConcurrentDictionary. To address @danmosemsft point. It is definately true that Dictionary<K, V> is instantiated alot, but the cost of an additional unused method on this instantiation is mostly pay-for-play (you do have to reserve a slot for the method (probably a pointer and view bytes), but otherwise it is pay for play. But in general I like it. |
@vancem From the discussion at the end of https://github.com/dotnet/corefx/issues/25189, it seems to me it's not clear yet that having Also, how would |
If a delegate is provided, it may run multiple times, but the dictionary guarantees that only one will be published/returned and in a way that maintains the consistency of the data structure. |
@stephentoub Right, but I was assuming @vancem was suggesting adding |
I don't understand the question. It already has both AddOrUpdate: |
@stephentoub I didn't realize a method called @vancem said:
My question is: what is the API that would be added to |
I don't know; I don't think there's anything to add there for this. Whether or not we add a ref-returning method to Dictionary, we definitely should not add such a method to ConcurrentDictionary. |
To compare ref returning method with what we normally do for the case when value is collection. if (!dict.TryGetValue(key, out var values))
{
dict.Add(key, values = new List<int>());
}
values.Add(value); ref var values = ref dict.GetOrAdd(key);
if (values == null)
{
values = new List<int>();
}
values.Add(value); The last lines can be replaced with (values = values ?? new List<int>()).Add(5); but at the cost of additional read and write every time (writing reference values to the heap has it own cost). This issue could be cured by null coalescing assignment.. (values ??= new List<int>()).Add(5); |
Given the issue with the byref being silently invalidated by updates to the dictionary, is pretty ugly. Also given that ConcurrentDictionary has exactly the AddOrUpdate method, already we should stick with that rather than a ref-returning version. |
@vancem tho this is primarily for performance since we can add extension methods to get the API. |
The original proposal was for AddOrUpdate taking an add value and an update delegate. Is GetOrAdd also proposed - @AnthonyLloyd I guess it is potentially useful but just doesn't happen to come up in kNucleotide? I agree that AddOrUpdate taking an add delegate is more marginal (because it is only useful when the value cannot be trivially set) and I suggest we not take that further (in this issue) |
That is true for JIT'd environments, but is it also true for AOT? I had assumed @jkotas was pushing back on any addition to the Dictionary class, not just growth in existing methods, for this reason. |
Yes I'd say this should be thought of much wider. I like what @vancem said about it being an accumulation. |
GetOrAdd is far more useful than AddOrUpdate, Please don't lose GetOrAdd. |
@dotnetchris @ufcpp We have RefDictionary (dotnet/corefxlab#2458) which shows very good perf. Around 50% fast as thought above. |
@AnthonyLloyd are you bringing up RefDictionary because RefDictionary is built such that you can use the indexer to check for existence safely? And that GetOrAdd only becomes relevant to thread safety concerns of ConcurrentDictionary? |
It looks like |
@AnthonyLloyd That's a much better solution. Thank you all for stepping back and actually curing the disease instead of just making the medicine taste better. |
Thinking about this further, this also makes DictionarySlim support simple concurrent writes very easily since you can lock on the instance that is returned from update. |
Add the AddOrUpdate method that is on ConcurrentDictionary to Dictionary. This would improve performance in use cases such as counting keys.
K-Nucleotide in the benchmarksgame does this and has an ugly workaround currently.
Proposed API
I'm proposing some kind of Update method for
Dictionary<K,V>
which could up to double the performance of this operation.After looking at
ConcurrentDictionary<K,V>
I think it could be the same as AddOrUpdate:Example
The usage example for counting keys would look like:
Discussion
there are other common cases where the value in the dictionary is another collection such as a
List<T>
which would need the other AddOrUpdate onConcurrentDictionary<K,V>
:since this request is for performance the second function may not be so relevant and would require further evaluation.
Other options could be:
GetOrAdd(key)++
but may only be useful for value types?The text was updated successfully, but these errors were encountered: