-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Dictionary<TKey, TValue>.TryAdd(TKey, TValue) #14676
Comments
I wonder whether other functions from For example if you need something like List<T> values;
if (!dict.TryGetValue(key, out values))
{
values = new List<T>();
dict.Add(key, values);
}
// use values here Or you could simplify this and also potentially make it more efficient into: var values = dict.GetOrAdd(key, _ => new List<T>()); I'm focusing on this being simpler code, because it's not clear to me whether it would actually be more efficient Though this specific example might become moot if Regarding |
@stephentoub I suspect a lot of people would appreciate this feature. It's conceptually more intuitive. Do you have a sense of the performance benefit? How many times would you need execute this pattern for it to matter? |
This is better. The entire test and set model most developers use with What about methods like I'd be willing to help code on weekends |
@whoisj It still wouldn't be thread-safe. The whole I think that you're assuming that since it's a single method call, it has to be atomic. But it doesn't have to be. If you want thread-safe dictionary, just use |
@svick touche. Yes "atomic" not "safe". |
I have definitely seen and written a lot of code doing TryAdd. |
For reference, this is what I was imagining in terms of the change:
It depends on the type of the key being used, as the primary costs involved in a lookup are the computation of the hash code and the comparison of the added key against all of the key entries in the same target bucket. For a very simple key, e.g. an Int32, the improvement seems to be around 15%. For a complex key, the improvement approaches 50%, which is the theoretical maximum you'd expect from removing one of the two lookup operations. |
So benefits and no tradeoffs? Why is there even a discussion? |
There's always a cost to new APIs. Ongoing maintenance, testing, documentation, extra code means larger binaries, more surface area means more choice which can lead to customer confusion, etc. etc. |
@stephentoub +1 for this change. As a separate comment, reading the line of your diff stephentoub/coreclr@b3b0ba4#diff-333b4ea3107f8853ce3c9c3d99182a7bR196 Reminds me why I like to specify the argument's name when passing bool literals: we don't know what true means by just reading the line. I haven't seen a DO/DON'T guideline for this, it's not mentioned in the coding style document either. Is there an informal guideline for named args? |
This is very concerning, because the rationale is completely wrong. Like all operations capable of altering a if (!dictionary.Contains(key))
dictionary[key] = value; ¹ The performance benefit comes from only needing to look up the hash bucket once in the case an addition is required. |
I think the
Unlike TValue existing;
if (!dictionary.TryGetValue(key, out existing))
{
dictionary[key] = value;
existing = value;
} Note that while if (!dictionary.ContainsKey(key))
{
dictionary[key] = value;
// Some operation here which should not execute if the dictionary was unchanged
} |
What is TryAddValue?
That's exactly the scenario for which I was proposing it. If you look around the corefx codebase, for example, that pattern shows up a lot, and rarely in such cases does the code need the current value from the collection.
I have no problem with a GetOrAdd method being added to |
👍 for |
I meant Would you be opposed to expanding this proposal to include the other conditional update methods? I've positioned them in what I see as their overall level of versatility. I don't have anything against
|
I don't have a problem with it, but each API should be considered individually, not en mass. The key for me is that TryAdd is a primitive operation (add this value, failing if it already exists); it's exactly like the existing Add, except with a Boolean used for the failure mode instead of an exception, such that Add can easily be built on top of TryAdd (as it is in my example implementation). You can approximate TryAdd as a compound operation using other primitive operations (find + add to avoid the exception), but that's just a workaround. The other operations are all truly compound operations built in terms of the primitives, with most of their names highlighting their compound nature, e.g. pseudo-code: AddOrUpdate: public TValue AddOrUpdate<TArg>(TKey key, Func<TKey, TArg, TValue> addFactory, Func<TKey, TValue, TArg, TValue> updateFactory, TArg factoryArg)
{
TValue value;
return (this[key] = TryGetValue(key, out value) ?
updateFactory(key, value, factoryArg) :
addFactory(key, factoryArg));
} GetOrAdd: public TValue GetOrAdd<TArg>(TKey key, Func<TKey, TArg, TValue> factory, TArg factoryArg)
{
TValue value;
if (!TryGetValue(key, out value))
{
value = factory(key, factoryArg);
Debug.Assert(TryAdd(key, value));
}
return value;
} TryUpdate: public bool TryUpdate(TKey key, TValue newValue, TValue oldValue)
{
TValue existingValue;
if (!TryGetValue(key, out existingValue) || !EquaityComparer<TValue>.Default.Equals(existingValue, oldValue))
return false;
this[key] = newValue;
return true;
} Are these useful? Of course, they implement commonly used patterns. And they should be implementable more efficiently internal to the dictionary rather than on top of it, due to being able to avoid multiple lookups. They're also more complex, take delegates that can easily lead to misuse causing perf problems in the consuming code, etc. I was quick to add TryAdd and TryUpdate to ConcurrentDictionary because for ConcurrentDictionary they're both primitive operations, where the dictionary can ensure they're implemented atomically with regards to other operations on the collection in a way that consuming code can't. And we added GetOrAdd and AddOrUpdate because they are common patterns and the code required to implement them in a concurrency-friendly way is more than what's shown above for Dictionary. My point is simply that I'd be happy to have these helpers as members of Dictionary, but not instead of TryAdd; TryAdd is a fundamental operation for the dictionary.
Sure, TryAdd would not be a substitute when one of these more complicated operations is needed. Just as these more complex operations are not a good substitute when just TryAdd is needed for the (common) pattern called out earlier in this thread. The closest would be GetOrAdd, but it does unnecessary additional work, and if you need to make a decision based on whether the value was added or not, you can't easily do so with a signature like that shown... you'd need to complicate the GetOrAdd API to also return information about which path was taken, at an added expense. |
I did correct my mis-statement, it is not thread-safe but it is, however, atomic. If the implementation of That said, this seems like a no brainier for addition to the library. I agree with @svick that there are likely other complex-atomic entry points in to the set that would bring value as well. I also agree with @stephentoub that this pull request should remained focused on delivering |
@whoisj atomic is used to refer to the behavior of an operation with respect to other threads. |
As for the primitives, I feel it's unfortunate that TValue value;
if (cache.TryGetValue(key, out value))
{
cache.Add(key, GetNewValue(key));
} And while I could use the concurrent for the primitive, for a single threaded job it's overkill to use that collection. |
GetOrAdd should definitely be added to IDictionary and implemented across the collections. I find GetOrAdd to be so valuable I'll use the ConcurrentDictionary just to have it. |
The concurrent dictionary operations that take a |
|
That though would be a breaking change in terms of every single implementation of the interface outside of corefx. On a related note, since Dictionary is in mscorlib and used there a lot, should this be under coreclr? |
In cognitive terms, there could be a bigger change to add some of the methods that ConcurrentDictionary has that Dictionary currently does not, than to add them all. It's easy to get "Dictionary now has all the methods ConcurrentDictionary has". In terms of implementation, https://github.com/hackcraft/coreclr/commit/1a982053ae39ee0e418d370fdee3497ecefd0e5a takes a stab at that. |
A little late, but +1 for |
We did a quick review on this. The only open question is whether that's the only type of API that is essentially borrowed from However, we should probably add similar APIs to Otherwise, no concerns from my side. |
@terrajobst does this mean you can tag this as "feature approved"? |
Is there a formal specification of what is being proposed? I might grab this if I have time but it seems unclear from the discussion exactly which out of |
@safern can you please talk to @terrajobst to get explicit what the API board actually approved. |
@danmosemsft yes, I'll talk to @terrajobst and will post the API approved so that @jamesqo can grab this! |
Then we should open an issue to use it. The crude regex
|
I think I found like 50+ last year when I grepped the repo for |
I talked to @terrajobst and the approved API was: public bool TryAdd(TKey key, TValue value) If we would like /cc @jamesqo @danmosemsft |
Anyone interested in taking this one, so we cna get it in for 2.0? Seems like it shouldn't take too long. |
@sepidehms as you have a few cycles, do you want to code this up, since IMO it would be nice to squeeze this into 2.0? |
@danmosemsft I'm taking care of it right now. |
I'm not seeing |
This will probably be available in .NET Standard vNext. |
@ViktorHofer vNext = 2.2 or 3.0? |
Not decided yet. |
@IanKemp follow |
I recently had a use case for a TryAdd where I needed to update an existing value in the dictionary in case the key already existed. Thought I'd share the solution since the proposals here don't cover this:
The TryAdd method:
The changes to the Insert method (copied from Dictionary.cs, mostly):
|
BTW, I recommend that you open a new issue if you want to discuss, typically posts on closed issues are overlooked. |
That's a nice trick with the ref returns Dan! But it has an issue, how do you distinguish between a missing key and one that is in the DictionarySlim but has a default value? Look at this test:
Also where are the indexers? |
@kdg3737 as you point out, There are no indexers because they are purely convenience members and this is "slim". There is value in keeping a generic type small because its code is generated separately for every differently sized type that it is instantiated over. For The type is in corefxlab because the API is not final. If you have feedback on the API, please do open an issue there. You may want to look at the discussion in the original PR first: dotnet/corefxlab#2458 If the type proves to be widely useful, and we are comfortable with the API, it is may be a candidate for moving into the product. |
That makes sense for your 'slim' Dictionary, but the topic here is about the regular Dictionary which as-still-is requires a double lookup for a contains-add. The smallest devices we run .net/mono on have 4GB ram these days so it's unlikely we'd ever need or want this 'slim' variant, even with hundreds of different jitted Dictionary types. I'm way more interested in saving CPU cycles (and development time, I do use dictionaries pretty much everywhere) than saving a couple of kilobytes volatile memory which I have in abundance anyway and are consuming power no matter if I use them or not. Functionality beats memory-efficiency, no indexers and the issue I mentioned above are a no-go for me and I expect many developers would agree. Of course it is nice to see MS care about memory usage and to have this DictionarySlim option. This actually reminds me of another Dictionary I once wrote for large string-keys that would be hashed before putting them in the Dictionary to save on memory (losing IEnumerable in the process). Anyways, I feel I've said what I wanted to, thanks for this conversation and good luck with your work on corefxlab. |
@kdg3737 I see, I was confused because your example was Because of the "code bloat" mentioned in my comment above, we're careful about adding more code into Dictionary, but we can talk about it. |
It's very common to see code that does this:
This forces the dictionary to lookup the key twice when the key isn't yet in the dictionary.
It'd be great if we had an additional method on the type:
Functionally it would behave exactly as if it were implemented like:
but the implementation internally would be done more efficiently, essentially implemented by calling a variant of Insert but that returns false instead of throwing when a duplicate key is found.
The text was updated successfully, but these errors were encountered: