Proposal: Provide specialized collections aka Power Collections #26870

ViktorHofer · 2018-07-19T15:35:51Z

Proposal

We are hitting limitations with our in-built collections all over the places, either performance based (i.e. allocation) or feature based). After talking with @jkotas offline we agreed that it would be ideal if we would provide commonly used specialized collections (like a Dictionary<string, string> or a PriorityQueue) as a source based nuget package. That would allow us to provide basic enhanced functionality over the general purpose inbox types but still the ability to tweak them on demands.

Considerations:

What kind of collection types would benefit from specialization?
Where do we see gaps to our competition (again, performance or feature wise)?
Where would such a package be hosted and how would it be deployed? (corefx vs community corefx repo)
Is a source based nuget package feasible and are we easily able to use it inside corefx?

Example

Currently working in Regex to avoid allocation I stumbled upon places where Unicode categories are statically stored in a Dictionary<string, string>. Trying to get the value out of it involves allocation because there is currently no way to pass a ReadOnlySpan as a key in to get the value. Providing a specialized version would allow us to add an additional overload for ReadOnlySpan.

References

Dictionary/List ref api additions: dotnet/coreclr#17405 (comment)
PriorityQueue: https://github.com/dotnet/corefx/issues/574#issuecomment-308491122 & dotnet/corefxlab#1850
OrderedDictionary<,>: https://github.com/dotnet/corefx/issues/26634

cc @jkotas @karelz @benaadams @danmosemsft

danmoseley · 2018-07-19T16:37:57Z

Perhaps we could have a Dictionary implementation specialized for keys of type int -- that would be used by k-nucleotide (given they accept use of Long2IntOpenHashMap by Java 😕 ).

Why do you propose a source based package?

@stephentoub

danmoseley · 2018-07-19T16:39:41Z

cc @AnthonyLloyd for that.

prajaybasu · 2018-07-19T17:07:31Z

I hope adding OrderedDictionary<TKey, TValue> is also being considered. (Related: #24826)

ViktorHofer · 2018-07-19T17:19:21Z

OrderedDictionary<TKey, TValue> would also be nice. (Related: #24826)

Thanks I added it to the links.

Why do you propose a source based package?

One of the requirements for such a package would be that we can change it frequently for our inbox stuff (i.e. adding new apis that accept types that we just added in master) but still have a stable version that ideally targets netstandard. We did the same with Span by cross-compiling for netstandard and netcoreapp. That is definitely an approach we can think about but during my discussion with @jkotas he pointed out that source based nuget packages are more versatile as consumer will always want to fine-tune their collections which is easily possible with that approach.

AnthonyLloyd · 2018-07-19T18:25:50Z

Possibly:

A counting Dictionary<K,int> (or even Dictionary<int,int>) with Increment(K key) and Decrement(K key). For reference counting etc.
An aggregating Dictionary<K,V,R>(Func<V,R> seed, Func<R,V,R> accum) with Add(K v, V v).

Possibly 2) could be used in Enumerable.GroupBy.

benaadams · 2018-07-19T19:53:40Z

A fast Type Dictionary<Type, ...> e.g. https://github.com/benaadams/Ben.TypeDictionary

For DI etc

AnthonyLloyd · 2018-07-20T08:20:15Z

@benaadams I always assumed the following through some magic had good perf

static class TypeHolder<T>
{
    public static string Value;
}

TypeHolder<int>.Value = "int";
TypeHolder<string>.Value = "string";

benaadams · 2018-07-20T10:19:06Z

@AnthonyLloyd alas the generic is a process wide singleton (can be only one); and doesn't work well with GC

benaadams · 2018-07-20T11:24:42Z

Where would such a package be hosted and how would it be deployed?

corefxlab, myget; graduate to corefx, nuget

Being in corefx doesn't mean it can't be a nuget only package

community corefx repo

Only if MS doesn't want to support it going forward and/or wants to hand commit/management rights over to non-MS. Don't think that necessarily helps the .NET OSS community as now you have an officially sanctioned OSS section; where does that leave regular repos?

shaggygi · 2018-07-22T12:26:56Z

I still think it would be beneficial to have Immutable Parent/Child Collection, but understand it is not specifically related to this thread.

@ViktorHofer, @jkotas to expand on what I'm looking for, it appears @sharwell was working on something similar here... https://github.com/tunnelvisionlabs/dotnet-trees

Few links:
https://stackoverflow.com/questions/9304404/how-to-create-a-completely-immutable-tree-hierarchy-construction-chicken-and-eg
http://concurrencyfreaks.blogspot.com/2013/10/immutable-data-structures-are-not-as.html

AndyAyersMS · 2018-07-22T21:21:51Z

For fast dictionaries consider different hash -> bucket mapping strategies. The current dictionary's modulo a prime is slow (about 1/3 the cycle cost for a successful TKey=int lookup these days).

Custom vs/default comparators should be supported via subtyping or parameterization and not by options passed to the constructor, so the jit doesn't need heroics to inline comparators. If via subyping, the fast dictionary classes should be sealed.

For types where hash recomputation is cheap (ints), avoid storing the hash.

.Net settled long ago on the parallel array setup but different approaches may have appeal, eg a chunked array setup that is able to avoid LOH and should be cheaper to grow. Maybe also consider approaches that allow some mutate on read so if there are collisions frequently accessed elements can be made cheaper to find by migrating them earlier in the collision chains.

ViktorHofer · 2018-07-23T17:43:00Z

Thanks Andy, whoever is going to implement a fast dictionary will benefit a lot from your info.

I propose we talk about this in the Design/API review meeting. If we agree to provide these collections either inbox or as a nuget package I believe we could implement them (together with community members who are interested) fairly quick.

Zhentar · 2018-07-24T16:43:00Z

There are two specialized hash maps I'd like to see.

Fast ulong keyed map. Basically, applying all the optimizations Andy mentioned. My use case is string atomization while parsing Utf8 streams; when I've got a token I need to stringify, I hash the raw bytes with a high quality 64-bit hash function so that the risk of collision is low enough (even accounting for the birthday paradox) that I can treat hash equality as value equality. I'd also prefer an open addressing hash table algorithm that only takes one cache miss on random lookups ( unfortunately this last part is incompatible with a fast K-Nuc dictionary mentioned in Dictionary CQ coreclr#15419 because K-Nuc benefits tremendously from coalesced chaining preserving insertion order locality)
string keyed map which assumes reference equality for equal values. As long as my strings are atomized it would be nice to get an extra bit of efficiency out of dictionaries using them as keys.

danmoseley · 2018-07-24T23:32:05Z

Last time we discussed this it got wedged on the discussion of whether there should be a new repo. @joshfree @weshaggard is corefxlab a suitable place to put experimental code that has some infrastructure (CI) and the ability to package for nuget and might graduate to corefx later?
@karelz

ViktorHofer · 2018-07-24T23:48:56Z

If we ship this as a normal nuget package (not source based) this should live in corefx as we want to use it internally in master. Also as a buid from source requirement I believe we must live build netcoreapp asset.

joshfree · 2018-07-25T00:01:02Z

@danmosemsft yes corefxlab is the appropriate place for something like this to begin while we’re unsure if it should be promoted to corefx or not.

mikedn · 2018-07-25T17:18:28Z

string keyed map which assumes reference equality for equal values

How's that different from a normal dictionary using an appropriate key comparer? (apart from the cost of interface calls that the normal dictionary incurs).

Zhentar · 2018-07-25T17:27:37Z

(apart from the cost of interface calls that the normal dictionary incurs).

In a sufficiently tight loop, that's enough :) (combined with the assumption that it's worthwhile to store both the key and the hash code and check for the equality of both)

danmoseley · 2018-07-30T16:51:39Z

Here's what we are going to do. We do want such a package for extended collections. I will move this issue to corefxlab. We will create a package with a name something like System.Collections.Extensions (or System.Collections.Experimental) and put types in a temporary namespace (eg System.Collections.Extensions.Experimental) so we have the option of moving some into the BCL if they prove worthy. This will publish to myget to start.

The goal of this is to develop useful collections, but it would be nice if the kNucleotide maintainer would allow it also.

danmoseley · 2018-07-30T16:51:55Z

Issue moved to dotnet/corefxlab dotnet/runtime#14870 via ZenHub

danmoseley closed this as completed Jul 30, 2018

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the 3.0 milestone Jan 31, 2020

ghost locked as resolved and limited conversation to collaborators Dec 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Provide specialized collections aka Power Collections #26870

Proposal: Provide specialized collections aka Power Collections #26870

ViktorHofer commented Jul 19, 2018

danmoseley commented Jul 19, 2018 •

edited

Loading

danmoseley commented Jul 19, 2018

prajaybasu commented Jul 19, 2018 •

edited

Loading

ViktorHofer commented Jul 19, 2018

AnthonyLloyd commented Jul 19, 2018 •

edited

Loading

benaadams commented Jul 19, 2018 •

edited

Loading

AnthonyLloyd commented Jul 20, 2018

benaadams commented Jul 20, 2018

benaadams commented Jul 20, 2018 •

edited

Loading

shaggygi commented Jul 22, 2018 •

edited

Loading

AndyAyersMS commented Jul 22, 2018

ViktorHofer commented Jul 23, 2018

Zhentar commented Jul 24, 2018

danmoseley commented Jul 24, 2018

ViktorHofer commented Jul 24, 2018

joshfree commented Jul 25, 2018

mikedn commented Jul 25, 2018

Zhentar commented Jul 25, 2018 •

edited

Loading

danmoseley commented Jul 30, 2018

danmoseley commented Jul 30, 2018

Proposal: Provide specialized collections aka Power Collections #26870

Proposal: Provide specialized collections aka Power Collections #26870

Comments

ViktorHofer commented Jul 19, 2018

Proposal

Example

References

danmoseley commented Jul 19, 2018 • edited Loading

danmoseley commented Jul 19, 2018

prajaybasu commented Jul 19, 2018 • edited Loading

ViktorHofer commented Jul 19, 2018

AnthonyLloyd commented Jul 19, 2018 • edited Loading

benaadams commented Jul 19, 2018 • edited Loading

AnthonyLloyd commented Jul 20, 2018

benaadams commented Jul 20, 2018

benaadams commented Jul 20, 2018 • edited Loading

shaggygi commented Jul 22, 2018 • edited Loading

AndyAyersMS commented Jul 22, 2018

ViktorHofer commented Jul 23, 2018

Zhentar commented Jul 24, 2018

danmoseley commented Jul 24, 2018

ViktorHofer commented Jul 24, 2018

joshfree commented Jul 25, 2018

mikedn commented Jul 25, 2018

Zhentar commented Jul 25, 2018 • edited Loading

danmoseley commented Jul 30, 2018

danmoseley commented Jul 30, 2018

danmoseley commented Jul 19, 2018 •

edited

Loading

prajaybasu commented Jul 19, 2018 •

edited

Loading

AnthonyLloyd commented Jul 19, 2018 •

edited

Loading

benaadams commented Jul 19, 2018 •

edited

Loading

benaadams commented Jul 20, 2018 •

edited

Loading

shaggygi commented Jul 22, 2018 •

edited

Loading

Zhentar commented Jul 25, 2018 •

edited

Loading