Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: LinkedDictionary #64829

Open
alexzaitzev opened this issue Feb 4, 2022 · 20 comments
Open

[API Proposal]: LinkedDictionary #64829

alexzaitzev opened this issue Feb 4, 2022 · 20 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Collections wishlist Issue we would like to prioritize, but we can't commit we will get to it yet
Milestone

Comments

@alexzaitzev
Copy link

alexzaitzev commented Feb 4, 2022

Background and motivation

This data container is an analogue of LinkedHashMap implementation in Java, which is missing in C#. In the latest release was finally added PriorityQueue which was in Java since 2004. Bridging the gap between APIs of C# and Java will make possible quickly migrate the code from one language to another.

Dictionary with an opportunity to traversal the items in the addition order can be super useful in such scenarios as cache with LRU replacement policy or another cases where we want to maintain insertion order.

This is not a generic OrderedDictionary, because we don't want to alter the order, just traversal in the addition order and be able to read and delete the last/first items.

API Proposal

namespace System.Collections.Generic
{
    public class LinkedDictionary<TKey, TValue> : IDictionary<TKey, TValue>, IDictionary where TKey : notnull
    {
        public LinkedDictionary();
        public LinkedDictionary(int capacity);
        public LinkedDictionary(IEqualityComparer<TKey> comparer);
        public LinkedDictionary(int capacity, IEqualityComparer<TKey> comparer);
        public LinkedDictionary(IDictionary<TKey, TValue> dictionary);
        public LinkedDictionary(IDictionary<TKey, TValue> dictionary, IEqualityComparer<TKey> comparer);
        public LinkedDictionary(IEnumerable<KeyValuePair<TKey, TValue>> collection);
        public LinkedDictionary(IEnumerable<KeyValuePair<TKey, TValue>> collection, IEqualityComparer<TKey> comparer);
        public TValue this[TKey key] { get; set; }
        public object this[object key] { get; set; }
        public ICollection<TKey> Keys { get; }
        public ICollection<TValue> Values { get; }
        public int Count { get; }
        public bool IsReadOnly { get; }
        public bool IsFixedSize { get; }
        public bool IsSynchronized { get; }
        public object SyncRoot { get; }
        ICollection IDictionary.Keys { get; }
        ICollection IDictionary.Values { get; }
        public void Add(TKey key, TValue value);
        public void Add(KeyValuePair<TKey, TValue> item);
        public void Add(object key, object value);
        public void Clear();
        public bool Contains(KeyValuePair<TKey, TValue> item);
        public bool Contains(object key);
        public bool ContainsKey(TKey key);
        public void CopyTo(KeyValuePair<TKey, TValue>[] array, int arrayIndex);
        public void CopyTo(Array array, int index);
        public IEnumerator<KeyValuePair<TKey, TValue>> GetEnumerator();
        public void Remove(TKey key);
        public bool Remove(KeyValuePair<TKey, TValue> item);
        public void Remove(object key);
        public bool TryGetValue(TKey key, [MaybeNullWhen(false)] out TValue value);
        IEnumerator IEnumerable.GetEnumerator();
        IDictionaryEnumerator IDictionary.GetEnumerator();
        bool IDictionary<TKey, TValue>.Remove(TKey key);

        // New APIs 
        public KeyValuePair<TKey, TValue>GetLast(); // O(1)
        public KeyValuePair<TKey, TValue>GetFirst(); // O(1)
        public bool RemoveLast(); // O(1)
        public bool RemoveFirst(); // O(1)
    }
}

API Usage

var ld = new LinkedDictionary<int, string>();
ld.Add(3, "three");
ld.Add(2, "two");
ld.Add(1, "one");

Console.WriteLine(ld.GetFirst().Key);
Console.WriteLine(ld.GetLast().Key);

foreach (var item in ld)
    Console.WriteLine(item.Key);

ld.RemoveFirst();
ld.RenoveLast();

Console.WriteLine(ld.GetFirst().Key);
Console.WriteLine(ld.GetLast().Key);

/* This code example produces the following output:
3
1
3
2
1
2
2
*/

Alternative Designs

No response

Risks

No risks to the existing API is expecting. Following SemVer it's a minor API update, because it's an addition of the new API and not changing/deleting existing ones.

@alexzaitzev alexzaitzev added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Feb 4, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Collections untriaged New issue has not been triaged by the area owner labels Feb 4, 2022
@ghost
Copy link

ghost commented Feb 4, 2022

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

This data container is an analogue of LinkedHashMap implementation in Java, which is missing in C#. In the latest release was finally added PriorityQueue which was in Java since 2004. Bridging the gap between APIs of C# and Java will make possible quickly migrate the code from one language to another.

Dictionary with an opportunity to traversal the items in the addition order can be super useful in such scenarios as cache with LRU replacement policy or another cases where we want to maintain insertion order.

This is not a generic OrderedDictionary, because we don't want to alter the order, just traversal in the addition order and be able to read and delete the last/first items.

API Proposal

namespace System.Collections.Generic
{
    public class LinkedDictionary<TKey, TValue> : IDictionary<TKey, TValue>, IDictionary where TKey : notnull
    {
        public LinkedDictionary();
        public LinkedDictionary(int capacity);
        public LinkedDictionary(IEqualityComparer<TKey> comparer);
        public LinkedDictionary(int capacity, IEqualityComparer<TKey> comparer);
        public LinkedDictionary(IDictionary<TKey, TValue> dictionary);
        public LinkedDictionary(IDictionary<TKey, TValue> dictionary, IEqualityComparer<TKey> comparer);
        public LinkedDictionary(IEnumerable<KeyValuePair<TKey, TValue>> collection);
        public LinkedDictionary(IEnumerable<KeyValuePair<TKey, TValue>> collection, IEqualityComparer<TKey> comparer);
        public TValue this[TKey key] { get; set; }
        public object this[object key] { get; set; }
        public ICollection<TKey> Keys { get; }
        public ICollection<TValue> Values { get; }
        public int Count { get; }
        public bool IsReadOnly { get; }
        public bool IsFixedSize { get; }
        public bool IsSynchronized { get; }
        public object SyncRoot { get; }
        ICollection IDictionary.Keys { get; }
        ICollection IDictionary.Values { get; }
        public void Add(TKey key, TValue value);
        public void Add(KeyValuePair<TKey, TValue> item);
        public void Add(object key, object value);
        public void Clear();
        public bool Contains(KeyValuePair<TKey, TValue> item);
        public bool Contains(object key);
        public bool ContainsKey(TKey key);
        public void CopyTo(KeyValuePair<TKey, TValue>[] array, int arrayIndex);
        public void CopyTo(Array array, int index);
        public TValue GetEldest();
        public IEnumerator<KeyValuePair<TKey, TValue>> GetEnumerator();
        public void Remove(TKey key);
        public bool Remove(KeyValuePair<TKey, TValue> item);
        public void Remove(object key);
        public bool TryGetValue(TKey key, [MaybeNullWhen(false)] out TValue value);
        IEnumerator IEnumerable.GetEnumerator();
        IDictionaryEnumerator IDictionary.GetEnumerator();
        bool IDictionary<TKey, TValue>.Remove(TKey key);

        // New APIs 
        public KeyValuePair<TKey, TValue>GetLast(); // O(1)
        public KeyValuePair<TKey, TValue>GetFirst(); // O(1)
        public bool RemoveLast(); // O(1)
        public bool RemoveFirst(); // O(1)
    }
}

API Usage

var ld = new LinkedDictionary<int, string>();
ld.Add(3, "three");
ld.Add(2, "two");
ld.Add(1, "one");

Console.WriteLine(ld.GetFirst().Key);
Console.WriteLine(ld.GetLast().Key);

foreach (var item in ld)
    Console.WriteLine(item.Key);

ld.RemoveFirst();
ld.RenoveLast();

Console.WriteLine(ld.GetFirst().Key);
Console.WriteLine(ld.GetLast().Key);

/* This code example produces the following output:
3
1
3
2
1
2
2
*/

Alternative Designs

No response

Risks

No risks to the existing API is expecting. Following SemVer it's a minor API update, because it's an addition of the new API and not changing/deleting existing ones.

Author: alexzaitzev
Assignees: -
Labels:

api-suggestion, area-System.Collections, untriaged

Milestone: -

@theodorzoulias
Copy link
Contributor

Your suggestion reminds me of a class named OrderedDictionary<TKey, TValue> in the OndrejPetrzilka/Rock.Collections repository. That class doesn't have GetFirst/GetLast/RemoveFirst/RemoveLast methods, but it does include these:

public bool MoveFirst(TKey key);
public bool MoveLast(TKey key);
public bool MoveBefore(TKey keyToMove, TKey mark);
public bool MoveAfter(TKey keyToMove, TKey mark);
public IEnumerator<KeyValuePair<TKey, TValue>> GetEnumerator(TKey startingElement);
public IEnumerator<KeyValuePair<TKey, TValue>> GetReverseEnumerator();
public IEnumerator<KeyValuePair<TKey, TValue>> GetReverseEnumerator(TKey startingElement);

Do you think that these APIs go beyond the scope of your suggestion?

@alexzaitzev
Copy link
Author

Yes, the idea was to leave the original experience of using dictionary, but add defined enumeration order (insertion order) with minimal safe API (by analogy with HTTP safe methods: which are not alter the state). Remove first or last item of course are not safe but I consider them as kind of safe, because they do no change insertion order.

Another point is to add this as a part of a framework, so noone needs to reinvent the wheel (as I did when I needed such data structure). I'd be happy if .NET has both LinkedDictionary<TKey, TValue> and OrderedDictionary<TKey, TValue> out of the box without the need to use external libs.

Last but not least is a consistent with Java API (which is obviously richer) to make ramp-up to C# faster and smoother for Java guys/projects.

But I am happy to discuss any critique/suggestions about proposed API.

@theodorzoulias
Copy link
Contributor

@danmoseley personally I've never encountered a situation where accessing the elements of a dictionary by index, like the experimental OrderedDictionary<K,V> does, could be useful. On the other hand in many occasions I have appreciated that the normal Dictionary<K,V> preserves the insertion order, as long as there is no Remove intercepted between Adds. Having a dictionary type that publicly guarantees the preservation of insertion order, instead of being an implementation detail, would be nice. Although to be honest if this guarantee came with a performance price, I might still prefer to use a normal dictionary instead of the suggested LinkedDictionary<K,V>, and rely on its undocumented behavior.

@alexzaitzev
Copy link
Author

@theodorzoulias LinkedDictionary may be implemented in a way that doesn't hit performance, just a small memory overhead for storing links.

@theodorzoulias
Copy link
Contributor

@alexzaitzev from the home page of Rock.Collections:

Performance overhead: Add/Remove 15-20% slower

That dictionary has double-linked entries. Even with forward-only links, a ~10% slowdown should be expected.

@alexzaitzev
Copy link
Author

@theodorzoulias it was mentioned for OrderedHashSet and it also says that it may be relevant for OrderedDictionary, so it would be really interesting to measure, because AFAIU he measured it in the simplest and the most inaccurate way OrderedHashSetPerformanceTest.cs

@svick
Copy link
Contributor

svick commented Feb 7, 2022

For the record, the discussion about generic OrderedDictionary is at #24826.

@theodorzoulias
Copy link
Contributor

@svick that generic OrderedDictionary suggestion has a Remove method with O(N) complexity, because of the requirement of keeping track of indexes. The suggestion of this issue (LinkedDictionary) is about a non-index-aware dictionary, with O(1) Remove. It's easy to envision a usage scenario where the LinkedDictionary would be a good fit, and the OrderedDictionary would be completely inappropriate.

@eiriktsarpalis
Copy link
Member

I recently had the need to implement an LRU cache and might have benefited from one of these. I think the issue with the cited .NET implementations in this discussion (inlcuding the non-generic OrderedDictionary) is that they don't implement access order, and therefore can't be used as LRU caches on their own (at least not without wrapping the dictionary operations with your own logic, somewhat defeating the benefits of using a specialized class).

That being said, I don't believe there is a strong need to ship a inbox solution for a few reasons:

  1. It is fairly straightforward to roll your own class using some Dictionary/LinkedList combination. There are tons of examples on how to do this online.
  2. Every caching application has its own requirements depending on its access patterns:
    • Does it need to be thread-safe? Most caches I'm aware of do.
    • Are the added allocations incurred by linked lists acceptable? Should order be represented using a List<T> instead (and therefore incur an O(n) performance penalty on every access?)
    • Does the LRU cache need to be perfect? Can we minimize the amount of work required by making a good guess on what the least recently used entry is? A "LinkedDictionary" implies serialized access order which can be expensive to update in concurrent applications.

@alexzaitzev
Copy link
Author

@eiriktsarpalis thanks for your input, but we shouldn't be tied to just LRU cache example. Of course LinkedDictionary may be used for building it, but more important thing is to guarantee an insertion order, which may be beneficial in many more cases in addition to LRU cache. I believe LinkedHashMap was added exactly for that, so I don't see any cons in adding this class as Java did.

About thread safety: ConcurrentLinkedDictionary class can be added if needed, but I'd start with a not thread safe.

About using a List instead of LinkedList: we are not discussing implementation details right now, but I'd be happy to discuss them if we decide to add LinkedDictionary to the library. As for now I'd say that List has too big performance overhead, but I can implement both and compare.

@eiriktsarpalis
Copy link
Member

thanks for your input, but we shouldn't be tied to just LRU cache example. Of course LinkedDictionary may be used for building it, but more important thing is to guarantee an insertion order, which may be beneficial in many more cases in addition to LRU cache.

Could you provide other use cases? Adding new collections is expensive, so it's unlikely we would add one for the sake of achieving parity with the Java APIs. We need to see data on use cases and there also needs to be substantial demand for the feature.

About using a List instead of LinkedList: we are not discussing implementation details right now

My point is that either choice comes with its own set of trade-offs that are important depending on the use case. I don't believe there can be a one-size-fits-all solution here. I could be wrong of course, if we do come up with one design that demonstrably suits 80% of use cases for .NET codebases then we might consider it for inclusion.

@alexzaitzev
Copy link
Author

Could you provide other use cases? Adding new collections is expensive, so it's unlikely we would add one for the sake of achieving parity with the Java APIs. We need to see data on use cases and there also needs to be substantial demand for the feature.

Sure. Actually all use case where you need to maintain an order. For example, key-value pair often uses for storing configs, so having insertion order during config population/manipulation super handy. For printing let's say.

Also it may be used in system development as a FIFO with a key access.

Moreover I've seen some questions about it on SOF, so looks like people have some more examples.

@eiriktsarpalis eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Feb 10, 2022
@eiriktsarpalis eiriktsarpalis added this to the Future milestone Feb 10, 2022
@eiriktsarpalis eiriktsarpalis added the wishlist Issue we would like to prioritize, but we can't commit we will get to it yet label Feb 10, 2022
@theodorzoulias
Copy link
Contributor

I think the issue with the cited .NET implementations in this discussion (including the non-generic OrderedDictionary) is that they don't implement access order, and therefore can't be used as LRU caches on their own (at least not without wrapping the dictionary operations with your own logic, somewhat defeating the benefits of using a specialized class).

@eiriktsarpalis such a dictionary would have a behavior deviating from most, if not all, collections in the standard libraries, that allow multiple concurrent readers without synchronization. For example, from the documentation of the Dictionary<K,V>:

A Dictionary<TKey,TValue> can support multiple readers concurrently, as long as the collection is not modified.

A dictionary that maintains access order would be modified by just reading it. This might be an argument for not introducing a collection with this behavior in the standard .NET libraries.

@eiriktsarpalis
Copy link
Member

@theodorzoulias agreed, which is why I don't think we have a very compelling use case to incorporate such a type in System.Collections.

@alexzaitzev
Copy link
Author

@theodorzoulias actually I proposed to maintain an insertion order - not an access one. Java's LinkedHashMap can switch between them, but in API I've posted there is no such an opportunity. In this case it's ok to put it in System.Collection, otherwise I agree that we need to put it in another place System.Collections.Specialized, I suppose.

@eiriktsarpalis what do you think?

@eiriktsarpalis
Copy link
Member

Like I said I don't believe we have compelling use cases to include this in System.Collections. I think this might be a better fit for a NuGet package.

@GSPP
Copy link

GSPP commented Feb 20, 2022

I have worked with a team that needed this in hundreds of places. What they actually did was to use normal Dictionary, unaware that it doesn't preserve order. It turns out, if you just add and have less than a certain number of items (~1000?) then Dictionary preserves order in practice. They were unaware of all of this. This is of course not an ideal strategy.

Accessing by index wasn't needed, but order of enumeration was needed. Ordering wasn't done by any kind of key but by insertion order. This proposal would fit nicely.

@alexrp
Copy link
Contributor

alexrp commented Jul 13, 2024

Where does this stand with #24826 done?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Collections wishlist Issue we would like to prioritize, but we can't commit we will get to it yet
Projects
None yet
Development

No branches or pull requests

7 participants