Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a pairing heap to System.Collections.Specialized #28708

Closed
pgolebiowski opened this issue Feb 17, 2019 · 14 comments
Closed

Add a pairing heap to System.Collections.Specialized #28708

pgolebiowski opened this issue Feb 17, 2019 · 14 comments
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Collections
Milestone

Comments

@pgolebiowski
Copy link
Contributor

About

Over four years ago, we started a conversation on GitHub about adding a priority queue to the framework. We didn't reach a consensus.

When we started diving deep, we learned that customer expectations varied so much that we were going back and forth and none solution to the problem appeared to be good enough to end up in the framework. I think we've been tackling a problem that was too big. Instead of focusing on providing "a priority queue" to the .General namespace, let's focus on providing "priority queue capabilities" to the .Specialized namespace — a small and smart data structure that could be used to satisfy various customer expectations.

Proposal

I believe that this would provide the right balance:

public class PairingHeap<TKey, TValue> : IEnumerable<PairingHeapNode<TKey, TValue>>
{
    public PairingHeap(IComparer<TKey> comparer);

    public PairingHeapNode<TKey, TValue> Root { get; }
    public int Count { get; }

    public PairingHeapNode<TKey, TValue> Add(TKey key, TValue value);
    public void Update(PairingHeapNode<TKey, TValue> node, TKey key);
    public void Remove(PairingHeapNode<TKey, TValue> node);
    public void Merge(PairingHeap<TKey, TValue> otherHeap);
    
    public IEnumerator<PairingHeapNode<TKey, TValue>> GetEnumerator();
}

public class PairingHeapNode<TKey, TValue>
{
    public TKey Key { get; }
    public TValue Value { get; }
}

Important

I think that for every reasonable X in "but why not X instead of PairingHeap<TKey, TValue>", X could be implemented with PairingHeap<TKey, TValue>.

Before you comment

  1. Read this thread — so that you have the context and we don't repeat the same points over and over.
  2. At least skim through this paper.
@Wraith2
Copy link
Contributor

Wraith2 commented Feb 18, 2019

I think you missed the definition for IHeapNode

@TimLovellSmith
Copy link
Contributor

@pgolebiowski

How are Update/Remove used? Do you have to remember 'node' from an earlier add or enumerate the nodes to find the one to update? Is it really meant to take a 'key' and not a 'value' parameter?

public void Update(PairingHeapNode<TKey, TValue> node, TKey key);

@pgolebiowski
Copy link
Contributor Author

pgolebiowski commented Feb 19, 2019

@Wraith2 Right, should be PairingHeapNode, thanks!

@TimLovellSmith Yes, you have to explicitly say which node you wish to modify. This is the only way to get O(log n) and to support a scenario in which there are duplicate keys with different values. When you start enumerating to find an element, you go to O(n), plus you cannot support duplicate keys (in case you would want to search by the key).

As for why it takes a key — because a heap uses keys for ordering.

@TimLovellSmith
Copy link
Contributor

@pgolebiowski
Ah, I see what I was misunderstanding... the key is the priority? I had been thinking of the value as the priority. That explains why you also need duplicate keys. It is the opposite arrangement of key and value when I was imagining a PriorityDictionary structure.

I think different choices of what to call a key have these different background stories:

POV a) 'key = priority' - priority is the most important thing, since its what the data structure works by - so that should be the key. Keys are not used to 'look up' anything however. Matches usage of word 'key' in literature discussing heaps etc.

POV b) 'key = data' - priority is just an additional datum (value), secondary to the real data you care about processing in prioritized order. Keys are also how you 'find' elements in the data structure if you want to implement an update. Matches usage of word 'key' in dictionary keys - if you think of priority queue as an associative map from keys to their priorities. Maybe only I think that way.

By the way I think you're not really correct in saying "This is the only way to get O(log n) and to support a scenario in which there are duplicate keys with different values." and what you're meaning is its the only way in which you can have multiple objects with same associated priorities.

I think you're overlooking ways to achieve O(log n) in the other world when you claim that otherwise 'its O(n)'. Are you really overlooking, or just dismissing them as complex or expensive? It is actually easy and O(1), though unfortunately not completely free. E.g. a supplementary dictionary of keys to heap nodes.

It could be redundant however if the keys (data objects, not priorities, so you might call them values) were themselves heap nodes... i.e. an 'intrusive' data structure. Would that be where you were thinking of going with IHeapNode?

@pgolebiowski
Copy link
Contributor Author

pgolebiowski commented Feb 19, 2019

Key is a computer science term used when referring to maintaining order in heaps. It's used even in the first paragraphs on Wikipedia when describing heaps. And you cannot* rearrange a heap (update/remove in n-ary or pairing) in O(1), if that's what you're implying.

*Imagine you have structs and duplicate everything. Plus doesn't feel elegant. Probably a better API for update and remove exists, but I don't have an idea yet.

@TimLovellSmith
Copy link
Contributor

TimLovellSmith commented Feb 19, 2019

No there is no need to rearrange the heap. I adopt your key terminology here, and introduce a new term 'Id'.

Suppose each element or 'value' in the heap has a unique Id. Then you can have a Dictionary<TId, PairingHeapNode> 'entries' inside the implementation of PairingHeap, and every time a value is added to the heap, you just add an entry: entries[item.Id] = newNode;

Then it is easily possible to implement bool Remove(TId id); and 'void Update(TId id, TKey key) that would work in the same O(log n) time as void Remove(PairingHeapNode node) and `void Update(PairingHeapNode node, TKey key)'.

The advantage of this kind of API is that it works in terms of whatever keys are already part of the application domain, instead of requiring the application designer to think in terms of PairingHeapNodes. It potentially saves the application designer from having to maintain such a dictionary themself. Although admittedly that is not such hard work. It can just be in a wrapper class.

The disadvantage was that it requires maintaining an 'entries' dictionary inside the PairingHeap.

So that concludes the explanation...

Actually the disadvantage is big enough that it can outweigh the usability advantages for performance-conscious use cases - especially since not everyone needs to use Update and Remove. So I think I am coming around to your proposal.

@TimLovellSmith
Copy link
Contributor

I have a feeling it might be easier to implement the 'Merge' API if it returns a PairingHeapNode<> instead of void. As the implemention would change one node/tree to be child of the other, but its not obvious in advance which.

@TimLovellSmith
Copy link
Contributor

TimLovellSmith commented Feb 19, 2019

I think there should also be a specific DeleteMin oriented API call because it can be implemented more optimally. e.g. public PairingHeapNode<TKey, TValue> node RemoveMin();..

For an empty heap, should RemoveMin throw or return null? I am thinking 'throw'. I didn't see an obvious precedent to base that decision on, but I'm thinking the API looks more similar in spirit to 'First()', than ''FirstOrDefault'().

There's an additional fringe benefit - the name 'RemoveMin' happens to make the sense of the heap (min heap or max heap?) explicit in the API. You could rename 'Root' to 'Min' or 'MinElement' for similar effect...

@TimLovellSmith
Copy link
Contributor

PS I came up with one doubt about using the paper to inform the selection of which heap implementation to use. the paper references, I have the idea they used C to implement their heap... so their evaluations probably don't let us predict impact on GC.

@Wraith2
Copy link
Contributor

Wraith2 commented Feb 20, 2019

Having watched several api reviews a question that comes up a lot is how the api is intended to be used, What patterns and sequences of calls you'd use. Any chance of some example usage to show the "right" way to use this?

@safern
Copy link
Member

safern commented May 23, 2019

@pgolebiowski thanks for your proposal. I'm with @Wraith2 in order to have smooth api review, would you mind adding some example usage for this API?

Also, regarding to PairingHeapNode<TKey, TValue> -- I know the intent of adding a new class for this, to tight it with its consumer class, but why not use KeyValuePair<TKey, TValue> which is already part of the framework? It has the same shape as PairingHeapNode<TKey, TValue>.

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the 5.0 milestone Feb 1, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@Jlalond
Copy link
Contributor

Jlalond commented Apr 3, 2020

@pgolebiowski is there any movement on this as it's been roughly a year. I would be very interested in this being a thing.

@safern
Copy link
Member

safern commented Apr 3, 2020

cc: @layomia @eiriktsarpalis

@eiriktsarpalis eiriktsarpalis modified the milestones: 5.0.0, Future Jun 24, 2020
@eiriktsarpalis eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Jun 24, 2020
@eiriktsarpalis
Copy link
Member

Duplicate of #44871

@eiriktsarpalis eiriktsarpalis marked this as a duplicate of #44871 Oct 26, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Nov 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Collections
Projects
None yet
Development

No branches or pull requests

8 participants