Add a pairing heap to System.Collections.Specialized #28708

pgolebiowski · 2019-02-17T14:30:19Z

About

Over four years ago, we started a conversation on GitHub about adding a priority queue to the framework. We didn't reach a consensus.

When we started diving deep, we learned that customer expectations varied so much that we were going back and forth and none solution to the problem appeared to be good enough to end up in the framework. I think we've been tackling a problem that was too big. Instead of focusing on providing "a priority queue" to the .General namespace, let's focus on providing "priority queue capabilities" to the .Specialized namespace — a small and smart data structure that could be used to satisfy various customer expectations.

Proposal

I believe that this would provide the right balance:

public class PairingHeap<TKey, TValue> : IEnumerable<PairingHeapNode<TKey, TValue>>
{
    public PairingHeap(IComparer<TKey> comparer);

    public PairingHeapNode<TKey, TValue> Root { get; }
    public int Count { get; }

    public PairingHeapNode<TKey, TValue> Add(TKey key, TValue value);
    public void Update(PairingHeapNode<TKey, TValue> node, TKey key);
    public void Remove(PairingHeapNode<TKey, TValue> node);
    public void Merge(PairingHeap<TKey, TValue> otherHeap);
    
    public IEnumerator<PairingHeapNode<TKey, TValue>> GetEnumerator();
}

public class PairingHeapNode<TKey, TValue>
{
    public TKey Key { get; }
    public TValue Value { get; }
}

Important

I think that for every reasonable X in "but why not X instead of PairingHeap<TKey, TValue>", X could be implemented with PairingHeap<TKey, TValue>.

Before you comment

Read this thread — so that you have the context and we don't repeat the same points over and over.
At least skim through this paper.

The text was updated successfully, but these errors were encountered:

Wraith2 · 2019-02-18T00:35:49Z

I think you missed the definition for IHeapNode

TimLovellSmith · 2019-02-18T23:20:58Z

@pgolebiowski

How are Update/Remove used? Do you have to remember 'node' from an earlier add or enumerate the nodes to find the one to update? Is it really meant to take a 'key' and not a 'value' parameter?

public void Update(PairingHeapNode<TKey, TValue> node, TKey key);

pgolebiowski · 2019-02-19T01:32:22Z

@Wraith2 Right, should be PairingHeapNode, thanks!

@TimLovellSmith Yes, you have to explicitly say which node you wish to modify. This is the only way to get O(log n) and to support a scenario in which there are duplicate keys with different values. When you start enumerating to find an element, you go to O(n), plus you cannot support duplicate keys (in case you would want to search by the key).

As for why it takes a key — because a heap uses keys for ordering.

TimLovellSmith · 2019-02-19T04:55:51Z

@pgolebiowski
Ah, I see what I was misunderstanding... the key is the priority? I had been thinking of the value as the priority. That explains why you also need duplicate keys. It is the opposite arrangement of key and value when I was imagining a PriorityDictionary structure.

I think different choices of what to call a key have these different background stories:

POV a) 'key = priority' - priority is the most important thing, since its what the data structure works by - so that should be the key. Keys are not used to 'look up' anything however. Matches usage of word 'key' in literature discussing heaps etc.

POV b) 'key = data' - priority is just an additional datum (value), secondary to the real data you care about processing in prioritized order. Keys are also how you 'find' elements in the data structure if you want to implement an update. Matches usage of word 'key' in dictionary keys - if you think of priority queue as an associative map from keys to their priorities. Maybe only I think that way.

By the way I think you're not really correct in saying "This is the only way to get O(log n) and to support a scenario in which there are duplicate keys with different values." and what you're meaning is its the only way in which you can have multiple objects with same associated priorities.

I think you're overlooking ways to achieve O(log n) in the other world when you claim that otherwise 'its O(n)'. Are you really overlooking, or just dismissing them as complex or expensive? It is actually easy and O(1), though unfortunately not completely free. E.g. a supplementary dictionary of keys to heap nodes.

It could be redundant however if the keys (data objects, not priorities, so you might call them values) were themselves heap nodes... i.e. an 'intrusive' data structure. Would that be where you were thinking of going with IHeapNode?

pgolebiowski · 2019-02-19T08:29:58Z

Key is a computer science term used when referring to maintaining order in heaps. It's used even in the first paragraphs on Wikipedia when describing heaps. And you cannot* rearrange a heap (update/remove in n-ary or pairing) in O(1), if that's what you're implying.

*Imagine you have structs and duplicate everything. Plus doesn't feel elegant. Probably a better API for update and remove exists, but I don't have an idea yet.

TimLovellSmith · 2019-02-19T18:18:06Z

No there is no need to rearrange the heap. I adopt your key terminology here, and introduce a new term 'Id'.

Suppose each element or 'value' in the heap has a unique Id. Then you can have a Dictionary<TId, PairingHeapNode> 'entries' inside the implementation of PairingHeap, and every time a value is added to the heap, you just add an entry: entries[item.Id] = newNode;

Then it is easily possible to implement bool Remove(TId id); and 'void Update(TId id, TKey key) that would work in the same O(log n) time as void Remove(PairingHeapNode node) and `void Update(PairingHeapNode node, TKey key)'.

The advantage of this kind of API is that it works in terms of whatever keys are already part of the application domain, instead of requiring the application designer to think in terms of PairingHeapNodes. It potentially saves the application designer from having to maintain such a dictionary themself. Although admittedly that is not such hard work. It can just be in a wrapper class.

The disadvantage was that it requires maintaining an 'entries' dictionary inside the PairingHeap.

So that concludes the explanation...

Actually the disadvantage is big enough that it can outweigh the usability advantages for performance-conscious use cases - especially since not everyone needs to use Update and Remove. So I think I am coming around to your proposal.

TimLovellSmith · 2019-02-19T18:21:46Z

I have a feeling it might be easier to implement the 'Merge' API if it returns a PairingHeapNode<> instead of void. As the implemention would change one node/tree to be child of the other, but its not obvious in advance which.

TimLovellSmith · 2019-02-19T18:31:54Z

I think there should also be a specific DeleteMin oriented API call because it can be implemented more optimally. e.g. public PairingHeapNode<TKey, TValue> node RemoveMin();..

For an empty heap, should RemoveMin throw or return null? I am thinking 'throw'. I didn't see an obvious precedent to base that decision on, but I'm thinking the API looks more similar in spirit to 'First()', than ''FirstOrDefault'().

There's an additional fringe benefit - the name 'RemoveMin' happens to make the sense of the heap (min heap or max heap?) explicit in the API. You could rename 'Root' to 'Min' or 'MinElement' for similar effect...

TimLovellSmith · 2019-02-19T23:21:34Z

PS I came up with one doubt about using the paper to inform the selection of which heap implementation to use. the paper references, I have the idea they used C to implement their heap... so their evaluations probably don't let us predict impact on GC.

Wraith2 · 2019-02-20T01:41:15Z

Having watched several api reviews a question that comes up a lot is how the api is intended to be used, What patterns and sequences of calls you'd use. Any chance of some example usage to show the "right" way to use this?

safern · 2019-05-23T22:45:36Z

@pgolebiowski thanks for your proposal. I'm with @Wraith2 in order to have smooth api review, would you mind adding some example usage for this API?

Also, regarding to PairingHeapNode<TKey, TValue> -- I know the intent of adding a new class for this, to tight it with its consumer class, but why not use KeyValuePair<TKey, TValue> which is already part of the framework? It has the same shape as PairingHeapNode<TKey, TValue>.

Jlalond · 2020-04-03T17:25:10Z

@pgolebiowski is there any movement on this as it's been roughly a year. I would be very interested in this being a thing.

safern · 2020-04-03T17:26:25Z

cc: @layomia @eiriktsarpalis

eiriktsarpalis · 2021-10-26T16:49:24Z

Duplicate of #44871

msftgits transferred this issue from dotnet/corefx Feb 1, 2020

msftgits added this to the 5.0 milestone Feb 1, 2020

maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020

eiriktsarpalis modified the milestones: 5.0.0, Future Jun 24, 2020

eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Jun 24, 2020

eiriktsarpalis marked this as a duplicate of #44871 Oct 26, 2021

eiriktsarpalis closed this as completed Oct 26, 2021

ghost locked as resolved and limited conversation to collaborators Nov 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a pairing heap to System.Collections.Specialized #28708

Add a pairing heap to System.Collections.Specialized #28708

pgolebiowski commented Feb 17, 2019

Wraith2 commented Feb 18, 2019

TimLovellSmith commented Feb 18, 2019

pgolebiowski commented Feb 19, 2019 •

edited

Loading

TimLovellSmith commented Feb 19, 2019

pgolebiowski commented Feb 19, 2019 •

edited

Loading

TimLovellSmith commented Feb 19, 2019 •

edited

Loading

TimLovellSmith commented Feb 19, 2019

TimLovellSmith commented Feb 19, 2019 •

edited

Loading

TimLovellSmith commented Feb 19, 2019

Wraith2 commented Feb 20, 2019

safern commented May 23, 2019

Jlalond commented Apr 3, 2020

safern commented Apr 3, 2020

eiriktsarpalis commented Oct 26, 2021

Add a pairing heap to System.Collections.Specialized #28708

Add a pairing heap to System.Collections.Specialized #28708

Comments

pgolebiowski commented Feb 17, 2019

About

Proposal

Important

Before you comment

Wraith2 commented Feb 18, 2019

TimLovellSmith commented Feb 18, 2019

pgolebiowski commented Feb 19, 2019 • edited Loading

TimLovellSmith commented Feb 19, 2019

pgolebiowski commented Feb 19, 2019 • edited Loading

TimLovellSmith commented Feb 19, 2019 • edited Loading

TimLovellSmith commented Feb 19, 2019

TimLovellSmith commented Feb 19, 2019 • edited Loading

TimLovellSmith commented Feb 19, 2019

Wraith2 commented Feb 20, 2019

safern commented May 23, 2019

Jlalond commented Apr 3, 2020

safern commented Apr 3, 2020

eiriktsarpalis commented Oct 26, 2021

pgolebiowski commented Feb 19, 2019 •

edited

Loading

pgolebiowski commented Feb 19, 2019 •

edited

Loading

TimLovellSmith commented Feb 19, 2019 •

edited

Loading

TimLovellSmith commented Feb 19, 2019 •

edited

Loading