-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a pairing heap to System.Collections.Specialized #28708
Comments
I think you missed the definition for |
How are Update/Remove used? Do you have to remember 'node' from an earlier add or enumerate the nodes to find the one to update? Is it really meant to take a 'key' and not a 'value' parameter?
|
@Wraith2 Right, should be @TimLovellSmith Yes, you have to explicitly say which node you wish to modify. This is the only way to get O(log n) and to support a scenario in which there are duplicate keys with different values. When you start enumerating to find an element, you go to O(n), plus you cannot support duplicate keys (in case you would want to search by the key). As for why it takes a key — because a heap uses keys for ordering. |
@pgolebiowski I think different choices of what to call a key have these different background stories: POV a) 'key = priority' - priority is the most important thing, since its what the data structure works by - so that should be the key. Keys are not used to 'look up' anything however. Matches usage of word 'key' in literature discussing heaps etc. POV b) 'key = data' - priority is just an additional datum (value), secondary to the real data you care about processing in prioritized order. Keys are also how you 'find' elements in the data structure if you want to implement an update. Matches usage of word 'key' in dictionary keys - if you think of priority queue as an associative map from keys to their priorities. Maybe only I think that way. By the way I think you're not really correct in saying "This is the only way to get O(log n) and to support a scenario in which there are duplicate keys with different values." and what you're meaning is its the only way in which you can have multiple objects with same associated priorities. I think you're overlooking ways to achieve O(log n) in the other world when you claim that otherwise 'its O(n)'. Are you really overlooking, or just dismissing them as complex or expensive? It is actually easy and O(1), though unfortunately not completely free. E.g. a supplementary dictionary of keys to heap nodes. It could be redundant however if the keys (data objects, not priorities, so you might call them values) were themselves heap nodes... i.e. an 'intrusive' data structure. Would that be where you were thinking of going with IHeapNode? |
Key is a computer science term used when referring to maintaining order in heaps. It's used even in the first paragraphs on Wikipedia when describing heaps. And you cannot* rearrange a heap (update/remove in n-ary or pairing) in O(1), if that's what you're implying. *Imagine you have structs and duplicate everything. Plus doesn't feel elegant. Probably a better API for update and remove exists, but I don't have an idea yet. |
No there is no need to rearrange the heap. I adopt your key terminology here, and introduce a new term 'Id'. Suppose each element or 'value' in the heap has a unique Id. Then you can have a Dictionary<TId, PairingHeapNode> 'entries' inside the implementation of PairingHeap, and every time a value is added to the heap, you just add an entry: entries[item.Id] = newNode; Then it is easily possible to implement The advantage of this kind of API is that it works in terms of whatever keys are already part of the application domain, instead of requiring the application designer to think in terms of PairingHeapNodes. It potentially saves the application designer from having to maintain such a dictionary themself. Although admittedly that is not such hard work. It can just be in a wrapper class. The disadvantage was that it requires maintaining an 'entries' dictionary inside the PairingHeap. So that concludes the explanation... Actually the disadvantage is big enough that it can outweigh the usability advantages for performance-conscious use cases - especially since not everyone needs to use Update and Remove. So I think I am coming around to your proposal. |
I have a feeling it might be easier to implement the 'Merge' API if it returns a |
I think there should also be a specific DeleteMin oriented API call because it can be implemented more optimally. e.g. For an empty heap, should RemoveMin throw or return null? I am thinking 'throw'. I didn't see an obvious precedent to base that decision on, but I'm thinking the API looks more similar in spirit to 'First()', than ''FirstOrDefault'(). There's an additional fringe benefit - the name 'RemoveMin' happens to make the sense of the heap (min heap or max heap?) explicit in the API. You could rename 'Root' to 'Min' or 'MinElement' for similar effect... |
PS I came up with one doubt about using the paper to inform the selection of which heap implementation to use. the paper references, I have the idea they used C to implement their heap... so their evaluations probably don't let us predict impact on GC. |
Having watched several api reviews a question that comes up a lot is how the api is intended to be used, What patterns and sequences of calls you'd use. Any chance of some example usage to show the "right" way to use this? |
@pgolebiowski thanks for your proposal. I'm with @Wraith2 in order to have smooth api review, would you mind adding some example usage for this API? Also, regarding to |
@pgolebiowski is there any movement on this as it's been roughly a year. I would be very interested in this being a thing. |
Duplicate of #44871 |
About
Over four years ago, we started a conversation on GitHub about adding a priority queue to the framework. We didn't reach a consensus.
When we started diving deep, we learned that customer expectations varied so much that we were going back and forth and none solution to the problem appeared to be good enough to end up in the framework. I think we've been tackling a problem that was too big. Instead of focusing on providing "a priority queue" to the
.General
namespace, let's focus on providing "priority queue capabilities" to the.Specialized
namespace — a small and smart data structure that could be used to satisfy various customer expectations.Proposal
I believe that this would provide the right balance:
Important
I think that for every reasonable
X
in "but why notX
instead ofPairingHeap<TKey, TValue>
",X
could be implemented withPairingHeap<TKey, TValue>
.Before you comment
The text was updated successfully, but these errors were encountered: