Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a priority queue that supports priority updates #44871

Open
eiriktsarpalis opened this issue Nov 18, 2020 · 40 comments
Open

Add a priority queue that supports priority updates #44871

eiriktsarpalis opened this issue Nov 18, 2020 · 40 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Collections wishlist Issue we would like to prioritize, but we can't commit we will get to it yet
Milestone

Comments

@eiriktsarpalis
Copy link
Member

Forking the conversation from #14032, creating this issue is to discuss a potential design for a specialized heap implementation that supports O(log n) priority updates. We have two potential high-level designs under consideration:

  1. Perform priority updates using handles. See this proposal for a reference implemenation using pairing heaps.
  2. Perform priority updates by passing the enqueued element directly, up to equality. This would effectively prevent duplicate elements from being enqueued. See this prototype for reference.

See this post for a benchmark of two proposed implementations in priority update scenaria.

cc @pgolebiowski @TimLovellSmith

@eiriktsarpalis eiriktsarpalis added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Collections wishlist Issue we would like to prioritize, but we can't commit we will get to it yet labels Nov 18, 2020
@eiriktsarpalis eiriktsarpalis added this to the Future milestone Nov 18, 2020
@ghost
Copy link

ghost commented Nov 18, 2020

Tagging subscribers to this area: @eiriktsarpalis, @jeffhandley
See info in area-owners.md if you want to be subscribed.

Issue Details
Description:

Forking the conversation from #14032, creating this issue is to discuss a potential design for a specialized heap implementation that supports O(log n) priority updates. We have two potential high-level designs under consideration:

  1. Perform priority updates using handles. See this proposal for a reference implemenation using pairing heaps.
  2. Perform priority updates by passing the enqueued element directly, up to equality. This would effectively prevent duplicate elements from being enqueued. See this prototype for reference.

See this post for a benchmark of two proposed implementations in priority update scenaria.

cc @pgolebiowski @TimLovellSmith

Author: eiriktsarpalis
Assignees: -
Labels:

api-suggestion, area-System.Collections, wishlist

Milestone: [object Object]

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Nov 18, 2020
@eiriktsarpalis eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Nov 18, 2020
@Symbai
Copy link

Symbai commented Apr 30, 2021

A priority queue where you cannot change the priority of items makes only sense if you know how important something is at the time of adding it to the queue. This is a very specific use case and cuts off all other use cases where developers would love to have a collection of priority items. I.e. it won't work in all cases where you have an end user deciding the priority, for example a download queue or a task queue or file transfer queue.

I've upvoted but after months without a single comment, I'm just here to raise my word again for this API: Please give it a review, thank you ❤

@CharlesPublic
Copy link

I would actually be fine with an Update Method.

public void Update(TElement element, TPriority priority)

No need to store any reference to elements.
I can keep track of the reference to elements by myself.

Just something simple that pushes the updated element further up / down the tree and won't affect the performance of the other operations.

@astw
Copy link

astw commented Jan 20, 2022

I just want simple a java version of PriorityQueue class, no second priority value....
this class looks a little clumsy

@eiriktsarpalis
Copy link
Member Author

I would actually be fine with an Update Method.

How would the data structure know where to find element without scanning the contents of the entire heap?

@CharlesPublic
Copy link

CharlesPublic commented Jan 21, 2022

I would actually be fine with an Update Method.

How would the data structure know where to find element without scanning the contents of the entire heap?

On Enqueue return a class Node with (TElement, TPriority, int NodeIndex).
The priority queue could update the node.NodeIndex when rearranging any nodes.
(set node.NodeIndex to -1 on Dequeue())

Then you could simply pass the Node object to the Update priority Method.
(Throw an exception when node.Element at index doesn't match)

Maybe add another Enqueue(Node node) method.

@eiriktsarpalis
Copy link
Member Author

On Enqueue return a class Node with (TElement, TPriority, int NodeIndex).

Yes, that would be following the handles approach as outlined in the original post. Unfortunately such a method cannot be retrofitted in the existing PriorityQueue class, we would need a specialized heap implementation.

@stephentoub
Copy link
Member

stephentoub commented Jan 21, 2022

Unfortunately such a method cannot be retrofitted in the existing PriorityQueue class, we would need a specialized heap implementation.

FWIW, I think we could do it if we were willing to pay for it in additional complexity and a small overhead on every operation.

You're right we would need a different implementation, but both the current implementation and the additional updateable implementation could both be hidden behind the same facade, e.g.

private NonUpdateableImpl? _defaultImpl = new NonUpdateableImpl();
private UpdateableImpl? _updateableImpl;

Individual operations would switch based on which of these was in use, e.g.

public void Enqueue(TElement element, TPriority priority)
{
    if (_defaultImpl is NonUpdateableImpl impl)
    {
        impl.Enqueue(element, priority);
    }
    else
    {
        Debug.Assert(_updateableImpl is not null);
        _updateableImpl.Enqueue(element, priority);
    }
}

We'd add an additional Enqueue method that returned a Node object, which UpdateableImpl.Enqueue would return, and this method would upgrade from the non-updateable implementation to the updateable implementation the first time the new method is used, e.g.

public PriorityQueueNode EnqueueAsUpdateable(TElement element, TPriority priority)
{
    if (_defaultImpl is NonUpdateableImpl defaultImpl)
    {
        _updateableImpl = UpgradeToUpdateable(defaultImpl);
        _defaultImpl = null;
    }

    Debug.Assert(_updateableImpl is not null);
    return _updateableImpl.Enqueue(element, priority);
}

With that, existing usage not requiring updates only pays for an extra (likely well predicted) branch and method call on each operation as well as one additional allocation/field per queue, which would hopefully be only minimally measurable. And updates become pay-for-play: once you perform an enqueue using the new updateable entry point, the whole queue switches over to the new implementation, and the remainder of the usage of that class pays for the allocation per entry required to support updates well.

Seems feasible, though I'm probably missing some details.

@eiriktsarpalis
Copy link
Member Author

I think it's feasible, although we might want to expose "updateability" as a constructor parameter to avoid paying the cost of re-heapifying on the first call to Update.

@stephentoub
Copy link
Member

although we might want to expose "updateability" as a constructor parameter to avoid paying the cost of re-heapifying on the first call to Update

Potentially. Though I suspect (and could of course be wrong) that situations that require updateability are going to lead devs to always, on a given queue, use the EnqueueAsUpdateable method, in which case that initial re-heapifying cost would be minimal as it wouldn't contain any data.

@layomia
Copy link
Contributor

layomia commented Jan 21, 2022

Is size a valid consideration here? If the updateability feature is baked into the existing type, should it be designed such that size is pay-for-play? i.e. you only get code to support priority updates if you use the feature.

I'm not sure what the size costs would be, but just wondering if pay-for-play is a goal for our APIs in general wrt size. If it is, then it should be achievable using similar techniques as the STJ source-gen feature.

@stephentoub
Copy link
Member

Is size a valid consideration here?

I expect there's not enough code here to warrant an IL size concern. Unlike JsonSerializer, this also isn't rooting the world :-)

@sw47
Copy link

sw47 commented Feb 28, 2022

I just want simple a java version of PriorityQueue class, no second priority value

+1 to this - is it possible to have a Generic version of the current PriorityQueue to accepts just a single primitive type (i.e. int, string) and the data structure maintains ordering based on ASCII value? Would be good to support more methods similar to what PriorityQueue offers in Java

@ryepesg
Copy link

ryepesg commented Feb 28, 2022

+1 I think the PriorityQueue in the Java stdlib works wonderfully. Although it is a shame the language doesn't support order by pairs by default as in the rest of the languages, so you can quickly add pairs like (priority, object) and not have to care about implementing comparison interfaces

@astw
Copy link

astw commented Mar 1, 2022

The uniqueness of key in SortedSet or SortedList is really headache for me.
Lots of time if I want the functionalities of ProrityQueue, I have to customize the comparer by an extra unique value to handle the possible duplicate values.
I need to either

        int index = 0;
        var cmp = Comparer<int[]>.Create((a, b) =>
        {
            if (a[0] == b[0]) return a[1].CompareTo(b[1]);
            return a[0].CompareTo(b[0]);
        });
        var pq = new SortedSet<int[]>(cmp);
    ....    
       public void Add(int value)
       {
            pq.Add(new int[] { value, index++ }); 
       }
     

or some like below, but it might be troublesome if need to remove items.

        var list = new SortedList<int, int>(Comparer<int>.Create((int a, int b) =>
            {
                if (a == b)
                {
                    return 1;
                }
                return a.CompareTo(b);
            }));

I just don't know why we cannot have a very simple generic priority queue class as in java

@eiriktsarpalis
Copy link
Member Author

I just want simple a java version of PriorityQueue class, no second priority value

+1 to this - is it possible to have a Generic version of the current PriorityQueue to accepts just a single primitive type (i.e. int, string)

Out of curiosity, what would prevent you from doing something along the lines of

public class PriorityQueue<TElement> : PriorityQueue<TElement, TElement>
{
    public void Enqueue(TElement value) => Enqueue(element: value, priority: value);
}

Would be good to support more methods similar to what [PriorityQueue offers in Java]

Presumably methods like Remove and Contains? These are linear-time operations in the Java implementation, would that satisfactorily solve your use case?

@sw47
Copy link

sw47 commented Mar 1, 2022

@eiriktsarpalis - how your suggestion work when I need to remove from the PriorityQueue with Remove?

My use case is I want to to add duplicate values (offer in Java PQ) and remove them (remove). This is only with the int primitive type. I understand the Remove and Contains is linear in Java and it mentions that in the documentation.

@eiriktsarpalis
Copy link
Member Author

how your suggestion work when I need to remove from the PriorityQueue with Remove?

That suggestion specifically concerns your other feedback about not accepting a separate priority parameter. There is no workaround for removing arbitrary elements other than reheapifying.

@astw
Copy link

astw commented Mar 1, 2022 via email

@terrajobst
Copy link
Member

Before adding a new super specialized collection type I would demand that we have reasonable evidence that this is going to be used.

We shipped PriorityQueue<TElement, TPriority> in .NET 6 (two years ago) and based on apisof.net and grep.app usage is still extremely low.

@mattnewport
Copy link

mattnewport commented Oct 19, 2023

Before adding a new super specialized collection type I would demand that we have reasonable evidence that this is going to be used.

We shipped PriorityQueue<TElement, TPriority> in .NET 6 (two years ago) and based on apisof.net and grep.app usage is still extremely low.

Efficient implementations of Dijkstra's algorithm need the priority queue to support updating priorities.

I know I was watching this issue because I needed to implement Dijkstra's algorithm and found I couldn't use the .NET PriorityQueue because of it not supporting this.

Other fairly common algorithms like A* and Prim's algorithm also require priority updates for optimally efficient implementations.

I would guess that a non-trivial fraction of uses of priority queues are for implementing one of these algorithms that need priority updates for maximum efficiency. Since the current PriorityQueue doesn't support priority updates, people who might otherwise use it to implement such algorithms will have to look elsewhere. Some of the relatively low usage could be a result of that, though I'm not sure how you could easily find direct evidence for that.

@Symbai
Copy link

Symbai commented Oct 19, 2023

The way it was designed and approved makes it available in very specific situations only. This is why the usage is low. And the fact that its low should make it questionable why it was approved like that.

@eiriktsarpalis
Copy link
Member Author

See #14032 (comment) for a justification as to why the current design was selected. I believe these points still hold today, and would challenge the assertion that low usage is because of lack of priority update support -- there is no evidence to support a causal link.

@terrajobst
Copy link
Member

@mattnewport

Efficient implementations of Dijkstra's algorithm need the priority queue to support updating priorities.

I'm not questioning the usefulness of the API, I'm questioning how widely used it will be. Virtually all API suggestions for us pass the bar for usefulness, but we can't add them all. Considering how low use of the existing priority queue is, I'd rather we spend our resources on other things that benefit substantially more people, for example System.CommandLine.

@eiriktsarpalis
Copy link
Member Author

One of the common motivations cited for priority update support is that people aren't able to use C# in leetcode (since it's prohibitively difficult to implement Dijkstra). At first glance this might seem like a weird request, but I do think it's one of those X-factor things that contribute to the prestige and adoption of the language.

@eiriktsarpalis
Copy link
Member Author

FWIW folks emulating priority updates in the Java PriorityQueue class by piggybacking on its built-in remove method. Even though it is a linear-time operation it should satisfy most interviewers. Perhaps a compromise then might be to offer such an API in the existing PQ class as well:

public partial class PriorityQueue<TElement, TPriority>
{
    public bool TryRemove(TElement element, out TPriority? priority, IEqualityComparer<TElement>? equalityComparer = null);
}

Then defining an $\mathcal O(n)$ priority update method would be as simple as doing

public static void EnqueueOrUpdate<TElement, TPriority>(this PriorityQueue<TElement, TPriority> queue, TElement element, TPriority priority)
{
    queue.TryRemove(element, out _);
    queue.Enqueue(element, priority);
}

@mattnewport
Copy link

One of the common motivations cited for priority update support is that people aren't able to use C# in leetcode (since it's prohibitively difficult to implement Dijkstra). At first glance this might seem like a weird request, but I do think it's one of those X-factor things that contribute to the prestige and adoption of the language.

I think there is value in making a language a good choice for leetcode / technical interviewing. C# has quite a few nice features for those uses but falls down in its data structure and algorithm support somewhat compared to say C++. The place I actually ran into this was working through a Data Structures and Algorithms course on Coursera. C# worked well generally but this was one of a few examples of where it fell short compared to some other languages.

Some of these algorithms are actually useful in the real world as well - coming from a game development background A* is pretty commonly implemented for pathfinding.

@eiriktsarpalis
Copy link
Member Author

Some of these algorithms are actually useful in the real world as well - coming from a game development background A* is pretty commonly implemented for pathfinding.

I don't dispute that. It's why this issue is still open, but it's unlikely we'll prioritize it in the near term. It's an excellent opportunity for the NuGet ecosystem to innovate in this space as well.

@terrajobst
Copy link
Member

@mattnewport

Some of these algorithms are actually useful in the real world as well - coming from a game development background A* is pretty commonly implemented for pathfinding.

There are many data structures that are common and very useful. Binary trees, interval trees, DAGs etc.

But there is a huge difference in building a data structure that is great for a given application and generalizing it such that it's useful for a variety of different applications. And in some cases, there is a big incentive in privatizing them to your app such that you can tweak them and exploit their internals in order to optimize the algorithms you need in your app.

Of course, where a given data structure or algorithm lies is somewhat subjective and may also change over time. But based on my own experience working in managed code and talking to customers I just don't see compelling enough evidence that priority queues are worth the cost of having it in the core libraries. Investing in another priority-based data structure seems to be too expensive for the value it provides to the .NET ecosystem (and keep in mind that in order for us to work on that we'd have to decide something else we wouldn't work on, so there is also opportunity loss involved as well).

@ryepesg
Copy link

ryepesg commented Oct 21, 2023

@eiriktsarpalis @terrajobst What sets apart Priority Queues from other data structures is the fact that other popular programming languages have implementations in the standard library for them (you already mentioned Java, but it is also the case for Python, C++ and Go). One of the reasons why I teach Python as the introductory programming language instead of C# is because the students can easily implement the algorithms you find on any Algorithms textbook with the right algorithm complexity, even if some of the best books are written in Java for example.

@eiriktsarpalis
Copy link
Member Author

What sets apart Priority Queues from other data structures is the fact that other popular programming languages have implementations in the standard library for them (you already mentioned Java, but it is also the case for Python, C++ and Go).

Last time I checked the priority queue implementations in Java, Python and C++ use a similar implementation that also do not support priority updates. Is that not the case?

@mattnewport
Copy link

What sets apart Priority Queues from other data structures is the fact that other popular programming languages have implementations in the standard library for them (you already mentioned Java, but it is also the case for Python, C++ and Go).

Last time I checked the priority queue implementations in Java, Python and C++ use a similar implementation that also do not support priority updates. Is that not the case?

C++'s priority_queue is generally a wrapper around the heap functions from the algorithms library so while it doesn't support priority updates you can implement them yourself (in linear time) by working directly with the heap functions on a vector.

@eiriktsarpalis
Copy link
Member Author

you can implement them yourself (in linear time)

Java's Remove method achieves something similar, although I wouldn't consider this a real priority update. It would need to be $\mathcal O(\log n)$ for any Dijkstra-like implementation to scale well.

@rotanov
Copy link

rotanov commented Dec 19, 2023

+1 for Dijkstra. Every year I solve the https://adventofcode.com/ and while doing so I usually have to implement Dijkstra a couple of times. Though Dijkstra would benefit the most from Fibonacci heap implementation.

@mikeyp22
Copy link

+1 for Dijkstra. Every year I solve the https://adventofcode.com/ and while doing so I usually have to implement Dijkstra a couple of times. Though Dijkstra would benefit the most from Fibonacci heap implementation.

I am exactly facing this right now. Was looking how to update the priority for Dijkstra algorithm for AdventOfCode

@eiriktsarpalis
Copy link
Member Author

eiriktsarpalis commented Dec 25, 2023

.NET 9 will be including a PriorityQueue.Remove method which should unblock priority updates (albeit in linear time): #93994

@micampbell
Copy link

This makes sense to add a Remove method - even if it's O(n)because it has no impact on Enqueue/Dequeue performance, which is - right now - exceptional. So, the issue becomes: how much are you willing to sacrifice queuing performance for updating performance. @eiriktsarpalis solution to have an O(n) Remove has no impact on queuing performance. So that is a good compromise.

There are applications, though, where elements in the queue may need to be updated once or even twice for every Dequeue. In these cases, simultaneous updates to a Dictionary can be done during queuing to speed up subsequent updating/removing. I have made an implementation of this (UpdatablePriorityQueue), but it indeed has slower queuing performance. However, overall performance is faster in the larger process due to fast update/remove.

Are applications where the queue needs to be updatable common enough to include in System? I am unsure. Perhaps not.

@qbit86
Copy link

qbit86 commented Apr 5, 2024

I am exactly facing this right now. Was looking how to update the priority for Dijkstra algorithm for AdventOfCode
@mikeyp22

For the purposes of the Advent of Code, in the Dijkstra implementation it's just fine to use the standard PriorityQueue<> with no Update() and even no linear Remove(). You simply add the vertex with the new distance to the priority queue, ignoring the entry added before for that previously encountered vertex.

This worsens the space complexity of the algorithm from O(n) to O(n + m), and the logarithmic factor in the time complexity. However, usually the traversed graph is sparse with m ~ n (for example, the grid with m ≈ 4 n).


One of the common motivations cited for priority update support is that people aren't able to use C# in leetcode (since it's prohibitively difficult to implement Dijkstra).
@eiriktsarpalis

I think there is value in making a language a good choice for leetcode / technical interviewing. C# has quite a few nice features for those uses but falls down in its data structure and algorithm support somewhat compared to say C++. The place I actually ran into this was working through a Data Structures and Algorithms course on Coursera. C# worked well generally but this was one of a few examples of where it fell short compared to some other languages.
@mattnewport

Do you have any examples of LeetCode problems where the approach with standard PriorityQueue<> (no Update() and no linear Remove()) isn't sufficient?


  1. Perform priority updates using handles. See this proposal for a reference implemenation using pairing heaps.
  2. Perform priority updates by passing the enqueued element directly, up to equality. This would effectively prevent duplicate elements from being enqueued.

@eiriktsarpalis

I believe the most efficient heap-based data structure for the Dijkstra's algorithm is the indirect heap. This is what Boost.Graph uses. You update the distance map in Dijkstra as a separate associative array, and the priority queue encapsulates the mapping from the element to it's index in the array-backed heap. It should be updated when the external distance map is updated.

@julealgon
Copy link

julealgon commented May 10, 2024

I assume this proposal wouldn't cover "automatically" updating priorities based on some property or func over the item, would it?

For instance, imagine there was a priority property in the items being added to the queue, and every time a Dequeue would happen, the effect would be that the elements with lowest priority at time of Dequeue are dequeued. The current queue of course uses the priority at time of insertion to decide dequeuing order.

I could see it being feasible to implement this manually by, say... making that property an observable of some sort and then subscribing to value changes and manually updating the priority for that item in the queue (even if that might be inneficient), but was wondering if this scenario was discussed at any point.

The specific scenario I have would be to add Task instances into one such queue, and then dequeue Completed and successful" tasks (IsCompletedSuccessfully) before "Completed and faulted" ones (IsFaulted) and leave running ones for later (basically, following the order in the TaskStatus enum. When I initially add the Task to the queue, it could be in any state and it would complete after being added. Thus why the updates would need to be dynamic.

The ideal Enqueue signature for this would probably be:

public void Enqueue (TElement element, Func<TElement, TPriority> priorityAccessor);

And then somehow only accessing the priorityAccessor at Dequeue time.

@eiriktsarpalis
Copy link
Member Author

@julealgon no, that's pretty much out of scope for this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Collections wishlist Issue we would like to prioritize, but we can't commit we will get to it yet
Projects
None yet
Development

No branches or pull requests