-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Review] Data Structure API #6812
Comments
Updated chpldocs with additional methods ( |
|
Growing and shrinking of a distributed bounded queue is not easy without incurring severe performance penalties due to needed synchronization. A bounded queue is to be implemented on top of a distributed array, of which does not support concurrent growing or shrinking. If heavily requested, a I would like to note that this can be done without using the queue API by creating a new queue of the requested size and appending the old into the new... var oldQueue = makeBoundedFIFO(100);
for i in 1 .. 100 do oldQueue += i;
var newQueue = makeBoundedFIFO(200);
newQueue += oldQueue;
for i in 101 .. 200 do newQueue += i;
Would there be a reason to not to set the bounds to 1 instead?
I'm against adding var coreWorkQueue = makeBalancedQueue(int);
forall i in 1 .. 100 do coreWorkQueue += i;
var splitWork = coreWorkQueue.dequeue(10);
var splitWorkQueue = makeBoundedFIFO(int);
splitWorkQueue += splitWork; Where Edit: Fixed first example to |
This brings up a separate question: Should we even have an interface? If each queue was implemented independent of an interface, it would raise any and all restrictions. Maybe an unbounded queue should not have a There are many ways to proceed with this. Edit: To expand on this more, perhaps we could do what Java does, and have hierarchies for this...
Edit2: It would be a lot less complicated than this, but it encapsulates the idea. |
The question is not how it would perform but whether it is a useful functionality. Although you can do that without an API support, it might be nice to have it in the API.
Simply because it isn't supposed to be 1 :) It just feels hacky to me. |
I'm in favor of utilizing an inheritance hierarchy for the queue set up. For the enqueue method, would it return false if some elements have been added but not all? Maybe be a little more explicit on that. For freeze/unfreeze - will we guarantee that all operations initiated prior to the freeze/unfreeze will complete before the state changes? Or is it possible to mess up an operation part way through (say, by freezing the midst of a long enqueue, for instance). |
That's one of the things I've been debating about. The transactional additions was referring to
Another thing that I've also been debating (also under the Open Questions), however I am in favor of making it implementation-specific. If we went ahead with inheritance hierarchy, the specifics for each queue can be noted in the overridden method. |
Soon (tomorrow), I will make a change to the API to add some inheritance and I'll announce when I finish (still here to answer questions). |
I have revised the entire API to include hierarchy now. Note that quite a few methods are empty (as it is more of an interface). As chpldocs only shows methods that are overriden, there's nothing else I can do about that yet (I can fill it in more if people like the way it is going right now). |
Two ideas that I had while reading the updated documents:
|
I find this agreeable. An array is definitely the least expensive way to return bulk data, and
Sure, that's agreeable. Originally I removed them so that a Overall, I agree to adding them back. |
I'm not sure that sentence makes sense. I mean, I would expect a List to have more "ordering" than a stack or a queue - in particular you can access the middle elements. Maybe you mean something different? Or maybe it shouldn't be called "List" ? (Also we already have something called a List in Chapel that is different from what you are building). I'm generally happy with the outline of the API. |
In the API, I relaxed the ordering for Looking at this again, in Java a List is defined as "an ordered collection (also known as a sequence)", so perhaps List should be indexable by definition and I'll need another category to stick a load-balancing non-indexable non-ordered (but high-performance, I.E seeing over 250x performance over a normal, albeit naive, synchronized list implementation at 64 nodes) data structure. I suppose technically I can have it be a |
Does related work have a name for a "list" that returns an arbitrary element? Would "bag" be a better term? For example: shows "List" is ordered. And that "bag" is a "multi-set" which isn't far off but might convey the wrong idea. Maybe we just need to invent a term for a totally unordered collection? Just brainstorming here:
Or, what about just calling it a "collection" and not trying to have a name for it? I mean, don't you have this terminology right now:
So couldn't it just be:
? If it being unordered is important for its use, should we be looking for a two-word term, like "UnorderedList" or "UnorderedCollection" ? I still probably like "bag" or "jumble" the best of all of these. |
"hoard" and "mound" don't intuitively spark enough understanding of what that structure is to me (maybe because I am not a native speaker) I actually like the sound of "Collection" the most for the purpose. But I am not sure if it has any implications on the software design aspect. Two-word terms also sound good to me as they are descriptive. I can live with "bag". |
Interesting... Yeah,
In terms of the Also I won't hold any illusion that I can finish everything here by the end of the month, but my goal is to get at least 1 in each category. |
One more thing as well, I need to establish what would be 'acceptable' for performance. In the distributed It's implementation would require employment of the H-Synch algorithm where in summary, we maintain a global queue-lock (MCS) [Requires GlobalAtomicObject] which a node needs to acquire before it may touch the data structure, and a CC-Synch lock (CCSynchLock) which a task needs to acquire. The performance would be near-equivalent to 1 node which will hopefully be non-degrading. It does require more effort on the user's behalf though. @mppf Since you have a much firmer grasp on what is and is not acceptable, would a list with non-degrading performance be desired? As well, if this is desired (and if it is achieved), its possible to reuse the pattern/abstraction in other applications. |
Due to time constraints, I'm trimming down the Collections API a lot more. Not enough time to update the hosted chpldocs so I'll show it below in text... class Collection {
type eltType;
/*
Adds an element to this data structure.
*/
proc add(elt : eltType) : bool;
/*
Removes an arbitrary element from this data structure.
BUG: Compiler will segfault if the returned value is not captured at callsite.
'c.remove();' == segfault
BUG: Compiler with LICM will cause undefined behavior if the return value is captured
in a variable declare out of a loop.
'var captured : (bool, eltType); while !captures[1] do captured = remove()' == undefined behavior
*/
proc remove() : (bool, eltType);
// Checks to see if the data structure contains an item. Note we do not support the
// 'removeItem' equivalent by default due to it causing a ton of headache in implementing
// my ordered data structure.
proc contains(elt : eltType) : bool;
// By default calls repeatedly calls 'remove()' until empty if not overloaded. May be
// problematic with issue #7003
proc clear();
// By default checks if 'size() == 0', can be overloaded
inline proc isEmpty() : bool {
halt("'proc isEmpty() : bool' is not supported...");
}
// Returns number of elements.
proc size() : int;
// By default, they must implement serial iteration; I would put Parallel iteration here, but it is
// not possible as the leader iterator's yielded chunk can vary in type. Also its not the easiest
// to debug either (I.E: #6998)
iter these() : eltType;
// Freezes the collection, rendering it immutable.
proc freeze() : bool;
// Unfreezes the collection, rendering it mutable.
proc unfreeze() : bool;
// If this collection can be frozen.
proc canFreeze() : bool;
// If this collection is currently frozen.
proc isFrozen() : bool;
} Note that there isn't any 'Stack', 'List', or 'Queue' derivative. If they are needed, they can be added later. I plan to submit two scalable data structures (used to be 5, but 3 of which I took the best of and merged into one, and another discarded entirely). Now, they will directly inherit from 'Collection', this should avoid most of other issues with dynamic dispatch as well and make my job a lot easier. I definitely promised more than I originally could deliver, apologies for this sudden change. |
Now that the PR has been posted, I have another revision I kind of want to make. That would be potentially removing 'freeze' logic from the API entirely. Turns out its a lot more tedious and harder to implement than originally expected. Right now, all data structures allow for iteration during concurrent access, so 'freeze' currently merely elides the need to acquire any locks. Worst of all, the |
I don't have any objection to removing the "freeze" logic. Such logic complicates the API and is only really useful if it offers some performance advantage, in my opinion. |
Cool! I'll assume most other people would feel the same way. As well, since the 'resource leakage on breaking out of iterators' is a confirmed bug, I can probably make a warning that the user cannot break out of serial iterators early without consequence. |
[GSoC] Parallel-Safe Collections Module and Distributed Data Structures [PR by @LouisJenkinsCS, reviewed by @e-kayrakli and @mppf; tested by @mppf] # Collections I present a parallel-safe distributed data structures framework, similar to Java's 'Collection' framework. Original discussion can be seen [here](#6812). Original repository containing benchmarks can be found [here](https://github.com/LouisJenkinsCS/Distributed-Data-Structures). Chpldocs documentation can be seen [here](https://louisjenkinscs.github.io/Distributed-Data-Structures/). **Note:** This is the official PR for the GSoC program, but I do realize it may seem a bit like a rough-draft. Any feedback or constructive criticism would be appreciated it. ## Deque Provides a strict ordering without sacrificing too much performance. Supports insertion and removal from both ends, allowing FIFO, LIFO, and a Total ordering, which is preserved across all nodes in a cluster, and employs a wait-free round-robin approach to load distribution that ensures fairness in memory, bandwidth, and computation. **Disclaimer:** The deque provided, while scalable, rely heavily on network atomics. The benchmark results are produced using said network atomic operations. ## Bag With performance that scales both in the number of nodes in a cluster and the number of cores per node, we offer a multiset implementation, called a 'Bag', which is a medium that allows storing and retrieving data in any arbitrary order. This type of data structure is ideal for work queues as it employs it's own load balancing, and offers unparalleled performance. **Disclaimer:** A node can request a 'privatized' copy, which retrieves a clone that is allocated on the node requesting it, reducing any excess communication. Usage of `getPrivatizedInstance()` is highly advised for performance-critical sections. ### Performance We compare our data structures to a naive synchronized list implementation as that is all that is available in the Chapel standard library. In all cases, the data structures scale and outperform the naive implementation. #### Insert Implementation | Performance over Naive (at 64 nodes) -------------- | :-----------: SynchronizedList | 1x DistributedDeque | 63x DistributedBag | 403x ![](https://raw.githubusercontent.com/LouisJenkinsCS/Distributed-Data-Structures/master/Results/Collections_Add.png) #### Remove Implementation | Performance over Naive (at 64 nodes) -------------- | :-----------: SynchronizedList | 1x DistributedDeque | 123x DistributedBag | 651x ![](https://raw.githubusercontent.com/LouisJenkinsCS/Distributed-Data-Structures/master/Results/Collections_Remove.png) ##### Testing Passes new test directory with: * quickstart * quickstart + gasnet * qthreads * qthreads + gasnet #### Related Work / Inspiration ##### Bag ###### Segmenting per Core & Multiple Passes for 'Best', 'Average', and 'Worst' Cases This is based heavily on my own work, [Concurrent Map for Go](https://louisjenkinscs.github.io/publications/A_Concurrent_Map_for_Go.pdf) (although the name has changed since published at PACT 2017, the current version is relevant enough). It borrows from the convoy-avoidance algorithm where we skip over any locked buckets. We do the same in the Bag for locked segmented, looking for ideal conditions first. Segmentation also directly relates due to the fact that the map was also a segmented 'fixed-depth' tree-like structure. ##### Deque Compared to when this was originally a `Queue`, the `Deque` has taken on an entirely new design based on the old, and so a lot of the older stuff no longer applies. ###### Monotonic Atomic Increasing Counters for Wait-Free Ordering. Can be observed somewhat in the queue [FFQ](http://se.inf.tu-dresden.de/pubs/papers/ffq2017.pdf), where enqueue and dequeue operations find their position based on said counters. Compared to my own work, instead of having 1-dimensional cells, we have each cell maintain another ordered data structure. ###### Segmenting the Deque per-core This borrowed from how Bag worked, which of course borrowed from my publication.
Data Structure API
Looking for feedback and review on the API for Data Structures, chpldocs hosted here. Below are questions that help determine the overall direction of further development of the queue.
Edit: The chpldocs link is now updated. Old docs are here.
Edit2: I have removed the
QueueFactory
entirely for now.Edit3: Generalizing for any kind of data structures.
## The QueueFactoryEmulates Go's 'make'.### Open Questions* Should we maintain an interface at all?* If not, then should we make each data structure arecord
instead?* Resilience to Change vs FlexibilityProposed Additions
'Freezing' Data Structures
While parallel-safe data structures are desired, not all things can be done in a
parallel-safe manner. For these situations, I believe the data structures should be armed with two
states: Mutable (unfrozen) and Immutable (frozen). The benefit is two-fold, as it
enables optimizations such as adopting a more optimistic manner of accessing data
by not contesting for locks meaning more performance, and the user has the benefit
of enforcing immutability across nodes without need for external synchronization.
I believe that this is the only 'safe' way to allow things like reduction, zipper,
and mapping iterations without modifying the queue while providing a significant performance boost.
Open Questions
Iteration
Iteration allows for the data structures to become more than just a simple containers, making things
such as reductions, zipper-iterables, and mappings possible.
Open Questions
Transactional Additions
Adding in bulk 'all-or-nothing' transactional approach.
Open Questions
Operator Overloading
While some operators can be helpful, others can be rendered irrelevant. Syntactic
sugar operators such as
+=
can help produce more readable code and are easyto overload. Perhaps the
=
operator can be overloaded to perform a 'clear-and-add'kind of operation? What say you?
Edit: Removed the 'literal' part of example.
The text was updated successfully, but these errors were encountered: