[Review] Data Structure API #6812

LouisJenkinsCS · 2017-07-24T15:30:48Z

Data Structure API

Looking for feedback and review on the API for Data Structures, chpldocs hosted here. Below are questions that help determine the overall direction of further development of the queue.

Edit: The chpldocs link is now updated. Old docs are here.

Edit2: I have removed the QueueFactory entirely for now.

Edit3: Generalizing for any kind of data structures.

~~## The QueueFactory~~

~~Emulates Go's 'make'.~~

~~### Open Questions~~

* Should we maintain an interface at all?
* If not, then should we make each data structure a record instead?
* Resilience to Change vs Flexibility

Proposed Additions

'Freezing' Data Structures

While parallel-safe data structures are desired, not all things can be done in a
parallel-safe manner. For these situations, I believe the data structures should be armed with two
states: Mutable (unfrozen) and Immutable (frozen). The benefit is two-fold, as it
enables optimizations such as adopting a more optimistic manner of accessing data
by not contesting for locks meaning more performance, and the user has the benefit
of enforcing immutability across nodes without need for external synchronization.
I believe that this is the only 'safe' way to allow things like reduction, zipper,
and mapping iterations without modifying the queue while providing a significant performance boost.

Open Questions

Are the semantic changes of having dual-mode operations too confusing?
What happens to concurrent ongoing operations while freezing/unfreezing?
- Should we halt? Block until unfrozen? Make it 'implementation-defined' behavior?

Iteration

Iteration allows for the data structures to become more than just a simple containers, making things
such as reductions, zipper-iterables, and mappings possible.

Open Questions

Should we allow read-only iteration or 'draining' iteration, perhaps both?
- Proposal: Based on if the data structure is 'frozen' or 'unfrozen'.

Transactional Additions

Adding in bulk 'all-or-nothing' transactional approach.

Open Questions

Should we allow adding non-scalar types? From other collections?

Operator Overloading

While some operators can be helpful, others can be rendered irrelevant. Syntactic
sugar operators such as += can help produce more readable code and are easy
to overload. Perhaps the = operator can be overloaded to perform a 'clear-and-add'
kind of operation? What say you?

var queue = makeBoundedFIFO(int, 100);
queue = (1,2,3,4,5);
for i in 6 .. 100 do queue += i;

Edit: Removed the 'literal' part of example.

The text was updated successfully, but these errors were encountered:

LouisJenkinsCS · 2017-07-24T21:38:45Z

Updated chpldocs with additional methods (clear(), isEmpty(), isFull(), length)

e-kayrakli · 2017-07-25T17:57:53Z

One thing that worth thinking about would be allowing grow/shrink on bounded queues. User may choose to do that in some cases, regardless of its cost. On a similar note, 0-sized bounded queues can also be allowed. This would let patterns where users initialize a 0-sized queue and pass it to some initialization function.
Another thing (that we briefly discussed with Louis) is a need for a split method. This came up when Louis was using unbalanced tree search as a benchmark. See for example: https://github.com/chapel-lang/chapel/blob/master/test/studies/uts/uts_deq.chpl#L220. Here the split method is used by the user to balance the load across tasks manually. Louis' current implementation balances the load in the sense that dequeues can be satisfied from remote locales if the local storage is depleted, therefore the need for balancing may not be that much, but still a user-controllable way of doing that might be nice. Also, split can be done by a bulk dequeue followed by a bulk enqueue to a new queue. So the implementation of such a method can be just a wrapper around those.

LouisJenkinsCS · 2017-07-25T18:27:03Z

One thing that worth thinking about would be allowing grow/shrink on bounded queues

Growing and shrinking of a distributed bounded queue is not easy without incurring severe performance penalties due to needed synchronization. A bounded queue is to be implemented on top of a distributed array, of which does not support concurrent growing or shrinking. If heavily requested, a
distributed dynamically resizing bounded queue can be created, separate from the normal distributed bounded queue, which can be implemented in a way that allows this.

I would like to note that this can be done without using the queue API by creating a new queue of the requested size and appending the old into the new...

var oldQueue = makeBoundedFIFO(100);
for i in 1 .. 100 do oldQueue += i;
var newQueue = makeBoundedFIFO(200);
newQueue += oldQueue;
for i in 101 .. 200 do newQueue += i;

users initialize a 0-sized queue and pass it to some initialization function

Would there be a reason to not to set the bounds to 1 instead?

split can be done by a bulk dequeue followed by a bulk enqueue to a new queue.

I'm against adding split to the API because the user may want to split into another type of queue.

var coreWorkQueue = makeBalancedQueue(int);
forall i in 1 .. 100 do coreWorkQueue += i;
var splitWork = coreWorkQueue.dequeue(10);
var splitWorkQueue = makeBoundedFIFO(int);
splitWorkQueue += splitWork;

Where coreWorkQueue 'split' into a FIFO queue.

Edit: Fixed first example to makeBoundedFIFO instead of makeBoundedQueue

LouisJenkinsCS · 2017-07-25T18:45:29Z

This brings up a separate question: Should we even have an interface?

If each queue was implemented independent of an interface, it would raise any and all restrictions. Maybe an unbounded queue should not have a shrink/grow method, but perhaps a certain type of bounded queue should. Perhaps the interface should be as simple as enqueue and dequeue and we should leave it at that to allow breathing room for other data structures? Perhaps we should have multiple interfaces? If we wanted a Deque, just implementing a Queue interface is not sufficient, it would also be able to implement a Stack interface, etc. A load-balanced queue that imposes no FIFO ordering can perhaps implement some List interface instead, and perhaps abstract work stealing to another interface LoadBalanced, etc.

There are many ways to proceed with this.

Edit:

To expand on this more, perhaps we could do what Java does, and have hierarchies for this...

Queue is the base interface, BoundedQueue inherits from Queue, and DistributedBoundedQueue inherits from BoundedQueue, etc. Again, lifts a lot of restrictions.

Edit2:

It would be a lot less complicated than this, but it encapsulates the idea.

e-kayrakli · 2017-07-25T18:46:57Z

Growing and shrinking of a distributed bounded queue is not easy without incurring severe performance penalties due to needed synchronization.

The question is not how it would perform but whether it is a useful functionality. Although you can do that without an API support, it might be nice to have it in the API.

Would there be a reason to not to set the bounds to 1 instead?

Simply because it isn't supposed to be 1 :) It just feels hacky to me.

lydia-duncan · 2017-07-25T21:54:43Z

I'm in favor of utilizing an inheritance hierarchy for the queue set up.

For the enqueue method, would it return false if some elements have been added but not all? Maybe be a little more explicit on that.

For freeze/unfreeze - will we guarantee that all operations initiated prior to the freeze/unfreeze will complete before the state changes? Or is it possible to mess up an operation part way through (say, by freezing the midst of a long enqueue, for instance).

LouisJenkinsCS · 2017-07-25T22:29:46Z

For the enqueue method, would it return false if some elements have been added but not all? Maybe be a little more explicit on that.

That's one of the things I've been debating about. The transactional additions was referring to enqueue as an addition to the queue, in that either we commit by adding all of the elements, or rollback by not adding any. You're right in that I wasn't clear on what happened when there was not enough space available. I'll edit the description a bit later.

For freeze/unfreeze - will we guarantee that all operations initiated prior to the freeze/unfreeze will complete before the state changes? Or is it possible to mess up an operation part way through (say, by freezing the midst of a long enqueue, for instance).

Another thing that I've also been debating (also under the Open Questions), however I am in favor of making it implementation-specific. If we went ahead with inheritance hierarchy, the specifics for each queue can be noted in the overridden method.

LouisJenkinsCS · 2017-07-25T23:21:52Z

Soon (tomorrow), I will make a change to the API to add some inheritance and I'll announce when I finish (still here to answer questions).

LouisJenkinsCS · 2017-07-26T19:25:13Z

I have revised the entire API to include hierarchy now. Note that quite a few methods are empty (as it is more of an interface). As chpldocs only shows methods that are overriden, there's nothing else I can do about that yet (I can fill it in more if people like the way it is going right now).

e-kayrakli · 2017-07-27T15:05:03Z

Two ideas that I had while reading the updated documents:

An alternative to "bulk remove" in Collection: remove(10) resulting in creating a new collection may be somewhat heavyweight. How about bulk remove returning an array, and having a much-discussed split returning a collection? This may be a bit confusing, but I don't know..
Do you think there should be Queue.enqueue/Queue.dequeue and similarly Stack.push/Stack.pop as simple wrappers for add/remove. So, a class extending these would override add/remove, yet allow to use enqueue/dequeue and push/pop?

LouisJenkinsCS · 2017-07-27T16:23:37Z

An alternative to "bulk remove" in Collection: remove(10) resulting in creating a new collection may be somewhat heavyweight. How about bulk remove returning an array, and having a much-discussed split returning a collection?

I find this agreeable. An array is definitely the least expensive way to return bulk data, and split makes more sense to split into a separate Collection.

Do you think there should be Queue.enqueue/Queue.dequeue and similarly Stack.push/Stack.pop as simple wrappers for add/remove. So, a class extending these would override add/remove, yet allow to use enqueue/dequeue and push/pop?

Sure, that's agreeable. Originally I removed them so that a Queue and Stack feel more like a Collection, but neglected the fact that their iconic methods provide more clarity in determining what that collection is at a glance. For example, if you see c.add(1), is it a stack, queue, or a list? You won't know unless you look for the type. Now if you see c.enqueue(1) or c.push(1) it becomes obvious.

Overall, I agree to adding them back.

mppf · 2017-07-28T14:31:31Z

List... A data structure without any particular ordering.

I'm not sure that sentence makes sense. I mean, I would expect a List to have more "ordering" than a stack or a queue - in particular you can access the middle elements. Maybe you mean something different? Or maybe it shouldn't be called "List" ? (Also we already have something called a List in Chapel that is different from what you are building).

I'm generally happy with the outline of the API.

LouisJenkinsCS · 2017-07-28T15:09:32Z

I would expect a List to have more "ordering" than a stack or a queue

In the API, I relaxed the ordering for List to allow it to be more of an 'umbrella' for any arbitrary data structure that isn't FIFO, LIFO, or even indexable. For example, if a data structure only allows inserting and removing arbitrary elements without supporting indexing as some optimization, or is inherently unable to due to it's design (I.E; a load-balancing data structure where the index of an item is essentially useless as items can move back-and-forth between nodes). I suppose a better term can be devised instead.

Looking at this again, in Java a List is defined as "an ordered collection (also known as a sequence)", so perhaps List should be indexable by definition and I'll need another category to stick a load-balancing non-indexable non-ordered (but high-performance, I.E seeing over 250x performance over a normal, albeit naive, synchronized list implementation at 64 nodes) data structure. I suppose technically I can have it be a List which ignores the index and returns any arbitrary element and just document it as such.

mppf · 2017-07-28T15:33:31Z

Does related work have a name for a "list" that returns an arbitrary element? Would "bag" be a better term?

For example:
https://en.wikipedia.org/wiki/List_of_data_structures

shows "List" is ordered. And that "bag" is a "multi-set" which isn't far off but might convey the wrong idea.

Maybe we just need to invent a term for a totally unordered collection? Just brainstorming here:

"bag" (at risk of causing confusion with existing uses of the term)
"pile" (I made that up, but apperently so did somebody else)
"jumble"
"collection"
"hoard"
"mound"
edit: "trove"

Or, what about just calling it a "collection" and not trying to have a name for it? I mean, don't you have this terminology right now:

Collection
a List is a Collection with no further property guarantees
** list, bounded list, dynamic bounded list, distributed list
a Queue is a special Collection
a Stack is a special Collection

So couldn't it just be:

Collection
** bounded collection, dynamic bounded collection, distributed collection
a Queue is a special Collection
a Stack is a special Collection

?

If it being unordered is important for its use, should we be looking for a two-word term, like "UnorderedList" or "UnorderedCollection" ?

I still probably like "bag" or "jumble" the best of all of these.

e-kayrakli · 2017-07-28T15:42:21Z

"hoard" and "mound" don't intuitively spark enough understanding of what that structure is to me (maybe because I am not a native speaker)

I actually like the sound of "Collection" the most for the purpose. But I am not sure if it has any implications on the software design aspect. Two-word terms also sound good to me as they are descriptive. I can live with "bag".

LouisJenkinsCS · 2017-07-28T15:48:58Z

Interesting... Yeah, Bag sounds about right, considering it describes it rather well. DistributedBag and Bag for the single-locale version sounds about right. However, I do have List data structures that are indexable, so I guess it'd be...

Collection
- Bag
List
- BoundedList
- DynamicBoundedList
Queue
- BoundedQueue
- DynamicBoundedQueue
Stack
- BoundedStack
- DynamicBoundedStack

In terms of the List, if a data structure supports indexing, then pop_front, push_front, pop_back, push_back, etc are provided for free. As well, potentially later on if a Set interface is added, then Bag can be moved under it as a Multiset of sorts.

Also I won't hold any illusion that I can finish everything here by the end of the month, but my goal is to get at least 1 in each category.

LouisJenkinsCS · 2017-07-28T18:35:06Z

One more thing as well, I need to establish what would be 'acceptable' for performance. In the distributed List implementation (not the Bag/Multiset one I discussed earlier), I can promise non-degrading performance, but I can't promise scalable (as in increase-per-node) performance for a list that requires full ordering across nodes, arbitrary/random access, etc. This is better than a normal synchronized list by far as its performance significantly degrades as you add more nodes.

It's implementation would require employment of the H-Synch algorithm where in summary, we maintain a global queue-lock (MCS) [Requires GlobalAtomicObject] which a node needs to acquire before it may touch the data structure, and a CC-Synch lock (CCSynchLock) which a task needs to acquire. The performance would be near-equivalent to 1 node which will hopefully be non-degrading. It does require more effort on the user's behalf though. @mppf Since you have a much firmer grasp on what is and is not acceptable, would a list with non-degrading performance be desired? As well, if this is desired (and if it is achieved), its possible to reuse the pattern/abstraction in other applications.

LouisJenkinsCS · 2017-08-14T16:17:59Z

Due to time constraints, I'm trimming down the Collections API a lot more. Not enough time to update the hosted chpldocs so I'll show it below in text...

class Collection {
  type eltType;

  /*
    Adds an element to this data structure.
  */
  proc add(elt : eltType) : bool;

  /*
    Removes an arbitrary element from this data structure.

    BUG: Compiler will segfault if the returned value is not captured at callsite.
    'c.remove();' == segfault
    BUG: Compiler with LICM will cause undefined behavior if the return value is captured
    in a variable declare out of a loop.
   'var captured : (bool, eltType); while !captures[1] do captured = remove()' == undefined behavior
  */
  proc remove() : (bool, eltType);

  // Checks to see if the data structure contains an item. Note we do not support the
  // 'removeItem' equivalent by default due to it causing a ton of headache in implementing
  // my ordered data structure.
  proc contains(elt : eltType) : bool;

  // By default calls repeatedly calls 'remove()' until empty if not overloaded. May be
  // problematic with issue #7003
  proc clear();

  // By default checks if 'size() == 0', can be overloaded
  inline proc isEmpty() : bool {
    halt("'proc isEmpty() : bool' is not supported...");
  }

  // Returns number of elements.
  proc size() : int;

  // By default, they must implement serial iteration; I would put Parallel iteration here, but it is
  // not possible as the leader iterator's yielded chunk can vary in type. Also its not the easiest
  // to debug either (I.E: #6998)
  iter these() : eltType;

  // Freezes the collection, rendering it immutable.
  proc freeze() : bool;

  // Unfreezes the collection, rendering it mutable.
  proc unfreeze() : bool;

  // If this collection can be frozen.
  proc canFreeze() : bool;

  // If this collection is currently frozen.
  proc isFrozen() : bool;
}

Note that there isn't any 'Stack', 'List', or 'Queue' derivative. If they are needed, they can be added later. I plan to submit two scalable data structures (used to be 5, but 3 of which I took the best of and merged into one, and another discarded entirely). Now, they will directly inherit from 'Collection', this should avoid most of other issues with dynamic dispatch as well and make my job a lot easier. I definitely promised more than I originally could deliver, apologies for this sudden change.

LouisJenkinsCS · 2017-08-21T12:29:24Z

Now that the PR has been posted, I have another revision I kind of want to make. That would be potentially removing 'freeze' logic from the API entirely. Turns out its a lot more tedious and harder to implement than originally expected. Right now, all data structures allow for iteration during concurrent access, so 'freeze' currently merely elides the need to acquire any locks. Worst of all, the DistributedBag sees a large 33% performance loss from the overhead of these checks, so perhaps I should remove it? Since I originally brought up the idea and no one voiced objections to it, I'd need to see if people would want it or not.

mppf · 2017-08-21T13:11:27Z

I don't have any objection to removing the "freeze" logic. Such logic complicates the API and is only really useful if it offers some performance advantage, in my opinion.

LouisJenkinsCS · 2017-08-21T13:13:41Z

Cool! I'll assume most other people would feel the same way. As well, since the 'resource leakage on breaking out of iterators' is a confirmed bug, I can probably make a warning that the user cannot break out of serial iterators early without consequence.

@LouisJenkinsCS

[GSoC] Parallel-Safe Collections Module and Distributed Data Structures [PR by @LouisJenkinsCS, reviewed by @e-kayrakli and @mppf; tested by @mppf] # Collections I present a parallel-safe distributed data structures framework, similar to Java's 'Collection' framework. Original discussion can be seen [here](#6812). Original repository containing benchmarks can be found [here](https://github.com/LouisJenkinsCS/Distributed-Data-Structures). Chpldocs documentation can be seen [here](https://louisjenkinscs.github.io/Distributed-Data-Structures/). **Note:** This is the official PR for the GSoC program, but I do realize it may seem a bit like a rough-draft. Any feedback or constructive criticism would be appreciated it. ## Deque Provides a strict ordering without sacrificing too much performance. Supports insertion and removal from both ends, allowing FIFO, LIFO, and a Total ordering, which is preserved across all nodes in a cluster, and employs a wait-free round-robin approach to load distribution that ensures fairness in memory, bandwidth, and computation. **Disclaimer:** The deque provided, while scalable, rely heavily on network atomics. The benchmark results are produced using said network atomic operations. ## Bag With performance that scales both in the number of nodes in a cluster and the number of cores per node, we offer a multiset implementation, called a 'Bag', which is a medium that allows storing and retrieving data in any arbitrary order. This type of data structure is ideal for work queues as it employs it's own load balancing, and offers unparalleled performance. **Disclaimer:** A node can request a 'privatized' copy, which retrieves a clone that is allocated on the node requesting it, reducing any excess communication. Usage of `getPrivatizedInstance()` is highly advised for performance-critical sections. ### Performance We compare our data structures to a naive synchronized list implementation as that is all that is available in the Chapel standard library. In all cases, the data structures scale and outperform the naive implementation. #### Insert Implementation | Performance over Naive (at 64 nodes) -------------- | :-----------: SynchronizedList | 1x DistributedDeque | 63x DistributedBag | 403x ![](https://raw.githubusercontent.com/LouisJenkinsCS/Distributed-Data-Structures/master/Results/Collections_Add.png) #### Remove Implementation | Performance over Naive (at 64 nodes) -------------- | :-----------: SynchronizedList | 1x DistributedDeque | 123x DistributedBag | 651x ![](https://raw.githubusercontent.com/LouisJenkinsCS/Distributed-Data-Structures/master/Results/Collections_Remove.png) ##### Testing Passes new test directory with: * quickstart * quickstart + gasnet * qthreads * qthreads + gasnet #### Related Work / Inspiration ##### Bag ###### Segmenting per Core & Multiple Passes for 'Best', 'Average', and 'Worst' Cases This is based heavily on my own work, [Concurrent Map for Go](https://louisjenkinscs.github.io/publications/A_Concurrent_Map_for_Go.pdf) (although the name has changed since published at PACT 2017, the current version is relevant enough). It borrows from the convoy-avoidance algorithm where we skip over any locked buckets. We do the same in the Bag for locked segmented, looking for ideal conditions first. Segmentation also directly relates due to the fact that the map was also a segmented 'fixed-depth' tree-like structure. ##### Deque Compared to when this was originally a `Queue`, the `Deque` has taken on an entirely new design based on the old, and so a lot of the older stuff no longer applies. ###### Monotonic Atomic Increasing Counters for Wait-Free Ordering. Can be observed somewhat in the queue [FFQ](http://se.inf.tu-dresden.de/pubs/papers/ffq2017.pdf), where enqueue and dequeue operations find their position based on said counters. Compared to my own work, instead of having 1-dimensional cells, we have each cell maintain another ordered data structure. ###### Segmenting the Deque per-core This borrowed from how Bag worked, which of course borrowed from my publication.

LouisJenkinsCS changed the title ~~Queue API Interface - Discussion~~ [Review] Queue API Jul 25, 2017

ben-albrecht mentioned this issue Jul 26, 2017

Type method with a param-typed return-type #6837

Closed

LouisJenkinsCS changed the title ~~[Review] Queue API~~ [Review] Data Structure API Jul 26, 2017

ben-albrecht mentioned this issue Aug 14, 2017

LICM does not detect defs from virtual method calls #7003

Closed

LouisJenkinsCS mentioned this issue Aug 21, 2017

[GSoC] Parallel-Safe Collections Module and Distributed Data Structures #7062

Merged

3 tasks

LouisJenkinsCS closed this as completed Aug 25, 2017

ben-albrecht added type: Design area: Libraries / Modules labels Oct 26, 2017

ben-albrecht mentioned this issue Nov 9, 2017

Build a big set of libraries #6329

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Review] Data Structure API #6812

[Review] Data Structure API #6812

LouisJenkinsCS commented Jul 24, 2017 •

edited

Loading

LouisJenkinsCS commented Jul 24, 2017

e-kayrakli commented Jul 25, 2017

LouisJenkinsCS commented Jul 25, 2017 •

edited

Loading

LouisJenkinsCS commented Jul 25, 2017 •

edited

Loading

e-kayrakli commented Jul 25, 2017

lydia-duncan commented Jul 25, 2017

LouisJenkinsCS commented Jul 25, 2017

LouisJenkinsCS commented Jul 25, 2017 •

edited

Loading

LouisJenkinsCS commented Jul 26, 2017

e-kayrakli commented Jul 27, 2017

LouisJenkinsCS commented Jul 27, 2017

mppf commented Jul 28, 2017

LouisJenkinsCS commented Jul 28, 2017 •

edited

Loading

mppf commented Jul 28, 2017 •

edited

Loading

e-kayrakli commented Jul 28, 2017

LouisJenkinsCS commented Jul 28, 2017

LouisJenkinsCS commented Jul 28, 2017

LouisJenkinsCS commented Aug 14, 2017

LouisJenkinsCS commented Aug 21, 2017

mppf commented Aug 21, 2017

LouisJenkinsCS commented Aug 21, 2017

[Review] Data Structure API #6812

[Review] Data Structure API #6812

Comments

LouisJenkinsCS commented Jul 24, 2017 • edited Loading

Data Structure API

Proposed Additions

'Freezing' Data Structures

Open Questions

Iteration

Open Questions

Transactional Additions

Open Questions

Operator Overloading

LouisJenkinsCS commented Jul 24, 2017

e-kayrakli commented Jul 25, 2017

LouisJenkinsCS commented Jul 25, 2017 • edited Loading

LouisJenkinsCS commented Jul 25, 2017 • edited Loading

e-kayrakli commented Jul 25, 2017

lydia-duncan commented Jul 25, 2017

LouisJenkinsCS commented Jul 25, 2017

LouisJenkinsCS commented Jul 25, 2017 • edited Loading

LouisJenkinsCS commented Jul 26, 2017

e-kayrakli commented Jul 27, 2017

LouisJenkinsCS commented Jul 27, 2017

mppf commented Jul 28, 2017

LouisJenkinsCS commented Jul 28, 2017 • edited Loading

mppf commented Jul 28, 2017 • edited Loading

e-kayrakli commented Jul 28, 2017

LouisJenkinsCS commented Jul 28, 2017

LouisJenkinsCS commented Jul 28, 2017

LouisJenkinsCS commented Aug 14, 2017

LouisJenkinsCS commented Aug 21, 2017

mppf commented Aug 21, 2017

LouisJenkinsCS commented Aug 21, 2017

LouisJenkinsCS commented Jul 24, 2017 •

edited

Loading

LouisJenkinsCS commented Jul 25, 2017 •

edited

Loading

LouisJenkinsCS commented Jul 25, 2017 •

edited

Loading

LouisJenkinsCS commented Jul 25, 2017 •

edited

Loading

LouisJenkinsCS commented Jul 28, 2017 •

edited

Loading

mppf commented Jul 28, 2017 •

edited

Loading