Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Review] Data Structure API #6812

Closed
LouisJenkinsCS opened this issue Jul 24, 2017 · 21 comments
Closed

[Review] Data Structure API #6812

LouisJenkinsCS opened this issue Jul 24, 2017 · 21 comments

Comments

@LouisJenkinsCS
Copy link
Member

LouisJenkinsCS commented Jul 24, 2017

Data Structure API

Looking for feedback and review on the API for Data Structures, chpldocs hosted here. Below are questions that help determine the overall direction of further development of the queue.

Edit: The chpldocs link is now updated. Old docs are here.

Edit2: I have removed the QueueFactory entirely for now.

Edit3: Generalizing for any kind of data structures.

## The QueueFactory

Emulates Go's 'make'.

### Open Questions

* Should we maintain an interface at all?
* If not, then should we make each data structure a record instead?
* Resilience to Change vs Flexibility

Proposed Additions

'Freezing' Data Structures

While parallel-safe data structures are desired, not all things can be done in a
parallel-safe manner. For these situations, I believe the data structures should be armed with two
states: Mutable (unfrozen) and Immutable (frozen). The benefit is two-fold, as it
enables optimizations such as adopting a more optimistic manner of accessing data
by not contesting for locks meaning more performance, and the user has the benefit
of enforcing immutability across nodes without need for external synchronization.
I believe that this is the only 'safe' way to allow things like reduction, zipper,
and mapping iterations without modifying the queue while providing a significant performance boost.

Open Questions

  • Are the semantic changes of having dual-mode operations too confusing?
  • What happens to concurrent ongoing operations while freezing/unfreezing?
    • Should we halt? Block until unfrozen? Make it 'implementation-defined' behavior?

Iteration

Iteration allows for the data structures to become more than just a simple containers, making things
such as reductions, zipper-iterables, and mappings possible.

Open Questions

  • Should we allow read-only iteration or 'draining' iteration, perhaps both?
    • Proposal: Based on if the data structure is 'frozen' or 'unfrozen'.

Transactional Additions

Adding in bulk 'all-or-nothing' transactional approach.

Open Questions

  • Should we allow adding non-scalar types? From other collections?

Operator Overloading

While some operators can be helpful, others can be rendered irrelevant. Syntactic
sugar operators such as += can help produce more readable code and are easy
to overload. Perhaps the = operator can be overloaded to perform a 'clear-and-add'
kind of operation? What say you?

var queue = makeBoundedFIFO(int, 100);
queue = (1,2,3,4,5);
for i in 6 .. 100 do queue += i;

Edit: Removed the 'literal' part of example.

@LouisJenkinsCS
Copy link
Member Author

Updated chpldocs with additional methods (clear(), isEmpty(), isFull(), length)

@LouisJenkinsCS LouisJenkinsCS changed the title Queue API Interface - Discussion [Review] Queue API Jul 25, 2017
@e-kayrakli
Copy link
Contributor

  • One thing that worth thinking about would be allowing grow/shrink on bounded queues. User may choose to do that in some cases, regardless of its cost. On a similar note, 0-sized bounded queues can also be allowed. This would let patterns where users initialize a 0-sized queue and pass it to some initialization function.

  • Another thing (that we briefly discussed with Louis) is a need for a split method. This came up when Louis was using unbalanced tree search as a benchmark. See for example: https://github.com/chapel-lang/chapel/blob/master/test/studies/uts/uts_deq.chpl#L220. Here the split method is used by the user to balance the load across tasks manually. Louis' current implementation balances the load in the sense that dequeues can be satisfied from remote locales if the local storage is depleted, therefore the need for balancing may not be that much, but still a user-controllable way of doing that might be nice. Also, split can be done by a bulk dequeue followed by a bulk enqueue to a new queue. So the implementation of such a method can be just a wrapper around those.

@LouisJenkinsCS
Copy link
Member Author

LouisJenkinsCS commented Jul 25, 2017

One thing that worth thinking about would be allowing grow/shrink on bounded queues

Growing and shrinking of a distributed bounded queue is not easy without incurring severe performance penalties due to needed synchronization. A bounded queue is to be implemented on top of a distributed array, of which does not support concurrent growing or shrinking. If heavily requested, a
distributed dynamically resizing bounded queue can be created, separate from the normal distributed bounded queue, which can be implemented in a way that allows this.

I would like to note that this can be done without using the queue API by creating a new queue of the requested size and appending the old into the new...

var oldQueue = makeBoundedFIFO(100);
for i in 1 .. 100 do oldQueue += i;
var newQueue = makeBoundedFIFO(200);
newQueue += oldQueue;
for i in 101 .. 200 do newQueue += i;

users initialize a 0-sized queue and pass it to some initialization function

Would there be a reason to not to set the bounds to 1 instead?

split can be done by a bulk dequeue followed by a bulk enqueue to a new queue.

I'm against adding split to the API because the user may want to split into another type of queue.

var coreWorkQueue = makeBalancedQueue(int);
forall i in 1 .. 100 do coreWorkQueue += i;
var splitWork = coreWorkQueue.dequeue(10);
var splitWorkQueue = makeBoundedFIFO(int);
splitWorkQueue += splitWork;

Where coreWorkQueue 'split' into a FIFO queue.

Edit: Fixed first example to makeBoundedFIFO instead of makeBoundedQueue

@LouisJenkinsCS
Copy link
Member Author

LouisJenkinsCS commented Jul 25, 2017

This brings up a separate question: Should we even have an interface?

If each queue was implemented independent of an interface, it would raise any and all restrictions. Maybe an unbounded queue should not have a shrink/grow method, but perhaps a certain type of bounded queue should. Perhaps the interface should be as simple as enqueue and dequeue and we should leave it at that to allow breathing room for other data structures? Perhaps we should have multiple interfaces? If we wanted a Deque, just implementing a Queue interface is not sufficient, it would also be able to implement a Stack interface, etc. A load-balanced queue that imposes no FIFO ordering can perhaps implement some List interface instead, and perhaps abstract work stealing to another interface LoadBalanced, etc.

There are many ways to proceed with this.

Edit:

To expand on this more, perhaps we could do what Java does, and have hierarchies for this...

Queue is the base interface, BoundedQueue inherits from Queue, and DistributedBoundedQueue inherits from BoundedQueue, etc. Again, lifts a lot of restrictions.

Edit2:

It would be a lot less complicated than this, but it encapsulates the idea.

image

@e-kayrakli
Copy link
Contributor

Growing and shrinking of a distributed bounded queue is not easy without incurring severe performance penalties due to needed synchronization.

The question is not how it would perform but whether it is a useful functionality. Although you can do that without an API support, it might be nice to have it in the API.

Would there be a reason to not to set the bounds to 1 instead?

Simply because it isn't supposed to be 1 :) It just feels hacky to me.

@lydia-duncan
Copy link
Member

I'm in favor of utilizing an inheritance hierarchy for the queue set up.

For the enqueue method, would it return false if some elements have been added but not all? Maybe be a little more explicit on that.

For freeze/unfreeze - will we guarantee that all operations initiated prior to the freeze/unfreeze will complete before the state changes? Or is it possible to mess up an operation part way through (say, by freezing the midst of a long enqueue, for instance).

@LouisJenkinsCS
Copy link
Member Author

For the enqueue method, would it return false if some elements have been added but not all? Maybe be a little more explicit on that.

That's one of the things I've been debating about. The transactional additions was referring to enqueue as an addition to the queue, in that either we commit by adding all of the elements, or rollback by not adding any. You're right in that I wasn't clear on what happened when there was not enough space available. I'll edit the description a bit later.

For freeze/unfreeze - will we guarantee that all operations initiated prior to the freeze/unfreeze will complete before the state changes? Or is it possible to mess up an operation part way through (say, by freezing the midst of a long enqueue, for instance).

Another thing that I've also been debating (also under the Open Questions), however I am in favor of making it implementation-specific. If we went ahead with inheritance hierarchy, the specifics for each queue can be noted in the overridden method.

@LouisJenkinsCS
Copy link
Member Author

LouisJenkinsCS commented Jul 25, 2017

Soon (tomorrow), I will make a change to the API to add some inheritance and I'll announce when I finish (still here to answer questions).

@LouisJenkinsCS
Copy link
Member Author

I have revised the entire API to include hierarchy now. Note that quite a few methods are empty (as it is more of an interface). As chpldocs only shows methods that are overriden, there's nothing else I can do about that yet (I can fill it in more if people like the way it is going right now).

@LouisJenkinsCS LouisJenkinsCS changed the title [Review] Queue API [Review] Data Structure API Jul 26, 2017
@e-kayrakli
Copy link
Contributor

Two ideas that I had while reading the updated documents:

  • An alternative to "bulk remove" in Collection: remove(10) resulting in creating a new collection may be somewhat heavyweight. How about bulk remove returning an array, and having a much-discussed split returning a collection? This may be a bit confusing, but I don't know..

  • Do you think there should be Queue.enqueue/Queue.dequeue and similarly Stack.push/Stack.pop as simple wrappers for add/remove. So, a class extending these would override add/remove, yet allow to use enqueue/dequeue and push/pop?

@LouisJenkinsCS
Copy link
Member Author

An alternative to "bulk remove" in Collection: remove(10) resulting in creating a new collection may be somewhat heavyweight. How about bulk remove returning an array, and having a much-discussed split returning a collection?

I find this agreeable. An array is definitely the least expensive way to return bulk data, and split makes more sense to split into a separate Collection.

Do you think there should be Queue.enqueue/Queue.dequeue and similarly Stack.push/Stack.pop as simple wrappers for add/remove. So, a class extending these would override add/remove, yet allow to use enqueue/dequeue and push/pop?

Sure, that's agreeable. Originally I removed them so that a Queue and Stack feel more like a Collection, but neglected the fact that their iconic methods provide more clarity in determining what that collection is at a glance. For example, if you see c.add(1), is it a stack, queue, or a list? You won't know unless you look for the type. Now if you see c.enqueue(1) or c.push(1) it becomes obvious.

Overall, I agree to adding them back.

@mppf
Copy link
Member

mppf commented Jul 28, 2017

List... A data structure without any particular ordering.

I'm not sure that sentence makes sense. I mean, I would expect a List to have more "ordering" than a stack or a queue - in particular you can access the middle elements. Maybe you mean something different? Or maybe it shouldn't be called "List" ? (Also we already have something called a List in Chapel that is different from what you are building).

I'm generally happy with the outline of the API.

@LouisJenkinsCS
Copy link
Member Author

LouisJenkinsCS commented Jul 28, 2017

I would expect a List to have more "ordering" than a stack or a queue

In the API, I relaxed the ordering for List to allow it to be more of an 'umbrella' for any arbitrary data structure that isn't FIFO, LIFO, or even indexable. For example, if a data structure only allows inserting and removing arbitrary elements without supporting indexing as some optimization, or is inherently unable to due to it's design (I.E; a load-balancing data structure where the index of an item is essentially useless as items can move back-and-forth between nodes). I suppose a better term can be devised instead.

Looking at this again, in Java a List is defined as "an ordered collection (also known as a sequence)", so perhaps List should be indexable by definition and I'll need another category to stick a load-balancing non-indexable non-ordered (but high-performance, I.E seeing over 250x performance over a normal, albeit naive, synchronized list implementation at 64 nodes) data structure. I suppose technically I can have it be a List which ignores the index and returns any arbitrary element and just document it as such.

@mppf
Copy link
Member

mppf commented Jul 28, 2017

Does related work have a name for a "list" that returns an arbitrary element? Would "bag" be a better term?

For example:
https://en.wikipedia.org/wiki/List_of_data_structures

shows "List" is ordered. And that "bag" is a "multi-set" which isn't far off but might convey the wrong idea.

Maybe we just need to invent a term for a totally unordered collection? Just brainstorming here:

  • "bag" (at risk of causing confusion with existing uses of the term)
  • "pile" (I made that up, but apperently so did somebody else)
  • "jumble"
  • "collection"
  • "hoard"
  • "mound"
  • edit: "trove"

Or, what about just calling it a "collection" and not trying to have a name for it? I mean, don't you have this terminology right now:

  • Collection
  • a List is a Collection with no further property guarantees
    ** list, bounded list, dynamic bounded list, distributed list
  • a Queue is a special Collection
  • a Stack is a special Collection

So couldn't it just be:

  • Collection
    ** bounded collection, dynamic bounded collection, distributed collection
  • a Queue is a special Collection
  • a Stack is a special Collection

?

If it being unordered is important for its use, should we be looking for a two-word term, like "UnorderedList" or "UnorderedCollection" ?

I still probably like "bag" or "jumble" the best of all of these.

@e-kayrakli
Copy link
Contributor

"hoard" and "mound" don't intuitively spark enough understanding of what that structure is to me (maybe because I am not a native speaker)

I actually like the sound of "Collection" the most for the purpose. But I am not sure if it has any implications on the software design aspect. Two-word terms also sound good to me as they are descriptive. I can live with "bag".

@LouisJenkinsCS
Copy link
Member Author

Interesting... Yeah, Bag sounds about right, considering it describes it rather well. DistributedBag and Bag for the single-locale version sounds about right. However, I do have List data structures that are indexable, so I guess it'd be...

  • Collection
    • Bag
  • List
    • BoundedList
    • DynamicBoundedList
  • Queue
    • BoundedQueue
    • DynamicBoundedQueue
  • Stack
    • BoundedStack
    • DynamicBoundedStack

In terms of the List, if a data structure supports indexing, then pop_front, push_front, pop_back, push_back, etc are provided for free. As well, potentially later on if a Set interface is added, then Bag can be moved under it as a Multiset of sorts.

Also I won't hold any illusion that I can finish everything here by the end of the month, but my goal is to get at least 1 in each category.

@LouisJenkinsCS
Copy link
Member Author

One more thing as well, I need to establish what would be 'acceptable' for performance. In the distributed List implementation (not the Bag/Multiset one I discussed earlier), I can promise non-degrading performance, but I can't promise scalable (as in increase-per-node) performance for a list that requires full ordering across nodes, arbitrary/random access, etc. This is better than a normal synchronized list by far as its performance significantly degrades as you add more nodes.

It's implementation would require employment of the H-Synch algorithm where in summary, we maintain a global queue-lock (MCS) [Requires GlobalAtomicObject] which a node needs to acquire before it may touch the data structure, and a CC-Synch lock (CCSynchLock) which a task needs to acquire. The performance would be near-equivalent to 1 node which will hopefully be non-degrading. It does require more effort on the user's behalf though. @mppf Since you have a much firmer grasp on what is and is not acceptable, would a list with non-degrading performance be desired? As well, if this is desired (and if it is achieved), its possible to reuse the pattern/abstraction in other applications.

@LouisJenkinsCS
Copy link
Member Author

Due to time constraints, I'm trimming down the Collections API a lot more. Not enough time to update the hosted chpldocs so I'll show it below in text...

class Collection {
  type eltType;

  /*
    Adds an element to this data structure.
  */
  proc add(elt : eltType) : bool;

  /*
    Removes an arbitrary element from this data structure.

    BUG: Compiler will segfault if the returned value is not captured at callsite.
    'c.remove();' == segfault
    BUG: Compiler with LICM will cause undefined behavior if the return value is captured
    in a variable declare out of a loop.
   'var captured : (bool, eltType); while !captures[1] do captured = remove()' == undefined behavior
  */
  proc remove() : (bool, eltType);

  // Checks to see if the data structure contains an item. Note we do not support the
  // 'removeItem' equivalent by default due to it causing a ton of headache in implementing
  // my ordered data structure.
  proc contains(elt : eltType) : bool;

  // By default calls repeatedly calls 'remove()' until empty if not overloaded. May be
  // problematic with issue #7003
  proc clear();

  // By default checks if 'size() == 0', can be overloaded
  inline proc isEmpty() : bool {
    halt("'proc isEmpty() : bool' is not supported...");
  }

  // Returns number of elements.
  proc size() : int;

  // By default, they must implement serial iteration; I would put Parallel iteration here, but it is
  // not possible as the leader iterator's yielded chunk can vary in type. Also its not the easiest
  // to debug either (I.E: #6998)
  iter these() : eltType;

  // Freezes the collection, rendering it immutable.
  proc freeze() : bool;

  // Unfreezes the collection, rendering it mutable.
  proc unfreeze() : bool;

  // If this collection can be frozen.
  proc canFreeze() : bool;

  // If this collection is currently frozen.
  proc isFrozen() : bool;
}

Note that there isn't any 'Stack', 'List', or 'Queue' derivative. If they are needed, they can be added later. I plan to submit two scalable data structures (used to be 5, but 3 of which I took the best of and merged into one, and another discarded entirely). Now, they will directly inherit from 'Collection', this should avoid most of other issues with dynamic dispatch as well and make my job a lot easier. I definitely promised more than I originally could deliver, apologies for this sudden change.

@LouisJenkinsCS
Copy link
Member Author

Now that the PR has been posted, I have another revision I kind of want to make. That would be potentially removing 'freeze' logic from the API entirely. Turns out its a lot more tedious and harder to implement than originally expected. Right now, all data structures allow for iteration during concurrent access, so 'freeze' currently merely elides the need to acquire any locks. Worst of all, the DistributedBag sees a large 33% performance loss from the overhead of these checks, so perhaps I should remove it? Since I originally brought up the idea and no one voiced objections to it, I'd need to see if people would want it or not.

@mppf
Copy link
Member

mppf commented Aug 21, 2017

I don't have any objection to removing the "freeze" logic. Such logic complicates the API and is only really useful if it offers some performance advantage, in my opinion.

@LouisJenkinsCS
Copy link
Member Author

Cool! I'll assume most other people would feel the same way. As well, since the 'resource leakage on breaking out of iterators' is a confirmed bug, I can probably make a warning that the user cannot break out of serial iterators early without consequence.

mppf added a commit that referenced this issue Aug 24, 2017
[GSoC] Parallel-Safe Collections Module and Distributed Data Structures

[PR by @LouisJenkinsCS, reviewed by @e-kayrakli and @mppf; tested by @mppf]

# Collections

I present a parallel-safe distributed data structures framework, similar
to Java's 'Collection' framework. Original discussion can be seen
[here](#6812). Original
repository containing benchmarks can be found
[here](https://github.com/LouisJenkinsCS/Distributed-Data-Structures).
Chpldocs documentation can be seen
[here](https://louisjenkinscs.github.io/Distributed-Data-Structures/).

**Note:** This is the official PR for the GSoC program, but I do realize
it may seem a bit like a rough-draft. Any feedback or constructive
criticism would be appreciated it.

## Deque

Provides a strict ordering without sacrificing too much performance.
Supports insertion and removal from both ends, allowing FIFO, LIFO, and a
Total ordering, which is preserved across all nodes in a cluster, and
employs a wait-free round-robin approach to load distribution that
ensures fairness in memory, bandwidth, and computation.

**Disclaimer:** The deque provided, while scalable, rely heavily on
network atomics.  The benchmark results are produced using said network
atomic operations.

## Bag

With performance that scales both in the number of nodes in a cluster and
the number of cores per node, we offer a multiset implementation, called
a 'Bag', which is a medium that allows storing and retrieving data in any
arbitrary order.  This type of data structure is ideal for work queues as
it employs it's own load balancing, and offers unparalleled performance.

**Disclaimer:** A node can request a 'privatized' copy, which retrieves a
clone that is allocated on the node requesting it, reducing any excess
communication.  Usage of `getPrivatizedInstance()` is highly advised for
performance-critical sections.

### Performance

We compare our data structures to a naive synchronized list
implementation as that is all that is available in the Chapel standard
library.  In all cases, the data structures scale and outperform the
naive implementation.

#### Insert

Implementation | Performance over Naive (at 64 nodes)
-------------- | :-----------:
SynchronizedList | 1x
DistributedDeque | 63x
DistributedBag | 403x

![](https://raw.githubusercontent.com/LouisJenkinsCS/Distributed-Data-Structures/master/Results/Collections_Add.png)

#### Remove

Implementation | Performance over Naive (at 64 nodes)
-------------- | :-----------:
SynchronizedList | 1x
DistributedDeque | 123x
DistributedBag | 651x

![](https://raw.githubusercontent.com/LouisJenkinsCS/Distributed-Data-Structures/master/Results/Collections_Remove.png)

##### Testing

Passes new test directory with:
  * quickstart
  * quickstart + gasnet
  * qthreads
  * qthreads + gasnet

#### Related Work / Inspiration

##### Bag

###### Segmenting per Core & Multiple Passes for 'Best', 'Average', and 'Worst' Cases

This is based heavily on my own work, [Concurrent Map for
Go](https://louisjenkinscs.github.io/publications/A_Concurrent_Map_for_Go.pdf)
(although the name has changed since published at PACT 2017, the current
version is relevant enough). It borrows from the convoy-avoidance
algorithm where we skip over any locked buckets. We do the same in the
Bag for locked segmented, looking for ideal conditions first.
Segmentation also directly relates due to the fact that the map was also
a segmented 'fixed-depth' tree-like structure.

##### Deque

Compared to when this was originally a `Queue`, the `Deque` has taken on
an entirely new design based on the old, and so a lot of the older stuff
no longer applies. 

###### Monotonic Atomic Increasing Counters for Wait-Free Ordering.

Can be observed somewhat in the queue
[FFQ](http://se.inf.tu-dresden.de/pubs/papers/ffq2017.pdf), where enqueue
and dequeue operations find their position based on said counters.
Compared to my own work, instead of having 1-dimensional cells, we have
each cell maintain another ordered data structure.

###### Segmenting the Deque per-core

This borrowed from how Bag worked, which of course borrowed from my
publication.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants