Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSoC] Parallel-Safe Collections Module and Distributed Data Structures #7062

Merged
merged 16 commits into from
Aug 24, 2017

Conversation

LouisJenkinsCS
Copy link
Member

@LouisJenkinsCS LouisJenkinsCS commented Aug 21, 2017

Collections

I present a parallel-safe distributed data structures framework, similar to Java's 'Collection' framework. Original discussion can be seen here. Original repository containing benchmarks can be found here. Chpldocs documentation can be seen here.

Note: This is the official PR for the GSoC program, but I do realize it may seem a bit like a rough-draft. Any feedback or constructive criticism would be appreciated it.

Deque

Provides a strict ordering without sacrificing too much performance. Supports insertion
and removal from both ends, allowing FIFO, LIFO, and a Total ordering, which is
preserved across all nodes in a cluster, and employs a wait-free round-robin approach
to load distribution that ensures fairness in memory, bandwidth, and computation.

Disclaimer: The deque provided, while scalable, rely heavily on network atomics.
The benchmark results are produced using said network atomic operations.

Bag

With performance that scales both in the number of nodes in a cluster and the
number of cores per node, we offer a multiset implementation, called a 'Bag',
which is a medium that allows storing and retrieving data in any arbitrary order.
This type of data structure is ideal for work queues as it employs it's own load
balancing, and offers unparalleled performance.

Disclaimer: A node can request a 'privatized' copy, which retrieves a clone
that is allocated on the node requesting it, reducing any excess communication.
Usage of getPrivatizedInstance() is highly advised for performance-critical
sections.

Performance

We compare our data structures to a naive synchronized list implementation
as that is all that is available in the Chapel standard library.
In all cases, the data structures scale and outperform the naive implementation.

Insert

Implementation Performance over Naive (at 64 nodes)
SynchronizedList 1x
DistributedDeque 63x
DistributedBag 403x

Remove

Implementation Performance over Naive (at 64 nodes)
SynchronizedList 1x
DistributedDeque 123x
DistributedBag 651x

Revisions

  • Remove 'freeze' logic
  • Engin's suggested changes
  • Michael's suggested changes

@LouisJenkinsCS
Copy link
Member Author

@e-kayrakli @mppf

Both GSoC mentors

@mppf
Copy link
Member

mppf commented Aug 21, 2017

Two things for the PR message:

  1. One place refers to MultiSet while the other refers to Bag
  2. These data structures are designed for parallel use cases. I don't think that's true of the Java analogues, but I might be wrong about that. Either way, I think it's important to put that information up-front near the top of the PR message.

@LouisJenkinsCS
Copy link
Member Author

Gating Issues: #7003 as it produces undefined behavior when LICM is enabled and the return value of Collection.remove() is captured in a variable declared outside of a loop.

#6542 as it produces a compiler segmentation fault when the return value is ignored/not captured at all. Hence meaning that using Collection.remove() in requires extremely specific use to avoid runtime or compile-time errors.

@LouisJenkinsCS
Copy link
Member Author

Thanks @mppf I have done both. Should I rename the PR to 'Parallel-Safe Collections Module'?

@mppf
Copy link
Member

mppf commented Aug 21, 2017

@LouisJenkinsCS - yes I think that renaming is a good idea. Thanks!

@LouisJenkinsCS LouisJenkinsCS changed the title [GSoC] Collections Module [GSoC] Parallel-Safe Collections Module Aug 21, 2017
@e-kayrakli
Copy link
Contributor

Thanks Louis! I discussed with @mppf offline and will give this an initial review a little later today.

Meanwhile, I don't know whether you have signed a CLA yet. If not, maybe you can take a look at this and get it out of the way while waiting for reviews:

https://github.com/chapel-lang/chapel/tree/master/doc/rst/developer/contributorAgreements

@mppf
Copy link
Member

mppf commented Aug 21, 2017

We already have a contributor agreement for Louis.

@LouisJenkinsCS
Copy link
Member Author

Would also like to make known my other question on a potential revision, here, on how I would like to remove 'freeze' logic as its no longer needed like originally planned and comes at a penalty to performance.

@LouisJenkinsCS LouisJenkinsCS changed the title [GSoC] Parallel-Safe Collections Module [GSoC] Parallel-Safe Collections Module and Distributed Data Structures Aug 21, 2017
Copy link
Contributor

@e-kayrakli e-kayrakli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finished a full review of the PR. My comments are mostly trivial changes sprinkled with some questions to the reviewer from the team

Adds an element to this data structure.
*/
inline proc add(elt : eltType) : bool {
halt("'proc add(elts : eltType ... ?nElts) : bool' is not supported...");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error message is seemingly from an older version of add


**FIX:** Use the `--no-loop-invariant-code-motion` to disable LICM.
Otherwise, just make sure you always capture the return value inside of a loop
in a variable not declared outside of loop...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think linking to github issues can be useful for some users who wants to know more about the bug/or state their concerns regarding the issue.

/*
Obtain the number of elements contained in this collection.
*/
inline proc size() : int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To bring the collections more inline with the existing List module, I think we can remove the parentheses and maybe add a synonym length? Arguably, that module needs to be incorporated into this framework but until then let's not diverge too much from it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parenthesis methods cannot be overloaded, but having a synonym for 'length' seems spot on since 'length' can just call 'size()'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right but you could create a proc Collection.size { return this.getSize(); } e.g.

Iterate over all elements in the data structure.

**BUG:** Compiler does not currently allow overloading standalone or leader/follower
iterators, and as such only serial iterators may be used with the base type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly a link to github issue here

private const ADD_AVERAGE_CASE = 1;
private const REMOVE_BEST_CASE = 2;
private const REMOVE_AVERAGE_CASE = 3;
private const REMOVE_WORST_CASE = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these consts looks like they could be params. Also types should be omitted unless there is an issue making them necessary (i.e. is it a problem if STATUS_* becomes an int with type inference?)

// gets added, as well as a barrier to prevent the head and tail from being cached.
var size : atomic int;

inline proc recycleNode() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd find createNode more appropriate but it is more personal taste

pragma "no doc"
inline proc getPrivatizedThis {
return chpl_getPrivatizedCopy(this.type, pid);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly privatization methods should be marked with TODO or FIXME or something

*/
proc pop() : (bool, eltType) {
return popBack();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think push/pop should use front instead of back, but I don't know if that makes any practical difference

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Enqueue pushes to the back, I'd expect that if I did enqueue(); pop(); to get the same thing I just added, hence I have both use the same end.

nodes[idx] = (size, tailIdx, node);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe names FIFO/LIFO can be clearer.. iterFIFO/iterLIFO or something.. or even iter iterator(param order: orderKind where orderKind is an enum or something

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking of having these(param order : orderKind = orderKind.NONE), but I wasn't sure. That way I'd have orderKind.NONE, orderKind.FIFO, orderKind.LIFO, etc. As well, since enums no longer causes a compiler internal error with deadCodeElimination (fixed in #6570), I have no issue with enums anymore.

@@ -0,0 +1,2 @@
-sisBounded --main-module=DequeParity
--main-module=DequeParity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need for --main-module flag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, that was from before, but I do need it to run both with and withou t-sisBounded set, and I wasn't sure if blank lines get counted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do:

-sisBounded=true
-sisBounded=false

}

*/
inline proc remove() : (bool, eltType) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm used to seeing remove calls refer to a specific element - where the pattern here is more what I'd probably call pop. Is there a particular reason you chose add/remove vs. push/pop? Any precedent?

Of course it's the case that we don't want to imply that all of these Collections meet some particular definition of Stack ordering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No-inline here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I didn't want to imply any particular ordering, and add and remove seemed descriptive enough, as remove in this case means "remove any item".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ...

/*
Adds an element to this data structure.
*/
inline proc add(elt : eltType) : bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't expect virtual methods that just halt should be marked inline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning behind declaring it inline is due so that the user will know the precisely location they have invoked the unsupported method. Normally, if it is not inlined, it shows a line number in Collections.chpl, which isn't entirely helpful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ...

/*
Determine whether an element exists in this collection.
*/
inline proc contains(elt : eltType) : bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no-inline

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ...

/*
Obtain the number of elements contained in this collection.
*/
inline proc getSize() : int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no inline here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ...

As such, this cannot be used where `ref` intents cannot be used, such as ``forall``
and ``coforall`` loops.
*/
proc +=(ref c : Collection(?eltType), elt : eltType) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one seems more reasonable to inline, if you want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ...

@@ -0,0 +1,116 @@
/*
A 'Collection' is a data structure, a container for elements that provide support
for insert, lookup, remove, and iteration operations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the number of open issues / compiler bugs here, I'd like to recommend that this module go into modules/packages. It is used in the same way as it would be if it went in to standard/. My expectation would be that once some of these user-facing issues that limit its usefulness are addressed, it can move to standard/.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and if it wasn't clear, we'd want to put all of your new modules in either standard/ or packages/ together)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no objections to that. I placed it in standard/ because the List module was also in standard/.

or, for example a near empty bag, we fall into the 'average-case', etc. Examples
of 'best-case' scenarios for a removal would be when a segment is unlocked
and contains some elements we can drain, and the 'average-case' would be to find
any segment, unlocked or not, that contains elements we can drain, and so on.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried to use chpldoc to render the documentation for your new modules? (While that's a good idea, it's not necessarily required). In this case in particular, there's usually a comment at the top of the module that offers a high level introduction. If that's what this comment is, it's too detailed. I'm not objecting to this comment's existence - I'm just saying that we'd need an even more basic comment (What is this and why would I use it?) before any comment that goes into as much detail about performance as this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation (using chpldoc) can be seen here. I normally left the high-level description for the main structure itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the link - I had forgotten that you'd already generated that.

Right, but it's good for such documentation to start with an introductory paragraph. For this reason, we typically explicitly declare the module (e.g. module DistributedBag { }) and attach comments to that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That explains why its a huge deal for the classes to not share the same name as the module itself. I'll make that change.

stealing work, and better locality of elements among segments. As well, to
achieve true parallelism, usage of a privatized instance is a requirement, as
it avoids the overhead of remotely accessing class fields, bounding performance
on communication.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the array implementation (say), the priviatized instances are handled transparently by the module code. I think that doing so in your case would amount to:

  • creating a record wrapping these types
  • storing the privatized ID a well as the class pointer in the record
  • creating forwarding methods from the record to the class instance / privatized class

I can see that you've been making use of the class system in Chapel, so that Bag and Deque are subclasses of Collection. I can think of several reasons why you might have wanted to do that, but my personal expectation would be that Collection would be an "interface" (which does not exist yet in Chapel) and that Bag and Deque are implementations of that interface. I bring this up because if you do the record-wrapping as I describe above, one would no longer be able to store a mixed-type array of Collections that are either Bag or Deque. But I don't think doing so is actually useful, in this particular case.

So, if you were to follow the array strategy:

  • Collection, DistributedBag, DistributedDeque would become CollectionImpl, DistributedBagImpl, DistributedDequeImpl
  • The new records wrapping these would be called DistributedBag and DistributedDeque (or something, you could consider using lower case to name them for example)
  • Users would no longer have to manually delete or manage privatized instances

Lastly, I'm noting that this module creates a class with the same name as the module. We need to find a way not to do that - Chapel is a single namespace language - so one namespace needs to be able to identify both the module and the class - otherwise we get some weird compilation problems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I can agree that record wrapping is a very good idea, it wouldn't allow me to have any kind of inheritance (issue #6851). Not only would you no longer be able to store mixed-type arrays, you wouldn't be able to store any kind of Collection as you can no longer inherit, and even if you could you wouldn't be able to use the Collection in an opaque way (since before removal, they would truncate all methods and fields to the base type).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, there is an issue with using records in a forall or coforall loop. Example below

record bag {
	var someData : atomic int;
	proc add(elt) { someData.add(elt); } 
}

var b : bag;
forall i in 1 .. 100 {
	b.add(i);
}

writeln(b);

TIO (Online Compiler)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use a Collection in an opaque way with generic programming. I'm not sure that supporting mixed-type arrays of Collection is worth the trade-off of requiring users to call privatization functions manually.

I'm having trouble interpreting this:

you wouldn't be able to store any kind of Collection as you can no longer inherit
you wouldn't be able to use the Collection in an opaque way (since before removal, they would truncate all methods and fields to the base type).

What does inheriting have to do with storing a Collection? What do you mean by removal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, you wouldn't be able to do this...

proc addSomeData(c : Collection) {
   c.add(1);
  c.add(2);
  c.add(3);
}

var bag = new DistributedBag(int);
addSomeData(bag);

You would not be able to use it that way at all. You can't say "Give me any collection so I can add elements to it", which I believed was one of the main key-points to using it. Unless, you're saying that we have it like this...

proc addSomeData(c : Collection(?implType));

Where Collection is generic on the implementation? Can you provide an example of how you envision it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it makes more sense to me to use OwnedObject either directly or as inspiration. Generally speaking, we've been moving away from reference counting things in preference to requiring that the user not delete them before they go out of scope. I think if that strategy is good enough for language-supported arrays, it's good enough for Bag etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lastly, even if you (say) replaced class DistributedBag with record DistributedBag, that would probably cause problems with the privatization code. In order to privatize you probably do need record DistributedBag and class DistributedBagImpl.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure, this is an affirmative on committing to record wrapping. I'll begin working on these changes in a new branch immediately (stop me if I assumed incorrectly).

Also, looking at the OwnedObject module, doesn't it seem more than a bit... restrictive? Looking at the implementation, all it does is disallow the user from using the object in certain contexts using pragmas. SharedObject, reference counting or not, allows the user to make use of the all data structures as if they are classes, which I strongly believe is the way to go about this. If privatization support improves for classes, then this will not be an issue in the future and since this becomes a 'module/package' it doesn't have to strictly follow the 'standard'.

Copy link
Member Author

@LouisJenkinsCS LouisJenkinsCS Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mppf

I'm thinking that with record wrapping, I might be able to do this without requiring explicit privatization (perhaps this might be best?). What if I just create an array of DistributedBagImpl in the record and have each Locale pick which to use based on that? The only issue is that this type should almost never be used as a wide ref or else it incurs the cost we're trying to avoid (hence the need for SharedObject). Speaking of SharedObject, I think I can make it work by just making a SharedObject hold on to some helper class that has the sole purpose of destroying the DistributedBag fields when it gets destroyed.

Furthermore, with all this indirection, I can probably proceed this way, and it probably is best in general. I still have concerns based on how to ensure that as a record it never is used by wide reference (is there a pragma for that?).

As you work towards removing the inheritance, just make Collection exist for documentation purposes

Would it be possible to make it into a CHIP instead? However, if we were to keep the inheritance with the implementations to make Collection relevant, the issue would next come down to forwarding. If I want to forward only on the privatized instance, is this currently allowed? Etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd lean towards mimicking the pattern of record wrapping done by arrays. That would not necessarily mean using OwnedObject. In particular, the main point there is that when the record is destroyed, the Impl instance is deleted, and there's no provision for keeping the instance alive longer if there are record copies of it. In this way it is similar to using a bare class.

It's probably not currently a good idea to try to use OwnedObject for this purpose, as OwnedObject has some issues (to do with lvalue checking and compiler errors at least) that you could easily avoid just by writing your own constructor / destructor.

If you created an array of DistributedBagImpl, you'd still need privatization / replication in that array to avoid communication when getting the local instance ID. I think it's reasonable to use the privatization mechanism, but having a Replicated array field (say) is a pretty reasonable alternative.

About not wanting to use the record as a wide ref - Ben H. recently committed a mechanism that can be used to serialize/deserialize instances. Obviously it might be more than you're able to do in your last week, but you could try to use that to control what happens when your data structure is used on the other side of an on statement.

I'm not sure what you mean by forwarding... "If I want to forward only on the privatized instance, is this currently allowed?" I think that's what the array record does, depending on exactly what you mean. Might be best to find me on irc to ask about these.

there are larger numbers of elements. The better the locality, the better raw
performance and easier it is to redistribute work.
*/
config param distributedBagInitialBlockSize = 1024;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering - is there a particular reason these need to be compile time constants? Certainly I can think of reasons we might prefer to hide them from the -help output, but I'm having trouble imagining that such configuration settings couldn't be set at launch time? (I.e. config const).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember originally it had to do with using tuples (which is no longer the case), I suppose that slipped my mind. I'll make that change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ...

we allocate to increase raw parallelism, and the larger the workload the better
locality (see `distributedBagInitialBlockSize`). This data structure is unordered and employs
its own work-stealing algorithm, and provides a means to obtain a privatized instance of
the data structure for maximized performance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph is a pretty good introduction. For full-on user documentation you'd also need to include an example of using it.

*/
proc getSize() : int {
var sz : atomic int;
forall loc in targetLocales do on loc {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a coforall

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ...

*/
proc contains(elt : eltType) : bool {
var foundElt : atomic bool;
forall elem in getPrivatizedThis {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I havn't dug in to the these call below, but I would've expected this one to be written with somehting like

coforall loc in targetLocales do on loc {
  forall elem in getPrivatizedThis.localElements() {
        if elem == elt {
          foundElt.write(true);
       }
  }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does sound like a good idea, having each Bag implement some parallel and serial iterator to reduce the amount of redundant code.

distribution across majority of nodes, but this should be called when you have
a severe imbalance, or when you have a smaller number of elements to balance.
Furthermore, while this operation is parallel-safe, it should be called in a
sequential context.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think - but am not sure - that what you mean by "it should be called in a sequential context" is that it should only be called from one task at a time - since it redistributes the entire data structure and uses on statements etc to handle data elsewhere. I don't think "should be called in a sequential context" means that for me. I'd probably just say "Furthermore, while this operation is parallel-safe, it's only really useful to call it from one task at a time, since that call will update the entire distributed data structure".


// Allocate buffer, which holds the 'excess' elements for redistribution.
// Then fill it.
var buffer = c_malloc(eltType, excess);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use c_malloc instead of just a normal Chapel array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I have a helper method that takes both a c_ptr, a size, and a locale and writes into it, and one that does the opposite (reads from a c_ptr). I chose c_ptr over chapel arrays as they are more lightweight, but I didn't know about the bulk transfer support you mentioned before.

var give = average - nElems;
var arr : [1..give] eltType;
on bufferOffset {
var tmpBuffer = buffer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If buffer were also a Chapel array, you'd write this transfer in a way that the Chapel compiler can optimize into a small number of PUTs by using array slices. E.g.

arr[1..give] = buffer[bufferOffset..bufferOffset+give-1];

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this situation, I'll give it to you, that does look like it'd save me a ton of logic and triple checking, I'll see about using that, and in particular, using slices like that.

if average > nElems {
var give = average - nElems;
var arr : [1..give] eltType;
on bufferOffset {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure you meant on Locales[bufferOffset] or something like that - as it is, on bufferOffset will probably be a no-op since bufferOffset is running on the current locale.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call on bufferOffset because bufferOffset is located on the node that holds the buffer, which is the node that called balance. On line 341, you see that we do a for loc in Locales do on loc. We have to jump to the locale owning the buffer because its a C_Ptr.

}

// Phase 3: Release all locks from first node and segment to last node and segment.
for loc in localThis.targetLocales do on loc {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this need to happen in the reverse of the acquire order to prevent some deadlock case? Or am I just being paranoid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we hold the locks on all segments, so we don't need to bother checking to see if they are locked or not, we just unlock them. We start by locking the first node segment, then continue on to the last, so we unlock in the same way. Also should be noted that we use atomic spinlocks over sync variables.

for loc in targetLocales {
for segmentIdx in 0..#here.maxTaskPar {
// The size of the snapshot is only known once we have the lock.
// BUG: c_ptr does not seem to work here and causes a segmentation fault at runtime.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well yeah... c_ptr is always local and doesn't support use from another locale. _ddata would do that, but it's almost certainly better to just use a Chapel array in the first place.


inline proc push(elt : eltType) {
if elems == nil {
halt("DistributedBag Internal Error: 'elems' is nil");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have assert too, which might be more succinct for errors like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I've had with assert is that for some reason I can't seem to find the source of an assertion, normally it shows a line number that points into the Assert module. Furthermore, it doesn't seem to allow me to provide an error message for failing an assertion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, it's not worth too much trouble at this point. However we could certainly add error messages for asserts. And for the line numbers, try CHPL_UNWIND=libunwind and see where that gets you. Or compile without CHPL_DEVELOPER=1.

@mppf
Copy link
Member

mppf commented Aug 23, 2017

I ran some testing in the standard (local) configuration and get a failure for CollectionNQueens:

CollectionNQueens.chpl:168: error: halt reached - Spun: 1000000

@LouisJenkinsCS
Copy link
Member Author

Did you change the number of queens for NQueens? As well, which data structure. Its possible to naturally reach 1M spins (was an arbitrary upper limit).

@LouisJenkinsCS
Copy link
Member Author

After running it locally and inspecting it, the globalHead, globalTail, and queueSize are correct, but it seems it is dropping elements somewhere. I'll need to look into that.

@LouisJenkinsCS
Copy link
Member Author

After testing, it seems that this issue only arises on --local or non-multilocale setups, and only on NQueens. Rerunning the other tests, which pass with flying colors, combined with the strange fact that both tail and head positions seem to be fine (and the fact twice now I have had runtime undefined behavior/issues caused by compiler optimizations), I'll need to look at C code.

@mppf
Copy link
Member

mppf commented Aug 23, 2017

Passed full local testing except for CollectionNQueens.

@LouisJenkinsCS
Copy link
Member Author

@e-kayrakli

It turns out the issue was much simpler than expected. When I was recycling the node, I forgot to sanitize it, so I was reusing the old values as well as the old headIdx and tailIdx, meaning it would be reusing data from the old operations. Now that I sanitize it (and have reran the benchmark 100 times on swan), I confident that it works now. Phew, had me nervous.

@mppf I'll push out fix soon so you can test it tomorrow.

@LouisJenkinsCS LouisJenkinsCS force-pushed the GSOC-Collections-Module branch from ba83ff5 to 1f0fff2 Compare August 24, 2017 17:30
@LouisJenkinsCS
Copy link
Member Author

@mppf

Ready to merge. Had to rebase and force push my changes so that I could run with new Barriers change, so old reviews are outdated, but I'll manually look through them when I make 2nd

@mppf
Copy link
Member

mppf commented Aug 24, 2017

Passes new test directory with:

  • quickstart
  • quickstart + gasnet
  • qthreads
  • qthreads + gasnet

@mppf mppf merged commit 7889137 into chapel-lang:master Aug 24, 2017
@LouisJenkinsCS LouisJenkinsCS deleted the GSOC-Collections-Module branch August 24, 2017 19:34
/*
Pass 2: Average Case

Find any segment (locked or unlocked) and make an attempt to acquire it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "Find any segment" but the average case code only looks in 1 segment. Is that intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's outdated documentation. Originally we checked the status of the segment before trying to add to it (back when I had work stealing keep the lock on the segment, it would just look for an adjacent segment which wasn't). I can make a OR correcting it if needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, has there been a reported issue so far with the collections?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I havn't seen one, I'd let you know. I think we're still waiting for somebody to battle-test them.

You can fix the comment if you like, no big deal.

@LouisJenkinsCS
Copy link
Member Author

@mppf I've got some potentially good news! I've proposed a change to my independent study advisor to redirect it towards research towards Chapel! (As I said potentially, its not a given yet) If its approved, I'll be doing a lot more work on either Collections framework or the GlobalDescriptorTable/GlobalAtomicObject as my new research topic! (Also I got a small undergraduate research grant, woo!)

@LouisJenkinsCS
Copy link
Member Author

Nevermind, gotta commit to current topic. I wonder if I could talk to someone on the team who is proficient enough with the LLVM backend to help me out a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants