Concurrency #17

shelby3 · 2016-09-27T07:08:18Z

I have a long comment at the Rust forum (some of the inspiration came from @keean linking me to Sean Parent's video presentation) and another one that I posted to another forum, which delve into the various options for concurrency and comparing them. Note the negative aspects of stackless coroutines enumerated in the second linked comment post.

Also note that my code that employs a ES6 generator to implement an emulation of async / await has never been tested, but I did update and correct it recently.

@spion wrote a blog on why ES6 generators offer more flexibility than async / await.

Continuation-passing-style (CPS) can be employed for stackless coroutines which lower overhead (which for example PureScript supports with a continuation monad), but they are very restrictive and don't interopt well with the imperative programming that we do with JavaScript.

Concurrency is hard and we are not aiming for maximum performance of context-switches, thus I think stackless coroutines are a pita and not worth it. For someone who needs that control, let them use another language (e.g. PureScript or C++) and interopt through our FFI or JavaScript.

I need generators to support custom loaders, my emulation of async / await, and in general to interopt well with JavaScript ecosystem such as Node.JS.

Thus I think we should support ES6 generators initially. And we can look into coroutines later.

Thoughts?

The text was updated successfully, but these errors were encountered:

keean · 2016-09-27T07:13:43Z

I think generators and co-routines are the way to go. Everything should be eager unless specifically lazy. That one was easy :-)

shelby3 · 2016-09-27T07:15:50Z

So are you okay with not doing anything in the language for stackless co-routines? The stackless form can implemented with a continuation monad library combined with declared purity?

Edit: I have hence realized that monads are employed to enable data race safe sharing of mutable state emulated by copying immutable data, and has nothing to do with the choice of the underlying mechanism of the usermode (i.e. not kernel context-switching) cooperative multitasking. Monads enable controlling stateful effects within a pure, referentially transparent, immutable paradigm.

keean · 2016-09-27T07:27:44Z

Stackless co-routines are cool. Its basically the same as JavaScript's event model. If we are compiling to JavaScript we have to cope with this. If we write an infinite loop, JavaScript requires returning to the event loop to process IO.

Is it best to give programmers a model where they write a linear synchronous program, and its is converted to the event model underneath and runs on a stackless co-routine runtime (transpiling to 'Go' would be interesting), or is it better to give them an asynchronous event based model?

shelby3 · 2016-09-27T07:50:15Z

@keean wrote:

Stackless co-routines are cool.

But they have compositional (and perhaps other) trade-offs. They seem to force either low-level tsuris, or monadic tsuris. Remember in general that monads don't compose (Applicatives do compose).

Stackless co-routines are cool. Its basically the same as JavaScript's event model.

I believe that may be incorrect. JavaScript is using afaik stackful event loops (as in multiple event loops running concurrently, one each for each JavaScript instance). Afaik JavaScript is not using a jump in CPS style, rather the script must exit or control is passed to the event loop with generators and returning a Promise.

Edit: I have hence realized that JavaScript’s callback model unwinds the stack storing the callback context on the heap to return to the event queue scheduler. So there is only ever one stack in use.

If we are compiling to JavaScript we have to cope with this. If we write an infinite loop, JavaScript requires returning to the event loop to process IO.

This is accomplished with generators and Promises (or setTimeout <--- notice camel case), which afaik are stackful.

Is it best to give programmers a model where they write a linear synchronous program, and its is converted to the event model underneath and runs on a stackless co-routine runtime (transpiling to 'Go' would be interesting), or is it better to give them an asynchronous event based model?

Afair, you posed that thought to me that before on the Rust forum. Can you find my response? I find it arduous to locate details of our past discussions on Rust's forum.

~~Generally I have disliked the design choices of almost everything about Go's design and I think I had already explained why I didn't like Go's concurrency model, but I forget by now.~~

keean · 2016-09-27T08:22:36Z

What do you think about the actor model?

shelby3 · 2016-09-27T08:36:07Z

I remember you explained some aspects of Go and especially the concurrency model are applicable to a some use cases. And I recall I didn't (at that time) find those use cases compelling for the programming I need to do right now.

I am hoping (^{_{pie-in-the-sky right now trying to resolve my health ailment}}) to build a mobile app platform (headed towards abstracting away the operating system to a large degree); also I need to do a lot of coding for JavaScript for client and server reusing the same code base on both (i.e. Node.JS), so I need high-level abstractions, modularity, extensibility, and composition. If I were doing systems programming, I'd have different priorities.

When I need to drop down to systems level tsuris, I'll probably be hoping for something better than C++, but seems Rust may have blown that chance with the mandatory lifetimes (or maybe that is exactly the feature needed if people will use it). I might have a different perspective on Go for that use case. We’ll see.

As for Actor model, that is more than I want to tackle today. I'll come back to it.

SimonMeskens · 2016-09-27T18:12:43Z

I'm a big fan of the actor model. Just as an aside.

Anyway, I think CPS with tail optimization basically gives you the best possibilities, combined with syntax that compiles into a stack machine (ES6 generators work like this and async/await is just a particular generator).

I really really love JavaScript's concurrency model with setTimeout. Basically, in JavaScript doesn't use coroutines, it uses a polling based event loop. A game loop in JavaScript is written like this:

function loop() {
  console.log("ping!");
  setTimeout(loop, 0);
}

loop();

This means: run loop, then yield to the event loop and ask it to rerun loop as soon as possible. During the execution of the loop, no other code can run (not even UI events, there's no interrupt events in JavaScript). So when you write an infinite loop, you have to write a recursive loop that yields to event loop with setTimeout instead. Notice that giving it a timing of 0 means basically just infinite loop (0 means loop as fast as possible), but after each iteration, it allows the UI logic to update.

I personally think this is THE best system I've ever seen in a language. It makes sure that you get concurrency, without state issues. For lots of concurrency use cases, you don't actually need threading and JavaScript's model allows you to do it very elegantly.

So yeah, sticking to that same model seems logical to me.

keean · 2016-09-27T18:30:09Z

We could have a first class actor model:

actor Queue :
    list : Nil
    add : (activity) =>
        list.push(activity)
    remove : (activity) =>
    // etc

The idea is the actor can have state but all the 'methods' must be pure, and we model calls as asynchronous message passing (which can be optimised to a normal function call.

SimonMeskens · 2016-09-27T18:42:22Z

I'm not sure if I agree with syntax, but yeah, first class actors would be extremely interesting to have. I think actors also combine very elegantly with generators and iterators for a VERY powerful concurrency model.

Combined with a yield to event loop syntax, this would make ZenScript close to the most powerful concurrent language on the market.

I just wonder if we can somehow marry records and actors under the same system, so all actors are records or all records are actors. I'd also like to get delegation into that system. I'm not very familiar yet with the current ideas on records, so I'll have to read up on that and come back to you.

keean · 2016-09-28T07:40:17Z

Records are data, so would be the messages sent between actors, or in the internal state of actors etc. So actors are not records, actors receive and respond to messages (they act), records are just plain data (although in theory a message could contain an actor address, or even a serialised actor). So we end up with something like actors can contain records, and records can contain actors, but actors are not records.

SimonMeskens · 2016-09-28T11:17:03Z

I'd argue you can view a record as an actor that responds to queries (messages) for data in a specific format, but I get your point. We can always look into this issue later, it's not very critical. It'd also devolve into a discussion about non-deterministic arbiters and such, which is also something I don't want to get into.

In summary: first-class actors that are encouraged and influence library design = good. I envision libraries that offer lots of functionality through monadic abstractions over actors in a way that makes it look like any other language and I smile 😄

shelby3 · 2016-12-07T16:49:58Z

@SimonMeskens wrote:

During the execution of the loop, no other code can run (not even UI events, there's no interrupt events in JavaScript). So when you write an infinite loop, you have to write a recursive loop that yields to event loop with setTimeout instead […]

I personally think this is THE best system I've ever seen in a language. It makes sure that you get concurrency, without state issues.

There’s still the potential for data race unsafety for the mutable data shared between your loop and any other event handlers that run every time your loop does a setTimeout and returns to the runtime’s event dispatching scheduler.

And the callback you provide to setTimeout has to save the state and restart location of your loop’s progress. It’s a lot of cruft compared to Go’s green threads (aka gorountines) which enable what appears to be sequential synchronous code.

Regarding my comment post about Actors (← click for link to my definition of unbounded nondeterminism), I found the following interesting:

https://youtu.be/nXYGPYnCokE?t=541

He makes a very interesting point that aspects of models which do not preserve bijective ("bidirectional") structural equivalence, are thus introducing the aspect of thermodynamics that processes are irreversible. What he really should say is such a model doesn't require a total order and allows partial orders, which is entirely necessary for modeling asynchrony. A total order is bounded, whereas the set of partial orders is unbounded.

https://youtu.be/nXYGPYnCokE?t=1460

The statement that "forgetting is the essence of thermodynamics consequences" is from the perspective of the any relative observer inside the system, but that doesn't mean the information no longer exists from the perspective of a total observer outside the system. The existence of a total observer can't be required in the system without bounded nondeterminism. So unbounded nondeterminism requires forsaking omniscience. As I had mentioned in the past, universal omniscience requires a non-quantifiable speed-of-light (so that any observer can be total in real-time), which would collapse the light cones of special relativity rendering the past and future indistinguishable and nothing distinguishable could exist.

shelby3 · 2016-12-08T18:31:58Z

More on comparing Actor model to π and rho calculi:

https://youtu.be/oU4piSiYkE8?t=488

The Actor model doesn't have an explicit algebra of channels (channels apparently require additional overhead because expose a race condition of the concurrent processes which can access the channels), although channels can be created in the Actor model by creating a new child Actor for each channel, where the "child" relationship is maintained in the state internal to the involved actors in terms of the child actor forwards all received messages to the parent actor. Thus apparently the π and rho calculi algebraically model this particular configuration of the Actor model. So the distinction is what can be analyzed algebraically in the model and not an enhancement of computational power of the model. An analogy is I guess the NAND or NOR gates from which all digital logic can be constructed, yet we don't program in such a low-level model. So the Actor model is fundamental but more low-level than the π and rho calculi.

shelby3 · 2017-02-19T07:00:04Z

Please check my logic, facts, and analysis in this comment post.

Go's green threads (much more lightweight than OS threads) are both more and less performant than JavaScript's callbacks (i.e. Promises) depending on the usage scenario.

For example, when a stack of nested functions is waiting on the result of the innermost function, Go's very low cost context switch which retains the stack is more performant than unwinding the stack and essentially copying it into a nesting of Promises for accomplishing returning from each nested function to the top-level JavaScript event loop.

Vice versa, when parallelizing independent code paths¹ (irrespective to scheduling algorithm) to make multiple blocking operations concurrent, Go needs a goroutine and thus an allocated stack for each concurrent operation; whereas, JavaScript only needs to allocate a Promise for each operation.

An example of this latter case is invoking map for a range (e.g. of an array) with a blocking function as the input, such that we want the invocation of the blocking function for each element of the range to run concurrently (but not necessarily executing simultaneously parallelized depending on whether hardware threads scheduling are employed). So for JavaScript, the code would invoke the blocking function on each element saving the returned Promises in an array and assigning continuation code to the then method to save the result for the output of the parallelized map operation; whereas, the Go code would spawn a goroutine with the continuation code for each invocation. The continuation code of both would decrement a counter to track when the parallelized operation was complete. The JavaScript completion code would fire the callback in the Promise returned from the parallelized operation; whereas, for Go the goroutine owning the parallelized operation would sleep awaiting the 0 value of the counter. When more than one hardware thread is employed by the parallelized instances, then the counter decrement operation must be a mutex (which could alternatively to shared write access, be effectively achieved by accumulating all decrements as messages to a single thread which performs the decrement, which also requires critical section synchronization for the updating the message/channel queue).

Hardware thread scheduling with JavaScript is afaik currently non-portable (e.g. no Web Workers on Node.js) and less flexible (e.g. ~~shared objects are forced to be thread safe by copying or borrowing~~[Edit: there’s SharedArrayBuffer but as such it’s of limited functionality because SharedArrayBuffer doesn’t integrate with GC, Object, exceptions, nor yield generators]). Web Workers implementations are not forced to implement M:N green threading. So optimization of hardware utilization is less flexible on JavaScript. I conjecture that it is hypothetically plausible to combine both Promises and green threading in some optimized way but not currently in possible in JavaScript.

¹ Rob Pike states an incorrect definition of concurrency in the general case because in general there is no way to prove that nondeterministic composition of code paths are independent because this would require a total order which is the antithesis of unbounded nondeterminism, a.k.a. in the indeterminism case (where the unbounded entropy of the universe is intertwined in the state machine of the program), i.e. it can't be proven that the state interactions in concurrency are faultless or fault tolerant.

Sean Parent provides a different definition. John Harrop’s definitions of concurrency and parallelism seem to jibe with mine.

And Pike's error is also a weakness of Go's CSP (communicating sequential processes, i.e. channels) in that without bounded nondeterminism then it is impossible to guarantee the channel communication won't deadlock as I wrote before:

as contrasted with Go's CSP that attempts to more tightly synchronize and bounded determinism

Even though concurrency in general can potentially deadlock on any dependent state:

An Actor employs futures (aka Promises) on result values to create a decentralized stack which is claimed to prevent deadlocks on mutual recursion but clearly it won't prevent deadlocks on locking a resource so resources must not be locked while waiting for response to a message. Not to be confused with asynchronous sending of messages, which means not blocking waiting for receiver to acknowledge receipt, which is one of the differences from Go's CSP.

shelby3 · 2017-02-21T00:38:43Z

Go is correct that the first use case example from my prior comment can be always be handled by the language and appear to the programmer at the source code level to be a synchronous function call. That assumes the programmer intends to not execute any of his source code concurrently while waiting for the result of the said blocking function. The programmer could convey this intent by assigning or casting the result value to the value type of the Promise for languages employing Promises and by ~~not passing~~[waiting on?] a channel to the blocking function in a language which employs channels such as Go. In that case, the compiler could infer that it should arrange the necessary concurrency (implemented either via event loop model or green threads).

However, the only reason the programmer needs access to Promises or channels for this usage scenario is to manually encode parallelism inherent in the algorithm (i.e. independent code paths)— but which the compiler could in theory infer if it is informed as to which functions are pure and which objects are read-only, borrowed, etc. Given this latter information, the compiler could even infer the encoding of the inherent parallelism in the second use case example from my prior comment. Even the (presumably recursive algorithm) of the loccount tree traversal parallelism in Eric Raymond's Go blog, could be hypothetically inferred by the compiler.

So I come back to my interpretation of the generative essence of Sean Parent's thesis, which is that the more completely and accurately we can express our algorithms, then less fragile optimization kludges will creep into our code. In other words, the programmer's job is to convey the algorithm and the compiler's (and runtime's) job is to optimize it. In other words, if the compiler can infer the parallelism, then it is free to optimize the number of threads dynamically (and even perhaps to choose between an event loop or green threads on a case-by-case basis). Compilers and runtimes thus become smarter and programming becomes simpler with more algorithmic clarity (less implementation details and premature optimization clutter).

Thus I am asserting that both JavaScript and Go have some catching up to do with Haskell which I've read it already quite good at optimizing and inferring parallelism and other optimization expressed in the algorithm. And Haskell is too obtuse and as I explain below that 100% purity is too restrictive (as we discussed in the thread on purity it forces purity everywhere as total order, requiring monads to model effects as types yet monads aren't generally composable). There is hopefully a better balance that could be achieved. A marriage of pure, declared side-effects (effects as types?), and read-only declarations with an imperative language and without all the category theory when it isn't necessary.

Sometimes it is necessary to borrow the lifetime, because just declaring an object read-only is insufficient to convey the algorithm. For example, for inferring the parallelism if we know the input tree object to the loccount example is borrowed, then we don't care if the per-file counting function is pure, because we know it can't modify the input tree object if we aren't providing the said function a borrowed reference to the tree. But we only need to borrow it as read-only, so it is not an exclusive borrow except exclusive of writing. However, as @keean and I remember from our analysis of Rust, borrowing is a global total order exclusion paradigm. Whereas, if we know that the per-file counting function is exclusive of the input tree object (i.e. in a partial order of exclusion), then we don't need the paralysis and tsuris of a total order on borrowing. The only way I can think of to achieve that partial order is if the said per-file counting function is pure or we make a read-only copy of the input tree object. Note a pure per-file counting function could read from the files without making itself impure. A function which writes to file can also be "pure" (aka referentially transparent) w.r.t. to any context which doesn't read from the file, because it doesn't call any function which modifies any state of the program, i.e. that some external context can read from the modified file is irrelevant if the compiler is only analyzing inherent parallelism for a limited context which doesn't read from the file. So it seems perhaps we need not only pure function declarations but also declarations of specifically which side-effects an impure function creates.

Another advantage (in addition to the aforementioned simplicity, clarity, and antifragile flexibility of the source code level expression of the algorithm) of having the compiler infer the parallelism versus the programmer manually encoding the parallelism, is it won't infer it incorrectly due to human error thus creating a bug. One of the main goals of compilers is to do the tedious and complex checking for correctness that humans don't naturally do perfectly. We need humans for the creativity of algorithms, not the pedantic repetitive mundane tasks and premature optimization.

shelby3 · 2017-02-21T15:28:51Z

If I understand correctly the stance of those (e.g. @keean in private communications) who would argue against the viability of my prior comment, they argue that such optimization is equivalent to a proof search which is an exponentially hard complexity problem, e.g. optimizing a bubble sort to a parallel merge sort. But afaics, I did not propose allowing the compiler to change the algorithm. I merely proposed for the programmer to express the invariants necessary to infer the parallelism inherent in the algorithm expressed. I've read that one of the Haskell compilers is already apparently very good at this, which is enabled by its strictness w.r.t. such invariants such as referential transparency and side-effects lifted to types (monads).

I agree that compilers which are expected to modify algorithms (e.g. from a bubble sort to a parallel merge sort) would be equivalent to an exponential search which is probably in one of the very intractable computational complexity classes. I am not yet clear if my proposal is also in an intractable computational class.

The argument that claims all (or nearly all) of the interesting parallelizable code can be put in libraries carefully hand-tuned by experts seems to potentially leave a lot of userland code stuck with the noise hiding the underlying algorithm with all the premature optimization of not adopting my proposal. Also it means that expertly written library code becomes that much less readable in open source. I don't like the Cathedral model of s/w development. I prefer open source, readability, simplicity, clarity, and not premature optimization by a small group of experts who can disappear or totally muck it up and no one knows how to fix it.

Also the point of having the compiler detect the parallelism (i.e. the independent code paths) is that it needs to be able to do that anyway in order to check that the parallelism that the concurrency that the programmer would otherwise manually encode would not violate the (lack of) parallelism (i.e. required data race safety) in the algorithm (c.f. Rob Pike's misunderstanding of concurrency versus parallelism). Thus in order to not involve human error and bugs, we need the express the invariants to the compiler so the compiler knows which code paths are independent. Without invariants, the compiler can only safely assume that none of the code paths are independent and thus there is no parallelizability in the code. The reason concurrency programming is so difficult (i.e. buggy) is because humans can't reason perfectly about all the invariants.

Rust's strategy is to force us to have a total order of (provably safe from deadlocks and livelocks) compile-time “locks” (borrows) on all resources in the program including all allocated objects. But it seems to be uneconomic to force such pedantic tsuris of synchronization on all code, when the highly parallelizable code may only be a small portion of the program. Instead I have proposed that we only declare the invariants we need to, for the cases where we need parallelization.

keean · 2017-02-21T16:21:59Z

Parallelising an algorithm is equivalent to changing the algorithm. Modern super-scalar out-of-order CPUs already exploit the maximum inherant parallelism in code, and they do so at runtime, where there is more information about scheduling and completion times than at compile time. A compiler cannot beat the CPU at this kind of optimisation, hence the failure of VLIW CPUs in general computing. Again writing the Titanium compiler to achieve the same results at compile time as an x86-64 achieves as runtime turned out to be a hard problem.

shelby3 · 2017-02-21T17:16:38Z

optimizing a bubble sort to a parallel merge sort

That is parallelizing the task of sorting, not inferring the inherent parallelization in the original bubble sort algorithm (if any).

Parallelising an algorithm is equivalent to changing the algorithm.

In the example code below i and j can both be inherently read in parallel, as long as the compiler knows that the two reads are independent from each other. The algorithm is not changed by running those two reads concurrently. The invariants (which if declared, insure the two reads are independent enabling the inherent parallelization) are part of the expression of the algorithm.

var i = ... // read string from a file
var j = ... // read string from the Internet

return i.concat(j)

Modern super-scalar out-of-order CPUs already exploit the maximum inherant parallelism in code

They can't optimize the inherent parallelizability in the example above, because they don't know that the two reads are independent. Those invariants aren't expressed at the low level of machine code.

You are conflating low and high-level parallelism.

keean · 2017-02-21T17:24:32Z

I see what you are talking about. There is no guarantee that the file and the network are not connected. Reading the file may change the result returned from the network. Such optimisations are not safe in general.

The programmer needs to express whether the IO operations need to be sequential or parallel. In the above example they are inherantly sequential. Parallelising may break the programmers assumptions.

There needs to be a way for the programmer to say it is safe to parallelise. Something like:

(I, j) = parallel(read file..., read network)

shelby3 · 2017-02-21T17:33:39Z

There is no guarantee that the file and the network are not connected. Reading the file may change the result returned from the network.

If the reading is declared pure, then it can't have side-effects. This should obviously be enforced.

Such optimisations are not safe in general.

I am not proposing in general. I am proposing only when the invariants insure the independence of the code paths.

The programmer needs to express whether the IO operations need to be sequential or parallel.

Disagree. Seems you are missing the entire justification for (thrust of) my proposal. Now you are trusting the programmer to know all the invariants of every possible ramification in the universe, i.e. back to JavaScript and Go's model. And manually encoding the concurrency details instead of the compiler taking care of that detailed tsuris and noise. This causes bugs and makes code less comprehensible (obscures the algorithm). If instead we build up invariants via hierarchies, the same as we do for coding in general (i.e. we call functions which call other functions, i.e. a purity declaration is transitive to functions we as the programmer aren't even aware of), then the compiler does that reasoning for us. We generally don't want programmers to have to do non-local reasoning, as that is not modularity.

Again I am not yet sure if my proposal is computational complexity tractable. But it is also not necessary that the compiler infer every parallelization opportunity. The programmer can still fallback to manual encoding when the compiler fails to infer. It can be a compiler technology that improves over time.

I want to make some major innovations in programming language, if possible.

keean · 2017-02-21T17:43:12Z

The system does not know the invariants. The file that you read could trigger a network service to change the data downloaded from the network, such that changing the order from sequential to parallel will change the results output by the program. In general the programmer is the only thing with knowledge of the invariants, unless they are somehow expressed in the code.

shelby3 · 2017-02-21T17:48:45Z

The system does not know the invariants.

We don't refute a potentially correct proposal by refuting one example where the invariants could not be insured (thus could not be declared and wouldn't be inferred). That is disingenuous (noisy) discussion to not admit there are examples that can be insured.

It does in some other examples. I had alluded to an example in one of my prior comments, where the compiler knows from the declared invariants that reading a file doesn't modify any objects (stored in RAM) in the code paths.

The file that you read could trigger a network service to change the data downloaded from the network

Not if the invariants insure that the network resource does not dependent on any such network services. My example presumed the invariants were insured.

Tangentially, you are making the case for why in general concurrency is indeterminant and there are in general no independent code paths and thus in general concurrency is always buggy (and can't be fixed), because the entropy of the universe is unbounded.

But my proposal is about limiting inference of inherent parallelism to insured invariants.

In general the programmer is the only thing with knowledge of the invariants

If the invariants aren't insurable, the programmer manually writing the concurrent structures isn't going to make it any less buggy. You've gained nothing except all the negative aspects (I've already enumerated) of not inferring.

If the compiler can’t check the data race safety of the parallelism the programmer explicitly requests, then the same bugs can happen. I guess your point is that it’s better to at least supplement what the compiler can check with the programmer’s explicit intent to have a better chance of avoiding data race safety bugs for that which the compiler can’t check.

Edit: you can make the argument that the programmer can declare that his algorithm doesn't care if the result of j is nondeterministic w.r.t. to i and the unbounded entropy of the read operations. Thus your manual encoding becomes useful in that case where the invariants required for deterministic parallelism can't be insured. I wasn't arguing against still allowing that manual encoding when necessary (to express unbounded nondeterminism or to handle deterministic cases the compiler isn't smart enough to infer).

keean · 2017-02-21T19:41:53Z

I think there are three scenarios, these must be sequential, these must be parallel, and "don't care". The problem with the above example is it is ambiguous between "must be sequential" and "don't care". Programmers used to writing sequential code will assume sequentiality. We need a way to distinguish don't care, and sequential.

We both know this is not a problem for pure code, as beta reduction order is non-deterministic, but we cannot do IO from pure code (exactly because it makes evaluation order visible to external IO).

keean · 2017-02-21T19:54:46Z

In general IO assumed sequentiality, consider:

print("Shall we play a game?")
let game = getString()

shelby3 · 2017-02-21T20:12:49Z

In general IO assumed sequentiality, consider:
print("Shall we play a game?")
let game = getString()

You are writing and reading from stdout and stdin, thus of course those invariants tell the compiler there is no inherent parallelism. I don't see any inconsistency with my proposal and your example.

If instead you were copying that UI string to a dialog box control and then reading from an input field of that dialog, then you'd have inherent sequential code path expressed by the dependencies on the access to the dialog object resource.

shelby3 · 2017-02-21T20:19:49Z

The problem with the above example is it is ambiguous between "must be sequential" and "don't care".

There is no ambiguity because if there are no dependencies, then it doesn't matter which order i and j are instantiated.

I understand what you are thinking is the programmer may have some semantics in mind which are not expressed unambiguously in the code itself. Can you think of an example?

keean · 2017-02-21T23:46:19Z

It only doesn't matter because you assume the file and network operation are independent. Let's assume they are dependent, so reading the file changes the result from the network service. How do I express that the order matters, even though the operations are on different 'channels'?

keean · 2017-02-22T08:32:55Z

Here's a more practical example, first write a file to a USB stick, and then display a message that it is safe to eject the USB device. If you arbitrarily parallelise IO then the message will appear whilst the file is still writing, and you will remove the device resulting in a corrupted data file.

shelby3 · 2017-02-22T11:55:17Z

Here's a more practical example, first write a file to a USB stick, and then display a message that it is safe to eject the USB device.

Correct programming code would not display that message until testing that the return value from file write operation did not return an error.

You could argue that correct code would issue an exception on error (instead of a return value) which would bypass the code that follows the file write call. Exceptions thus make it ambiguous whether the code following the file write call is sequentially dependent on the successful execution of the write or not. This is a problem of expression due to void functions. And this would need to be dealt with somehow.

Perhaps the most general construct would be to allow void functions to be assigned to an object of Void type. Dependent code paths could condition on if( object exists ). Or probably better is just assign the function to an object of type Promise (or infer the type desired by call .then() on the function return value), and attach the dependent code using the then() method.

Thus semantically dependent sequential code paths become explicitly more verbose (where they would not have been) so that independent code paths can be inferred.

shelby3 · 2018-06-02T12:48:54Z

http://www.raphael.poss.name/on-the-realizability-of-hardware-microthreading/
i started reading this yesterday, it was interesting :D

Can you provide a quick summary of main findings? I don’t have time to read it.

NodixBlockchain · 2018-06-02T13:52:34Z

And then I discovered goroutines.

For me the three main issue i see with Go in the current state of things are :

Channels are under lock, so the 'share data by communitcating' is ok for IO kind of concurency, but for low latency operation on large set of shared data it doesn't seem optimal.
Scheduler seem to be still in early stage, and need to a bit baby sit it to get the thing right, regarding all the issue of core affinity, cache issue etc, it's still some abstraction that need to keep in mind when designing the program for it not to be come a ressource hog. Eventually using buffered channel to limit the concurency level etc
It seems to be lacking a little bit with performant database.

for 1 & 2 Dmitry Vyukov seems to be on it, and he wants to do a system of lockless channels, and improve scheduler to care more about data locality and that sort of things, but there is no deadline or anything for it.

for 3 it will probably come up at some point.

And need to see that for blockchain, the level of IO concurency might not be that high, in bitcoin they don't recomand using more than 8 outbound connection in peers, or to limit the number of peers one node is connected to, to avoid centralization, and to have one node with too much outbound connection and all node connecting to it. They say having too much outbound connection will be detrimental to the proper scaling/decentralization of the network.

NodixBlockchain · 2018-06-02T13:55:23Z

Can you provide a quick summary of main findings? I don’t have time to read it.

I didn't finish it yet, but they discuss various aspect of what is at stake at hardware level with micro threading/concurency and all that sort of things.

They discuss things like access window on the register file to avoid register spilling in function calls, system to share cache between different cores, the concept SPMD (single program multi data) like it look similar to shader thinking, ability to execute the same program on different data, and all kind of things.

But didn't finish it yet :)

shelby3 · 2018-06-02T15:08:44Z

I wrote:

So again I ask you, “Do you know what percentage of the silicon of Intel CPUs are for the speculative execution and register rewriting as contrasted with the caching?”

Apparently only about 10 – 15% of the transistor budget is expended on OoO execution (percentage of TPD power budget may be higher):

https://hothardware.com/news/amd-ryzen-die-shot-cache-structure-architecture-advantages-exposed

https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)#Core_2

https://en.wikichip.org/wiki/intel/microarchitectures/sandy_bridge_(client)#Core_2

So apparently there will be not much sense in removing for purposes of increasing number of cores.

Although it would still perhaps be beneficial to turn it off when we don’t need it (i.e. in I/O bound concurrency) in order to conserve power consumption (important on mobile and in data centers), and when we need our code to surely free from timing attacks that might be lurking in the nondeterministic out-of-order speculation.

Nevertheless the point remains that transistor budget is increasing faster than transistors can be consumed for extant core counts that would increase performance. IOW, the OoO execution and superscalar features can’t improved. So Intel and AMD are forced to use those transistors for increased core counts.

So more extensive use of concurrency and parallelism is unavoidable.

Does anyone know why Itanium at 32nm had double the number of cores of Xeon at 32nm?

shelby3 · 2018-06-02T15:16:34Z

And need to see that for blockchain, the level of IO concurency might not be that high, in bitcoin they don't recomand using more than 8 outbound connection in peers, or to limit the number of peers one node is connected to, to avoid centralization, and to have one node with too much outbound connection and all node connecting to it. They say having too much outbound connection will be detrimental to the proper scaling/decentralization of the network.

Bitcoin doesn’t scale. So that’s just mostly nonsense coming from them. Garbage in then garbage out. Their goal is that the mining/validating fullnodes have distributed network topology, but that has nearly nothing to do with whether it is decentralized. Perhaps it is partly them trying to maintain an illusion of decentralization in Bitcoin.

The only way to scale with 0.05 second (50 ms) latency intra-shard transaction confirmations is with supernodes that handle 1000s of light client connections.

Distributed != decentralized. Centralization refers to the control over the nodes. The design of the system can cause supernodes to have no effective control. You are probably going to be very shocked when you see my ledger design. I try to think out-of-the-box.

EDIT: @NodixBlockchain perhaps you were referring to the P2P gossip network for communicating non-realtime information between all online nodes (both full and light). In that case, I agree that nodes should try to not all connect to same hub nodes. Yet I still maintain that doesn’t really guarantee anything because nodes can be Sybil attacked.

for 1 & 2 Dmitry Vyukov seems to be on it, and he wants to do a system of lockless channels, and improve scheduler to care more about data locality and that sort of things, but there is no deadline or anything for it.

for 3 it will probably come up at some point.

Scheduler tuning can proceed. The point is that the model is the correct one to pursue at the general purpose PL layer.

NodixBlockchain · 2018-06-02T17:48:00Z

Yes the gossip p2p protocol, they limit to 8 inbound connections, so in the absolute normally each nodes doesnt need to be connected to more than 8-16 nodes for the network to works well. If too much nodes end up connecting to a limited numbers, it can lead to longer propagation time for certain nodes, and not optimal distribution of the propagation time for tx/blocks. And it means some nodes will manage 1000 conections some other 1 or 2 which is not very well balanced. They says optimal propagation time is around 8 peers per node. Which doesn't prevent to retrieve and store larger bookmark of nodes, as long as they are selected randomly and that each node have a bookmark of node as large as possible, it still keep the network balanced. They talk about this sort of issue in this thread among other things https://hackernoon.com/the-ethereum-blockchain-size-has-exceeded-1tb-and-yes-its-an-issue-2b650b5f4f62 My idea to takle the main issue they are talking about is to have shards organized around application, as most likely each user will use only certain applications, and there are much less applications than users, so i tend to think 'natural shards' can organize around applications, and i have a little bit of an idea to organize this sort of things, if the network become too big. But they speaks also about this sort of issue about keeping the network as balanced as possible between all peers of the network, and normally it doesn't need that much high number of connections / nodes. And in the logic, large % of user should be running full node, and use the local RPC to use applications, so i'm not sure there is really that much great need for high level of concurency on the scale of a node. Le sam. 2 juin 2018 à 19:17, shelby3 <notifications@github.com> a écrit :

…

Note the edit: EDIT: @NodixBlockchain <https://github.com/NodixBlockchain> perhaps you were referring to the P2P gossip network for communicating non-realtime information between all online nodes (both full and light). In that case, I agree that nodes should try to not all connect to same hub nodes. Yet I still maintain that doesn’t really guarantee anything because nodes can be Sybil attacked. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#17 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Ab61I9INBhJ0mxw0HnIvX2k05adwk7gwks5t4siegaJpZM4KHUwx> .

shelby3 · 2018-06-03T11:19:43Z

@NodixBlockchain wrote:

If too much nodes end up connecting to a limited numbers, it can lead to longer propagation time for certain nodes, and not optimal distribution of the propagation time for tx/blocks.

Craig Wright’s data showed that propagation to the top 98+% of the hashrate is within roughly 1 second on average. But for the 1000s of nodes in the remaining 2%, propagation can be much slower. So hashrate actually drives the prioritization of connectivity in the network, not some rule which attempts to maximize the distribution of the network topology. And this is because as I already wrote to you, distribution of network topology has little or nothing to do with the economics and thus the centralization of control.

My idea to takle the main issue they are talking about is to have shards organized around application, as most likely each user will use only certain applications, and there are much less applications than users, so i tend to think 'natural shards' can organize around applications, and i have a little bit of an idea to organize this sort of things, if the network become too big.

If you have an application all on one shard, then you have no resiliency if that shard loses liveness or is otherwise attacked.

And in the logic, large % of user should be running full node, and use the local RPC to use applications, so i'm not sure there is really that much great need for high level of concurency on the scale of a node.

I already told you that sort of mesh networked design for blockchain doesn’t scale latency nor throughput. And I’m not going to discuss that further with you in @keean’s Github issues threads.

Please do not go further on this tangent. We should not pollute this thread with off-topic discussion of the details about blockchains. Please I asked you already if you want to discuss that then go respond to Traxo's posts on bitcointalk.org or my posts on Steemit or Medium.

NodixBlockchain · 2018-06-03T12:18:28Z

The main point is that for blockchain in the ideal setup where there is good repartition of nodes, the need for concurrency is not extreme, and it shouldnt put too much pressure on the Go scheduler. Normally in a well balanced decentralized network where large % of user run their own node, each node shouldnt have to handle too much connections. Configuration where each node should have to handle lot of connections is more centralized configuration with 1 big server for 1000 clients. It's not even specific to blockchain, can have the same reasoning with file server, where on one side there is a node cluster on a cloud that serve all the requests, and the network scale by increasing the capacity of this cluster, and each node serve lot of files, and bittorrent where the network scale by increasing the number of nodes, and each node doesn't serve huge amount of files. Bittorrent can serve large number of requests without any node needs an extreme level of concurrency. It's really just about the level of concurrency that is required at a node level to handle large number of requests, not really things that are even specific to blockchain. Le dim. 3 juin 2018 à 13:19, shelby3 <notifications@github.com> a écrit :

…

@NodixBlockchain <https://github.com/NodixBlockchain> wrote: If too much nodes end up connecting to a limited numbers, it can lead to longer propagation time for certain nodes, and not optimal distribution of the propagation time for tx/blocks. Craig Wright’s data showed that propagation to the top 98+% of the hashrate is within roughly 1 second on average. But for the 1000s of nodes in the remaining 2%, propagation can be much slower. So hashrate actually drives the prioritization of connectivity in the network, not some rule which attempts to maximize the distribution of the network topology. And this is because as I already wrote to you, distribution has nothing to do with the economics and thus the centralization of control. My idea to takle the main issue they are talking about is to have shards organized around application, as most likely each user will use only certain applications, and there are much less applications than users, so i tend to think 'natural shards' can organize around applications, and i have a little bit of an idea to organize this sort of things, if the network become too big. If you have an application all on one shard, then you have no resiliency if that shard loses liveness or is otherwise attacked. Please stop. This thread is not for discussing blockchains. Please I asked you already if you want to discuss that then go respond to @traxo <https://github.com/traxo>'s posts on bitcointalk.org or my posts on Steemit or Medium. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#17 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Ab61I6jxER-CW9yhjzXscOuZ8PriuuTcks5t48ZPgaJpZM4KHUwx> .

shelby3 · 2018-06-04T23:03:52Z

Someone mentioned Clojure’s core.async library. Has go command for simulating green threads on the JVM.

Does anyone know if they have any tricks up their sleeve for obtaining the same efficiency of goroutines on the JVM? How could they simulate Go’s dynamic stack allocation?

analyticbastard · 2018-06-05T12:59:51Z

go is a macro that takes its body and examines it for any channel operations. It will turn the body into a state machine. Upon reaching any blocking operation, the state machine will be 'parked' and the actual thread of control will be released. This approach is similar to that used in C# async. When the blocking operation completes, the code will be resumed (on a thread-pool thread, or the sole thread in a JS VM). In this way the inversion of control that normally leaks into the program itself with event/callback systems is encapsulated by the mechanism, and you are left with straightforward sequential code. It will also provide the illusion of threads, and more important, separable sequential subsystems, to ClojureScript.

From the docs

shelby3 · 2018-08-29T05:07:21Z

M:N Green Threading on Java, Kotlin, Scala

There’s actually a drop-in replacement for goroutines on Java and Kotlin named Quasar (c.f. also). The afore-linked Hacker News thread mentions some of the pitfalls of the alternative Akka actor library on Scala which ostensibly doesn’t do a CPS transformation (and instead apparently is more like JavaScript Promise). Note the headline link is dead but archived.

Continuation passing style (CPS)

Quasar employs bytecode weaving to achieve continuations presumable in the continuation-passing style (CPS).

Scala has a delimited continuations transformation compiler plugin, but I’m not so clear on whether it would be better than what I will propose below? A reason was provided for deprecating its use but the deprecation of the plugin itself was reverted. Someone has recently implemented fibers that depends on that plugin. I had attempted to explain that delimited continuations plugin in 2012. Today I found another tutorial.

CPS normally requires tail recursion (c.f. also) to avoid overrunning the end of the stack. JavaScript doesn’t have tail call optimization (TCO); and given the reason Google provided for removing it and noting that the TC39 proposal to change the syntax as died on the vine, then JavaScript will probably never get TCO. TCO is sometimes referred to as tail call elimination (TCE).

(Note @keean did mention stackless coroutines up-thread in 2016, but it didn’t register with me at that time, that CPS could essentially model M:N green threading. I mentioned it again as possibility in Jan 2018.)

Green threads vs. `Promise`/`Future`

I explained up-thread about how Promise breaks functional composition unless every function is async and await is employed for every (not intentionally parallelized) return value. Opaque cooperative preemptive M:N green threading is superior to the Future or Promise model (which also achieves M:N threading but not opaquely and is even more inefficient than CPS (^{_{“O(N) vs. O(1)” and cited in the OP of this Concurrency thread linked from here}}) because it unwinds the stack on each non-blocking suspend) because it is an opaque abstraction that is below the language layer and thus doesn’t break serial functional composition nor force the programmer to annotate his code with different types in order for serial asynchronous non-blocking code to context switch a green thread. Explicit parallelism in the opaque cooperative preemptive model employs explicit green threads such as actors or goroutines. IOW the opaque cooperative preemptive model makes the suspends opaque; whereas, the Future or Promise model make suspends explicit.

A simple CPS code transformation

I’ve been contemplating just suddenly that we might want to have Zer0 “transpile” (i.e. non-native compile target) initially to Scala instead of Go, because Scala already has most of the features we need such as powerful type system with HKT (although Scala 3 DOT-Dotty calculus is still undecidable with HKT lambdas), type parametrisation (aka generics), type parametrised modules (via object), and even typeclasses. And Scala 3 (which appears be coming in 2019 or 2020¹) will add the non-disjoint, structural unions I’ve wanted. This might drastically simplify the implementation of the first Zer0 compiler. Additionally Scala has an excellent JavaScript compiler Scala.js (with no reflection unlike Go and Swift, so it can do dead-code elimination) and good IDE support.

Closures in a spaghetti stack on the heap

My idea is that there’s a very simple CPS transformation that Zer0 could output (such as in Scala code). For every non-blocking function/procedure application/call, wrap the remainder of the caller’s function/procedure in a closure function/procedure which takes any original (in the Zer0 code) return value (of said non-blocking function/procedure) as an input argument. And pass this continuation closure as an argument into the said non-blocking function/procedure. The returned value (and result type) of the transformed said non-blocking function/procedure is changed to that of the said closure continuation function/procedure. Return the returned value of the transformed said non-blocking function/procedure from the caller’s function/procedure. The last operation each transformed function/procedure does is return the returned value of application/call of the input continuation function/procedure—which is applied/called with input argument assigned the result (what was in the non-transformed version a returned) value of the said transformed function/procedure.

Note this CPS transformation applies turtles up the call hierarchy in that any function/procedure—which calls a said non-blocking function/procedure—is also a non-blocking function/procedure. This insures that given TCO, then the stack can’t overflow due to context switching from suspended non-blocking function/procedure. IOW all the call hierarchy for non-blocking functions/procedures is stored in closures on the heap; and thus there’s essentially no utilization of the stack for them.

Of course then we need an API that is utilized by non-blocking functions so they can suspend to the M:N scheduler. And then we need to employ something like Java’s ForkJoinPool for the scheduler. That was discussed in the links about Quasar above.

Trampoline for TCO/TCE

Additionally for those output targets which don’t support (or guarantee) TCO, an additional trampoline continuation (c.f. also) transformation can be made to effectively emulate TCO. Wrap the said transformed non-blocking function in a closure and return this closure from caller’s function/procedure instead. The caller’s caller must call the returned closure.

Goroutines are more efficient

Thus with CPS continuations every non-blocking function/procedure is a green thread. This granularity is less efficient than goroutines which have a separate stack allocated for each green thread. Because a stack is more efficient than closures on the heap. Also because for the case that actors contain non-blocking code (not the mutex for inverting the incoming message queue for impure actor procedures which is external to the actor’s initial stack frame), the CPS can’t implement the near zero-cost resource allocation paradigm (c.f. also) I’m contemplating for actors— which requires stacks and a separate bump pointer heap for each actor function/procedure.

Actors

We also need an API for actors, including instantiating them. It should be compatible with the paradigm for actors we have been contemplating in the Parallelism issues thread #41. Actors will be functions/procedures. The message inputs are the said input arguments. The result/return value is the return message. However we probably also want to abstract this so it can also work in the distributed network case.

Exceptions and stack traces

Unfortunately that CPS transformation will break stack traces and also breaks standard try-catch-throw exceptions (c.f. also) that rely on the stack. So we must also CPS transform the exceptions (c.f. also) as well. Actually the “stack” trace is still present on the cactus (aka “spaghetti”) stack (c.f. also) of CPS continuation closures on the heap. We just need a debugger that is aware of it.

The Promise paradigm also needs an analogous transformation in order to handle exceptions more correctly. An uncaught exception of the Promise should be injected into any try block that contained the non-blocking function call that returned it. IOW, any such try block should be replicated in the callbacks for onFulfilled and onRejected callbacks supplied to the then method. I have not ascertained whether the async/await paradigm is performing this transformation.

Stackful (c.f. also) green threads don’t break exceptions nor stack traces. The stack of course terminates at the creation point of the green thread (e.g. go in Go), so exceptions don’t (unless explicitly lifted) propagate to the creator of the green thread. Note this lack of propagation to green thread creator would be desired where each Actor is a green thread.

¹ _{The Sept 21 minutes linked to some completed Dotty IDE and build development. From the June 19 minutes:}

_{Eugene asked how soon Scala 3 would be ready for initial testing on Twitter’s codebase. The requirements are that basics works and that the feature set has been frozen, so they aren’t testing a still-moving target, features-wise. Martin responded that by Scala Days 2019 [i.e. June 11th, 2019] a feature freeze should be in effect, with a final release targeted for early 2020.}

_{An April 23 article stated:}

_{Don’t hold your breath just yet – Scala 3.0 is behind 2.14, giving it a release date in early 2020 according to the current schedule.}

_{Scala 2.14 will be all about smoothing the migration path to Scala 3. This release will do so by defining migration tools, shim libraries, and targeted depreciations.}

_{It’s unclear when ScalaJS will work with Scala 3 (aka Dotty).}

shelby3 · 2018-09-10T05:31:57Z

An idea for making a Pony-like PL more palatable for Javascript programmers.

shelby3 · 2018-09-13T15:01:30Z

An example of the unnecessary mess that Scala and Java get in because there’s no native M:N green threads on the JVM:

Dead-locks due to exhausting the thread pool

shelby3 · 2018-09-15T06:32:02Z

M:N Green Threads Are the Superior Concurrency Model?

Click also the links in this post up-thread.

Finishing up the research I had started about extant Scala concurrency paradigms, the Monix concept is basically lazy iterators named Observables.

Seems they’re forced to jump through these hoops on Java and Scala because there’s no native M:N green threads on the JVM.

Frankly I don’t see the benefit of bringing the knowledge of the lazy iterators into the semantics of the library-using programmer, such that the programmer has to deal with free monads. This seems to be that free monad effects discussion we already had. My stance seems to be that we can just have M:N green threading (with Actors) instead which makes sequential blocking code run asynchronously automatically. The lazy Iterator (implemented by the library programmer) is employed by the library-using programmer the same as a non-lazy one. The runtime takes care of providing the concurrency by cooperatively task switching in blocking code. This seems to be much simpler for a programmer to reason about. Goroutines (which kicked ass on the Skynet benchmark) are much loved compared to all this free monad (unnecessary) layering of semantic complexity.

To me this all smells like people trying to be too clever for their own good and then ending up in a clusterfuck of complexity. Even adding await / async doesn’t provide the same compositional degrees-of-freedom and elegance of the M:N green threading model.

I come back to the long post I made up-thread about effects (and Keean’s idea for iterating events step-by-step instead of registering for them). It seems monads are about being able to employ equational reasoning (c.f. also) in the algorithm orthogonal to the effects in the algorithm. But we’ve already discussed recently (c.f. links already provided) about how equational reasoning doesn’t really map well to unbounded non-determinism. Rather I think Pony’s Actor model with reference capabilities is what we need.

Can anyone point out a myopia or flaw in my thinking about this issue? Am I oversimplifying and how so is my oversimplification detrimental?

Of course everyone is free to do their own way. And I guess there is not just one correct way, because of human inertia. Yet I think a PL design benefits from some opinionated decisions so as to not have a kitchen sink of corner case complexity.

shelby3 · 2018-11-24T08:56:15Z

In addition to fixing some typos (e.g. s/blocking/non-blocking/), note the important up-thread edit (the bolded part insert):

I explained up-thread about how Promise breaks functional composition unless every function is async and await is employed for every (not intentionally parallelized) return value.

You may want to click the link for the explanation of why that was added.

shelby3 · 2018-11-25T05:59:14Z

The generative essence for the up-thread discussion of the Elm Architecture and @keean’s idea for pulling events (and his related point) instead of registering event callbacks, distills to not sharing mutable state between asynchronous (i.e concurrent) code paths. Also AFAICS my proposal below essentially obtains @keean’s point about having a modular interconnection between event dispatchers and receivers, which is also the point of Functional Reactive programming.

The Elm Architecture orders event processing synchronously so that the shared data is updated before the next event is processed. Thus event listeners which are deregistered while processing an event, will not fire during nor after said processing. Event listeners which are registered while processing an event, will not fire until after said processing.

“Pulling an event” is distinguished from registering a callback primarily¹ by not sharing the registration state of listeners between the code path that processes the event and the concurrent code path that sends the event. The event can only be sent while the listener is waiting for it and thus listener is not concurrently processing any other event.

So what I was describing is the case where a callback is registered as an event handler. The handler is fired and so while the handler is busy which may cause the handler to deregister the event handler, the same event may fire again. There is a race condition. And this can apply to related events too since sometimes (e.g. drag and drop) we employ related events.

There are many facets of data race errors that can occur in JavaScript because events continue to fire willy-nilly (not in any order) until the event handler is deregistered. What we want to do is wait for each event to be processed, then make any change to the event handlers registration state, before allowing any new events to fire. We can accomplish this with filtering on the message queue and interface the message queue to JavaScript's event handler mechanism (i.e. each handler sends a message), then filter at the message queue any events which fired non-sequentially (because the message queue is only processed sequentially by the single task of the partition associated with the message queue).

Note there can be more than one partition (e.g. we might partition on UI elements such as a dialog box but I need to think about this more), so events will only be put into a sequential ordering for each partition (i.e. each partition’s message queue not to be conflated with the JavaScript's runtime message queue). So events could still be unordered if we have multiple partitions in our UI. Yet we will still prevent data races because remember the Pony reference capabilities (or copying) would be employed to share data between partitions.

Also in general (not just for the case of event handler registration race conditions) we want each event to be processed completely before processing the next one, as it pertains to updating the data that the event handler mutates. And data that is shared between unordered partitions must be provably data race safe (which basically means proving it is immutable or is only accessed by one partition which is what Pony’s reference capabilities facilitate proving).

Essentially this is a CSP channel paradigm. The listener updates its copy of which messages it is listening for while processing a message. The next message is only processed after completing the processing of the prior message. The sender to (an unbuffered) CSP channel can’t send until the listener is ready to receive a message on the channel. To avoid discarding unwanted messages and to start receiving wanted messages, the listener should send a message to the sender of messages when there is a change to the state of whether it wants to receive messages from that sender.

If the processing of a message requires a non-blocking (i.e. delayed asynchronous) operation, the listener of messages could (instead of blocking and waiting) delegate this to another concurrent process which doesn’t share any mutable state. The listener would then receive the update from the conclusion of the concurrent process via a message which thus wouldn’t change shared state concurrently with processing another message.

So we can conclude that a flaw in JavaScript’s concurrency model is that it doesn’t force the isolation of mutable shared state for concurrent code paths. Callbacks supplied to Promise and registered as event listeners shouldn’t have access to shared mutable state. These state changes should instead be shared via messages.

¹ _{We also discussed how it might be involved with code structure employed to receive an event such as calling the event sender thus inverting the control, but that isn’t the generative essence.}

Need for message based sequencing of events in JavaScript

I am familiar with data-races, we have a lot in one of our JavaScript applications, and I spent a lot of time debugging and getting rid of them. There are a few practical techniques you can use: disable the event source until all asynchronous handling has finished (disable the button, or an overlay to disable an area or the whole screen); or have a Boolean variable that we test and set on entry, and clear when the async chain has finished. This is more important in JavaScript than most JS developers realise, and I think most JS apps are riddled with this kind of error.

This is essentially what the paradigm I am proposing will do but in a more structured and implicit way with the message queue. Basically applying the "actor"-like (message based) paradigm to JavaScript.

shelby3 · 2018-11-28T10:34:04Z

Tying all our discussions into a summary and some additional insights:

Message-based concurrency (and parallelism)

shelby3 · 2020-05-17T10:49:27Z

Discussion from Quora that began in 2018…

I replied today:

@Rick Armbruster replied:

I replied:

@Nathan F Yospe wrote:

Predictability. Go has long since abandoned its foolish claim on being a systems programming language, though, so this is no longer a terribly meaningful discussion.

Rust, C++, C, and (if you don’t use any of the library or sugared library proxy parts, like arrays) D are all capable of being written in a manner that is entirely resource and timing predictable, and with very little runtime. The only other production languages that can make the same claim are sort of specialized for scientific computing; Fortran, Julia (with similar caveats to D) or still rely on transpilation (Nim)… Swift has predictability, but as far as I can tell, cannot be stripped of the heavy runtime.

Real world massively multicore, concurrent programs are not deterministically predictable. They are inherently stochastic. So your entire basis flies out the window in that context (which will in the near future be the prevalent context). Rust has to punt to ARC for some heap allocations which is less performant and as unpredictable as other forms of AMM such as tracing GC. Go has a problem that its GC doesn’t scale to multicore. Pony is headed more in the correct direction, but refer to my links in my other comment for more points and issues.

They are, in fact everything running on a computer is deterministically predictable. Because nothing runs randomly u just need to find the indicators on ur own and that is really unconfortable sometimes.

I/O is never deterministic unless you have the deterministic seed or the “strange attractor” (deterministic chaos) cycle for the Universe. That’s really folly to presume that just because computer instructions can be repeated deterministically that the interaction of code with non-deterministic processes will somehow magically make the outcomes of programs predictable — the number permutations of code path orderings becomes unbounded due to recursion. Unbounded recursion is essentially what gives rise to the Halting problem which applies to any Turing-complete computer system.

Scooping the Loop Snooper

shelby3 · 2020-05-18T22:54:24Z

To Share With Lifetimes, or Not Share At All?

shelby3 · 2020-05-22T18:32:53Z

@keean brought to my attention the blog post async/await is the wrong abstraction, which makes the correct point that the async/await model provides no access to a cancel operation.

However, I posit that cancellation can be added to async/await by extending Promise (in the right way) to add a cancel callback argument to the constructor and a cancel method, which when called by any descendant Promise in the thenable chain will invoke cancel callback so that the original source of the asynchronous event can reject the source of the Promise chain. I didn’t test this yet to confirm whether it will work as envisioned. Obviously it will not be full interoperable with third party libraries which aren’t using the extended Promise.

The paradigmatic analog in the Go green threading model would be an asynchronous cancel channel that is passed into every function that can block, because concurrency via green threading employs (cooperative, user not kernel mode) preemption instead of employing generators to return a Promise turtles up (like a Matryoshka doll) the functional call hierarchy back up to an event loop. The blocked goroutine must be awoken by sending the goroutine a message over a channel it’s listening to (aka waiting on) — so the blocked function can resume execution and return an error appropriate to the scenario of the cancellation. The error will propagate turtles back up the functional call hierarchy. Note that the said preemption would be at a [non-deterministic] select waiting for the said channel and the blocking event simultaneously.

As a separate issue from Pony’s “actors” not really being Hewitt’s unbounded non-determinism Actor model, Go’s channel types (which enable creating channel objects willy-nilly) aren’t really Actor-like Partitions (ALPs) because in the Actor model there is only a lone receive-only message queue. Thus in a single-threaded Actor or ALP model, there’s shouldn’t be any means to pass a channel object into every blocking function — instead just send a cancel message to the Actor or ALP but then programmer has to build an infrastructure to propagate such a cancel message to all Actors or ALPs which are being waited on as defined by the Actor’s or ALP’s internal state machine.

There’s also a need for the aforementioned, transitively-passed “cancel channel” or “propagation infrastructure” in an ALP model where I propose to abstract the message queue with a function call (actually an “actor-method-call”) and then block with green threading preemption — instead of modeling concurrency (akin to Promise-ification) by sending a message with a message-based “Future” and returning to the event loop. Yet I don’t propose conflating my ALP with willy-nilly construction of channel objects a la Go. Instead every actor-like object in my proposal should have an implicit cancel method which when invoked will implicitly signal cancellation transitively down the current (if any) turtles (Matryoshka doll) chain of actor-method-calls. The access (i.e. type of the reference) to an ALP should be exclusive [read] or [r/w] so that permission to invoke ALP methods is controlled and is transferred only by consuming the extant, exclusive reference.

We had also documented in this Concurrency #17 thread that async/await is considered harmful because it breaks functional composition because it creates a “What Color is Your Function”, bifurcation boondoggle between what’s run in parallel and sequential. Data race safety is an issue with parallelized execution in what appears syntactically to be serialized code. As discussed in this thread, parallelization needs to be syntactically explicit (and data race safety preferably enforced by the compiler) except debatably (@keean dissented upthread) perhaps where the compiler can safely and provably make automatic parallelization assumptions which don’t confuse the programmer’s expectations.

My proposed Go-like green threaded ALP concurrency model is by default composed of single-threaded ALPs. The syntactically sequential code is easier to reason about and a lot less tsuris (and in some or most cases more performant) than (c.f. also and also) Promiseification and generators. Yet the green threaded ALP concurrency model may also benefit from some paradigm(s) to be explicit about intra-ALP parallelism, which might include parallelized collections and syntax/APIs analogous to Promise.all(), etc.. I suppose cancellation would apply to all blocked parallel trees in that case.

shelby3 · 2020-09-17T02:44:54Z

I proposed Pony-esque recover blocks for constructing immutables? for Vlang and perhaps also for any programming language I may create.

shelby3 mentioned this issue Sep 27, 2016

Syntax summary #11

Open

shelby3 mentioned this issue Jun 3, 2018

WASM #42

Open

shelby3 mentioned this issue Jul 15, 2018

Pointers #38

Open

shelby3 mentioned this issue Aug 26, 2018

Iteration versus enumeration #16

Open

shelby3 mentioned this issue Dec 2, 2018

Design shelby3/Lucid#2

Open

shelby3 mentioned this issue Jun 24, 2020

[reconstructed] WD-40 (for reducing Rust with the next mainstream language) #35 #50

Closed

NodixBlockchain mentioned this issue Jun 26, 2020

new wd-40 test #51

Open

shelby3 mentioned this issue Aug 22, 2020

WD-40 Issues thread was accidentally deleted by @keean #49

Open

shelby3 mentioned this issue Sep 15, 2020

V Concurrency Considerations vlang/v#3814

Closed

shelby3 mentioned this issue Sep 22, 2020

Proper support for distributed computing, parallelism and concurrency vlang/v#1868

Closed

Concurrency #17

Concurrency #17

Comments

shelby3 commented Sep 27, 2016 • edited Loading

keean commented Sep 27, 2016

shelby3 commented Sep 27, 2016 • edited Loading

keean commented Sep 27, 2016

shelby3 commented Sep 27, 2016 • edited Loading

keean commented Sep 27, 2016

shelby3 commented Sep 27, 2016 • edited Loading

SimonMeskens commented Sep 27, 2016

keean commented Sep 27, 2016 • edited Loading

SimonMeskens commented Sep 27, 2016

keean commented Sep 28, 2016

SimonMeskens commented Sep 28, 2016

shelby3 commented Dec 7, 2016 • edited Loading

shelby3 commented Dec 8, 2016 • edited Loading

shelby3 commented Feb 19, 2017 • edited Loading

shelby3 commented Feb 21, 2017 • edited Loading

shelby3 commented Feb 21, 2017 • edited Loading

keean commented Feb 21, 2017

shelby3 commented Feb 21, 2017 • edited Loading

keean commented Feb 21, 2017

shelby3 commented Feb 21, 2017 • edited Loading

keean commented Feb 21, 2017

shelby3 commented Feb 21, 2017 • edited Loading

keean commented Feb 21, 2017

keean commented Feb 21, 2017

shelby3 commented Feb 21, 2017 • edited Loading

shelby3 commented Feb 21, 2017 • edited Loading

keean commented Feb 21, 2017

keean commented Feb 22, 2017

shelby3 commented Feb 22, 2017 • edited Loading

shelby3 commented Jun 2, 2018

NodixBlockchain commented Jun 2, 2018 • edited Loading

NodixBlockchain commented Jun 2, 2018

shelby3 commented Jun 2, 2018 • edited Loading

shelby3 commented Jun 2, 2018 • edited Loading

NodixBlockchain commented Jun 2, 2018 via email • edited Loading

shelby3 commented Jun 3, 2018 • edited Loading

NodixBlockchain commented Jun 3, 2018 via email • edited Loading

shelby3 commented Jun 4, 2018

analyticbastard commented Jun 5, 2018

shelby3 commented Aug 29, 2018 • edited Loading

M:N Green Threading on Java, Kotlin, Scala

Continuation passing style (CPS)

Green threads vs. Promise/Future

A simple CPS code transformation

Closures in a spaghetti stack on the heap

Trampoline for TCO/TCE

Goroutines are more efficient

Actors

Exceptions and stack traces

shelby3 commented Sep 10, 2018

shelby3 commented Sep 13, 2018 • edited Loading

shelby3 commented Sep 15, 2018 • edited Loading

M:N Green Threads Are the Superior Concurrency Model?

shelby3 commented Nov 24, 2018

shelby3 commented Nov 25, 2018 • edited Loading

Need for message based sequencing of events in JavaScript

shelby3 commented Nov 28, 2018

shelby3 commented May 17, 2020

shelby3 commented May 18, 2020

shelby3 commented May 22, 2020 • edited Loading

shelby3 commented Sep 17, 2020

shelby3 commented Sep 27, 2016 •

edited

Loading

shelby3 commented Sep 27, 2016 •

edited

Loading

shelby3 commented Sep 27, 2016 •

edited

Loading

shelby3 commented Sep 27, 2016 •

edited

Loading

keean commented Sep 27, 2016 •

edited

Loading

shelby3 commented Dec 7, 2016 •

edited

Loading

shelby3 commented Dec 8, 2016 •

edited

Loading

shelby3 commented Feb 19, 2017 •

edited

Loading

shelby3 commented Feb 21, 2017 •

edited

Loading

shelby3 commented Feb 21, 2017 •

edited

Loading

shelby3 commented Feb 21, 2017 •

edited

Loading

shelby3 commented Feb 21, 2017 •

edited

Loading

shelby3 commented Feb 21, 2017 •

edited

Loading

shelby3 commented Feb 21, 2017 •

edited

Loading

shelby3 commented Feb 21, 2017 •

edited

Loading

shelby3 commented Feb 22, 2017 •

edited

Loading

NodixBlockchain commented Jun 2, 2018 •

edited

Loading

shelby3 commented Jun 2, 2018 •

edited

Loading

shelby3 commented Jun 2, 2018 •

edited

Loading

NodixBlockchain commented Jun 2, 2018 via email •

edited

Loading

shelby3 commented Jun 3, 2018 •

edited

Loading

NodixBlockchain commented Jun 3, 2018 via email •

edited

Loading

shelby3 commented Aug 29, 2018 •

edited

Loading

Green threads vs. `Promise`/`Future`

shelby3 commented Sep 13, 2018 •

edited

Loading

shelby3 commented Sep 15, 2018 •

edited

Loading

shelby3 commented Nov 25, 2018 •

edited

Loading

shelby3 commented May 22, 2020 •

edited

Loading