-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mutable Objects #32
Comments
With GC, I don’t see any need to explicitly type references. We simply remember those primitive types which are always copied when assigned. Agreed on annotating types with read-only, write-only, and read-write. Per my proposal for concurrency, we’ll also need an annotation for exclusive borrowing and another for stored (transitive) borrows. I am preferring to keep these annotations concise employing symbols:
Note
|
Rust used symbols for different kinds of references and borrows and they have changed to words like 'Ref' etc because it is more readable. |
I argued above we do not need Afaics, Rust is a blunder thus far, so I am not quite sure if we can cite them as a desirable leader on any design decisions. Remember I had complained to them about the need for the noisy Edit: also that last link exemplifies some of the complexity that is involved with having these concepts of references types, value types, and boxed types. I’d rather have the compiler/language deal with those things implicitly. For example, the compiler knows when it needs to use a boxed type because more than one type can be stored there. Unboxed slots in data structures are an optimization. Rust required that complexity because it is tracking borrows in a total order, which IMO is an egregious design blunder. Ah I see a mention that
|
I have been arguing that it is precisely because type systems do not differentiate between values and containers, that language semantics get very opaque and complex, for example C/C++ have l-values and r-values, and rust does not know whether to copy or reference variables in a closure. JavaScript does it differently and treats numbers as values and everything else as references. This causes problems because 'enums' should be values but are treated like objects. If you want a mutable number in JavaScript you end up using a singleton object like {n:3} or an array like [3]. So even in JavaScript there is some kind of notion of a "reference". I think whatever we do, it needs to cleanly distinguish between l-values (containers) and r-values (values). It probably also needs to distinguish between boxed and unboxed things. By distinguishing between objects and references you gain control over ownership, and so you can keep track of whether the caller or callee is responsible for deallocating the memory. As you say, this is probably not necessary with GC, but then I think that a hybrid GC would give the best performance, where you deallocate when things leave scope like C/C++, and only garbage collect those objects that escape their scope. I think Rust's lifetime annotations are taking things a bit far, but I find the RAII style leads to clean code in C++. There is a clear advantage to having destructors over finalisers. In any case, it is probably safe to infer things at the value level, as long as all these different kinds of things (that have different semantics) are clearly distinguished at the type level. |
A prior quote might be apropos regarding “complexity budget”. I’m attempting to reread most of the discussion on these Issues threads so I can make my final decisions and get on with creating a language or not. P.S. note I have completed 16 weeks of my 24 week TB medication. |
@keean wrote:
I repeat for my needs which is to not be mucking around in low-level details in a high-level language, I think JavaScript got this right in that certain primitive types (e.g.
Ditto the default behavior I wrote above is what I think I probably want.
What is an enum in JavaScript? Afaik, JavaScript has no enum datatype.
What problem do you see with this?
The compiler knows which types are (tagged) sum types and thus must be boxed. Why declare it?
Does that mean you are agreeing with me?
We are sometimes (some aspects) designing different languages. Maximum performance is not the highest priority goal of a high-level language. Performance can be tuned with a profiler (i.e. typically less than 20% of the code needs tuning and perhaps only 5% needs extreme low-level tuning) and can drop down to low-level languages for maximum performance via FFI if necessary. Marrying low-level details with a high-level language creates design priorities contention. I do not think such a perfect marriage exists. Programmers want a lower complexity budget and only a smaller portion of the code needs that high complexity focus. Python, JavaScript, and Java comprise the most popular programming language set on earth. C/C++ are still as popular as they are because sometimes we must have low-level control and because those other three (well at least Java and especially JavaScript) screwed up the integer types, which is one of things I want to rectify. Jeff Walker wrote:
Also I agree with Jeff Walker’s analysis of the fundamental reason TypeScript can not radically paradigm-shift to improve upon on the JavaScript minefield (although I disagree with his opinion that adding static typing is regressive):
You are correct to imply that we must track borrows for safe use of RAII, because if a reference to the local block instance has been stored somewhere, because otherwise RAII can enable use after destruction. Agreed it is a loss of brevity, convenience, and safety that GC languages such as JavaScript and Java don’t usually support destructors at block-level (or function level) scopes for instances local to that block. But by minimally tracking borrowing as a compile-time (not runtime!) reference counting scheme as I have proposed, we could support implicit deterministic destructors for block-level instances (and even bypass the GC for these and use compile-time implicit allocation and deallocation, i.e. no runtime reference counting overhead). Good idea! That would be significant feature advantage over other GC languages! |
@shelby3 wrote:
Add Edit: may need to add an annotation for unboxed data structures, or may prefer wrapping with |
Personally I don't like the hieroglyphics, it certainly makes it hard for occasional users to understand (like all the symbols in PL1, APL or perl). Most people can transfer knowledge about things like references from one language to another if the notation is readable. |
We probably disagree. I do not like verbose text (compounded on top of already verbose textual types, we need some contrast). That is one reason I do not like Ceylon. I like symbols for very commonly used syntax, e.g. the I do not see any other popular languages with read-only, write-only, read-write, and exclusive borrows annotations, so there is nothing to transfer from. Even Ceylon (and others) adopted Also these annotations on type annotations are mostly to be ignored by the eye, because they are a (type attribute) detail and normally the programmers wants to first focused on the named type annotation which varies between type annotations. Since these little (type attribute) details will be repeating (within a small bounded set of choices) on every type that otherwise are varying (within a unbounded set of choices), they are repetitive almost like noise that should be minimized in terms of visual conspicuity so it does not drown out the more prominent relevant information of the type annotation, e.g. |
There's a bit of a straw man there, as the type doesn't make sense (you wouldn't have a read-only exclusive borrow, as read only is a reference, and a borrow indicates transfer of ownership, and anything that is referenced must be stored). The closest type to this would be something like: Mut[Maybe[TYPE]] or type MyType[T]]= Mut[Maybe[T]] The point is the types are expressing the semantic invariants. However I see no reason not to allow user defined type operators. So '?' can be an alias for maybe etc. So we should alow symbols in the type definitions, and it can be left up to the programmer whether to use them or not. |
@keean wrote:
Incorrect in my intended context. You’re presuming Rust’s borrowing/ownership model. Please read again my proposal and understand how it differs from Rust.
Disagree. Remember you had also mentioned in the past that for readability consistency, one of the basic tenets of good programming language design is not to unnecessarily have more than one way to write the same thing. |
This: ⊖Atype |
Lol, the use of the kenkoy I am thinking there are no unadulterated (i.e. native, low-level manipulable) null references in the language I think I probably want:
As you presumably know, in JavaScript a “null reference” is distinguished as the value I suppose Excluding memory allocation which is inapplicable in GC, afaics the only utility of null pointers is to: a) differentiate the state of unset ( Since we probably need #b even for types that are not nullable, then the double question mark is an incongruent symbol choice. I am contemplating how we might type the bit flag hacked unboxed nullables? And generally for any data type? It consumes some of the complexity budget though. |
Don't forget with GC you often need Weak and Strong references, where a strong reference prevents the referenced object from being garbage collected, whereas with a weak reference the referenced object can be garbage collected, so you must check if it is still there when dereferencing. You can also have levels of weakness determining the memory pressure required to evict the object, to reflect circumstances like "I want to cache this in memory for speed, providing we have spare memory left, but evict as soon as we are running low on memory" vs "I want to hold this in memory for as long as possible, and only evict if we have no other free memory left". |
The use of weak references for caches does not work well. Weak references are not needed for breaking cycles if GC is present. The only potentially (but dubious whether) legitimate use for weak reference is for avoiding having to manually undo “put” operations, e.g. removing objects added to a map, list, event/listener list, etc.. I am leaning towards agreeing with David Bruant’s stance that my aforementioned last use case would be encouraging bugs. Also JavaScript has no weak references, so could not support them on a language that compiles to JavaScript unless we implemented our own GC and heap employing |
Except for the exclusivity and stored types, the others mentioned must also be allowed on the type parameters of a type. I realized that when contemplating a Afaics, we never place these annotations on type definitions, and only on the types of instances (including function arguments and result types). |
@keean wrote:
With GC and no stack allocation ever, then closures always reference the objects of the closure (because no need to copy them from stack before the activation record is destroyed when the function returns). The mutability is then an orthogonal concern at the typing level, i.e. with GC there are no r-values. R-values are a compiler optimization and implementation concern and I don’t see why they should be conflated with mutability nor the type system. I do agree that we need to distinguish between modifying the container and the reference to the container, and each should have a separate mutability attribute. Languages which use stack allocation typically prevent modification of the reference to the container, because this could cause memory leaks, which is why the r-value and l-value concept is introduced. My idea for an efficient cordoned nursery can hopefully eliminate the need for that low-level implementation complication leaking into the PL. However, it’s more efficient to store the containers which have immutable references (i.e. only the container may be mutable) directly in the activation record for the function than to store a reference to the container in the activation record. So the separate mutability attribute is important for optimization. Note closures over non-contiguous heap allocated cactus stacks would have a separate reference to each activation record that contains an object that is also in the closure. So these multiple references (and the cost to access objects via multiple activation record pointers) are an cost that is paid for cactus stacks. |
You also need to distinguish variable binding from mutation. A value like '3' is immutable, but we can rebind variables like: x = 3; x = x + 1. The value bound to the variable is immutable, but we can rebind. This explains why changing a variable inside a procedure does not change the argument passed. |
@keean rebinding is just modifying the reference to the container:
In a language which uses
Instead of the above, JavaScript makes references immutable for primitive objects so they always modify container but since there’s no way to access the reference then rebinding semantically the same as modifying the container (because the reference to
|
I agree that mutability should be on the type, and not as is apparently in Rust some orthogonal concept related to borrowing lifetimes that is only annotates the identifier:
Could you please possibly elaborate on how you think Rust messed up mutability so I may know if I’m missing the pertinent details of your point?
I’m instead proposing for Zer0 we retain the concept of l-values and r-values and thus only l-values implicitly have a container. With the hindsight of my post herein, do you (and if so then why do you) think conversion of r-values to l-values need to be explicit as you showed with I wrote:
JavaScript, Java, and Python employ call-by-sharing which is distinguished from call-by-reference because only certain objects are passed-by-reference. Since the grammar I am currently proposing for Zer0 will have explicit pointer types and dereferencing ( I’ve proposed a near zero-cost memory safety abstraction which I developed from @keean’s suggestion of employing Actors (in a zero-memory-resource-cost manner) for parallelism. So only objects which escape the stack frame lifetime (via compiler analysis which doesn’t require any lifetime annotations nor any of Rust’s aliasing error and tsuris) will be allocated on the non-shared (i.e. thread local) bump pointer heap and rest on the stack (each bump pointer heap is deallocated with one instruction when the Actor returns, so it’s to be very efficient on par with Rust’s performance and 100% memory safety). So the compiler will decide whether to allocate containers on the stack or the bump pointer heap. Only specially annotated reference counted pointers get allocated on the traditional shared heap. (I explained all of this in the above linked post, including how the Actor model will eliminate L3 and L4 cache and do software driven cache coherency and cache-to-cache transfers.) So therefore the compiler will decide implicitly where thread-local containers are allocated (i.e. stack or non-share bump pointer heap). So containers are not to be an explicit concept. And it will be possible to take the address of (
@keean wrote:
EDIT: Zer0 won’t need rebinding if it’s using call-by-value and not call-by-sharing. JavaScript needs to distinguish between preventing rebinding with
I found the post you made about that on the Rust forum. The issue being explained there is that by default in Rust, closures refer to the implicit containers for all the items in the environment of the closure. The containers are always implicit, thus it’s impossible in Rust (and C, C++ etc) to have an identifier represent a r-value. But you’re proposal isn’t necessary because by definition a r-value is not mutable so immutability of containers would accomplish the same effect, which I already have designed into Zer0. But we also need some way to tell closures to copy from a mutable container instead of referencing it. Rust has some convoluted way of The default really should be to reference the stack frame as that is the most efficient (only requires one pointer). Copying is less efficient, so I think it should be done manually. The programmer should make copies of the containers he wants before forming the closure. Rust’s closures are explicit, which IMO defeats the elegance of their primary local use case. I want only implicit local closures. We already agreed that closures at-a-distance (i.e. across modules) is an anti-pattern. 1 Pass-by-reference is what call-by-reference does for every argument of the function or procedure call. So we use the term pass-by-* when referring to assignment in general or only some of the arguments of a function or procedure call. 2 It’s more efficient to pass the reference than to copy the large object; and because having copies of the object in more than one memory location can cause cache spill. OTOH if there will be many accesses to the object, then it may be more efficient to copy so it can be accessed directly indexed off the stack pointer (SP) which may be more efficient than the double indirection of accessing the pointer on the stack and then the object referenced by the pointer. However if we can keep the pointer in a register, then copying to the stack may provide no speed advantage on accesses. Also (in the context of the near zero-cost resource safety model I proposed because of the Actor innovation) if the object escapes escape analysis and Zer0 must put it on the bump pointer heap anyway, then it can’t be copied to the stack. I wrote:
Note if we adopt my proposal to forsake open existential quantification, then all dynamic polymorphism will be limited to static union bounds, so it will always be possible to unbox (although maybe wasteful if some of the types in union require much more space than the others). Readers should note this is orthogonal to the issue of needing pointers to avoid recursive types that would otherwise require unbounded space (although Rust seems to conflate these two concepts). The Zer0 programmer will be able to manually force boxing by employing a pointer. Otherwise I think we should make it unspecified as whether the compiler is employing boxing or unboxing. Ditto (as Go already does) unspecified for order and alignment of record fields (c.f. also and also), we want to leave the flexibility for the compiler to do whatever optimizations it wants:
|
I don't think so, the object literal implicitly creates a new object like in JavaScript, noice: let x = {v: 1}
let y = {v: 1}
x === y // false, x and y are different objects. |
@keean wrote:
You overlooked this point I made about conflating mutability:
|
I propose the hypothesis that something must have an address to be mutable. |
@keean please delete your prior post and put it in the Typeclass objects thread where that discussion is occurring. My post which you quoted is in the Typeclass objects thread. |
@keean actually I see what you did is your quoted me from Typeclass objects thread, but you’re actually intending to reply to my prior post in this thread. You may want to edit your post to correct the quote. |
@keean wrote:
Your proposal is that only mutable things have an address so you can distinguish between objects and values. My proposal is every l-value has an address (of an implicit container) and r-values don’t have addresses. In my proposal, l-values and r-values are not types and instead are contextual. Therefore in your proposal you want to use In any case, my proposal doesn’t need a literal way to indicate an object, because l-values are always implicit containers. Thus the use of literal Whereas, you’re advocating a generic object literal, but that implies that the generic object literal will have a generic default type such as |
No, it is not. Do not restate the hypothesis and then attack your incorrect restatement. I repeat:
Then:
I think this is a mistake, constructing objects by hand is painful boilerplate in C/C++
Not at all, there are solutions to typed object literals:
Using a Kotlin like constructors, or Go like struct literals. Vs
|
@keean your proposals have changed so many times I can’t keep track of what you are actually proposing at any given whim that you change your mind. Originally you said that immutables should never be pointed to. Please make some coherent posts and stop just blasting noise.
What is the problem with constructing an object by calling the constructor?
Which can be syntax sugared as:
So you prefer what? { type : Html, children : { type : Body, children : { type : Div } } }
There’s no The example you wrote above doesn’t factor in that the constructor can attach its children without needing to create the temporary identifier names. |
What were you proposing 'new' for then?
Indeed. Let's keep things simple, forget what has gone before. Hypothesis: All mutable objects must have an address. Do you agree or disagree? |
Refresh |
I wrote up-thread:
P.S. I am going to sleep. |
Pony’s Design ComparedI’m trying to wrap my mind around what @keean and I have discussed in this thread. I proposed to be explicit about pointers versus values in terms of what is copied on assignment or passing as arguments, so we don’t have the copy-by-sharing situation of Javascript and Java where some primitive types copy the value and other types copy the reference (i.e. the pointer). This also requires being explicit about taking the address of ( As far as I can surmise, @keean ostensibly proposed to be explicit about the distinction between l-value and r-value; wherein, all r-values are immutable and can’t be addressable. Additionally it seems that @keean doesn’t want to allow immutable l-values? Or he doesn’t want to allow pointers to immutable l-values (but that would be a bifurcation because by definition l-values are addressable)? Apparently @keean seems to think we will gain something from such a design such as some ability to do some compiler optimizations that can’t be done with my proposal, but frankly I can’t see clearly what is the advantage of his proposal. I ask @keean to please clarify or to admit that his proposal is half-baked and not yet fully coherent. @keean suggested to me in private messaging that to facilitate compiling to Zer0 to Scala or Typescript, that I could adopt their copy-by-sharing design instead. But that doesn’t allow us to control when we want to store a reference or a value. Pony makes everything a reference and references have types. Apparently a primitive type such as the literal
Note Pony averts the complexity of Rust’s lifetimes (which also provides data race safety) by only allowing to recover exclusive writability from within Apparently Pony is @keean’s preferred design once it is fully baked. There’s no concept of l-values and r-values. The compiler figures everything out based on the capabilities of the reference type. Pony thus needs a separate concept of capability of rebinding of identifiers (whereas in my proposal this is accomplished with the mutability of the pointer type for an identifier). So Pony removes being explicit about when we’re using pointers or values, which certainly seems less noisy and less details for the programmer to juggle. I’m contemplating whether this removes any important low-level coding that the programmer could express? The main issue seems to be the control over time versus space. In Pony, I can’t tell it that I want exclusive write-only (or immutability), but I want or don’t want it to copy around all the values. For example, I might prefer to reduce space by having an array of pointers to immutable values (wherein I have many references to the elements of the array), even though that would be probably slower when doing operations over most or all of the elements of the array due to the extra indirection and loss of cache locality (elements of the array not being sequential in a cache line or memory page). So maybe the design we want is Pony’s but with optional extra annotation for reference types to signal to the compiler a preferred (but not guaranteed) choice between time versus space? What do you think @keean? I think I prefer the Pony design (with added optional preference annotations) because I think we can see from that article about the problems with C for parallelism that we should let the compiler have more leeway (and because we probably can’t predict exactly the future optimal hardware for parallelism):
But the tradeoff is that Pony can’t always maximally express exclusivity of writability? Will there will be algorithms that we can’t express optimally in Pony, i.e. where we want force storing a value but we can’t express exclusive writability of that value in Pony (although we can convince our human mind that the code is safe to make that assumption)? These (rare cases?) are where we really need that low-level capability of C. Perhaps we can in some cases recover that capability in Pony simply by manually copying when we want to force the capability to store a value but can’t express the safety of the exclusive writability in Pony. But yet there will be other cases where we want all these writable references to a shared value which we want stored sequentially in memory for the elements of an array, yet we can’t express in Pony that we don’t ever write to those references in a way that is not data race safe. But that also requires the semantic that we always assign to the array element first after creating an instance for the element before sharing the reference to that instance (realize this will also be single-threaded not shared between Actors). So seems what we really need is an annotation that says a reference is only allowed to receive assignments from newly created instances that have not yet been otherwise shared. So this I do believe we can adapt (with some additional annotations) the Pony model to maximally performant low-level code! Wow! Impressive! I think we have a C-killer here. |
Optimization of Result Value TemporariesWe’re referring to for example optimization of the result values of the
Goals for result values temporaries:
The issue addressed by goal #4 is probably not that significant. The issues addressed by goals #2, are more pronounced for example in C++ because it has destructors, and for #3 because perhaps (without escape analysis) there’s no highly efficient young generation objects bump-pointer, compacting heap (C++ doesn’t have tracing style GC nor my ALP concept). All programming languages need to deal with the issue addressed by goal #1, because unnecessary copying of the temporary result values significantly reduces performance. Programming languages that employ pass-by-reference (aka call-by-reference or return-by-reference) as the default, eliminate the unnecessary copying (return-by-copy) to the caller of the result value when the value is stored on the heap. (But that’s not the only potential copying involved that a move constructor might ameliorate for complex data type such as strings per example below…) Such languages also typically have an efficient young generation objects bump-pointer heap with a tracing GC. Whereas, programming languages such as C++ or Rust which attempt to keep result values on the stack when possible for even greater efficiency (aka “zero cost resource allocation abstraction”) have the caller provide the stack space to the called function for storing the result value because the stack space of the called function is deallocated when it returns. I cited this “copy elision” (aka “return value optimization”) up-thread. Yet as we see for the example But as the C++ optimization blog post I had cited up-thread explains, return-by-reference doesn’t eliminate all of the copying in the case of for example strings, because without move semantics then all of these temporary result values have to be copied to the next result value in the chain of the expression (e.g. of Thus we have shown that simplistic return-by-reference doesn’t eliminate all copying. Move semantics enables the elimination of more instances of copying. Additionally move semantics addresses the other 3 goals. With move semantics (and some cases requiring exclusive writable aliasing provided by Pony’s reference types or alternatively unboxed values), even the So even though with my posited ALP “actor epiphany” near zero-cost resource allocation bump-pointer heap (i.e. no tracing, just reset the bump pointer to origin on completion of the Actor function) of non-ARC references1 ameliorates #3 and #4, the performance could still benefit from move semantics to avoid needless calling destructors (probably a rare case of performance problem though) and combined with Pony’s reference types for knowledge of aliasing invariants to optimize memory consumption although in some cases that memory consumption can be optimized in another way. Note c.f. also Reentrant Actor-like Partitions. Pony doesn’t have destructors and it has (efficient?) GC heap collection, so presumably they ostensibly didn’t consider the necessity of move semantics for increasing performance. @keean and I pointed out other fundamental design flaws in Pony in the WD-40 issues thread #35 (which @keean accidentally deleted so now is #50). Therefore I propose that a PL should offer the option for the programmer to write move semantics variants of all functions including constructors. 1 Note I have previously doubted whether we would want to allow use of destructors with non-ARC references in the proposed ALP design, not just because the Actor function could be stalled for considerable and non-deterministic time intervals due to concurrency. This would require recording each destructor for every non-ARC allocation. ARC references also can have non-deterministic lifetimes due to the fact that non-determinism is present in the program in many facets. Seems to me that use of destructors for freeing things such as file handles has to be used with caution by the programmer who is paying attention to for example the blocking functions (i.e. opportunities for non-deterministic preemption) called in his code. |
Note the post I made the Parallelism issues thread #41:
|
Pony’s Monolithic Lifetime Borrowing CapabilitiesRust has complex (overlapping) lifetime tracking of exclusive writability borrowing. Whereas, Pony tracks more variants of aliasing capability (aka reference capabilities/types). Pony can entirely side-step the tsuris of Rust’s data race invariants, by employing the Pony’s borrowing1 concept (i.e. when not permanently losing sendable property with the Sendable capabilitiesThe Function as a recover blockPony’s Automatic receiver recovery is really just treating a function as a recover block wherein all the input (including external references in any automatic closures) arguments (except one?, e.g. the “receiver”3) are sendable and the result is also sendable (or discarded). That enables data race safety when the one chosen argument is treated as the recovered reference so that it can be passed in instead of aliased. 1 Moves with 2 C.f. chart in prior post but excluding the exclusive readability types added for Zer0, which must be recovered as exclusive readable. 3 The ( |
Modeling Pony’s Reference Capabilities Type System in ScalaCheck this out! I think we can complete a PoC for Zer0 without needing to implement a type checking engine, by transforming the Pony model into Scala’s type system!! With typeclasses also!! I’m contemplating that it may be possible (at least initially for a rough PoC) to model Pony’s reference capabilities type system in Scala without needing to do any additional type checking in the Zer0 compiler (for a Scala output target). IOW, I posit that all the type checking can be done by the Scala compiler with the following model. The model is complicated by Pony’s requirements:
1 Pony being that it has methods instead of free functions for OOP class inheritance anti-pattern](#43) (c.f. also, also, also “Edit#2”) seems to only support the viewpoint adaption for the “receiver” (i.e. Wrapped dataThe first trick is to wrap every data type in a zero-overhead value class wrapper type which implements the following interface, where trait Capability[DATA, +TYPE] {
val data : DATA
} Note the Wrapping the data this way will presumably not add much performance overhead (in most cases) because we can employ That also enables Generics and Reference Capabilities with only a syntactical transformation of Zer0 code into Scala code. Implicit resolution of every typeclass function use-siteTo model Viewpoint adaption and Arrow Types aka Viewpoints in Scala’s type system, employ type-level programming with (Scala’s unique?) The That satisfies the viewpoint adaption for data types that select the typeclass instance. Any Arrow Types aka Viewpoints adaption are encoded with dependent parametrisation which is also applicable to non-typeclass functions. Always alias expressions except ephemeralize result/returnIf we always (implicitly 2 There’s alternative patterns as well. |
You may be interested to know that Pony is moving toward a slightly more complex, but more permissive model for viewpoint adaptation that treats "extractive" viewpoint adaptation separately: https://github.com/ponylang/rfcs/blob/feature/formal-viewpoint-adaptation/text/0000-formal-viewpoint-adaptation.md It's based on a paper by George Steed |
@jemc thank you very much for the heads up. EDIT: I might have found an error or just a typo in George Steed’s paper. |
To Share With Lifetimes, or Not Share At All?An idea occurred to me while re-reading the Sharing doesn’t scale section of the Message-based concurrency (and parallelism) document I wrote on November 28, 2018. I’ll quote the salient bit:
I’ll quote the relevant bit from the “is inefficient” link cited above:
Pony is inefficient and can’t scale with shared data to massively concurrent Actor-like Partitions (ALPs) for the generative essence reason quoted above — the synchronization overhead between the distinct GCs for each ALP.1 Their advice (c.f. also) for performance enhancement distills to don’t share, lol. 😖 1 I’m coining the term Actor-like Partition (ALP) to describe the communicating sequential processes of Hoare’s Communicating Sequential Processes but without Hewitt’s Actor model’s complete lack of causal ordering which Pony misnomers “actors”, c.f. also. Likewise, languages with a single GC for all threads such as Java or Go are inefficient even if there’s no sharing and can’t scale to massively concurrent ALPs because a single GC for all ALPs either has global stop-the-world (of threads) contention or incurs synchronization overhead for contention mitigation with a concurrent and/or incremental algorithm, c.f. also, also, also, also, also and also. The essence of the scaling problem is that as the number processors increases the exponentially exploding (opportunities for) contention and/or synchronization overhead due to the 2 Odlyzko-Tilly’s Imagine for the above image
However, there’s some compelling reasons (also including the cache coherency problem) why sharing won’t scale even when eliminating GC synchronization with the lifetime borrowing ideal herein (although sharing within a local cache coherent group of processors is performant, yet that’s not scaling):
By employing distinct GCs for each ALP, Pony reduces the contention of GC compared to a single GC for all ALPs a la Go or Java, but only when those distinct GCs are independent from each other due to no or insignificant sharing. But then why have the reference capabilities model if sharing is to be discouraged? Well there’s the benefit that (even a writable) can be sent to another ALP without sharing if consumed instead. But even sending objects (instead of primitives) incurs some inter-process synchronization overhead either for cache coherence or inter-process bus such as AMD’s Infinity Fabric. Pony’s reference capabilities model creates many more cases of non-composability between capabilities and is much more complex than not having it, although AFAICS Pony’s capabilities model is orthogonal to and could coexist with my idea herein. Essentially Pony’s capabilities enabling proving whether a reference is sendable avoiding the necessity of creating a copy. But reference capabilities do not track lifetimes to enable super efficient stack-frame allocation and stack-based compile-time memory management a la Rust — when you can keep Rust’s tsuris cat stuffed in the don’t-punt-to ARC-bag. Pony’s capabilities don’t create Rust’s reentrancy-safe code nor prevent single-threaded data races via borrowing as Rust can. Thus Pony’s capabilities seem to add a lot of complexity for a very small gain if the performance advice is don’t share and copy instead. 😖 The creators of Pony are experimenting with a new concept Verona, c.f. my Verona synopsis and also. Compile-time borrowing for lifetimes is orthogonal to Pony’s capabilities, so my idea is to add compile-time borrowing for sharing objects to eliminate the synchronization overhead between the distinct GCs for each ALP. Conceptually this reduction in contention would even apply (to the mutex contention for altering the refcounts) for ARC with tracing for cycles if employing ARC for each CSP’s GC instead of tracing GC. Additionally I incorporate my long-standing idea to employ — instead of Pony’s mark-and-don’t-sweep GC — a super efficient, trace-free, bumper pointer memory management scheme for non-stack-frame allocated objects which don’t persist beyond the lifetime of processing of a single message from the ALP’s message queue. AFAICS, this can be accomplished at compile-time:
3 Tag means analogous to a phantom type or Pony’s capabilities, i.e. its orthogonal to other attributes of the type. 4 Note the underlying implementation could optionally automatically do the sharing as a consumed copy if that’s more efficient (c.f. also) such as for crossing the local cache coherent group barrier in a massively multicore architecture with intra-group but not inter-group cache coherence. Thus to the extent there’s only A key point is that a multitude of ALPs should be lightweight and employ green threads. The blocked ALPs incur no significant overhead in for example Go. AFAICS, my idea herein would be compatible with Go, although the posited performance increases wouldn’t be achieved unless Go’s GC was modified to use my idea. Note this idea herein wouldn’t break composability between non-ephemeral and @keean are you aware of any prior art for my idea? |
Perhaps the correct answer is literally “do not share” — send messages instead. Sharing presumes cache coherence, whereas a future of massively multi-cored with Infinity Fabric interconnects (or some of the more esoteric massively multi-core processors, c.f. also) essentially the data has to be copied anyway to the destination message queue. I seem to remember @keean proposing this specific approach or at least proposing referential transparency via immutability in past discussions. Here’s some of my ideas:
1 For example returning a consumed object wouldn’t persist internally to the next incoming message. 2 Because otherwise — given the programmer would not be able to correctly label those which 3 The option is either to enforce at compile-time that all inter-ALP sharing (i.e. via sent messages) contain data which is 4 Thus can process more than one message in parallel in intra-ALP sharing of its persistent data between multiple threads, as an alternative or adjunct to inter-ALP sharing of its immutable data. |
What’s the salient, coherent, succinct generative essence of this thread? There’s the example of Marilyn vos Savant stating most pertinently, “The winning odds of 1/3 on the first choice can’t go up to 1/2 just because the host opens a losing door,” Let’s contemplate a design without first-class ‘pointers’, i.e. without pointer equality comparisons nor pointer arithmetic such as is the case for Go(lang). ImmutableIf the program compiles (i.e. to avoid the ‘recursive types’ issue mentioned below) then semantically it makes no difference whether immutable containers[1] are assigned by copying or by reference. Although the choice will effect whether assignment after initialization aka reassignment is by copying or rebinding by reference, this and other distinctions only apply to space and (sometimes versus) performance. Reassignability could be controlled separately from the aforementioned choice, such as with Most glaring of the non-inferable space and performance considerations is that the compiler probably can’t decide for the programmer whether to assign the value to the field of a data record (aka structure) by reference to the value’s immutable container (for space optimization or performance) or copying the value (for avoiding recursive types or for performance). Even though theoretically in some or perhaps most other cases the compiler may be able to automatically optimize the choice of assignment (by reference or copying) for immutable containers, there could still be some cases where performance (versus space) could be fine-tuned if a manual choice can be expressed in the code. [1]@keean defined ‘container’ in this context to be the memory that stores a value. He adopted the mathematical definition of a ‘value’ which is always immutable, i.e. a Read-onlyRead-only access to a container which is not written during said access’ lifetime is in the same situation as aforementioned for an immutable. The challenge is statically proving the aforementioned invariant. Perhaps primitive value types can always be assigned most efficiently by copying and thus would trivially satisfy the invariant. The challenge remains for non-primitive value types. Exclusivity of access such as Rust’s exclusive write borrowing, or Pony’s A clever insight and delicious alternative to exclusive write total orders (why did I not think of this before!) is that two identifiers that apply to containers of different types of values can’t be aliases to the same container. Ah an advantage for typing that Clojure’s Rich Hickey didn’t enumerate. Given support for interior “pointers” then this rule has to be applied to all the referenced field types (i.e. thus typically not primitive types in call-by-sharing) within a data record (aka structure) type. Perhaps primitive value types can always be assigned most efficiently by copying and thus would be excluded from this rule (as is typical in call-by-sharing). A synchronous[2] “function” (or more precisely a procedure because not referring to a pure function) thus satisfies the aforementioned invariant for any read-only access arguments if all its writable access input arguments are of different types than the said read-only access arguments and also excluding any mutual field types of those as aforementioned. The aforementioned invariant also satisfies the requirements of C’s Presumably code within a function can be analyzed to detect aliasing but in cases of ambiguity or just because said analysis is extremely complex (←c.f. §Optimizing C) then the same clever insight could be applied for identifiers instanced within the body of a synchronous function. Explicit intent must be expressed in the code for read-only access arguments and any said identifiers which fall the invariant or for asynchronous functions. [2]An asynchronous function employing call-by-reference could be subject to writes to its read-only access arguments’ referenced containers by code external to the function. WritableIf there’s not a default choice such as JavaScript’s call-by-sharing then there needs to be some way to indicate which choice to employ for assignment because the semantics are not equivalent between the two options.
Another clever insight is that a function is semantically agnostic as to whether its writable arguments are passed by reference or by copying. The choice will affect the optimization and compiled encoding of the function. Although the choice may affect the semantics of the function w.r.t. caller’s results, the function may be agnostic as long as the caller expresses and receives its intended semantics. I thus posit that caller’s would express their intent such as by prepending call-by-value parameters (perhaps other than primitive value types) with If instead make everything immutable but then you’d have Haskell and its drawbacks. Why?Paradigms that reduce the number of factors the programmer has to keep track of, aid productivity and possibly even reduce What Color Is Your Function noncomposability. Additionally C Is Not a Low-level Language (←c.f. §Optimizing C, c.f. also) points out that letting the compiler decide (and providing it the information to decide about optimization) enables optimizations that the programmer did not or could not anticipate because the ideal optimizations change as the hardware and other factors of the system change. Letting the compiler can decide without the programmer’s intervention, fixes or ameliorates the issue @keean raised about Rust’s closures not being able to decide automatically and the concomitant complexity that introduces. Additionally, inverting the control thus providing more degrees-of-freedom is achieved by enabling the caller of the function to decide, in the cases where the programmer must make the decision because the compiler doesn’t have enough information to do so automatically. |
Looking at the mess Rust has created with its handling of mutability, and also the confusion l-values and r-values cause in C/C++ I think a simpler model is needed for mutability. I also think that mutability needs to be part of the type signature not a separate annotation.
I think the problem comes from a failure to distinguish between a value and a location in the type system. A value is something like a number 1, 2, 3, or a particular string like "Hello World". It cannot change, values are by definition immutable. References to values do not make sense, as we should simply pass the value (as it is immutable, there is no concept of identity with values a '1' is a '1' there is no concept of 'this one' or 'that one'). Variables stand for unknown values, therefore are also immutable, like for example in "2x = 6". Once we know 'x' has the value '3' it always stands for '3' in that scope. All values and variables are by nature r-values (and not l-values) so this simplifies things considerably. We write the type of an integer value or variable as "Int" for example.
So what about mutability? Well values and variables are not mutable, how do you change something? You need a container (or a location in computer terms). You can only write to something that has a location (is an l-value in 'C/C++' terms. A container is something like an array, that has slots the contain values, and the values in the slots can be changed. As a parametric type we might write "Array[Int]", Of course an array has length, so we want a simple single value container for the common case of something like an integer we can increment. We could write this as a type: "Mut[Int]". This represents a singleton container that can have an 'Int' value put in it. Containers are 'l-values' they have an address. The important point is that the container itself is a value that can be assigned to a variable, it is just the contents that can change.
In this way the semantics are kept clean, and we don't need concepts like l-values and r-values, and nor do we have the problems Rust has with closures, where it does not know whether to reference or copy an object. So if value assignment is 'let' we can assign a value to a variable that cannot change, or another type of object like a single mutable location.
This removes the need to have any kind of special annotation for mutability or references, removes the need to understand l-values and r-values.
Finally we need references to containers so that the same mutable value can be shared with different program fragments. References can be Read Only, Write Only, or ReadWrite. we can have:
However the variables themselves are still immutable. A
for
loop could look like this:where 'x' would have the type "Mut[Int]".
The text was updated successfully, but these errors were encountered: