Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

redesign constructors to allow uninitialized fields #30

Closed
JeffBezanson opened this issue Jun 2, 2011 · 31 comments
Closed

redesign constructors to allow uninitialized fields #30

JeffBezanson opened this issue Jun 2, 2011 · 31 comments
Assignees

Comments

@JeffBezanson
Copy link
Sponsor Member

Currently we need to do ugly things to make circular references:

type MyType
  self::Union((),MyType)
  function MyType()
    v = new(())
    v.self = v
  end
end

That is really bad, both ugly and compiler-unfriendly. With the change, we'd have

type MyType
  self::MyType
  MyType() = (this.self = this)
end

For constructors inside the type block, this is supplied, and the type's static parameters will be visible as well. this will also be returned automatically, and using return in a constructor will be an error.
Here's what Rational would look like:

 type Rational{T<:Int} <: Real
     num::T
     den::T

     function Rational(n::Int, d::Int)
         # can use T here
         g = gcd(n,d)
         this.num = div(n,g)
         this.den = div(d,g)
         # this is returned implicitly
     end
 end

 Rational{T}(n::T, d::T) = Rational{T}(n,d)
 Rational(n::Int, d::Int) = Rational(promote(n,d)...)
 Rational(n::Int) = Rational(n,1)

Notice the constructor inside the type block is only usable given an instantiated version of the type, Rational{T}. An outside generic function Rational() is defined to do this for you. If there are no user-defined constructors, we can still provide the same default constructors we have now.

One downside to this is that we'll have to check each field access for uninitialized references, as we currently do for elements of pointer arrays.

@ghost ghost assigned JeffBezanson Jun 2, 2011
@StefanKarpinski
Copy link
Sponsor Member

Wouldn't it be possible to generate code to check for uninitialized references only after an "inner" constructor returns? Some checks could of course be eliminated since the compiler can reason that they must be set (like this.num and this.den above), but in cases where complicated logic occurs, explicit checks could be done just before the implied return of this.

Of course, the trouble with that would be when you have something like this:

type MyType
  self::MyType
  MyType() = init(this)
end

Here the init function gets a partially initialized MyType object, so its accesses would need to be checked, implying that in general, all accesses would need to be checked.

But what if we required a special type for uninitialized objects? So that init would need to be declared as init(x::Uninitialized{MyType}) instead of just init(x::MyType). Then the compiler knows when compiling the code for init that it needs to generate uninitialized field checks for the x argument, and only init needs to pay the cost, instead of every piece of code everywhere paying the cost. It also uses the type system to prevent uninitialized objects from getting passed where they don't belong. It is very rare to actually want to pass an uninitialized object to an external function for any reason. This allows it, but requires explicit support.

The uninitialized types would always be immediate shadow parents of their initialized concrete counterparts. So the declaration that ConcreteT <: AbstractT would actually imply a type hierarchy like this:

ConcreteT <: Uninitialized{ConcreteT} <: AbstractT

That's a little tricky, but does seem to solve the problem.

@JeffBezanson
Copy link
Sponsor Member Author

This is clever, but I don't think we should do it. This type would have to be hacked in as a special case in various little places in the compiler and it could get ugly. It would also introduce (1) a case where the type of a value changes, currently impossible, and (2) a concrete type that has subtypes (Uninitialized is concrete because there can be a value of that type).
It would also break checks like isa(x,ConcreteT) during init(). I'm also not convinced that you have to initialize all fields within the constructor. There might be cases where it's convenient for an object to hang around for a while with uninitialized fields.
Performance is probably not a big deal, since you're testing a word you had to load anyway. Not as bad as a type check where you need an extra load. And not as bad as arrays which are more likely to involve intense looping. We're already doing the null checks there.
Maybe we can use the SIGSEGV handler to turn null accesses into exceptions!

@StefanKarpinski
Copy link
Sponsor Member

Ok, these are all fair points. It's a cute idea though. It does feel a little too clever, admittedly — and a little too much like how Ruby handles the object specific parent class, which is confusing but clever. And it's a good point about testing a word that's already loaded not being likely to be all that expensive.

I like the idea of using a SIGSEGV handler to trap null exceptions. If all fields default to all zero bits, that would make bits types default to zero, which seems sensible, and reference fields would default to NULL, which would allow us to do the SIGSEGV trapping business.

@JeffBezanson
Copy link
Sponsor Member Author

Oh darn, I forgot: using SEGV would not give us the behavior we have now, which is to throw an error as soon as a null reference arises, not when it is used. For example, assuming o.f is uninitialized:

a = 2
a = o.f   # currently we'd give an error here
...
a.b   # with SEGV behavior we'd get an error here

In general, you'd get the error at some random point in the future. Probably not ideal. On the other hand, when accessing an uninitialized element of an unboxed array you don't get any error, so perhaps this is "consistent".

@StefanKarpinski
Copy link
Sponsor Member

If we're going to allow uninitialized references to hang out as long as one feels like letting them hang out, then this strikes me as the most consistent behavior, honestly. It kind of sucks though because then you can have uninitialized values potentially propagating all over the place from a single accidentally uninitialized field that gets used, but I feel like that's kind of the corollary of allowing them to persist beyond object construction at all.

This is one of those decisions that it's really hard to predict which choice will in retrospect have been the right one. It feels very much like we could easily go one way and later really regret it as it becomes clear that the choice was either too lax or too strict. If we go lax, we can always get stricter, but we'd end up potentially breaking people's code. If we go strict, we can always relax things later without breaking anything. But implementing strict behavior is probably harder.

@JeffBezanson
Copy link
Sponsor Member Author

It's a tough one. But the lax behavior is the one that everybody has done before, and that the designers of Algol said they regret IIRC. A good rule of thumb is to raise errors as soon as possible (i.e. closer to the root cause). I just tried a cell array access microbenchmark, and the null checks seem to cost about 2%.

@StefanKarpinski
Copy link
Sponsor Member

Honestly, 2% is not bad. Nobody really cares if something is 2% slower (except in benchmarks competitions). These days, no one cares that Hadoop is absurdly inefficient, because it can scale, which is far more important. So maybe we should go strict and fix the error that the Algol folks feel that they made.

Then the question is becomes how strict to be. Is letting null references exist but checking for them on usage and throwing an exception the right trade off? Can you come up with an example of a case where letting an object be constructed with uninitialized references is really handy?

If we allow uninitialized references to persist beyond construction time, we will need to provide intrinsics to check if a field is initialized or not, if for no other reason, than that we would need that in order to display objects with uninitialized references without barfing. If there are a lot of other situations where checks like that would have to be made, it would seem like a fairly compelling argument for not allowing uninitialized fields to escape constructors. Think about how much of a mistake allowing the null value in Java is — it causes no end of corner cases and headaches. We do not want to find ourselves in that position when null is replaced with "uninitialized".

@StefanKarpinski
Copy link
Sponsor Member

This also interacts with the immutable issue: obviously you never want to construct an immutable object with uninitialized fields, because then it will be permanently uninitialized.

One idea would be to disallow the overriding of default inner constructor for immutable types — then you can only construct immutable objects by giving all field values to the default inner constructor. On the other hand, if inner constructors were never allowed to be return objects with uninitialized fields, then you could safely allow the definition of non-default inner constructors for immutable objects. However, that implies that the this object is mutable and then becomes immutable after the inner constructor returns. That's very weird and seems to create both a mutable and immutable version of every immutable type. I guess I prefer the former approach — disallow inner constructors for immutable types.

@StefanKarpinski
Copy link
Sponsor Member

Actually, looking at the Rational example above, this is more of an issue than I originally realized since Rational is a perfect candidate for immutability. So what is the type of this in the inner constructor? If it's Rational then it should be immutable, but it's clearly mutable.

@JeffBezanson
Copy link
Sponsor Member Author

Exactly; the only way to get rid of "null check hell" is to raise an error as soon as a null reference is accessed.

Checking at the end of a constructor is neither here nor there since you can still call functions during the constructor.

I have thought about the ability to check whether a reference is uninitialized, and I have hesitated. A good example is HashTable, which uses an Array to store the keys and values. Any value can be a hash key, so unused spots must be uninitialized. I use a bitvector to keep track of which spots are used. Sure it would be more efficient to have isassigned(a, i). But then whether something is uninitialized becomes part of your program's behavior, and the next thing you want is the ability to clear a reference back to uninitialized. I guess that's kind of a hybrid approach where you still need null checks, but when you don't do null checks errors are thrown sooner.

What I'm shooting for is to have no null pointers, period. In fact we currently have this with our struct types. But this comes at a cost of too little flexibility in constructing objects. My new proposed contract with the user is that there still may not be any null pointers, but you're on your own to make that happen since the language can no longer guarantee it. From that perspective it might make sense to check at the end of a constructor, but there is no analogous thing you can do for Arrays.

I don't care what happens when printing objects. Printing something as null is kind of dishonest because you can't do anything with that knowledge; accessing the element is still an error. The contract is that an uninitialized object is not allowed to exist, whatever happens is your own fault, but we only check at well-defined points. This is like how you can have prohibited appliances in freshman dorm rooms at Harvard as long as you hide them during vacations when room inspections take place.

@JeffBezanson
Copy link
Sponsor Member Author

For immutable objects a.x = y means to make a copy of a with a different value for field x. I guess that copy would have to be a special builtin operation that allows null references. So during the Rational constructor you might actually make 3 Rational objects, but in practice this would be optimized away.

@StefanKarpinski
Copy link
Sponsor Member

My objection to not being able to print or otherwise check for uninitialized fields is that without that ability, you'll have to completely guess where you have an initialized value. If they exist and you can't check for them, then you're really just sticking your head in the sand and pretending they're not there. And you can always just trap UndefRefErrors and then do something else in the catch block. How is that different from having an isassigned predicate? It's effectively the same but less efficient. Basically, if accessing an undefined field is an error, it does affect the program's behavior, regardless.

@StefanKarpinski
Copy link
Sponsor Member

I actually really dislike having a.x = y create a new object and replace a with it. I'm not sure what the benefit of that is. It seems to me that it would be much better to just disallow changing the fields of immutable objects altogether.

@JeffBezanson
Copy link
Sponsor Member Author

It's true that you can implement isassigned by catching the exception. That's probably enough to justify the predicate, so I will add it. On another level, I feel like you ought to know when things have been initialized. For example, in matlab people use exist('a') to see if they have set one of their local variables yet, and that always seemed really stupid to me.

A construct for "changing" a field of an immutable object is pretty common; even haskell has it. I feel like people will want to do things like assign the fields of a complex number, and do a[i].im=0. Otherwise you have to write a[i] = complex(a[i].re, 0). I don't much care myself but I think users will want this. We also need it for constructors; otherwise constructors for immutable types will have to use a different mechanism.

@JeffBezanson
Copy link
Sponsor Member Author

I'm having trouble with the constructor for RopeString. It uses unbounded mutual recursion among multiple constructor definitions. As far as I can tell the only way to do that with this change is to have the inner RopeString constructor do what new did, and do the recursion among outer generic function definitions for RopeString. But then we have lost abstraction, as the constructor that accepts all fields has become publicly visible.
The problem stems from the automatic constructing and returning of this. To be able to do what RopeString does we need to put constructing back under user control. We could make new take no arguments and return an uninitialized object. ?

@JeffBezanson
Copy link
Sponsor Member Author

Another possibility: reassign this and hope the unused constructed object is optimized away:

RopeString(h::RopeString, t::RopeString) =
    depth(h.tail) + depth(t) < depth(h.head) ?
        this = RopeString(h.head, RopeString(h.tail, t)) :
        (this.head = h;
         this.tail = t;
         ...)

Or a keyword argument?

RopeString(h.head, RopeString(h.tail, t), this=this)

though that one is a little leaky.

@StefanKarpinski
Copy link
Sponsor Member

I'm suspicious of hoping that the compiler is going to eliminate the creation of temporary objects. Even if it does manage to do so, imo, it puts too much distance between what's conceptually going on and what is actually going on.

If a constructor can call another constructor to do its work, then it seems to me like these are the options:

  • don't create a this object to be implicitly passed into the constructor and provide constructors with a way of creating uninitialized objects, e.g. a new thunk
  • explicitly pass this into constructors, thereby allowing one constructor to pass the buck to another constructor by passing its this instance to it
  • automatically share this between recursive constructor calls so that the recursive call is operating on the same this as the original call

I'm not really sure which is best :-|

@JeffBezanson
Copy link
Sponsor Member Author

Here's a weird one... have this.x=y in a constructor be special syntax that first checks whether this is assigned, and if not assigns it to a new instance. So nothing is allocated until you touch this. Clever because type inference is already able to remove undefined-var checks, but not able to remove object allocations. A bit strange but I'm starting to like this idea.

@StefanKarpinski
Copy link
Sponsor Member

Can you explain this bit:

Clever because type inference is already able to remove undefined-var checks, but not able to remove object allocations.

Other than that it seems reasonable enough, but I'm a bit hesitant to add magical behaviors...

@JeffBezanson
Copy link
Sponsor Member Author

this.x=y would be converted to this?=new(); this.x=y where ?= is a mythical operator that only evaluates the expression and does the assignment if this has not been set yet. Type inference can prove when a variable has been set, so most of the ?= expressions can be removed. For example:

if foo()
    this.x = 0
end
this.y = 1
this.z = 2
this.w = 3

The first one can use this=new(), because we know this has not been set yet. Only the second one requires a check, and the rest need no checks.

So we can have recursive constructors with no redundant allocations and allow uninitialized fields with very little overhead.

As per your recent email on hash tables, ?= could even become a full-fledged language feature. We could support it for any assignable thing. People might even like it for local variables, because sometimes you have complex logic that might end up setting some variable, and then later want something like "set it to this other thing if it hasn't been set yet". I've seen this in matlab code. ?= would let you avoid an extra flag. Since we already go to the trouble of checking for undefined variables, we might as well expose this capability to the user.

@StefanKarpinski
Copy link
Sponsor Member

So this is the basic plan:

  • allow undefined values
  • allow checking for undefined values
  • allow conditional setting of undefined values via ?=
  • throw an exception immediately on any use of an undefined value

This seems like a pragmatic combination. The last point is really important because it prevents "null check hell" and prevents undefined values from silently propagating through a program, thereby also avoiding many null checks, since anything that's been used cannot possibly be null.

JeffBezanson added a commit that referenced this issue Jun 13, 2011
@JeffBezanson
Copy link
Sponsor Member Author

A couple weeks ago Stephan B. observed that you might have a constructor like:

MyType() = return this

So we actually need to convert every occurrence of this to (this?=new(...); this). Might seem extreme, but perfectly doable.

Is the design settled? It seems a bit crazy but I feel like we really want recursive constructors. It's a clever feature. In other languages you have to do things like Class::Create(...) when you need this ability.

@StefanKarpinski
Copy link
Sponsor Member

This whole thing still feels fishy to me, but I don't have anything better :-(

@StefanKarpinski
Copy link
Sponsor Member

You know, I'm thinking that having a new() function that creates an uninitialized object feels better than this. It feels really wrong the way this is special here. Providing a new() function is also is the least magical solution, which is nice — it's perfectly understandable and explainable. We could allow optional arguments to new() which could be used to assign fields if they're given. That's frequently a lot clearer and more concise.

@StefanKarpinski
Copy link
Sponsor Member

Using this approach, the default constructor would just be this:

type Foo
  bar
  baz

  Foo(bar,baz) = new(bar,baz)
end

This would be provided by default, of course, so the above definition would be unnecessary. Here are various examples to try this on and see how it feels...

type MyType
  self::MyType
  MyType() = (this = new(); this.self = this)
end

A little verbose but pretty clear.

type Rational{T<:Int} <: Real
     num::T
     den::T

     Rational(n::Int, d::Int) = (g = gcd(n,d); new(div(n,g),div(d,g)))
 end

 Rational{T}(n::T, d::T) = Rational{T}(n,d)
 Rational(n::Int, d::Int) = Rational(promote(n,d)...)
 Rational(n::Int) = Rational(n,one(n))

Clean, clear, short. I like it.

function RopeString(h::RopeString, t::RopeString) =
    depth(h.tail) + depth(t) < depth(h.head) ?
        RopeString(h.head, RopeString(h.tail, t)) :
        new(h, t, max(h.depth, t.depth)+1, length(h)+length(t))

Also simple, clear and concise.

@StefanKarpinski
Copy link
Sponsor Member

This would also jive well with immutable objects — you create them in their final form using new() and then can't change them. This would mean that immutable objects cannot be self-referential, which is actually rather nice.

@JeffBezanson
Copy link
Sponsor Member Author

It's a good point that new() is actually pretty nice and concise when it works. And it's perfect for immutable objects---just a matter of making the arguments to new required instead of optional. Elegant. Yes, this is the better solution.

My only remaining problem is that I want to rename the clone function to new. clone is a terrible name, since it doesn't clone but makes new uninitialized objects. Can we think of another name, or some other way to get around this conflict?

@StefanKarpinski
Copy link
Sponsor Member

For posterity, it should be noted that half of this discussion took place in this email thread.

@StefanKarpinski
Copy link
Sponsor Member

My only remaining problem is that I want to rename the clone function to new. clone is a terrible name, since it doesn't clone but makes new uninitialized objects. Can we think of another name, or some other way to get around this conflict?

I agree that clone is a lousy name for this but I don't really think new is much better for that purpose — and new is clearly a good name for the function that makes new objects. I always do a double-take and have to remember what clone actually does every time I see it. I've always felt that it was weird and slightly fishy that arrays have their own special construction function. It feels like it's begging for a cleaner and/or more general solution. How about calling it make_new or make or new_like or something like that?

@JeffBezanson
Copy link
Sponsor Member Author

It's not supposed to be a special constructor for arrays. It's part of the mutable container interface. It lets you write generic container manipulations like

function replace(c, a, b)
  n = make_new(c)
  for k = keys(c)
    n[k] = (c[k]==a ? b : c[k])
  end
  n
end

How about create or alloc?

@JeffBezanson
Copy link
Sponsor Member Author

branch merged in commit 587f36b. fixed.

StefanKarpinski pushed a commit that referenced this issue Feb 8, 2018
Add sizehint!, fix version number for atsign-noinline
StefanKarpinski pushed a commit that referenced this issue Feb 8, 2018
StefanKarpinski pushed a commit that referenced this issue Feb 8, 2018
Upgrade Pkg3 and ext dependencies to 0.7
cmcaine pushed a commit to cmcaine/julia that referenced this issue Sep 24, 2020
KristofferC pushed a commit that referenced this issue Aug 25, 2023
Stdlib: ArgTools
URL: https://github.com/JuliaIO/ArgTools.jl.git
Stdlib branch: master
Julia branch: master
Old commit: 08b11b2
New commit: 4eccde4
Julia version: 1.11.0-DEV
ArgTools version: 1.1.1 (Does not match)
Bump invoked by: @DilumAluthge
Powered by:
[BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl)

Diff:
JuliaIO/ArgTools.jl@08b11b2...4eccde4

```
$ git log --oneline 08b11b2..4eccde4
4eccde4 build(deps): bump actions/checkout from 2 to 3 (#30)
6a4049d build(deps): bump codecov/codecov-action from 1 to 3 (#32)
f94a0d3 build(deps): bump actions/cache from 1 to 3 (#31)
cb66300 enable dependabot for GitHub actions (#29)
```

Co-authored-by: Dilum Aluthge <dilum@aluthge.com>
KristofferC pushed a commit that referenced this issue Aug 25, 2023
…1047)

Stdlib: NetworkOptions
URL: https://github.com/JuliaLang/NetworkOptions.jl.git
Stdlib branch: master
Julia branch: master
Old commit: f7bbeb6
New commit: 976e51a
Julia version: 1.11.0-DEV
NetworkOptions version: 1.2.0 (Does not match)
Bump invoked by: @DilumAluthge
Powered by:
[BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl)

Diff:
JuliaLang/NetworkOptions.jl@f7bbeb6...976e51a

```
$ git log --oneline f7bbeb6..976e51a
976e51a Use human-readable title in the docs (#30)
895aee9 Update ssh-rsa key for github.com (#29)
db83efd fix an issue found by JET (#28)
```

Co-authored-by: Dilum Aluthge <dilum@aluthge.com>
Keno pushed a commit that referenced this issue Oct 9, 2023
Also fixes a bug when trying to run a :thunk expr in Compiled mode.
IanButterworth pushed a commit that referenced this issue Apr 11, 2024
…d56027 (#54056)

Stdlib: ArgTools
URL: https://github.com/JuliaIO/ArgTools.jl.git
Stdlib branch: release-1.10
Julia branch: backports-release-1.10
Old commit: 08b11b2
New commit: 5d56027
Julia version: 1.10.2
ArgTools version: 1.1.2(Does not match)
Bump invoked by: @IanButterworth
Powered by:
[BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl)

Diff:
JuliaIO/ArgTools.jl@08b11b2...5d56027

```
$ git log --oneline 08b11b2..5d56027
5d56027 build(deps): bump julia-actions/setup-julia from 1 to 2 (#38)
b6189c7 build(deps): bump codecov/codecov-action from 3 to 4 (#37)
997089b fix tests for TEMP_CLEANUP, which might be a Lockable (#35)
4a5f003 build(deps): bump actions/cache from 3 to 4 (#36)
84ba9e8 Hardcode doc edit backlink (#34)
9238839 build(deps): bump actions/checkout from 3 to 4 (#33)
4eccde4 build(deps): bump actions/checkout from 2 to 3 (#30)
6a4049d build(deps): bump codecov/codecov-action from 1 to 3 (#32)
f94a0d3 build(deps): bump actions/cache from 1 to 3 (#31)
cb66300 enable dependabot for GitHub actions (#29)
```

Co-authored-by: Dilum Aluthge <dilum@aluthge.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants