Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make Option[T] sane again for pointer types #18401

Closed

Conversation

timotheecour
Copy link
Member

@timotheecour timotheecour commented Jun 30, 2021

this PR #6253 introduced an "optimization" that made Option fundamentally flawed; it was even reported in review comments in that PR (refs #6253 (comment)) but this problem hasn't been addressed, till now.

my PR fixes this so that some(nil).isSome remains true and Options remain sane.

Specialing Option for SomePointer was a case of premature optimization similar to std::vector<bool> fiasco in C++ (https://isocpp.org/blog/2012/11/on-vectorbool + other; EDIT: interestingly, i see that #6253 (comment) made the same observation). It breaks usability of Option[T] as a way to provide a value outside of the valid range of values of T, ie, none and some(nil) should be different.

example 1

for a table-like API with an Option API to retrieve a value for a key (possibly absent), and API like this could make sense:

proc getOpt[K, V](key: K): Option[V]

but, without this PR, it wouldn't allow distinguishing whether nil is stored in a key or whether the key is absent.

example 2

proc fn[T](a: int, b: Option[T] = none(T)) =
  if b.isNone: handleNoValue()
  else: ...

without this PR, we wouldn't be able to distinguish whether caller specified a nil value for b or whether no value (or none(T)) was specified. nil should be legal and not be conflated with "use default param".

example 3

std/wrapnils has an API that returns an Option: ??., but can't distinguish between those 2 cases:

  • there was an intermediate nil in the chain
  • there was no intermediate nil in the chain but the last value is nil

eg: if ??.f1.x1[].bar.isNone, we can't tell whether evaluating f1.x1[].bar would result in SIGSEGV or not; with this PR, we would be able to distinguish those 2 cases.

example 4: breaks monad laws of Option[T] (associativity)

nim's design suffers from the same flaw as Optional in java, which conflates a valid value with None, breaking monad laws.
There's been numerous articles about this, eg:

here's an illustration of the broken monad law, where a.map(f1).map(f2) != a.map(f2 o f1):

when true:
  import std/[options, sugar]

  type Foo = ref object
  var g = [Foo(), nil, Foo()]

  proc fn3(x: int): Foo = g[x]
  proc fn4(x: Foo): string =
    if x == nil: "foo" else: "bar"
  echo some(0).map(x => fn4(fn3(x))) # some("bar")
  echo some(0).map(fn3).map(fn4) # some("bar")

  echo some(1).map(x => fn4(fn3(x))) # some("foo")
  echo some(1).map(fn3).map(fn4)
    # BUG: should show same thing: `some("foo")` (according to monad laws)
    # instead:
    # with -d:danger, returns none(string), else, raises:  `not val.isNil`  [AssertionDefect]

note 1

instead of special casing Option for SomePointer, this PR introduces a maybeOption that depending on whether T is SomePointer, returns an Option[T] or T; this allows client code to choose whether they want to optimize out Option for SomePointer in the cases where conflating nil and none is ok, without otherwise impacting code using Option[T] for T: SomePointer.

note 2

It is indeed a breaking change, but so was #6253 when it was introduced, but this breaking change is a needed bugfix. Only 1 important_package was broken, which I'm fixing in PMunch/nim-optionsutils#6 (it was in fact just a 1-line test on the implementation of option: let tmp = option(intref); check(sizeof(tmp) == sizeof(ptr int))

other languages

it seems only java has this broken behavior (Optional in java is widely reported as flawed in large part because of this); other languages either don't allow nil references, or allow some(nil) (and such that some(nil) != none)

#include <functional>
#include <iostream>
#include <optional>
int main(){
  auto a = std::optional<int*>{nullptr};
  auto b=a.has_value();
  std::optional<int*> c{};
  std::cout << a.has_value() << " " << c.has_value() << " " << std::endl; // prints 1 0
}

Option<&T> is optimized to a normal pointer to T, but that makes no problems, since &T can't be null in Rust. For "normal" pointers, Option<*mut T> isn't optimized

indeed:

use std::ptr;
fn main() {
  let data: u32 = 42;
  let mut raw_ptr = &data as *const u32;
  let null_ptr = ptr::null() as *const u32;
  raw_ptr = null_ptr;
  let opt1 = Some(raw_ptr);
  println!("{:?} unwraps to {:?} {:?}", opt1, opt1.unwrap(), opt1.is_some());
  // prints: Some(0x0) unwraps to 0x0 true
}
  • scala: Some(null) == None is also false

links

future work

  • consider special casing not nil types, for which nil is not in the valid range
  • consider adding cstring to SomePointer, or even better, using something like:
    when compiles(a == nil): (or overloadExists from resolveSymbol(foo(args)) and overloadExists(foo(args)) to return symbol after overload resolution #12076) to decide instead of SomePointer, so that it'll auto-work for not nil types
  • consider adding the equivalent of D's struct Nullable(T, T nullValue) ; which allows specifying at CT a sentinel value treated as null, for which the optimization is ok

@konsumlamm
Copy link
Contributor

In case someone wants to bring up Rust as a counterexample: Yes, Option<&T> is optimized to a normal pointer to T, but that makes no problems, since &T can't be null in Rust. For "normal" pointers, Option<*mut T> isn't optimized (so Rust is in fact another example in favor of this change), only for types where 0/null is not a valid value (which is part of the reason why NonNull exists).

Now if not nil is stabilized, we could think about applying the optimization to those types, since that shouldn't make any problems.

timotheecour added a commit to timotheecour/nim-optionsutils that referenced this pull request Jun 30, 2021
@timotheecour timotheecour changed the title WIP, do not review yet make Option[T] sane again for pointer types Jun 30, 2021
@timotheecour timotheecour marked this pull request as ready for review June 30, 2021 23:41
@timotheecour timotheecour added Ready For Review (please take another look): ready for next review round TODO: followup needed remove tag once fixed or tracked elsewhere labels Jul 1, 2021
@timotheecour timotheecour requested a review from Araq July 2, 2021 17:03
@arnetheduck
Copy link
Contributor

This breaks API that rely on Option[ptr T] to guarantee that nil is not present - it's a form of not nil, if you will - when you check that isSome is true, you know for certain that the pointer is not nil, which is a feature that code that uses it can rely on and thus doesn't have to do its own checks.

This is not so much a matter of whether the current options module makes sense or not, but having the standard library as a playground for crusades makes it very difficult to use it in actual software - unfortunately, breaking changes like these are tightly coupled to bug and security fixes. This is a more and more frequently occurring issue.

If you want to change the behavior of core modules in the library, why not start an alternative package instead where you can have full freedom? If your behavior has sufficient merit, people will migrate to it.

@timotheecour
Copy link
Member Author

timotheecour commented Jul 3, 2021

That's an API misuse; use a type NotNil[T: ptr|ref|pointer] = distinct T for your use case. std/options was not intended for a not nil wrapper, but for an option type, as is the case in almost all other languages. The fact that #6253 introduced that regression in the 1st place (as an "optimization", and without thinking about its consequences) doesn't change this fact.

Creating an pkg/options with the correct behavior wouldn't make sense either, Options is the one we have in stdlib and without this bugfix, APIs based on top of it in stdlib would be broken, eg

proc getOpt*[K,V](t: Table[K, V], key: K): Option[V]

as they wouldn't be usable to distinguish whether a key is absent or wether it's present but with nil value; after this bugfix you can.

If you want to add an Option.isSomeNotNil, we could potentially add that (although it'd probably be best represented as a different type NotNil to avoid the extra bool storage cost), but don't conflate it with isSome which just repeats the java mistake that makes Options useless with Pointer-like types.

@arnetheduck
Copy link
Contributor

arnetheduck commented Jul 3, 2021

was not intended for a not nil wrapper

This is your personal interpretation - I understand that you disagree, and why - if someone was writing the module today, I would even agree with your rationale - this is beside the point however: the current code however tells a different story - as it's written, this is a guarantee that it provides irrespective of what you consider to be the right behaviour, and because it's been there for a while (since before 1.0 for example), other code has come to depend on this.

Also, if you feel you need the std lib subsidy for your code, you can always start an options2 module that leaves the current version semantically unchanged, and you can write your "correct" version as you like it - this is a win-win - you get your interpretation in there and us users with skin in the game can migrate at our own pace, should we choose to.

@timotheecour
Copy link
Member Author

timotheecour commented Jul 4, 2021

Having both std/options and std/options2 will cause far more problems down the line: soon enough you'll need getOpt and getOpt2 APIs that return or consume an Option, or break code that uses things like proc initFoo[T](a: T, b: Option[T] = none(T)) = discard or types using Option[T].

Instead this PR fixes the regression introduced in #6253 so that it behaves like an Option type again (as in all languages except java); for packages that rely on not nil semantics we can either add a Nullable[T] type (in std/nullables) or add a isSomeNotNil proc to std/options.

Note that the following invariant is preserved:

type A = ref object
var a: A
let a1 = option(a)
assert a1.isNone # both before and after PR
let a2 = some(a) # before PR: raise UnpackDefect, after PR: works, and a2.isSome is true

@arnetheduck
Copy link
Contributor

You might call it a regression, but it's nonetheless a behavior that exists since the stable release of the language. It's not a bugfix, ie there are no crashes involved, and it's not a security fix - it's simply that you desire a different behavior to accommodate your other needs.

This is why I advocate that you make your own packages outside of the standard library: there, you're free to break things and move fast - your changes only affect those that choose to upgrade, and that's it.

In the standard library, a higher requirement for quality and stability applies: here, necessary changes and the good, non-breaking work of a whole community is bundled - the changes affect a much wider audience that uses the code in a different way than you, irrespective of the ideal you strive for. It's simply an unfortunate behavior that you have to live with, but can also exploit, in this case to exploit semantic guarantees and therefore simplify code that uses Option.

There are many unfortunate design flaws in the standard library - this is one of the smaller ones, but it is what it is - the key when making changes is to make them in an additive manner such that the existing behaviors and guarantees remain. It takes a bit more effort, but that's it.

@timotheecour
Copy link
Member Author

timotheecour commented Aug 24, 2021

@Araq ping on this; this PR has 11 upvotes and lots of people consider the module std/options broken and useless in generic code because of the aforementioned regression fixed by this PR.

This PR is on the critical path for stdlib API's that make use of options, eg for tables.

@arnetheduck
Copy link
Contributor

We are using the guarantees that Option provides to reason about the safety of the code that uses it - the ability to reason about behavior and not having it change is an important step for writing stable and predictable code in nim in general - the fact that nil is not a valid value is a useful property for Option to have - this change in particular introduces a significant semantic regression.

There will always be one more breaking change that you can introduce, which will seem critical from a constrained point of view - it's actually fine though, it's quite possible to software based on the current behavior, just not all the software you want - for some use cases, you need a different module than Option with a different behavior, or perhaps a different API on top of Option that is additive with respect to the current API, as opposed to destructive.

@timotheecour
Copy link
Member Author

timotheecour commented Aug 25, 2021

A working options type that works in generic code is a rather fundamental data structure in a stdlib, without which we can't write something like proc getOpt[K, V](key: K): Option[V] in std/tables or similar.

Please vote as a reaction to this comment:

  • option 1 (:+1:): fix std/options similarly to what's done in this PR, with possible transition flags and issuing a warning for code locations that would trigger a change of semantics (ie when Option[T]. isNone is called for a ptr-like type T); an API can be added isSomeNotNil as part of this proposal.
  • option 2 (:eyes:) : introduce a new module std/options2 or a different module name (please suggest a name)
  • option 3 (:confused:) : something else: please explain in a comment

The worst option is to do nothing at all about this.

@arnetheduck
Copy link
Contributor

if you want to introduce voting as a way of governance of the project, I suggest you write an RFC first where you detail how you plan to ensure the fairness of the vote (ie reaching a representative amount of voters, who is allowed to vote etc), what majority should be achieved and what the criteria should be for calling a vote - there's quite a bit of literature to take in on this subject, so I just listed the first things off the top of my head - otherwise, for technical matters, https://datatracker.ietf.org/doc/html/rfc7282 is a good start.

@Araq
Copy link
Member

Araq commented Aug 25, 2021

Please vote as a reaction to this comment

Voting about this is futile. We need to find the best solution. My "proposal" (it needs to be refined):

Introduce Opt[T] that works as we need it to work. The implementation needs to use case objects so that it works for types which lack default(T) (these types are part of Nim, consider var int, it has no default value).

@arnetheduck
Copy link
Contributor

Introduce Opt[T] that works as we need it to work.

for inspiration: https://github.com/status-im/nim-stew/blob/3c91b8694e15137a81ec7db37c6c58194ec94a6a/stew/results.nim#L258 - we define Opt[T] as Result[T, void] - this is nice because it allows the same api to be used across the two types which simplifies passing them around and converting back and forth - there's a progression here where we add more and more information where Result is a more general Option (and Option is a more general bool etc)

Incidentally it works like this PR proposes (without special nil support).

@arnetheduck
Copy link
Contributor

One more way to get the desired behavior for pointers/refs:

type Nilable[T: ref|pointer] = tuple[data: T]

This, combined with a few operators like [] to dereference etc can be used to work around the type magic in Option - it's even nicer in a way because you get Option[Nilable[ref MyType]] and you document that the final value has 3 states: unset, set to nil, set to not-nil.

@ringabout
Copy link
Member

see also one of its usage => #19719 (review)

@ringabout ringabout removed Vote Ready For Review (please take another look): ready for next review round labels Apr 13, 2022
@ringabout
Copy link
Member

ready for review

@Varriount
Copy link
Contributor

@xflywind Does this make nil a valid value for Option?

@ringabout
Copy link
Member

@Varriount Yeah, I think so. It means no special/surprsing case for pointer like types.

@Araq
Copy link
Member

Araq commented May 6, 2022

@xflywind This breaks code unnecessarily in the worst case possible, I think. How about we add a new type like Opt[T] and deprecate the old module?

@ringabout
Copy link
Member

Ok.

@arnetheduck
Copy link
Contributor

How about we add a new type like Opt[T] and deprecate the old module?

fwiw, Opt[T] as a special case of Result[T, E] has worked well for us - one can think of a progression of error handling support which goes like bool -> Opt[T] -> Result[T, string] -> Result[T, enum] -> Result[T, object] -> Exception in terms of cost (cognitive, performance) vs features (information, static vs dynamic etc) - Opt[T] implemented this way is available stand-alone from https://github.com/arnetheduck/nim-result for anyone interested.

@ringabout
Copy link
Member

Rejected, I take over.

@ringabout ringabout closed this Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TODO: followup needed remove tag once fixed or tracked elsewhere
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Option[T] should not special case ptr-like types, so that some(nil).isSome is true
6 participants