Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactoring and yak shaving. #13687

Closed
wants to merge 45 commits into from

Conversation

krux02
Copy link
Contributor

@krux02 krux02 commented Mar 19, 2020

  • remove lib/pure/collections/chains.nim (not used nor usable for anything)
  • made internal sequtils.evalOnceAs and tables.ctAnd dispensable and removed it.
  • made astalgo.debug on builtin core types less verbose.
  • refactored sequtils.toSeq to be simpler (and probably faster).
  • provide a $ operator for every type, including distinct and ptr, ref and proc. Even though $ on proc isn't particularly informative, it won't reject to compile.
  • fixed compiler crash of illegal recursive type: CyclicSeq = seq[ref CyclicSeq]
  • added more than 130 lines of new test code

This should make future code simpler to write as you won't need to check anymore if a type supports a $ operator, when it is available for everything.

@timotheecour
Copy link
Member

timotheecour commented Mar 19, 2020

provide a $ operator for exery type, including distinct and ptr and ref.

I don't think recursing inside ptr/ref is desirable

example1

# D20200319T034809:here
type Foo = ref object
  bar: Foo
  x: int
var foo = Foo(x: 12)
foo.bar = foo
echo foo

causes Error: call depth limit reached EDIT: produces (bar: ..., x: 12)

example2

# D20200319T035242:here
type Bar = ref object
  s: string
type Foo = ref object
  bar1: Bar
  bar2: Bar
  x: int

var bar = Bar(s: "some long string that gets duplicated")
var foo = Foo(x: 12, bar1: bar, bar2: bar)
echo foo

=>

(bar1: (s: "some long string that gets duplicated"), bar2: (s: "some long string that gets duplicated"), x: 12)

this is a common case (eg graphs, edges etc)

Not only do you duplicate but you also can't tell whether bar1, bar2 have same address or not.

proposal

instead, you can print the hex-formatted address of ref/ptr/pointer that are not at top-level
so that here it'd print

(bar1: 0xdeadbeef, bar2: 0xdeadbeef, x: 12)
or (see https://github.com/nim-lang/Nim/pull/13687#discussion_r394974115)
0xdead1234->(bar1: 0xdeadbeef, bar2: 0xdeadbeef, x: 12)

or, you could maintain a hashtable of what's been already expanded and only use address if it's already been printed:

0xdead1234->(bar1: 0xdeadbeef->(s:"some long string that gets duplicated"), bar2: 0xdeadbeef, x: 12)

if arg == nil:
"nil"
else:
$arg[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be some difference between echo somePtrToInt and echo someInt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ptr someType maybe print the address and the object it points to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean you would like to have an indirection sign like "-> 1" instead of "1"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, my github was slow I didn't see your second message. I actually thought about printing the address, but honestly most of the time you really do not care about an address. Addresses are long and rarely what you really care about. That is why I omitted it. I also didn't indicate that the object is a pointer, because for me being a pointer is part of the type, and types aren't printed in string representation.

Copy link
Member

@timotheecour timotheecour Mar 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because for me being a pointer is part of the type, and types aren't printed in string representation.

I agree on this point

I actually thought about printing the address, but honestly most of the time you really do not care about an address.

i disagree on this point, with reference types the address is what matters most since equality (and hence identity) is base on address instead of its dereferenced content. Eg if you want to know whether 2 nodes are equal in a graph. This relates to my point in #13687 (comment)

so for top-level types you'd print:

type Foo = ref object
  x: int
  f2: Foo
doAssert $Foo(x: 2) == 0xdeadbeef->(x: 2, f2: 0xdead0000)

analog to how repr does

Furthermore, this natually extends to pointer (where the only thing we can do is print address)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well for pointer I agree, the address is what should be printed and that is also what I do print here in this PR. But for everything else, pointer addresses are just visual noise, nothing else. And if you do care about an address you can still do: $cast[pointer](x) on it.

@timotheecour
Copy link
Member

timotheecour commented Mar 22, 2020

system.compiles should be replaced by system.supportsOp, supportsField, canImport which is not covered by an RFC.

overloadExists first exactly the bill for that, see #12076 that I've upgraded to support both foo(args) and foo.bar (etc), and also provides a way to return the symbol after resolution (via resolveSymbol)

@timotheecour
Copy link
Member

timotheecour commented Mar 23, 2020

as of your latest commit bd9694e, you've introduced already 7 regressions that I found, affecting many of the modules you've changed, who knows how many other ones there are.

regression 1

when true:
  import sequtils
  var s = newSeq[int](3)
  proc fun() =
    let s2 = toSeq(s)
    s[0] = 10
    doAssert s2 == @[0,0,0]
  fun()

before PR: this works
after PR: assert fails

regression 2

when true:
  import sequtils
  import times
  let n = 10_000_000
  let s = newSeq[int](n)
  proc fun(a: openArray[int]) =
    let s = toSeq(a)
    doAssert s[^1] != -1
  let t = cpuTime()
  for i in 0..<100: fun(s)
  echo cpuTime()-t

(with -d:danger)
before PR: 1.371333 seconds
after PR: 7.895341999999999 seconds

regression 3

when true:
  type Foo = object
    age: int
    s: string
    internal: seq[ptr Foo]
  var foo = Foo(age: 20, s: "bob")
  foo.internal = @[foo.addr]
  echo foo

before PR: (age: 20, s: "bob", internal: ...)
after PR: crashes with -d:danger, Error: call depth limit reached without -d:danger

regression 4

when true:
  type Foo = object
    age: int
    s: string
    internal: seq[ptr int]
  var foo = Foo(age: 20, s: "bob")
  let px = foo.age.addr
  foo.internal = @[px, cast[ptr int](cast[int](px) + 1*sizeof(int))]
  echo foo
  foo.internal.add cast[ptr int](cast[int](px) + (int.high div 10)*sizeof(int))
  echo foo

before PR:
(age: 20, s: "bob", internal: ...)
(age: 20, s: "bob", internal: ...)

after PR:
(age: 20, s: "bob", internal: @[20, 4382871624])
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

regression 5

when true:
  import tables
  var t: OrderedTable[int, int]
  t[0] = 0
  clear(t)
  doAssert 0 notin t

before PR: works
after PR: assert fails

regression 6

when true:
  import sharedtables
  import tables
  var t: SharedTable[int, int]
  var t0: Table[int, int]
  t0[0] = 0
  t[0] = 0

before PR: works
after PR: Error: unhandled exception

regression 7

nim c --gc:arc main

when true:
  type Foo = ref object
    id: int
    a: seq[Foo]
  var x = Foo()
  echo $x[]

before PR: (id: 0, a: ...)
after PR: Error: expression cannot be cast to pointer

regression 8

related to regression 4

when true:
  import tables
  import hashes
  type Foo = object
    id: int
    internal: ptr Foo
  proc hash(a: Foo): Hash = a.id
  var t: Table[Foo, int]
  proc main()=
    var f {.noinit.}: Foo
    f.id = 10
    try:
      echo t[f]
    except Exception as e:
      echo "caught"
  main()

before PR: caught
after PR: SIGSEGV: Illegal storage access. (Attempt to read from nil?)
in this particular case adding proc $(a: Foo): string = fallback(a) would circumvent this but in more complex cases this could be harder to work around

overall comment

Auto dereferencing ptr T should be controlled by a parameter, otherwise you're making it impossible to print some types. Which is why any fancy stringification is best delegated to a separate proc, not the builtin one. This would also enable using a HashSet to avoid printing the same address multiple times as I mentioned in #13687 (comment)
This would alleviate issues like custom $ not being picked up.

@krux02
Copy link
Contributor Author

krux02 commented Mar 23, 2020

@timotheecour thanks a lot for your last review and extensive tests. They are really valuable.

regression 1

This looks like a compiler bug to me. After all toSeq(s) is substituted with just s and then assigned to s2. This means s2 is supposed to be a copy of s, but it isn't. Good thought you spotted it. Really, I would never have thought about testing on that.

regression 2

Thanks a lot. Easy fix.

regression 3

Good catch.

regression 4

You are trying to print invalid pointers am I right? Well it does crash. But shouldn't it be ok to crash? after all these pointers are invalid pointers. After all you can crash ref equally, and you can even crash string printing if you force it to have invalid data:

var myString = "abc"
cast[ptr tuple[len,cap: int]](myString).len = -10
echo myString

regression 5

I really din't think I would have changed anything on how tables work. How did you catch that one? Thanks a lot of finding it.

@timotheecour
Copy link
Member

You are trying to print invalid pointers am I right? Well it does crash. But shouldn't it be ok to crash? after all these pointers are invalid pointers.

ptr T shouldn't necessarily point to valid data; they could point to garbage data from the stack or returned from a C callback, depending on user code eg for optimizations. The problem is $ can be implicit, see regression 8.

compiler/astalgo.nim Outdated Show resolved Hide resolved
@krux02
Copy link
Contributor Author

krux02 commented Apr 15, 2020

@timotheecour regarding regression 6 you posted. I am a bit puzzled by it.

The code is clearly written in a way, so that shared tables can only be used after they are initialized.
You see this line here https://github.com/nim-lang/Nim/blob/devel/lib/pure/collections/tableimpl.nim#L34

compiles(defaultInitialSize) tests if there is a defaultInitialSize declared, and in shared tables, there is no defaultInitialSize declared. compiles(defaultInitialSize) evaluates to false there. But because of weird logic and when and how generics are evaluated, this compiles(defaultInitialSize) eventually becomes true at the time, when `[]=` is instantiated. This initializes the shared table. Is this good? No, not at all. Because the initialization code initializes the lock after the lock has been set by `[]=`. On some platforms initializing a lock is a noop and it will just work magically, but effectively lazy initializing the shared table on usage is a bug that I fixed in this PR without knowing. The regression was probably introduced when something in generics/template symbol resoltution logic was changed, or it never worked. Can't tell right now. I think this also serves as a good example of why when compiles expressions are so bad.

when true:
  import sharedtables
  import tables
  var t: SharedTable[int, int]
  var t0: Table[int, int]
  t0[0] = 0
  t[0] = 0 # this is supposed to raise an exception.

@timotheecour
Copy link
Member

timotheecour commented Apr 21, 2020

regarding regression 6 you posted. I am a bit puzzled by it.

good observation. i found the root cause:

So indeed, regression 6 was working for the wrong reason

@stale
Copy link

stale bot commented Apr 21, 2021

This pull request has been automatically marked as stale because it has not had recent activity. If you think it is still a valid PR, please rebase it on the latest devel; otherwise it will be closed. Thank you for your contributions.

@stale stale bot added the stale Staled PR/issues; remove the label after fixing them label Apr 21, 2021
@stale stale bot closed this May 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Staled PR/issues; remove the label after fixing them
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants