Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use deterministic hashtables #1235

Merged
merged 1 commit into from
Apr 7, 2023
Merged

Use deterministic hashtables #1235

merged 1 commit into from
Apr 7, 2023

Conversation

yannham
Copy link
Member

@yannham yannham commented Apr 7, 2023

The issue

Nickel uses the builtin Rust hashtable everywhere. This hashtable is randomized, and in particular the iteration order is random (across evaluations). For some hashtables used solely as maps, this doesn't matter (say the tables of the cache, mapping UID to file content or parsed term).

However, this makes for a quite bad experience with respect to typechecking: because typechecking fails on the first error, when typechecking a record, the error depends on the order of iteration (if there are several typechecking errors, but this include unbound identifiers, so it's far from uncommon). So, you see an error, try to fix it, re-run Nickel. You get a different error somewhere else - yay, one error less! Then, at the next run, the initial error comes again, because you didn't really fix it.

Beside typechecking, exporting also forces records, and the order of iteration is relevant again if several fields fail to evaluate - the same issue as with typechecking.

Introducing IndexMap

This PR introduces indexmap::IndexMap, an external hashmap which is a drop-in replacement for Rust's hashmaps, use the same algorithm under the hood, is advertised as being on par with the builtin hashmap (tested on rustc) but guarantees that the order if iteration is the order of insertion (which corresponds, for record fields, to the actual order of the original source code).

We still sort fields for exports and other products of Nickel: for example, if records are merged, trying to guarantee a consistent order for the result is harder (and what could be this order, anyway?). Thus, the order guaranteed by IndexMap is mostly here to impose determinism (and it's a nice plus that it usually corresponds to the intuitive order the things were originally defined in), but isn't a replacement for sorting fields in some cases.

Because the type indexmap::IndexMap now appears in several places of the API, the module term re-export it, so that consumers (e.g. the LSP) can reuse it without having to manually depend on indexmap.

The hashtables not related to a term or to something user-facing (the one of the caches, typically) still use std::collections::HashMap.

Does this work?

Here is a simple test:

{
  foo : Number = "a",
  bar : String = 5,
}

Run this example 20 times on master, and you'll most likely see an even distribution of errors pointing to either foo or bar. With this PR, we always get an error on foo.

@yannham yannham force-pushed the task/use-orderered-hashmaps branch from 97ccfb0 to 3521698 Compare April 7, 2023 15:41
@github-actions github-actions bot temporarily deployed to pull request April 7, 2023 15:44 Inactive
This commits use a drop-in replacement for std::collections::HashMap.
The std hashmap has a non-deterministic iteration order, which makes for
a non-deterministic error reporting (when typechecking a file with
multiple errors for example, running Nickel several time on the same
file gives a different output, which is quite bad for the user, who
might think they solved an issue while they didn't).

IndexMap uses the same algorithm and is on par performance-wise, but has
a deterministic iteration order (which is insertion order). Stuff not
related to records or types (other hasmaps used for example in the cache
to map file ids to parsed terms) are left unchanged.
@yannham yannham force-pushed the task/use-orderered-hashmaps branch from 3521698 to 4dc59c8 Compare April 7, 2023 15:48
@github-actions github-actions bot temporarily deployed to pull request April 7, 2023 15:53 Inactive
@yannham yannham merged commit d723a37 into master Apr 7, 2023
@yannham yannham deleted the task/use-orderered-hashmaps branch April 7, 2023 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants