Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: deterministic evaluation of expressions/derivations #709

Closed
wants to merge 20 commits into from

Conversation

fkz
Copy link
Contributor

@fkz fkz commented Nov 19, 2015

This pull request introduces an option to nix-instantiate and nix-env to record all impure dependencies of a nix expression and save it in a json file, thus getting a "pure" nix expression, so it can later be replayed.
This is originally from #553 and a discussion on nixconf with @copumpkin.
We later started hacking this on the following sprint with help from @edolstra.

Usage

Currently, it can be invoked for example like this:

nix-build --record

Then an additional derivation /nix/store/...-source-closure gets build which contains a json file consisting of all impurities associated with this build including a closure of all used build files. For example nix-build --record -A vim "<nixpkgs>" results in the following json file (pretty printed with python -m 'tool.json').
With nix-store --export $(nix-store -qR /nix/store/5wxzlhm2s98rq97ml96bsb0j4d2ydy3k-source-closure) we can then bundle the complete source closure. (the dependencies can be seen with nix-store -q --tree /nix/store/...-source-closure/)

In the json file, there are basically the command line arguments to nix-instantiate, calls of impure functions with results (so they can be replayed) and a mapping from paths to store-paths for included files (like .nix, .patch etc.), the closure consists of all the filepaths which are values in the sources map (the keys are not needed).
This should then be reproducable with

nix-build --replay /nix/store/...-source-closure/nix-support/source

This command should currently be usable with nix-instantiate, nix-build and nix-shell; nix-env is not yet covered.

Furthermore, you could alter some things in the json file to create a different, but still reproducable behavior (currently the replay fails when the program wants to access anything else, we could add a mode which generates a new json file with additional sources/impurities though).

TODO

The basic playback works, there are a few things that still need to be done

Important

  • add tests
  • add documentation

Nice to have

  • support ~ -paths
  • support for nix-build, nix-shell
  • support for nix-env
  • writing the .json output to the nix store with its proper dependency clojure
  • save a mapping source-closure -> drv (maybe in the metadata of nix?)
  • not adding every file on its own to the store, instead recognize when one is already in the store and then use that (for example nixpkgs would only get one source mapping like this). We may somehow detect, how high we should go in the directory tree, e. g. :
    • if it's already in the store, we can go to /nix/store/<hash>-<name> (*)
    • find a .git, .gitrevision or other special files
  • since the __filterSource primop is currently not supported, some derivations are not recordable.
  • It would be great if these were composable. I.e. import /nix/store/....-source-closure in nix-exprs work. Nix would warn/error if the keys were mapped to different values in the referenced source closures. [N.B. this totally resembles mixing binary caches with the intensional store, and I think the similarity should be explored for maximum code/idea reuse.]
    Update: just implemented this with a refactoring to produce nix-files instead of json-files

Bugs

  • when using nix-instantiate --eval, no derivations are build, so the source clojure isn't build either (and there's an error message: cannot build missing derivation)

Further ideas

  • There's also a problem with imports from derivations since we get a dependency on the generated file (with the last point, this might actually be huge since we'd depend on the whole output of the derivation), so this isn't strictly a source closure anymore.
  • creating the option to include it in the output of the derivation (since we don't want to change the output of a derivation depending on changes in nix files, we'd have to create a symlink tree of the .json file and the derivation output; this would be the derivation including its source clojure) so we get an output which also depends on the source clojure.

@Ericson2314
Copy link
Member

Ah, I saw the original proposal but didn't understand it until now. "source closure" is the perfect word for this, is it just the analog of "run closure" and "build closure" for an earlier phase. Here's a few thoughts:

maybe creating the option to include it in the output of the derivation (since we don't want to change the output of a derivation depending on changes in nix files, we'd have to create a symlink tree of the .json file and the derivation output; this would be the derivation including its source clojure)

I'm thinking just as we store drv -> output, we should store source closure -> drv where possible. Intentional store already has richer relations.

not adding every file on its own to the store, instead recognize when one is already in the store and then use that (for example nixpkgs would only get one source mapping like this). Still not sure if that's always what we want, sometimes we may only want the files we really need.

Note the intensional store ought to automatically de-dup and rehash the contents of directories, so this will become a non-issue

There's also a problem with imports from derivations since we get a dependency on the generated file (with the last point, this might actually be huge since we'd depend on the whole output of the derivation), so this isn't strictly a source clojure anymore.

Hmm that might be a feature not a bug :). Certainly we can skip that for fixed output derivations, and certainly we need to do that for @copumpkin's proposed non-determinstic derivations [I view this + that is key to both succeeding]. Normal derivations are kinda a middle ground, and I'd be OK making that configurable.


I'd like to decouple this from surface UIs as much as possible. In that vain, can we make the following equivalent?

nix-build '<nixpkgs>' -A vim --record
nix-build  --record -E '(import <nixpkgs> {}).vim'
nix-build  --record -E <(echo '(import <nixpkgs> {}).vim')
nix-build $(nix-instantiate --record -E '(import <nixpkgs> {}).vim')

Besides exporting, our notion of boot generations absolutely needs to use this. Can be used to restore channel version when booting (though long term I hope to replace channels with things related to this) so that rebuilding of old boot generation yields the same thing.


Down the road, It would be great if these were composable. I.e. import /nix/store/....-source-closure in nix-exprs work. Nix would warn/error if the keys were mapped to different values in the referenced source closures. [N.B. this totally resembles mixing binary caches with the intensional store, and I think the similarity should be explored for maximum code/idea reuse.]

@fkz
Copy link
Contributor Author

fkz commented Nov 21, 2015

"source closure" is the perfect word for this, is it just the analog of "run closure" and "build closure" for an earlier phase

Yes, that's exactly what happens. Currently, I already instantiate the source derivation in nix-instantiate, maybe we could add an option to only generate drv etc.

maybe creating the option to include it in the output of the derivation (since we don't want to change the output of a derivation depending on changes in nix files, we'd have to create a symlink tree of the .json file and the derivation output; this would be the derivation including its source clojure)

I'm thinking just as we store drv -> output, we should store source closure -> drv where possible. Intentional store already has richer relations.

Yes, storing source clojure->drv sounds good (this should probably be stored somewhere in the nix metadata, but I'm not familiar with that part). Then we still only get an n-to-1 map of possible sources for this derivation. With an other indirection we would be able to point back to the source 1-to-1.

not adding every file on its own to the store, instead recognize when one is already in the store and then use that (for example nixpkgs would only get one source mapping like this). Still not sure if that's always what we want, sometimes we may only want the files we really need.

Note the intensional store ought to automatically de-dup and rehash the contents of directories, so this will become a non-issue

hm, I don't see your argument here. Currently, every .nix-file gets copied into its own store path and we save a mapping to all of them. I think we want to instead save a mapping to the nixpkgs directory. Then we also automatically link related metadata like the git revision of the nixpkgs file.
We still have to find out, how 'high' we should go in the directory tree and if it's already in the store, if we should reuse the derivation or generate a new fixed-output derivation.
We might establish an even tighter connection with #520, then we could point to the anchored version of nixpkgs.

I'd like to decouple this from surface UIs as much as possible. In that vain, can we make the following equivalent?

nix-build '<nixpkgs>' -A vim --record
nix-build  --record -E '(import <nixpkgs> {}).vim'
nix-build  --record -E <(echo '(import <nixpkgs> {}).vim')
nix-build $(nix-instantiate --record -E '(import <nixpkgs> {}).vim')

Yes, that's possible. Something similar already happens for nix-shell -p because nix-shell makes a similiar expression and gives it to nix-instantiate. These import-expressions aren't generated like this at the moment and I didn't want to change the expression (import <nixpkgs> vs import <nixpkgs> {}) depending on the content of nixpkgs for now.

Down the road, It would be great if these were composable. I.e. import /nix/store/....-source-closure in nix-exprs work. Nix would warn/error if the keys were mapped to different values in the referenced source closures. [N.B. this totally resembles mixing binary caches with the intensional store, and I think the similarity should be explored for maximum code/idea reuse.]

Sounds like a good idea!

@Ericson2314
Copy link
Member

Currently, I already instantiate the source derivation in nix-instantiate, maybe we could add an option to only generate drv etc.

Sorry, what do you mean?

Yes, storing source clojure->drv sounds good (this should probably be stored somewhere in the nix metadata, but I'm not familiar with that part). Then we still only get an n-to-1 map of possible sources for this derivation. With an other indirection we would be able to point back to the source 1-to-1.

How do we recover a 1-1 map? If you mean tracking source closure -> drv directly, I think that's a bit of a false mapping because the source closure should not influence the final build except through the drv.

Yes, that's possible. Something similar already happens for nix-shell -p because nix-shell makes a similiar expression and gives it to nix-instantiate. These import-expressions aren't generated like this at the moment and I didn't want to change the expression (import <nixpkgs> vs import <nixpkgs> {}) depending on the content of nixpkgs for now.

Ah good point. IMO however nix treats the the path keys should be reified into the nix expression language so that we can do builtins.callPath <nixpkgs> without reflecting on <nixpkgs>---the reflection would be moved into the definition of callPath. Basically, shame on nix for not doing this already.

@fkz
Copy link
Contributor Author

fkz commented Nov 21, 2015

I currently generate a derivation for 'build source closure' and then build it.

For the 1-1 mapping, the idea is to create a new derivation which consists of symlinks to the normal derivation and the source closure.

args = ["-e" (__toFile "name" ''
#!/bin/bash
${coreutils}/mkdir -p $out/nix-support
${coreutils}/cat << EOF > $out/nix-support/source
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

''
${coreutils}/cp ${__toFile "result"  result}$ $out/nix-support/source
''

Would this be more resilient to escaping problems?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet!

@Ericson2314
Copy link
Member

Well, that's not 1-1 either as the same derivation could be symlinked in different places.

@Ericson2314
Copy link
Member

@lethalman This would provide a way to do imperative package management on top declarative package management, as I brought up in the UX issue.

@fkz fkz mentioned this pull request Jan 5, 2016
3 tasks
@fkz
Copy link
Contributor Author

fkz commented Jan 13, 2016

I just realized that the current user name is a further impurity which isn't handled yet by this patch:

nix-instantiate --expr --eval '~/test'
/home/username/test

Daniel Peebles and others added 18 commits January 21, 2016 12:12
…ange because the result included in the primop argument)
…opied to the store.

Well, they're still not inverse because a path gets converted to a string, but that doesn't seem
to be a problem so far
…ayed

currently only supported for nix-instantiate (and thus probably also nix-build), not yet nix-env
add --playback option to play back the exact command

So the interface is now:
"NIX_RECORDING=filename nix-instantiate ..." to record the nix-instantiate command
and
"nix-instantiate --replay filename" to play it back

commands in file
In combination with nix channels (or the newer http download), the whole channel is treated as one input
we write 3 files to the nix closure:
-> default.nix: just imports the other two and can be imported to get the exact reproducable output we want to have
-> nix-support/recording.nix: recordings of all impurities i. e. function calls and imported files
-> nix-support/expressions.nix: the expression that will be executed

for the expression, a primop __findAlongAttrPath that does the supplies
the autobindings along the attrpath.

for the playback, a new primop prim_playback has been introduced. It's first parameter
is a description of all impurities (as written to expression.nix).
After this function is invoked, all subsequent calls can use the functions introduced here.

In general, we allow multiple prim_playback when they don't conflict (not in all cases conflicts are found yet)

It would also be cool to make the prim_playback more local, e. g. it should only apply
to a specific term, but it's unclear to me how this can be achieved in a lazy language.
@fkz
Copy link
Contributor Author

fkz commented Mar 26, 2016

After more experience with nix, I think now, that there's a better way to implement this with much less intrusion inside nix. With importScoped and the overwrite of the builtins in this way (overwriting import with an importScoped variant), I believe this to be possible with only minor additions to nix (mainly a few new primitives and maybe a few command wrappings).
I'm planning to work on nix-related-topics which may include this hopefully in a few weeks, when I'll have some free time.
cc @copumpkin

@copumpkin
Copy link
Member

I was thinking about that a while ago but doesn't scopedImport not
propagate down nested imports? So if I override builtins.blah in an import
and then it imports something else, doesn't the other import have the
original builtins.blah?
On Sat, Mar 26, 2016 at 09:46 Fabian Schmitthenner notifications@github.com
wrote:

After more experience with nix, I think now, that there's a better way to
implement this with much less intrusion inside nix. With importScoped and
the overwrite of the builtins in this way (overwriting import with an
importScoped variant), I believe this to be possible with only minor
additions to nix (mainly a few new primitives and maybe a few command
wrappings).
I'm planning to work on nix-related-topics which may include this
hopefully in a few weeks, when I'll have some free time.
cc @copumpkin https://github.com/copumpkin


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#709 (comment)

@fkz
Copy link
Contributor Author

fkz commented Mar 26, 2016

@copumpkin yes, but you can also overwrite import

@copumpkin
Copy link
Member

Oh, I see
On Sat, Mar 26, 2016 at 09:49 Fabian Schmitthenner notifications@github.com
wrote:

@copumpkin https://github.com/copumpkin yes, but you can also overwrite
import


You are receiving this because you were mentioned.

Reply to this email directly or view it on GitHub
#709 (comment)

@fkz
Copy link
Contributor Author

fkz commented Mar 26, 2016

(in fact, you can overwrite almost any builtin things in nix, also operators like < and so on)

@fkz
Copy link
Contributor Author

fkz commented Mar 26, 2016

@copumpkin you won't get the caching of imports this way, I don't know how big a performance deal this is. But doing anything about that would be premature optimation.

@copumpkin
Copy link
Member

Caching of imports actually seems like a biggish deal, given how they seem like the sort of thing that can grow really quickly. Do you have any objection to our current approach or does this just seem simpler to you?

On Mar 26, 2016, at 10:02, Fabian Schmitthenner notifications@github.com wrote:

@copumpkin you won't get the caching of imports this way, I don't know how big a performance deal this is. But doing anything about that would be premature optimation.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@fkz
Copy link
Contributor Author

fkz commented Mar 27, 2016

@copumpkin
Well, caching should be possible with this approach too, it's just not implemented yet (import calls scopedImport, but scopedImport is currently only cached for {}, but we should also be able to cache by pointer equality, too). I just meant: let's first get a working prototype and take performance into account later. I don't feel this is slow per se.
I don't have any objection in particular against our current approach; but this approach seems to handle composition of recordings better, because we can alter more things with nix expressions. For example, we can playback parts of nix expressions while also using unpure things in other parts (e. g. to change a previously recorded expression). I've tried doing this recursive stuff with the current design already (the current status of this pull request already does some of this as per @Ericson2314 s suggestion and the description of this pull request is already out of date; but it doesn't manage to keep these changes to the interpretation of nix primitives (playback) local enough; we can't overwrite the same primitive with different values in different code paths currently. Thus, it's only logical to use different functions for these same primitives in different code paths).
Also, I don't think this patch has any chance to get included in nix anytime soon, it's currently not polished enough to be included (there are not enough tests), but even if it was, there are plenty of examples of (good) pull requests that take a long time to be reviewed.
I have the feeling that this new approach only needs more local changes to nix, that have a chance to be included into nix, most things can probably be done outside of nix, maybe we really only need a few new primitives. I'm not sure about the path copying stuff, but I'm sure it's manageable with both approaches. This also leads to more rapid development times, because nix stuff always takes quite a long time to be reviewed, included, released, etc.
Anyway, I sould definitely blog about this (and other things) at some point 😄.

@copumpkin
Copy link
Member

I look forward to seeing it 😄 what primops would you need to add, then?

@copumpkin
Copy link
Member

I suppose scopedImport would also allow me to override the derivation builtin, to incorporate different hashing schemes like we discussed. I think there are definite pros and cons to the different hashing schemes, and would like to experiment with them, so putting this in nearly pure Nix would be wonderful 😄

@Ericson2314
Copy link
Member

OOO, easy hashing experimentation would be really useful for #859 (comment)

@copumpkin
Copy link
Member

For me, I was mostly just talking about how to inject the "recording" into the resulting derivation hash (do you do it with the main derivation or as a post-processing step, really). That seems more tractable than a general hashing generalization 😄

@fkz
Copy link
Contributor Author

fkz commented Jul 16, 2016

hm, already too much time is gone since I last looked here...
@copumpkin since you took this up again: I'd really like to do an other pair programming session, as these things always motivate to continue working on it. Then maybe it's really best if we make a short path to a now-working solution and then iterate. But i'll have to look into it again. So feel free to look into it and then maybe we can do a remote pair programming session? Never done one before though 😄

@virusdave
Copy link
Contributor

Is this PR dead? Or are people still dabbling with it?

@stale
Copy link

stale bot commented Feb 13, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label Feb 13, 2021
@stale
Copy link

stale bot commented Apr 16, 2022

I closed this issue due to inactivity. → More info

@stale stale bot closed this Apr 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants