Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

public: only embed a git revision in install.sh on release #383

Merged
merged 1 commit into from
Feb 20, 2020

Conversation

basvandijk
Copy link
Contributor

@basvandijk basvandijk commented Feb 18, 2020

This is more of a RFC rather than a PR but I would like to present something concrete so we can talk about it.

To goal is to get rid of deep cloning the sdk repo on CI which has the advantage that it should be more efficient to evaluate and more reproducible to build. Note that all other repos already don't deep clone themselves.

Not deep cloning the repo means that on CI there's no .git directory. The only derivation in sdk that used .git is install-sh-release.

We change its logic as follows:

pkgs.runCommandNoCC "install-sh-release" {
  ...
  revision = if version != "snapshot"
             then "${src.rev} (${version})"
             else "omitted when not a release";
  ...
}

On release version will be based on the git tag and will be a version number like 0.5.0. In that case the revision that gets embedded in install.sh is: 140bc06 (0.5.0).

Note that this is the revision corresponding to the 0.5.0 tag and not the revision of the last commit to public/install like it was before!

When not doing a release, like when building locally, on PRs or for master, version will be set to snapshot and revision will be set to a default of omitted when not a release. We do this so that we don't build the install-sh-release derivation for every commit.

Copy link
Contributor

@nmattia nmattia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I've seen it labeled as <dirty> in other projects, when the project is built out of source.

@basvandijk
Copy link
Contributor Author

I've seen it labeled as in other projects, when the project is built out of source.

I like that as well.

In that case I think it also makes sense to set version / pkgs.releaseVersion to dirty.

@knl
Copy link
Contributor

knl commented Feb 19, 2020

In Java projects it's common to name everything that is not a release a -snapshot, which feels a bit nicer than dirty, in case someone comes across such artifact.

What is <revision only present in release>? How does it get set up/chosen? Is it fixed, or it changes per release?

@basvandijk
Copy link
Contributor Author

basvandijk commented Feb 19, 2020

In Java projects it's common to name everything that is not a release a -snapshot, which feels a bit nicer than dirty, in case someone comes across such artifact.

I don't have a strong opinion on this so snapshot works for me as well.

What is <revision only present in release>?

It's the value for revision when not doing a release. So when building locally, on PRs or for master. And note that revision gets embedded in the install script.

How does it get set up/chosen?

The logic is the following:

  revision = if version != "snapshot"
             then "${builtins.substring 0 7  src.rev} (${version})"
             else "omitted when not a release";

So when version is not "snapshot", i.e. equals a release version, revision will be set the the git revision.

However, when version equals "snapshot" (on PRs for example) we don't want to embed a git revision in the install script because that would require rebuilding the script for each commit. So in that case we embed the string omitted when not a release.

Is it fixed, or it changes per release?

On a release revision will be set to "${builtins.substring 0 7 src.rev} (${version})" which means that the git revision of the release will be embedded in the install script. So yes, it will change for every release.

@nmattia
Copy link
Contributor

nmattia commented Feb 19, 2020

Ah, there was a misunderstanding. I meant using <dirty> as the revision in case no revision is present.

@basvandijk basvandijk force-pushed the basvandijk/RFC-get-rid-of.git branch 2 times, most recently from 60fff81 to f44d41a Compare February 19, 2020 10:01
@basvandijk
Copy link
Contributor Author

I've changed revision to omitted when not a release when not doing a release such that install.sh prints the following message:

Executing DFINITY SDK install script, commit: omitted when not a release

@basvandijk basvandijk marked this pull request as ready for review February 19, 2020 10:10
@basvandijk basvandijk requested review from a team February 19, 2020 10:10
Copy link
Contributor

@hansl hansl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In previous projects, dirty means there are local changes. Here it means the same. The version contains dirty in the case where there are local uncommitted changes. snapshot is probably more acceptable.

LGTM here.

@nomeata
Copy link
Contributor

nomeata commented Feb 19, 2020

I have been thinking about the problem “how to let nix-built projects self-identify”, aiming for the following goals:

  • Unrelated commits don’t cause rebuilds
  • If the project was built from a given commit, given the self-identification, I can find the commit
  • It is possible to build without .git around (e.g. when fetching the source via a github tarball)

Sounds tricky, but the following might do the job:

At build time, the build script calculates the hash of the source directory (source of the derivation, i.e. a filtered subdirectory), maybe using nix-hash --type sha256 --base32, and embeds this string in the executable.

This way, no .git is needed, and you get the same value even if unrelated things change.

To identify the revision that corresponds to such a source identification, we’d need a small tool that takes a git repo and a derivation therein, goes through the commits, instantiating (but not building!) the derivation at the various commits, and calculates the source hashes there. It could keep a cache to be not too slow.

Downsides:

  • Dirty source trees would not be identified in a recognizable manner.
  • It only works well when building with nix (i.e. clean subtree); when building with make or cargo directly it might make more sense to just use git describe.

Does this sound feasible?

@basvandijk
Copy link
Contributor Author

basvandijk commented Feb 19, 2020

@nomeata I think your goals are desirable.

As a self-identification I would suggest simply using the hash part of $out.

This is better than only using the hash of the source directory since it covers all changes to the derivation (source code, configuration flags, patches, environment variables, build- and run-time dependencies, build script, etc.).

Additionally $out is known at Nix evaluation time meaning that the tool that traverses the git history only needs to Nix evaluate. For example:

$ nix-store --query --outputs \
  $(nix-instantiate ci/ci.nix -A dfinity.rs.ic-replica-release.x86_64-linux) \
  | grep -oP '/nix/store/\K.*?(?=-)'
q64x9cn87sxsvb28ikv85hbpqnhhv7x8

And we likely don't even have to write this tool since Hydra already has a DB mapping $out to a build and a build to an evaluation and an evaluation to a git revision.

Or in a SQL query:

function revision_of_out() {
  out_hash="$1"
  ssh hydra-int.dfinity.systems \
    "sudo -u hydra psql -d hydra -c \"
      SELECT jsei.revision
      FROM
        buildstepoutputs  AS bso,
        builds            AS b1,
        builds            AS b2,
        jobsetevalmembers AS jsem,
        jobsetevalinputs  AS jsei
      WHERE
            bso.path LIKE '%$out_hash%'
        AND bso.build = b1.id
        AND b1.drvpath = b2.drvpath
        AND b2.jobset = 'dfinity'
        AND b2.id = jsem.build
        AND jsem.eval = jsei.eval
        AND jsei.name = 'src'
      \""
}

$ revision_of_out q64x9cn87sxsvb28ikv85hbpqnhhv7x8
                 revision
------------------------------------------
 78d7569ebeb8bb53ae71cd8cd575f703438b6efb
(1 row)

Let's start embedding $out in the products we ship!

@nomeata
Copy link
Contributor

nomeata commented Feb 20, 2020

Right … I was thinking about the $out path, but … the reasons why I thought it wouldn’t work don’t seem to make sense anymore.

What is the tradeoff out $out vs. the path of the derivation?

@basvandijk
Copy link
Contributor Author

What is the tradeoff out $out vs. the path of the derivation?

I was thinking that the former is possible but the latter isn't because of a circular dependency. However now I'm not so sure if it is impossible. Will think about it some more after I get some sleep...

I would actually prefer using the drv path instead of $out if that would be possible since the aforementioned query can then be simplified.

@nomeata
Copy link
Contributor

nomeata commented Feb 20, 2020

I was thinking that the former is possible but the latter isn't because of a circular dependency. However now I'm not so sure if it is impossible.

I think nix could make it possible (just put the “current derivation” in a environment variable when building a derivation), but upon first glance it does not seem possible to trick nix into doing that from within nix.

@basvandijk basvandijk changed the title RFC: public: only embed a git revision in install.sh on release public: only embed a git revision in install.sh on release Feb 20, 2020
@basvandijk
Copy link
Contributor Author

I was hoping for this to but it appears there's no environment variable pointing to the derivation:

nix-build -E \
  'let pkgs = import <nixpkgs> {};
   in derivation {
     name = "embed-drv-test";
     system = builtins.currentSystem;
     builder = "${pkgs.coreutils}/bin/env";
   }'
these derivations will be built:
  /nix/store/yn7wfi1npz2lzhf1axx0n9fp26nrsmv4-embed-drv-test.drv
building '/nix/store/yn7wfi1npz2lzhf1axx0n9fp26nrsmv4-embed-drv-test.drv'...
HOME=/homeless-shelter
NIX_BUILD_CORES=8
NIX_BUILD_TOP=/private/var/folders/5h/b11m6fxj2tqgbxwz5bgjy42c0000gn/T/nix-build-embed-drv-test.drv-0
NIX_LOG_FD=2
NIX_STORE=/nix/store
PATH=/path-not-set
PWD=/private/var/folders/5h/b11m6fxj2tqgbxwz5bgjy42c0000gn/T/nix-build-embed-drv-test.drv-0
TEMP=/private/var/folders/5h/b11m6fxj2tqgbxwz5bgjy42c0000gn/T/nix-build-embed-drv-test.drv-0
TEMPDIR=/private/var/folders/5h/b11m6fxj2tqgbxwz5bgjy42c0000gn/T/nix-build-embed-drv-test.drv-0
TERM=xterm-256color
TMP=/private/var/folders/5h/b11m6fxj2tqgbxwz5bgjy42c0000gn/T/nix-build-embed-drv-test.drv-0
TMPDIR=/private/var/folders/5h/b11m6fxj2tqgbxwz5bgjy42c0000gn/T/nix-build-embed-drv-test.drv-0
builder=/nix/store/ss8mv0709ychhy5hrlqas1xrk3xyr6s9-coreutils-8.31/bin/env
name=embed-drv-test
out=/nix/store/m9npdbsnz90blsmqi7ai875230ryz9pl-embed-drv-test
system=x86_64-darwin

$out it is.

nix/default.nix Outdated
@@ -4,7 +4,7 @@
, crossSystem ? null
, config ? {}
, overlays ? []
, releaseVersion ? "latest"
, releaseVersion ? "snapshot"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the meaning of snapshot here. "That's how they do it in Java" is not a good enough argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about "unreleased"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about "nightly"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm "nightly" is usually used for periodic (every night) builds which doesn't apply to these builds which happen on every PR / every commit to master.

In most contexts "nightly" builds are actually released (See Firefox Nightly for example) which we don't want to do for our PRs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true. unreleased, tree, trunk, whatever you like! I'd even be happy with snapshot if I understood what it meant.

Copy link
Contributor Author

@basvandijk basvandijk Feb 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knl do you have a definition for snapshot? Or maybe another nice color to paint this shed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to unreleased.

# to a default of `omitted when not a release`. We do this so that
# we don't build the `install-sh-release` derivation for every commit.
revision =
if version != "snapshot"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's an assumption that if version != "snapshot" then src != null. It'd be more robust to embed the revision iff revision is not null, and (independently) insert the version iff version is not null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Will change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the condition to src != null && version != "unreleased" which ensures that both src and version need to be set correctly to produce the "${builtins.substring 0 7 src.rev} (${version})" string.

# `master`, `version` will be set to `unreleased` and `revision` will be set
# to a default of `omitted when not a release`. We do this so that
# we don't build the `install-sh-release` derivation for every commit.
revision =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not

revision = optionalString (!isNull src) (substring 0 7 src.rev) + optionalString (!isNull version) "(${version})"

?

even if no version is given, the commit is still very useful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that will rebuild this thing on every commit which would involve unnecessary churn and wasn't what the original behavior.

To goal of this commit is to get rid of deep cloning the `sdk` repo on CI which
has the advantage that it should be more efficient to evaluate and more
reproducible to build. Note that all other repos already don't deep clone
themselves.

Not deep cloning the repo means that on CI there's no `.git` directory. The only
derivation in `sdk` that used `.git` is `install-sh-release`.

We change its logic as follows:

```
pkgs.runCommandNoCC "install-sh-release" {
  ...
  revision = if src != null && version != "unreleased"
             then "${src.rev} (${version})"
             else "omitted when not a release";
  ...
}
```

On release `version` will be based on the git tag and will be a version number
like `0.5.0`. In that case the revision that gets embedded in `install.sh` is:
`140bc06 (0.5.0)`.

Note that this is the revision corresponding to the `0.5.0` tag and not the
revision of the last commit to `public/install` like it was before!

When not doing a release, like when building locally, on PRs or for `master`,
`version` will be set to `unreleased` and `revision` will be set to a default of
`omitted when not a release`. We do this so that we don't build the
`install-sh-release` derivation for every commit.
@mergify mergify bot merged commit 8b8ea1b into master Feb 20, 2020
@mergify mergify bot deleted the basvandijk/RFC-get-rid-of.git branch February 20, 2020 14:23
nomeata added a commit to dfinity/motoko that referenced this pull request Jun 20, 2020
I have been preaching this for so long to other teams, I really better
clean up my own backyard: Make sure that you can run

    moc --version

and get a somewhat useful indication what version that is. This is
important for when random people use `moc` from random versions of SDK
or – later – build it themselves and then provide bug reports.

When building locally, with `.git` around, this now injects the git
revision:

    $ ./moc --version
    Motoko compiler (revision 9abae15e-dirty)

(the `-dirty` indicates that my work-tee is not clean).

When `.git` is _not_ available, as typially (and rightfully) when
building with nix, it looks if Nix’s `$out` variable is set, and
includes that:

    ~/dfinity/motoko $ $(nix-build -A moc)/bin/moc --version
    Motoko compiler (revision xx6fmamdh5bjph9law6x02d7f4hw8f84-moc-bin)

This is less useful for end-users, but it still allows us to (hopefully)
identify the version, in the worst case by running this command on all
revisions to find the right one:

    $ nix-store --query --outputs $(nix-instantiate -A moc-bin)
    /nix/store/xx6fmamdh5bjph9law6x02d7f4hw8f84-moc-bin

Or by querying hydra (as @basvandijk suggests in
dfinity/sdk#383 (comment))

Once we do versioned releases, we can of course extend this to inject
the version number there, and use `git describe --tag` to have
`v1.0-200-9abae15e` like derscriptions.

I _hope_ this will not cause extra recompilations with `dune`, and will
work out-of-the box for everyone, but we’ll see.
mergify bot pushed a commit to dfinity/motoko that referenced this pull request Jun 21, 2020
I have been preaching this for so long to other teams, I really better
clean up my own backyard: Make sure that you can run

    moc --version

and get a somewhat useful indication what version that is. This is
important for when random people use `moc` from random versions of SDK
or – later – build it themselves and then provide bug reports.

When building locally, with `.git` around, this now injects the git
revision:

    $ ./moc --version
    Motoko compiler (revision 9abae15e-dirty)

(the `-dirty` indicates that my work-tree is not clean).

When `.git` is _not_ available, as typically (and rightfully) when
building with nix, it looks if Nix’s `$out` variable is set, and
includes that:

    $ $(nix-build -A moc)/bin/moc --version
    Motoko compiler (revision 3yrjbw6g-aqppc05d-596yiq8k-bsznw5jw)

This is less useful for end-users, but it still allows us to (hopefully)
identify the version, in the worst case by running this command on all
revisions to find the right one:

    $ nix-store --query --outputs $(nix-instantiate -A moc-bin)
    /nix/store/3yrjbw6gaqppc05d596yiq8kbsznw5jw-moc-bin

Note that I injected dashes into the id so that it does not trip up 
`common.lib.standaloneRust` which otherwise would take it for an
unwanted dependency.

@basvandijk suggests in
dfinity/sdk#383 (comment) that one
may query Hydra’s database to map from id to revision.

Once we do versioned releases, we can of course extend this to inject
the version number there, and use `git describe --tag` to have
`v1.0-200-9abae15e` like descriptions.

I _hope_ this will not cause extra recompilations with `dune`, and will
work out-of-the box for everyone, but we’ll see.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants