Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make fetchGit do full checkout if ref = "*" #2409

Closed
wants to merge 1 commit into from
Closed

Make fetchGit do full checkout if ref = "*" #2409

wants to merge 1 commit into from

Conversation

lukateras
Copy link
Member

@lukateras lukateras commented Sep 7, 2018

Sometimes one might want to fetch a rev without knowing the ref. In this case, full repo clone is usually required. This commit adds a special case to ref handling, so that when ref = "*", fetchGit fetches all refs prior to fetching the rev.

Usage example:

let
  repo = fetchGit {
    url = "https://github.com/serokell/nixage.git";
    ref = "*";
    rev = "4b83ae894ab99b55c67b6629058b3a4fdd8ba0c2";
  };
in

builtins.trace repo repo

lukateras added a commit to serokell/stack-to-nix that referenced this pull request Sep 7, 2018
lukateras added a commit to serokell/nixpkgs that referenced this pull request Sep 8, 2018
lukateras added a commit to serokell/nixpkgs that referenced this pull request Sep 11, 2018
lukateras added a commit to serokell/nixpkgs that referenced this pull request Sep 12, 2018
@lukateras
Copy link
Member Author

lukateras commented Sep 23, 2018

@NixOS/nix-core Ping! Could you provide some feedback on this? :-)

This feature is used extensively as part of a fully sandboxable stack-to-nix Nix generator. Currently, all users have to install patched Nix to their user profile (which works because patch is eval time only).

The use case is to be able to fetch from language-specific lock files that do not include refs, at runtime, inside the sandbox.

lukateras added a commit to serokell/serokell-closure that referenced this pull request Sep 28, 2018
@grahamc
Copy link
Member

grahamc commented Oct 4, 2018

I'm not sure it is a great idea, because even '*' isn't all the refs. For example, github has lots of refs that wouldn't be fetched but are valid refs, and this list could change over time. For that reason, I feel the best way to go is specifying the correct ref from the start.

I understand this is complicated because other tools are making invalid assumptions about Git as well. The problem and question is do we propagate this invalid assumption to Nix? In particular this question is hard because Nix's goal is for very long-term reproducibility, which is at odds with this request.

@lukateras
Copy link
Member Author

lukateras commented Oct 4, 2018

Without this feature, sandboxable Nix generators for Stack and any other lock file format that doesn't include refs are not possible (even in upcoming Stack freeze file). So my motivation is primarily to improve on that front, allowing to build projects via Nix without any intermediary non-sandboxed step (e.g. what stack2nix does).

Some affected lock file formats:

I will check other tools and add them to the list.

I understand this is complicated because other tools are making invalid assumptions about Git as well. The problem and question is do we propagate this invalid assumption to Nix? In particular this question is hard because Nix's goal is for very long-term reproducibility, which is at odds with this request.

I think this feature leads to higher, not lower reproducibility, because ref is incidental to rev. Meaning, it won't matter on which ref is the rev that we're interested in, and so if we were trying to fetch a rev from an ephemeral feature branch, ref = "*"; won't break, while ref = "feature-foo"; will.

And fundamentally, any remote location fetch is not long-term reproducible, it always depends on other parties.

I think the correct question (in Nix context) to ask here is: will it always produce the exact same result? The answer here seems to be yes.

@grahamc
Copy link
Member

grahamc commented Oct 4, 2018

I think the correct question (in Nix context) to ask here is: will it always produce the exact same result? The answer here seems to be yes. Having this available will make a whole class of new convenient sandboxable Nix generators possible.

It won't because the remote list of branches isn't certain to be stable, but your point about it remaining true for any branch name seems legitimate.

Perhaps the default should be instead of having to specify *, an empty ref fetches all I refs.

I think some input from @NixOS/nix-core would be helpful.

@lukateras
Copy link
Member Author

lukateras commented Oct 4, 2018

It won't because the remote list of branches isn't certain to be stable, but your point about it remaining true for any branch name seems legitimate.

It is not stable, but it only affects cache, not Nix store.

I'm not sure it is a great idea, because even '*' isn't all the refs. For example, github has lots of refs that wouldn't be fetched but are valid refs, and this list could change over time. For that reason, I feel the best way to go is specifying the correct ref from the start.

It seems that this is an optimization problem (and of course everyone is encouraged to specify refs when possible!) rather than a reproducibility problem. This patch mimics git clone behavior which also doesn't include some refs (e.g. pull requests), but there is a confined class of refs that are always included in there (i.e. "branches").

Perhaps the default should be instead of having to specify *, an empty ref fetches all I refs.

I agree; maybe it should fetch refs/tags/* as well.

I think some input from @NixOS/nix-core would be helpful.

❤️

@grahamc
Copy link
Member

grahamc commented Oct 27, 2018

@shlevy
Copy link
Member

shlevy commented Oct 27, 2018

@yegortimoshenko I think having this functionality make sense, but I'd like to ensure we're matching the same logic these other tools use and nothing more. Does stack do a git clone, or fetch all refs, or what? Same question for mix, npm, etc. Once we're settled on that we can figure out the interface and everything.

@yorickvP
Copy link
Contributor

Here's the code for what stack is doing: https://github.com/commercialhaskell/stack/blob/master/subs/pantry/src/Pantry/Repo.hs#L90

git clone --recursive <URL> cloned
cd cloned
git reset --hard <rev>
git archive -o <out-tarball> HEAD

npm seems to be resolving github deps to http urls. I don't know what happens if you explicitly give it git urls, but it's rare enough that we haven't run into it yet.

@yorickvP
Copy link
Contributor

Mix is doing:

git -c core.hooksPath='' init --quiet
git remote add origin <url>
git fetch --force --quiet
git checkout --quiet rev
if <submodules> 
  git -c core.hooksPath='' submodule update --init --recursive

@kirelagin
Copy link
Member

Just to clarify, git fetch without any refspecs consults the remote.origin.fetch configuration option, which is set by default (after git init && git remote add origin <url> or git clone <url>) to +refs/heads/*:refs/remotes/origin/*, so, in other words, it fetches all heads.

@shlevy
Copy link
Member

shlevy commented Oct 28, 2018

OK, so let's switch this to fetch all heads?

As for interface... I don't really like "*". I'm leaning toward just having ref omitted following the behavior of fetch and clone.

@lukateras
Copy link
Member Author

lukateras commented Oct 28, 2018

It already does fetch all heads:

runProgram("git", true, { "-C", cacheDir, "fetch", "--quiet", "--force", "--", uri, "+refs/heads/*:refs/heads/*" });

As for interface... I don't really like "*". I'm leaning toward just having ref omitted following the behavior of fetch and clone.

Do you mean:

fetchGit {
  url = https://github.com/foo/bar;
}

Or rather:

fetchGit {
  url = https://github.com/foo/bar;
  ref = "";
}

@shlevy
Copy link
Member

shlevy commented Oct 29, 2018

Ah! Your description said all refs, that's why I was confused. OK, then I think the current behavior is fine.

And I meant the former example there, though I think with the way the primop is implemented both would work the same.

@lukateras
Copy link
Member Author

And I meant the former example there, though I think with the way the primop is implemented both would work the same.

OK!

Unlike url or rev, ref is wrapped into std::experimental::optional, so those two would be different as far as I understand.

@mkaito
Copy link
Contributor

mkaito commented Dec 18, 2018

I don't mean to be rude, but I'd love to see this merged. Could I help it along somehow?

@edolstra
Copy link
Member

I stated my opinion here: #2431 (comment)

@yorickvP
Copy link
Contributor

We're using this in serokell/stack-to-nix (and serokell/mix-to-nix), translating git repo fetches from language-specific lock files. These files includes the hashes and url, but nothing more. External repo's are not guaranteed to even have a ref for a commit. Furthermore, for internal repos, we may need ssh keys.

In the public github case, we can use fetchTarball, but it does not generalize to arbitrary git repos and does not cache as much as it could.

The only alternative here is probably extraBuiltins, which is less than optimal for portability (but better than a patched nix).

@lukateras
Copy link
Member Author

lukateras commented Dec 19, 2018

I will update this patch following @shlevy's feedback until Friday Monday, sorry for the delay.

yorickvP pushed a commit to serokell/nixpkgs that referenced this pull request Jan 16, 2019
@yorickvP
Copy link
Contributor

This seems to be less needed with nix 2.2. It's possible github enabled fetching unadvertised refs, and a recent nix change allows the "*" through.

@kirelagin
Copy link
Member

kirelagin commented Jan 16, 2019

To be more precise, since this commit the example from this PR is translated to git fetch ... '*:*'. I don’t quite understand what this refspec means, but I would guess it says “match all refs that the remote advertises”, but I don’t know how remotes determine what refs to advertise.

On a separate note, since maintaining a patched Nix is a little annoying, here is a Nix plugin that replaces the fetchGit primop with the one from this PR: kirelagin/nix-fetchgit.

@edolstra
Copy link
Member

What change? It's certainly not intended to allow fetching *. If it does it's a bug.

@kirelagin
Copy link
Member

Oh, sorry, the link in my comment was broken. I’ll edit it. Here is the correct one: 4aee93d#diff-7e39b3dccfe1a07ed3f4af7f4d8da628R127

@edolstra
Copy link
Member

Thanks. I guess we should really validate ref since it's potentially untrusted user input that we're passing to another program on the command line. The use of -- removes most risks but it still seems better to validate it.

kirelagin pushed a commit to serokell/nixpkgs that referenced this pull request Apr 12, 2019
kirelagin added a commit to serokell/nixpkgs that referenced this pull request Apr 12, 2019
kirelagin pushed a commit to serokell/nixpkgs that referenced this pull request Apr 23, 2019
@codygman
Copy link

codygman commented Jun 18, 2020

I was bit again by this today.

I post though, because it seems with slight modification the perfect solution for our current reality is already in this thread.

Perhaps the default should be instead of having to specify *, an empty ref fetches all I refs.

To be explicit, to me that means given:

fetchGit {
  url = https://github.com/foo/bar;
}

Where as now you get this error:

fatal: not a tree object error

Instead, do a full clone and provide the warning:

Warning: No ref specified for git://repo.git.
FULL clone being performed, this could be very slow. 
Please specify the 'ref' so a shallow clone can be performed

Then if someone forgets it, the worst problem is "it's slow" but the reason why is printed right in front of the user.

That lets get the best of all worlds I think.

  1. push people towards the correct solution.

To me not specifying a ref is just bad practice. - @edolstra

I agree, but breaking compatibility here is too high of a cost with regards to barrier to entry. I think the best thing we can do is allow it since it's so pervasive, and noisily prod users towards the correct solution while dangling "it'll be faster if you just add ref" as a carrot 🙂

  1. We allow what most new nix users will do by default since as many external tools @yorickvP listed make this admittedly invalid assumption.

In summary, I think this solution lets us meet the software world and it's invalid assumptions where they are, but smooths their transition and nudges them towards more correctness and it's benefits.

And it gives people like me trying to evangelize nix to improve reproducibility in the software industry a fighting chance in actually being able to convince people to use it 😄

@Ma27
Copy link
Member

Ma27 commented Jul 11, 2020

@edolstra @grahamc is there anything to be done to get this merged? Just got bitten by this as well.

@yorickvP
Copy link
Contributor

@Ma27 ref = "*"; should work in recent nix versions already. Also, it seems github might have enabled allow-reachable-sha1-in-want recently, but reports vary.

@Ma27
Copy link
Member

Ma27 commented Jul 12, 2020

@yorickvP on nix (Nix) 2.4pre20200622_334e26b (pkgs.nixFlakes) it doesn't (for public GitHub repositories) and breaks with the following error message:

invalid Git branch/tag name '*'

@yorickvP
Copy link
Contributor

For public github repo's, fetching with the https .tar.gz archive download should always work. You can try this as a workaround. It's sad to hear that nix 2.4 breaks ref="*" again.

@Ma27
Copy link
Member

Ma27 commented Jul 12, 2020

For public github repo's, fetching with the https .tar.gz archive download should always work.

I'm well-aware of that. I described my use-case (which requires fetchGit or something similar) here: #2431 (comment)

It's sad to hear that nix 2.4 breaks ref="*" again.

Ahh, I assumed that you meant the unstable versions by mentioning "recent nix versions". It actually works on nix 2.3.7, thanks for pointing that out!

On Nix 2.4 (master and flakes variant IIRC) the fetchers have been completely rewritten (see src/libfetchers) to provide a lot more inputs easily. I'll try to restore the ref = "*"; feature later (but hopefully today).

Ma27 added a commit to Ma27/nix that referenced this pull request Jul 14, 2020
Sometimes it's necessary to fetch a git repository at a revision and
it's unknown which ref contains the revision in question. An example
would be a Cargo.lock which only provides the URL and the revision when
using a git repository as build input.

However it's considered a bad practice to perform a full checkout of a
repository since this may take a lot of time and can eat up a lot of
disk space. This patch makes a full checkout explicit by adding an
`allRefs` argument to `builtins.fetchGit` which fetches all refs if
explicitly set to true.

Closes NixOS#2409
Ma27 added a commit to Ma27/nix that referenced this pull request Jul 15, 2020
Sometimes it's necessary to fetch a git repository at a revision and
it's unknown which ref contains the revision in question. An example
would be a Cargo.lock which only provides the URL and the revision when
using a git repository as build input.

However it's considered a bad practice to perform a full checkout of a
repository since this may take a lot of time and can eat up a lot of
disk space. This patch makes a full checkout explicit by adding an
`allRefs` argument to `builtins.fetchGit` which fetches all refs if
explicitly set to true.

Closes NixOS#2409
Ma27 added a commit to Ma27/nix that referenced this pull request Jul 17, 2020
Sometimes it's necessary to fetch a git repository at a revision and
it's unknown which ref contains the revision in question. An example
would be a Cargo.lock which only provides the URL and the revision when
using a git repository as build input.

However it's considered a bad practice to perform a full checkout of a
repository since this may take a lot of time and can eat up a lot of
disk space. This patch makes a full checkout explicit by adding an
`allRefs` argument to `builtins.fetchGit` which fetches all refs if
explicitly set to true.

Closes NixOS#2409
Ma27 added a commit to Ma27/nix that referenced this pull request Jul 29, 2020
Sometimes it's necessary to fetch a git repository at a revision and
it's unknown which ref contains the revision in question. An example
would be a Cargo.lock which only provides the URL and the revision when
using a git repository as build input.

However it's considered a bad practice to perform a full checkout of a
repository since this may take a lot of time and can eat up a lot of
disk space. This patch makes a full checkout explicit by adding an
`allRefs` argument to `builtins.fetchGit` which fetches all refs if
explicitly set to true.

Closes NixOS#2409
Ma27 added a commit to Ma27/nix that referenced this pull request Sep 8, 2020
Sometimes it's necessary to fetch a git repository at a revision and
it's unknown which ref contains the revision in question. An example
would be a Cargo.lock which only provides the URL and the revision when
using a git repository as build input.

However it's considered a bad practice to perform a full checkout of a
repository since this may take a lot of time and can eat up a lot of
disk space. This patch makes a full checkout explicit by adding an
`allRefs` argument to `builtins.fetchGit` which fetches all refs if
explicitly set to true.

Closes NixOS#2409
Ma27 added a commit to Ma27/nix that referenced this pull request Sep 29, 2020
Sometimes it's necessary to fetch a git repository at a revision and
it's unknown which ref contains the revision in question. An example
would be a Cargo.lock which only provides the URL and the revision when
using a git repository as build input.

However it's considered a bad practice to perform a full checkout of a
repository since this may take a lot of time and can eat up a lot of
disk space. This patch makes a full checkout explicit by adding an
`allRefs` argument to `builtins.fetchGit` which fetches all refs if
explicitly set to true.

Closes NixOS#2409
@edolstra edolstra closed this in 2857b1b Dec 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants