Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opam init cannot find tools #5373

Closed
kolmodin opened this issue Dec 1, 2022 · 16 comments · Fixed by #5381 or #5383
Closed

opam init cannot find tools #5373

kolmodin opened this issue Dec 1, 2022 · 16 comments · Fixed by #5381 or #5383
Milestone

Comments

@kolmodin
Copy link

kolmodin commented Dec 1, 2022

I'm trying to run opam init for the very first time.

$ opam init
No configuration file found, using built-in defaults.
Checking for available remotes: none.
  - you won't be able to use rsync and local repositories unless you install the rsync command on your system.
  - you won't be able to use git repositories unless you install the git command on your system.
  - you won't be able to use mercurial repositories unless you install the hg command on your system.
  - you won't be able to use darcs repositories unless you install the darcs command on your system.

[WARNING] Recommended dependencies -- most packages rely on these:
  - make
  - cc
[ERROR] Missing dependencies -- the following commands are required for opam to operate:
  - curl or wget: A download tool is required, check env variables OPAMCURL or OPAMFETCH
  - diff
  - patch
  - tar
  - unzip
  - getconf
  - bwrap: Sandboxing tool bwrap was not found. You should install 'bubblewrap'. See https://opam.ocaml.org/doc/FAQ.html#Why-does-opam-require-bwrap.

However, I do have most (or all) of those tools.
Using strace I see that opam finds the corrcect path, but seems to reject it and continue the search.

$ strace opam init
...
statx(AT_FDCWD, "/usr/local/sbin/curl", AT_STATX_SYNC_AS_STAT, STATX_BASIC_STATS, 0xffefaee0) = -1 ENOENT (No such file or directory)
statx(AT_FDCWD, "/usr/local/bin/curl", AT_STATX_SYNC_AS_STAT, STATX_BASIC_STATS, 0xffefaee0) = -1 ENOENT (No such file or directory)
statx(AT_FDCWD, "/usr/sbin/curl", AT_STATX_SYNC_AS_STAT, STATX_BASIC_STATS, 0xffefaee0) = -1 ENOENT (No such file or directory)
statx(AT_FDCWD, "/usr/bin/curl", AT_STATX_SYNC_AS_STAT, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0755, stx_size=268504, ...}) = 0
statx(AT_FDCWD, "/sbin/curl", AT_STATX_SYNC_AS_STAT, STATX_BASIC_STATS, 0xffefaee0) = -1 ENOENT (No such file or directory)
statx(AT_FDCWD, "/bin/curl", AT_STATX_SYNC_AS_STAT, STATX_BASIC_STATS, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0755, stx_size=268504, ...}) = 0
...

And indeed, there is a /usr/bin/curl binary which I can execute.

$ ls -l /usr/bin/curl 
-rwxr-xr-x 1 root root 268504 Oct 27 21:38 /usr/bin/curl

Why doesn't opam find the tools on my system?

$ opam config report
# opam config report
# opam-version         2.1.3 
# self-upgrade         no
Fatal error:
"lsb_release": permission denied.

Note that I'm able to run lsb_release.

$ lsb_release -i
Distributor ID:	Debian
@dra27
Copy link
Member

dra27 commented Dec 1, 2022

Ouch - something appears to be going very wrong with opam's permissions checking on commands, judging by the fact you can't even get opam config report to output full information! Two things which might help figure out what's happening here - what do you get for id -Gn | wc -w and would it be possible to have the strace output for geteuid, getegid and getgroups from strace opam init, please?

@kolmodin
Copy link
Author

kolmodin commented Dec 6, 2022

This is on my employer's machine, running a custom linux distro based on debian testing.
I'd prefer to not give any exact numbers, but I can say that;

  • id -Gn | wc -w returns a large number, more than 32.
  • geteuid32 succeeds to return a number
  • getgroups32 asks for size 32 and fails with -1 EINVAL (Invalid argument)
  • getegid is not called

opam's function resolve_command seems overly conservative.

let t_resolve_command =

From the bash man page it seems bash resolves executables by looking for an executable with the given name in PATH. From the man page:

COMMAND EXECUTION
     .... snip ....
      If the name is neither a shell function nor a builtin, and contains no slashes,
     bash searches each element of the PATH for a directory containing an executable file by that name.
     .... snip ....

If bash resolves to an executable which may or may not be executable by the current user, surely it'd be ok if opam does the same? One could argue that'd it'd be less surprising if they resolve to the same executable, even if the user is not allowed to execute it.

@dra27
Copy link
Member

dra27 commented Dec 6, 2022

Thanks for this! (I'm must confess to being disproportionately pleased to have correctly identified that Unix.getgroups was failing!). opam is doing the same PATH resolution as bash would - bash also does the execution test (it's in POSIX). There's a mismatch between the <limits.h> that opam was compiled with (or rather the OCaml with which opam was compiled) and the running kernel.

The fix in opam I think will be to remove the manual permissions checking and use Unix.access, which is what we should always have been doing, but if you can indulge me, I'm very curious to work out exactly how this has actually happened here! OCaml has clearly been compiled thinking NGROUPS_MAX is 32. On Linux, that implies either kernel 2.6.2 (!!) or something being up with <limits.h> (i.e. a wrong file somewhere). Did you build opam from sources, a binary download or installed from a package manager. I'm curious what you get from:

#include <stdio.h>
#include <unistd.h>
#include <limits.h>

int main (void) {
   printf("sysconf(_SC_NGROUPS_MAX) = %ld\nNGROUPS_MAX = %d\n", sysconf(_SC_NGROUPS_MAX), NGROUPS_MAX);
}

@kolmodin
Copy link
Author

kolmodin commented Dec 6, 2022

Yes, well done! :)
Ah, I maybe bash is doing more than it says in the man page.

$ gcc limits.c -o limits
$ ./limits 
sysconf(_SC_NGROUPS_MAX) = 65536
NGROUPS_MAX = 65536

Turns out I have multiple opam binaries installed.
One is from debian, installed at /usr/bin/opam. It can run opam config report, so it looks like it does not have this problem.

The other opam executable on my system is in my home directory at ~/bin/opam. My .bash_history tells me I installed it from https://opam.ocaml.org/doc/Install.html by manually downloading the executable from github (not using install.sh).
I, apparently, had installed the i686 version instead of x86_64 (md5sum confirms ~/bin/opam is i686). But both i686 and x86_64 fail to run config report in the same way as I originally reported.
It could mean that all opam binaries offered on github have the wrong limits.h.

For me personally, I can workaround by using version 2.1.2 provided by debian.

@dra27
Copy link
Member

dra27 commented Dec 6, 2022

Indeed - having checked further, we build the binaries with Alpine and it turns out that has NGROUPS_MAX set to 32! We'll get that fixed (there's a 2.1.4 release needed for OCaml 5.0 support, so we'd be publishing new binaries soon anyway).

@kit-ty-kate
Copy link
Member

kit-ty-kate commented Dec 14, 2022

@kolmodin This should have been (hot-)fixed for the opam 2.1.4 release binaries. You can update to it using the usual install script.

Now I don't think we should close this issue before fixing it properly (proposed ideas have been discussed in #5381)

@kolmodin
Copy link
Author

Thanks Kate! I'll try it out when I have a chance.

@dra27
Copy link
Member

dra27 commented Dec 16, 2022

Are we able to link it to any issue in musl? The other proposals are good improvements (and we should/will do it), but the bug was in the building of our release binaries, not in opam itself, so this is fixed.

@kit-ty-kate
Copy link
Member

musl doesn't have a bug tracker, only a mailing-list and i can count at least two "tickets" about it on there:

@kit-ty-kate
Copy link
Member

I asked around on IRC and @nekopsykose can do the liaison if anything changes on this issue in musl, while the devs setup a bugtracker we can point to.

@Timmmm
Copy link

Timmmm commented Sep 5, 2023

We've tried with the 2.1.5 release and still get this error. It was quite hard to track this down. Especially annoying because the check for curl or wget ignores --bypass-checks, even if you set OPAMFETCH.

Kind of insane that the issue is the number of groups. I don't know what exactly you're doing but have you considered that it's a bit over-engineered? Just try running command --version and see if it fails surely?

@kit-ty-kate
Copy link
Member

We've tried with the 2.1.5 release

Which binary did you use? Is that from a distribution or using the install.sh script?

@Timmmm
Copy link

Timmmm commented Sep 5, 2023

The one from GitHub Releases. We also tried the install script but I believe it uses the same binary.

@kit-ty-kate
Copy link
Member

arf it looks like it was hotfixed to 2.1.4 but we forgot to merge #5383 so i believe we forgot about it in 2.1.5. That's embarrassing, sorry about that.

While we figure this out, could you try the 2.1.4 release instead?

@Timmmm
Copy link

Timmmm commented Sep 18, 2023

Yep, 2.1.4 worked 👍🏻

@kit-ty-kate
Copy link
Member

All the 2.2.0 releases going forward have the fix. As soon as #5726 is merged you should be able to run and get a version that includes the fix:

bash -c "sh <(curl -fsSL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh) --dev"

@kit-ty-kate kit-ty-kate added this to the 2.3.0~alpha1 milestone Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment