Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make bash script shebangs more portable #1361

Merged
merged 3 commits into from
Jun 4, 2024
Merged

Conversation

EliahKagan
Copy link
Member

@EliahKagan EliahKagan commented May 7, 2024

Summary

This pull request changes #!/bin/bash shebangs to #!/usr/bin/env bash, which is in practice more portable, as systems where bash is in $PATH but not in /bin are more common than systems where env is not in /usr/bin (the latter being mostly theoretical, though not forbidden by POSIX). This solves a practical problem on FreeBSD, and I suspect it will solve problems on some other systems in practical use that I have not yet tested on.

Details

Fixture scripts and other shell scripts in the project require a Bash shell. Not all operating systems come with Bash, but it can be installed without too much trouble on most. However, once installed, it will not necessarily be accessible using the path /bin/bash. This actually works on Windows, at least in the Git Bash environment (see #1359). It also works in GNU/Linux systems, where bash is found in /bin either directly or, on some newer systems, by /bin being a symlink to /usr/bin. It works on macOS, because the version of Bash that ships with the system, while old (< 4), is available in /bin and the scripts in this project are compatible with it.

In contrast, currently and prior to the change in this PR, running the tests on FreeBSD 14.0-RELEASE-p6 (amd64) produces 259 failures.

With the changes from this PR, most of those failures go away, with only 14 failures remaining.

Caveats

I recommend that this only be merged if CI is running all tests without using generated archives, on at least one operating system. This is intended to be the case on Ubuntu, and my guess is that it is the case, but I am not certain. See #1360.

If requested, I can rerun tests on Ubuntu and Windows to test this PR, including with GIX_TEST_IGNORE_ARCHIVES=1. However, the operating system on which the tests should probably really be run without GIX_TEST_IGNORE_ARCHIVES=1 to check the effect of these changes is macOS. The reason is that many developers using macOS have a later version of Bash installed elsewhere than /bin and preceding it in $PATH, such as via brew. With this change, the scripts would then run with that different interpreter on such systems. (The GitHub Actions runners for macOS don't seem to have a later version installed, but that should be feasible to achieve if needed for testing.)

That should not cause a problem, since after all the scripts run with various versions on developers' GNU/Linux systems and in the ubuntu-latest jobs on CI, and also because only rather strange bugs would cause scripts that work on an older version of Bash to malfunction on a newer version. Therefore, I plan to mark this as ready to review following a bit more testing on Ubuntu and Windows. But I wanted to mention it just in case I am underestimating its significance.

There are also stranger possible problems that can occur due to switching from #!/bin/bin to #!/usr/bin/env bash. See gitpython-developers/GitPython@729372f. To the best of my knowledge, no such problems apply here (makefiles are not being used, for example), but I'll check for them before marking this as non-draft.

@EliahKagan
Copy link
Member Author

Rather than making things more portable, it turns out that, at least in the absence of other changes, this breaks Windows, and possibly macOS!

On Windows, compilation still succeeds as expected, but then (or when rerun):

$ cargo nextest run --all --no-fail-fast
warning: function `evaluate_target_dir` is never used
 --> gix-prompt\tests\prompt.rs:9:8
  |
9 |     fn evaluate_target_dir() -> String {
  |        ^^^^^^^^^^^^^^^^^^^
  |
  = note: `#[warn(dead_code)]` on by default

warning: `gix-prompt` (test "prompt") generated 1 warning
    Finished `test` profile [unoptimized + debuginfo] target(s) in 1.43s
error: creating test list failed

Caused by:
  for `gix-macros`, command `'C:\Users\ek\source\repos\gitoxide\target\debug\deps\gix_macros-9b54cd09b507c731.exe' --list --format terse` exited with code 0xc0000135: The specified module could not be found. (os error 126)
--- stdout:

--- stderr:

---

On macOS, at least on CI, in gix-status-tests::worktree the test tatus::index_as_worktree_with_renames::changed_and_untracked_and_renamed failed:

        FAIL [   0.026s] gix-status-tests::worktree status::index_as_worktree_with_renames::changed_and_untracked_and_renamed

--- STDOUT:              gix-status-tests::worktree status::index_as_worktree_with_renames::changed_and_untracked_and_renamed ---

running 1 test
test status::index_as_worktree_with_renames::changed_and_untracked_and_renamed ... FAILED

failures:

failures:
    status::index_as_worktree_with_renames::changed_and_untracked_and_renamed

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 21 filtered out; finished in 0.02s


--- STDERR:              gix-status-tests::worktree status::index_as_worktree_with_renames::changed_and_untracked_and_renamed ---
thread 'status::index_as_worktree_with_renames::changed_and_untracked_and_renamed' panicked at gix-status/tests/status/index_as_worktree_with_renames.rs:273:5:
assertion failed: `(left == right)`

Diff < left / right > :
 [
     Rewrite {
         source_rela_path: "dir/content",
         dest_rela_path: "content-copy",
         dest_dirwalk_status: Untracked,
         diff: None,
         copy: false,
     },
     DirwalkEntry {
         rela_path: "content-copy-with-rewrite",
         status: Untracked,
         disk_kind: Some(
             File,
         ),
     },
     Rewrite {
         source_rela_path: "dir/content2",
         dest_rela_path: "content-with-rewrite",
         dest_dirwalk_status: Untracked,
         diff: Some(
             DiffLineStats {
                 removals: 0,
                 insertions: 1,
                 before: 1,
                 after: 2,
                 similarity: 0.72,
             },
         ),
         copy: false,
     },
<    DirwalkEntry {
<        rela_path: "dir/untracked",
<        status: Untracked,
<        disk_kind: Some(
<            File,
<        ),
>    Rewrite {
>        source_rela_path: "empty",
>        dest_rela_path: "dir/untracked",
>        dest_dirwalk_status: Untracked,
>        diff: None,
>        copy: true,
     },
     DirwalkEntry {
         rela_path: "plainly-renamed-content",
         status: Untracked,
         disk_kind: Some(
             File,
         ),
     },
     Rewrite {
         source_rela_path: "executable",
         dest_rela_path: "rewritten-executable",
         dest_dirwalk_status: Untracked,
         diff: Some(
             DiffLineStats {
                 removals: 0,
                 insertions: 1,
                 before: 1,
                 after: 2,
                 similarity: 0.53333336,
             },
         ),
         copy: false,
     },
<    DirwalkEntry {
<        rela_path: "untracked",
<        status: Untracked,
<        disk_kind: Some(
<            File,
<        ),
>    Rewrite {
>        source_rela_path: "empty",
>        dest_rela_path: "untracked",
>        dest_dirwalk_status: Untracked,
>        diff: None,
>        copy: true,
     },
 ]


note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The macOS failure is more surprising to me, and I think it should be investigated and an issue created for it, unless it is a one-off or easily fixed.

I may update this PR description and/or close this PR sometime soon. I don't know of a simple way to fix the problems here.

@Byron
Copy link
Member

Byron commented May 7, 2024

Thanks a lot for bringing this up!

I will be sure to test it on MacOS as proposed once it's changed to non-draft.

@EliahKagan
Copy link
Member Author

EliahKagan commented Jun 1, 2024

I rebased this onto main and made the same shebang changes in the three new fixture scripts:

C:\Users\ek\source\repos\gitoxide [freebsd ↓1 ↑108]> git grep '#!/bin/bash'
gix-index/tests/fixtures/make_traverse_literal_separators.sh:#!/bin/bash
gix-worktree-state/tests/fixtures/make_dangling_symlink.sh:#!/bin/bash
gix-worktree-state/tests/fixtures/make_traverse_trees.sh:#!/bin/bash

The first of those was also missing an end-of-file newline, which I allowed my editor to add.

I amended with that so it's all in one commit and force-pushed.


The tests still fail, but the way they fail now reveals the problem. Tests on macOS no longer fail, or perhaps that was a one-off. The only failures are on Windows, and there are 15 of them. Viewing the summaries side-by-side shows that they are the same tests that I found in #1358 to happen on Windows with GIX_TEST_IGNORE_ARCHIVES=1, except that there is one more test that failed in #1358:

        FAIL [ 180.967s] gix-ref-tests::refs packed::iter::performance

That makes sense, if it runs slower on my machine than on CI. This is also what one would expect from the other tests' timings.

I therefore believe that, at this point, changing the shebangs from #!/bin/bash to #!/usr/bin/env bash is correct, and that the resulting failures are due to the fact that the scripts differ in any way, and thus that their CRC32 hashes that are used to determine whether existing generated archives can be used are different:

https://github.com/Byron/gitoxide/blob/2cefe77203131878d0d8f5346f20f0e25b76cbea/tests/tools/src/lib.rs#L389-L407

Of course, this PR should not be merged without fixing CI. Because we are not currently using LFS, churn in generated archives might be considered undesirable. I believe doing so here will entail regenerating all the archives, since all .sh fixture scripts have been changed. Nonetheless, it seems to me that this is the right thing to do, because right now the scripts shebangs are inconsistent with the fallback behavior of running bash wherever it is found in a $PATH search:

https://github.com/Byron/gitoxide/blob/2cefe77203131878d0d8f5346f20f0e25b76cbea/tests/tools/src/lib.rs#L463-L472

Inconsistency between a primary strategy and its backup strategy is not inherently bad, since the whole point is that they must be different. However, in this case, I believe the backup strategy is intended only to cover the case where the script is not executable, either due to permissions or because the operating system does not support executing shell scripts directly. Because the error conditions that cause it to be tried are narrow, I believe it is also only, or almost only, attempted in those cases. So it seems to me that there is little or no benefit in attempting the hard-coded absolute interpreter path /bin/bash first.

I admit, however, that broadening the error condition could be an alternative approach. Or both could be done, even separately.

If the archives are to be regenerated and the results committed, then I am unsure who should do this, what kind of system it should be done on and what version of git is best to use, what aspects of the configuration if any are important, or if there are any other special considerations.

@Byron
Copy link
Member

Byron commented Jun 2, 2024

I really appreciate your research, it's making tech-debt obvious that thus far I was conveniently able to ignore.

With the abandonment of git-lfs a decision was made that it has to be OK to regenerate these archives and pay with increased full-clone times, so I think that should be done without hesitation. The maybe enormous size of gitoxide in the future will be a good motivation to build a server-side with git-lfs support that isn't so terribly priced.

As for the system to use for regeneration, that's a great question. My only thought on this is that it should be one that supports symlinks properly, a any Unix-y system should do. I also think you should try it - maybe there are issues around that which could be fixed along with the work done here.

I am very much looking forward to merging this, and thank you again for your fantastic work 🙏.

(specifically, I love how you don't jump right into the code, but strive for a full understanding of the supporting infrastructure and the functioning of the CLIs first. The bugs and security issues you find on the way super-valuable outcomes to me, and I think it's a rare talent.)

Changing the scripts' shebangs caused their CRC32 checksums, which
the test suite uses in deciding whether to use a pre-generated
archive instead of running them, to differ.

This was done by running `cargo nextest run --all --no-fail-fast`
on an Ubuntu 22.04 LTS system with Git 2.45.2 installed using the
package from https://launchpad.net/~git-core/+archive/ubuntu/ppa.
This resembles the ubuntu-latest CI image but is not identical to
it. All tests passed on that system, as expected. This commits the
modified (i.e. regenerated) archives.
@EliahKagan
Copy link
Member Author

EliahKagan commented Jun 3, 2024

I regenerated archives by running the tests without setting GIX_TEST_IGNORE_ARCHIVES and committing the changes. I am unsure if this was the best approach.

As for the system I did it on, it was an Ubuntu 22.04 LTS system with the currently latest stable version of Git, 2.45.2, installed on it via the git-core PPA. This is similar, though not the same as, what the ubuntu-latest images on GitHub Actions currently provide. After doing this, I tested with and without that on that same Ubuntu system, and on Windows, both in the usual way where generated archives are used, and with GIX_TEST_IGNORE_ARCHIVES=1. The results seemed reasonable, passing in all the situations they had passed before.

See the notes and further details in https://gist.github.com/EliahKagan/e83322aba8687589df874943ad203e9f, or whatever parts may be of interest, for further information. I would also be pleased to expand the information here in this PR on request.

I think this is ready now for any manual testing that ought to be done on macOS. I could eventually do that, but I don't have regular interactive access to a macOS system and I can't do it right now.

It would make a lot of sense to test this on FreeBSD, which was the proximal motivation for this, even though not really the most important reason for this change (which should improve portability and compatibility broadly, in a way that is not specific to FreeBSD). However, I don't have the ability to run my FreeBSD system at the moment. I should be able to do that in the next couple of days. I am unsure if that should be regarded as a blocker.

@EliahKagan EliahKagan marked this pull request as ready for review June 3, 2024 08:15
@EliahKagan
Copy link
Member Author

EliahKagan commented Jun 3, 2024

I have been able to run the tests manually on macOS 14. I have updated the big gist with details of that, and also expanded it in some other ways and made it clearer and much easier to navigate (including, I hope, making it easy to find whatever, and however much or little, one wishes to examine).

A couple of archives are generated and shown as modified when running the tests even without setting GIX_TEST_IGNORE_ARCHIVES. This was of course the case after the shebangs were edited in the scripts, but I mean that it is the case for a small number of them otherwise, both on the main branch from before any of these changes, and on the feature branch after the archives are regenerated.

The gist, including the expanded readme, has detailed information about this. Of the three archives that had this issue before, only one does. But on macOS (only), one new archive has this issue. It strongly appears to be an archive that never used outside of macOS, thus it was not regenerated when I ran the tests on Ubuntu to regenerate the archives.

I will verify that this effect on macOS is reproducible and, assuming it is, I'll try adding another commit that updates that archive. Please note that the linked gist does not yet have information about such an update, which as of now I have not yet done, and I may not have time to reflect it fully in the gist immediately after doing it.

Edit: I've done that, and added the logs for it to the gist, though not yet documented them there. In the gist, those logs are those whose names begin round2-. As shown, the first clean rerun of the tests after committing the changes to the archive gix-discover/tests/fixtures/generated-archives/make_exfat_repo_darwin.tar.xz had a failure in the gix-worktree-state-tests::worktree test state::checkout::delayed_driver_process, which I believe to be unrelated. It looked to me like this was due to random events, and I did three more clean reruns after that, all of which passed. Since all CI checks also pass, we might be in good shape.

@EliahKagan
Copy link
Member Author

EliahKagan commented Jun 4, 2024

As a further update, I have now:

  • Updated the gist with information about the "second round" in which the macOS-specific archives was regenerated and committed, and tested.
  • Tested on FreeBSD. The benefits described in the original description here are preserved. More details on this are in the gist, though I have not analyzed it in depth.

In hindsight, the big gist, due to growing to so many files, ought to have just been a regular repository, for easier navigation. However, I think it is usable, since the readme links to each of the files with a description. I may manage to upload a non-gist version as well (the hyperlinks have to be written differently for that), and if I do I'll add a link here, but I wanted to give this updated information now.

Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

And thanks so much for sharing the Gist, I feel I should use them more for keeping track of things if I ever get to doing work that diligently! Admittedly, I just rely on CI and usually don't bother to test on other systems (with Windows being the exception, and Debian ever so rarely as well). This is also why your work is so valuable to me - it's finding issues I'd never have found while slowly chipping away on this huge codebase to understand it in depth.

A couple of archives are generated and shown as modified when running the tests even without setting GIX_TEST_IGNORE_ARCHIVES. This was of course the case after the shebangs were edited in the scripts, but I mean that it is the case for a small number of them otherwise, both on the main branch from before any of these changes, and on the feature branch after the archives are regenerated.

If archives that have been considered fresh by one system are regenerated on another, that would mean there was a hash-mismatch which shouldn't be possible. I think the algorithm is roughly as follows:

  • hash the script
  • see if the corresponding folder is available and not in an error state.
  • if it is not in an error state and exists, it is used
  • if it is in an error state, it will be cleared and regenerated
  • if it does not exist it will be regenerated
  • regeneration will be from an archive if that archive matches the hash of the script
  • if it does not match the hash of the script, or there is no archive, the script will be run. If the run fails, the generated folder is in an error state so it won't be used.
  • if the archive is name is not excluded in .gitignore, it will be created from the newly generated folder, possibly overwriting an existing archive.

Maybe somewhere in there is the possibility for this to happen, it's probably easier to look at the code if you think it might indeed have regenerated archives that it shouldn't have.

As shown, the first clean rerun of the tests after committing the changes to the archive gix-discover/tests/fixtures/generated-archives/make_exfat_repo_darwin.tar.xz had a failure in the gix-worktree-state-tests::worktree test state::checkout::delayed_driver_process, which I believe to be unrelated. It looked to me like this was due to random events, and I did three more clean reruns after that, all of which passed. Since all CI checks also pass, we might be in good shape.

These tests are special as they build a binary using cargo, which takes time and is somewhat prone to interruptions or issues related to the spawned process. The reason it worked fine after is probably due to the cargo process not actually doing anything once it realized the binary is fresh. However, as failures like these could contribute to CI-flakiness, it's something to keep an eye on.

From my experience, CI now is rock-solid and is fully deterministic even when run with extreme parallelism. But apparently, this isn't the case under all conditions or on all platforms (yet).

@Byron Byron merged commit 9c65d98 into GitoxideLabs:main Jun 4, 2024
19 checks passed
@EliahKagan EliahKagan deleted the freebsd branch June 4, 2024 13:22
@EliahKagan
Copy link
Member Author

And thanks so much for sharing the Gist, I feel I should use them more for keeping track of things if I ever get to doing work that diligently!

I often don't make them, and sometimes I just use them as a nicer version of a pastebin. In this case I should've made it a full-fledged repository, which I think is better when there are big files and more than a few files. I've gone ahead and made that version (basically just internal hyperlinks are different), in case anything still makes sense to look at or in case something has to be investigated in the future.

Maybe somewhere in there is the possibility for this to happen, it's probably easier to look at the code if you think it might indeed have regenerated archives that it shouldn't have.

gix-diff/tests/fixtures/generated-archives/make_diff_repo.tar.xz seems always to be modified, both before and after the changes in this PR. (A couple of other files were always modified before the changes here, but committing the regenerated archives fixed that.) I'll look into it.

These tests are special as they build a binary using cargo, which takes time and is somewhat prone to interruptions or issues related to the spawned process. The reason it worked fine after is probably due to the cargo process not actually doing anything once it realized the binary is fresh. However, as failures like these could contribute to CI-flakiness, it's something to keep an eye on.

Are such binaries cached and reused even if, between test runs, git restore . and gix clean -xde are used? I did that each time here.

@Byron
Copy link
Member

Byron commented Jun 5, 2024

Are such binaries cached and reused even if, between test runs, git restore . and gix clean -xde are used? I did that each time here.

Yes, they are re-used and stored in target/ which is git-ignored. It's normal cargo build artefacts.

@EliahKagan
Copy link
Member Author

Shouldn't the -x option in gix clean -xde remove git-ignored files? It seems generally sufficient to remove build output, both from cargo nextest and from cargo build. I have not found there to be anything left to remove by cargo clean after running gix clean -xde (which is why I have not also been running cargo clean).

@Byron
Copy link
Member

Byron commented Jun 5, 2024

Oh, yes, sorry, I didn't read carefully enough. Indeed, then target/ would be wiped and the rebuild happens. cargo is able to deal with multi-process activity, while the tests that need it will not try to do so in parallel either. So in theory, it should all be safe and deterministic.

EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Jun 30, 2024
These are the changed archives generated by fixture scripts by
running `cargo nextest run --all --no-fail-fast` on an Ubuntu 22.04
system. (All tests passed, as expected.)

The changed archives divide into two cases. First, there are those
that are changed due to the stylistic changes to heredocs in the
preceding commit 2641f8b (which change the CRC32 hashes of the
scripts and thus cause archives to be regenerated):

* gix-index/tests/fixtures/generated-archives/v2_icase_name_clashes.tar
* gix-worktree-state/tests/fixtures/generated-archives/make_dangling_symlink.tar
* gix/tests/fixtures/generated-archives/make_rev_spec_parse_repos.tar

Second, there are those that were already being generated when the
tests were run, rather than using the committed archives:

* gix-diff/tests/fixtures/generated-archives/make_diff_repo.tar
* gix-odb/tests/fixtures/generated-archives/make_repo_multi_index.tar

These were being generated every time that `nextest` command was
run on Unix-like systems (or at least GNU/Linux and macOS). I don't
know why this was happening, but it suggests a bug somewhere.

- That oddity precedes 55c635a (GitoxideLabs#1425), I just didn't commit the
  two archives not related to the changes being made there at that
  time, since unlike here, that was not a cleanup PR.

- It precedes, or mostly precedes, the change in dcab79a (GitoxideLabs#1415). At
  least the first of those archives, make_diff_repo.tar, already
  behaved this way in its .tar.xz form before that, as noted in:
  GitoxideLabs#1361 (comment)
LuaKT pushed a commit to LMonitor/gitoxide that referenced this pull request Aug 20, 2024
These are the changed archives generated by fixture scripts by
running `cargo nextest run --all --no-fail-fast` on an Ubuntu 22.04
system. (All tests passed, as expected.)

The changed archives divide into two cases. First, there are those
that are changed due to the stylistic changes to heredocs in the
preceding commit 2641f8b (which change the CRC32 hashes of the
scripts and thus cause archives to be regenerated):

* gix-index/tests/fixtures/generated-archives/v2_icase_name_clashes.tar
* gix-worktree-state/tests/fixtures/generated-archives/make_dangling_symlink.tar
* gix/tests/fixtures/generated-archives/make_rev_spec_parse_repos.tar

Second, there are those that were already being generated when the
tests were run, rather than using the committed archives:

* gix-diff/tests/fixtures/generated-archives/make_diff_repo.tar
* gix-odb/tests/fixtures/generated-archives/make_repo_multi_index.tar

These were being generated every time that `nextest` command was
run on Unix-like systems (or at least GNU/Linux and macOS). I don't
know why this was happening, but it suggests a bug somewhere.

- That oddity precedes 55c635a (GitoxideLabs#1425), I just didn't commit the
  two archives not related to the changes being made there at that
  time, since unlike here, that was not a cleanup PR.

- It precedes, or mostly precedes, the change in dcab79a (GitoxideLabs#1415). At
  least the first of those archives, make_diff_repo.tar, already
  behaved this way in its .tar.xz form before that, as noted in:
  GitoxideLabs#1361 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants