Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dirwalk ignores non-regular files #1727

Merged
merged 4 commits into from
Dec 18, 2024
Merged

dirwalk ignores non-regular files #1727

merged 4 commits into from
Dec 18, 2024

Conversation

Byron
Copy link
Member

@Byron Byron commented Dec 18, 2024

Supersedes #1629.

Let's try to keep the change minimal, but consider what's in the superseded PR.

Tasks

  • test
  • fix

@Byron Byron force-pushed the dirwalk-ignore-non-regulars branch from 287f7c2 to a33b624 Compare December 18, 2024 14:55
@Byron Byron mentioned this pull request Dec 18, 2024
@Byron Byron marked this pull request as ready for review December 18, 2024 15:01
@Byron Byron force-pushed the dirwalk-ignore-non-regulars branch from a33b624 to 2090f7d Compare December 18, 2024 15:04
@Byron Byron enabled auto-merge December 18, 2024 15:04
…ned instead.

That way, algorithms relying on dirwalking can still see them if they want to,
but would have a hard time to use them (accidentally).

Note that this replaces the `From` implementation with `entry::Kind::try_from_file_type()`,
which makes this a breaking change.`
@Byron Byron force-pushed the dirwalk-ignore-non-regulars branch from 2090f7d to a49c960 Compare December 18, 2024 15:34
@Byron Byron merged commit 69ee6a3 into main Dec 18, 2024
20 checks passed
@Byron Byron deleted the dirwalk-ignore-non-regulars branch December 18, 2024 16:03
Copy link
Member

@EliahKagan EliahKagan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, and gix clean benefits for sure.

In particular, as shown in this gist, this makes it so that when gix clean directly sees a FIFO (named pipe) on a Unix-like system, it disregards it as git clean does, rather than deleting it.

Before this change:

ek in 🌐 catenary in gitoxide on  main is 📦 v0.39.0 via 🦀 v1.83.0
❯ cargo run --bin=gix -- clean -n
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.77s
     Running `target/debug/gix clean -n`
WOULD remove pipe

Skipped 1 expendable entry - show with -x

After this change:

ek in 🌐 catenary in gitoxide on  main is 📦 v0.39.0 via 🦀 v1.83.0 took 14s
❯ cargo run --bin=gix -- clean -n
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.73s
     Running `target/debug/gix clean -n`
Nothing to clean (Skipped 1 expendable entry - show with -x)

There are some things relevant to #1629 and conceptually relevant to this PR and its goals that are not covered here, but that may be out of scope here. Some may even be outside the broader originally intended scope of #1629 itself. (In the following, I always mean the same thing by "FIFO" and "named pipe," leaning toward using the latter when talking about Windows only because it is the term used in the Microsoft documentation.)

  1. While the bug this fixes is not specific to the way cargo uses gix, the behavior of cargo publish was the stated motivation for #1629, as well as possibly being the main motivation for this fix, going by how #1695 classifies it. But to fix the behavior of cargo publish in the presence of FIFOs, it may be necessary for cargo to make more changes than just using a newer gix-dir:

    • It is possible to publish a crate from a directory that is not a Git repository, and when that is done, the bug still occurs, because the separate directory walking logic cargo implements also opens non-regular files including FIFOs.
    • In general, cargo publish seems to attempt to archive non-regular files, other than directories, as regular files, by opening them for read. This includes even symlinks, which it seems to dereference. (This also includes symlinks that resolve to non-regular files including FIFOs--when I do a dry-run publish of a crate that has a symlink to a FIFO, it blocks reading from the FIFO.)
  2. Both Git and gitoxide attempt to read from .gitignore even when it is a FIFO, in at least some situations. I'm not totally sure what the preferred behavior is for that, since it's a very strange situation and maybe there are use cases for .gitignore files to be kinds of filesystem entries that git cannot track. But it seems to me that blocking indefinitely should be avoided even here, and that this is a bug in Git and gitoxide.

    git status and git clean -n block in a repository where .gitignore is a FIFO. gix status and gix clean -n blocked in such a repository prior to the fix in this PR. Now, when there are no untracked files, gix status no longer blocks (which surprised me), but gix clean -n still does. But when there is a new unstaged regular file in addition to the FIFO, both gix status and gix clean -n still block. Both before and after this PR, when a gix command blocks, it blocks twice, i.e., it tries to open the .gitignore FIFO for read twice instead of just once, while in contrast git commands block just once. Or, to state it more carefully, I have to press Ctrl+C only once with git but twice with gix, which I think, but do not know for sure, is due to the entry being opened again.

  3. The tests here are rightly skipped on Windows. The reason this is the right thing to do, at least as they are currently written, is that they are all testing with FIFOs, and they use a fixture whose creation script uses the mkfifo command.

    Windows does have named pipes, and a mkfifo command does exist in Git Bash and other MSYS2 environments. But the MSYS2 mkfifo command does not actually create named pipes! Instead, it creates .lnk shortcut files that MSYS2 programs--programs that link to msys-2.0.dll and use it for POSIX system call emulation--treat as named pipes. I think Cygwin programs--programs that use cygwin1.dll--can also use them. Native Windows programs just see them .lnk shortcut files--which, at least typically, cannot be dereferenced and, in any case, are not named pipes.

    Although Git for Windows ships a MSYS2 environment, it is a native Windows program. More precisely:

    • Although git somewhat misleadingly places its Windows-specific implementations of various functions that are natively available on Unix-like systems in a mingw.c file, this is really an implementation detail of a native Windows program.
    • All directly Git-related binaries shipped in Git for Windows--the git.exe wrapper in cmd/ as well as the more important git.exe in a directory usually called mingw64/bin/ or mingw32/bin/ that it wraps, as well as all binaries inside the git-core directory whose location git --exec-path gives--are native Windows programs rather than MSYS2 programs.
    • Interpreters such as bash and perl are MSYS2 programs, however, so parts of Git that are implemented as scripts for them are exceptions in that they are effectively non-native. (Likewise, scripts provided by the user, such as hooks, if they are shell scripts, will also run in interpreters that perform POSIX functions using msys-2.0.dll.)

    The git command on Windows therefore--and I very much believe intentionally and rightly--gives no special semantics to any .lnk files, even if they happen to be treated specially by MSYS2. If one is to create named pipes on Windows for tests, it cannot be done with mkfifo in Git Bash or another MSYS2 or (as far as I know) any other Cygwin-like environment.

  4. At least with respect to FIFOs (named pipes), it is possible that the issue this is fixing should be considered not to affect Windows. My view is that probably we should consider Windows as affected and eventually try to test for the much weirder and harder to produce situation with named pipes that appear in a repository on Windows. But this is not obvious, and in order to decide, it may be necessary to consider this alongside other kinds of unusual filesystem entries on Windows, some of which are entirely Windows-specific.

    Unlike on Unix-like systems where they can be created just about anywhere (which the MSYS2 mkfifo situation with specially treated .lnk is meant to emulate), on Windows named pipes are only allowed to exist in a named pipe filesystem. On Windows, named pipes live under \\.\pipe\ or, more broadly, under paths of the form \ServerName\pipe\. They are not directly present elsewhere.

    Such paths resist being exposed in a way that allows a Git repository to include them. For example, a directory symlink can point to \\.\pipe\ and a file symlink can point to any particular named pipe, and these symlinks are valid and dereferenceable. But symlinks in repository working trees--regardless of whether named pipes are involved and regardless of whether the symlinks point outside the volume that holds them--do not cause their target entries to be part of a Git repository.

    When an NTFS junction (see #1354 (comment)), which can usually be viewed as similar to a recursive bind mount on Unix-like systems that support them (mount --rbind), is present in a working tree, Git does deference it and regard the contents of the junctioned-to location as part of the repository. However, junctions do not work across filesystems; one cannot make a junction from an ordinary volume to a path like \\.\pipe\ (nor to an individual pipe, though that is also because they can only point to directory-like entries).

    It is nonetheless possible to make this elusive scenario occur, so named pipes (not named in a way that makes them special to git--this is independent of the .gitignore behavior described above) are seen as part of a repository, and so git commands even try to read form them and block. The way I have found to make this happen is to place a junction in a repository working tree that points to a directory symlink on the same volume, where the directory symlink itself points to \\.\pipe\. I would guess that there are other ways.

    By the way, if performing this experiment, it is probably best to use a virtual machine that is not being used for anything else, since attempts are made to read from a bunch of named pipes on the system--anything under \\.\pipe\ may be read from!--including some where weird things happen. However, this is an area where I did not follow my own advice. The main effect I noticed, and reproduced, was that all consoles that were started before the experiment failed to be recognized as consoles by native Windows programs afterwards that check device type--for example, I could not run vim except in a new console--even though they seemed otherwise to continue working.

    One reason I mention the risk is to clarify what I mean when I say the situation is hard to produce. The experiment, as I performed it, is not too hard to set up, having figured out how. But it might be tricky to find an approach that wouldn't risk subtle havoc when run in automated tests.

    (I found that the behavior of MSYS2 programs when real named pipes are exposed to them using this technique is also very interesting, but I omit it here for, uh, brevity?)

There are a few more gix clean related things I plan to try. I'll open issues for any bugs I do find (if any), and also probably for the situation when .gitignore is a FIFO.

Edit: I've opened #1729 about a new gix clean panic.

@Byron
Copy link
Member Author

Byron commented Dec 19, 2024

Thanks so much for sharing!

But to fix the behavior of cargo publish in the presence of FIFOs, it may be necessary for cargo to make more changes than just using a newer gix-dir:

That's true! There are non-git codepaths and probably they would fall for named pipes, even though I didn't validate it. Interestingly I am not aware of an existing issue in Cargo even, so it seems to be rare.

2. Or, to state it more carefully, I have to press Ctrl+C only once with git but twice with gix, which I think, but do not know for sure, is due to the entry being opened again.

What happens is that Ctrl + C is implemented so the first time, it will set a boolean to true to indicate the maybe long-running operation to stop gracefully. This is how the application winds down by itself. As the application is stuck reading from a pipe, that doesn't work though. Once the user hits Ctrl + C again, that's a sign it's not working and the standard interrupt handler is invoked to abort the program (but not without clearing out tempfiles that are currently not being read from or written to).

4. One reason I mention the risk is to clarify what I mean when I say the situation is hard to produce. The experiment, as I performed it, is not too hard to set up, having figured out how. But it might be tricky to find an approach that wouldn't risk subtle havoc when run in automated tests.

That was a very interesting read, even though I kept thinking that Windows seems to be quite broken if sideeffects like these are even possible. My feeling is that something in the console might have started reading, and blocking on, the named pipe, and now isn't able to respond to events anymore that help classify it as console. It's a denial of service for the console in question, probably a primtiive for further exploitation.

4. (I found that the behavior of MSYS2 programs when real named pipes are exposed to them using this technique is also very interesting, but I omit it here for, uh, brevity?)

😁

Edit: I've opened #1729 about a new gix clean panic.

Wonderful - this actually tells me that the natural behaviour of ignoring these named pipe entries that I just accepted for my convenience isn't actually correct, and that they should be officially pruned after all. Then all other tools should handle them correctly as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants