-
-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various fixes #1374
Various fixes #1374
Conversation
---- Note that this commit also streamlines obtaininig a relative path for a directory, which previously could panic.
That way it's easier to assure that forbidden names are never used as part of path components.
…ding paths. This way, everyone using the stack with the purpose of altering the working tree will run additional checks to prevent callers from sneaking in forbidden paths. Note that these checks don't run otherwise, so one has to be careful to not forget to run these checks whenever needed.
That way, detailed information about the path-to-be is available not only for evaluating attributes or excludes, but also for validating path components (in this case, relevant for `.gitmodules`).
…ctNTFS`. This also adds `gitoxide.core.protectWindows` as a way to enforce additional restrictions that are usually only available on Windows. Note that `core.protectNFS` is always enabled by default, just like [it is in Git](git/git@9102f95).
It's probably best not to try to protect against violations of constraints in this free-to-mutate data-structure and instead suggest to validate entry paths before using them on disk (or use the `gix_worktree::Stack`).
…otectWindows` is enabled. Note that trailing `.` are forbidden for some reason, but trailing ` ` (space) is forbidden as it's just ignored when creating directories or files, allowing them to be clobbered and merged silently.
This should not be incorporated into automated tests in its current form. It is a proof of concept to generate repositories that attempt to install real executables in directories where they may be run, whereas test fixtures should completely limit all effects to testing directories, even in the event of regressions or unexpected failures.
+ Refactor for brevity.
Because some sed implementations, at least the one on macOS, detect invalid text in the current locale's encoding and error out. See: https://stackoverflow.com/questions/19242275/re-error-illegal-byte-sequence-on-mac-os-x This makes the script work on macOS.
Because committing the staged paths creates the necessary Git tree objects irrespective of what directories exist or are otherwise represented. In addition to simplifying the proof-of-concept repository, this also makes it so its entries are properly ordered in its Git object database, so `git fsck` does not report errors about that, and exits reporting success (though of course still warns about the presence of `..` components).
Because the output of `git commit` should show that information.
The repo this script makes attempts to check out entries traversing the default `$INDEX_ALLOCATION` directory stream of the `.git` directory, whose stream name is documented to be `$I30`. However, although I am able to access directories under this naming scheme through other applications, the repositories this script currently creates do not appear to trigger the bug in gitoxide. The next step is to try specifying the stream type explicitly.
This seems more effective at revealing such a vulnerability. I don't know why, since both should in principle work fine.
The repo the script makes contains a filename with slash characters in it that, if not rejected, will install a pre-commit hook.
This requires xxd now, but it honors its /bin/sh hashbang line, no longer assuming printf understands \xNN in a format string.
This is needed on some Git versions. It seems it was not needed on older versions, even though their git-fsck detected the unusual filenames when run. It is supported even on those older versions, so the script should still run on them.
The repo this script makes contains a filename with a slash character in it that, if not rejected, will create a file above the working tree. This is a modification of make_traverse_dotgit_slashes.sh. Both require some further revision, and since most of their content is duplicated, it may be worthwhile to combine them to avoid that.
Co-authored-by: Eliah Kagan <degeneracypressure@gmail.com>
- assure `con` is checked for, and that it's not overzealous. - reduce code duplication - improve documentation about more obscure parts of the code, based on the description in [this commit](git/git@e7cb0b4) - upper-case device names in comparisons as this is their canonical form, which also is more recognizable for people who are looking for them. - make clear why there is asymmetry between COM and LPT numbers. - Don't make a partial control-character check, but a complete one (i.e. *b < 32|0x20) - Add more variants for stream type tests (as regression protection, the code doesn't really care) - various clarifications in path-related tests on Windows Co-authored-by: Eliah Kagan <degeneracypressure@gmail.com>
Co-authored-by: Eliah Kagan <degeneracypressure@gmail.com>
@EliahKagan If you take a look at the CI failure on Windows you will see that the error there is due to a lock file that couldn't be obtained. I think this is akin to what Git does. The problem here is that it shouldn't even get to that point - it should fail when trying to read the reference, which I think it might actually do here but without issue. Unfortunately, I couldn't reproduce it in the VM, the test succeeds, but I will keep trying with added debug information. |
Turns out I was blind! CI is actually reproducing the correct issue, and the problem was that the error has handled too late. |
7384d8b
to
5f1c865
Compare
Alright, I think CI will pass |
When I run the tests locally on Windows, at 5f1c865, this still fails:
The same thing seems to be happening on CI for it, with these details. |
This started working with the upgradde of the `zip` crate.
Oh, that's a new failure probably due to Windows not really having symlinks. Maybe there is a way to support that better, but for now it's not to be expected to be a symlink on Windows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I have installed on Windows from e955770 with cargo install --path .
and repeated the manual tests of the refs-related features that had not been fully implemented yet when I had tested them yesterday. Those are working now.
Once this passes CI, I think it can be merged. It seems to me that the pull request description should eventually be updated with the other relevant checklist items, but that this could be done later (which might even be preferable).
Symlinks on Windows do exist, but currently they're locked behind one of two options:
The latter is also what you need to have a hassle-free Edit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments about the semantics of joining Windows paths are mainly to identify areas of interest for forthcoming work.
I've posted them as review comments because that seemed like an easier way to identify context than posting a regular comment here or a separate discussion question.
Some potentially identify areas where the tests may be improved, but they do not represent worries about the correctness of the code under test that was introduced in this PR.
I have two main goals:
-
In some cases, there are clarifications that can be made in the custom messages, which I plan to do soon in a small PR.
Edit: I've opened #1528 and added links to the comments below to corresponding comments in that PR about the related changes.
-
When working on this, I recall that we had discussed the possibility that there was at least one area where the behavior of the standard library should be improved, and you suggested that I might wish to contribute an improvement. I recall being interested in doing so, and also, I think, that you may have been willing to help review or otherwise provide guidance with that. But unfortunately I don't remember what that was!
It may be possible to figure it out from comments we exported from the temporary private fork in which most of the work that made it into this PR occurred. But those exports, even taken together, were not quite complete, and I have not yet managed to figure out which behavior of the standard library we thought was wrong. I'm hoping you may recall that. These comments include details about behaviors that I am confident at least on re-examination are correct but that may seem wrong, as well as about things that I am not sure are correct.
I also hope these comments may be useful to refer back to later when working on other parts of the code where these concepts are relevant. So if you identify any technical mistakes in the comments themselves, please let me know.
p("c:").join("relative"), | ||
p("c:relative"), | ||
"drive + relative = strange joined result with missing backslash, but it's a valid path that works just like `c:\relative`" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c:relative
doesn't work like c:\relative
, except when the current directory on the C: drive is the root. Sorry I missed this before. I'll propose a fix to the wording shortly. Note that the asserted behavior here is correct, both in the sense of being what the standard library does and in the sense that it is not a bug for the standard library to do this. It is only the custom message argument that should be changed.
See also #1528 (comment).
p("d:/").join("C:"), | ||
p("C:"), | ||
"d-drive + c-drive = c-drive - interesting, as C: is supposed to be relative" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, because taking a path
If we are located at d:\
and we use C:
as a path, the path C:
refers to the current directory on the C:
drive. But the only way to express that without resolving paths situationally is as C:
.
The unintuitive aspect of this is that one might assume that taking an absolute path
See also #1528 (comment).
The weirder sub-case of this is when both the absolute path on the left and the relative path on the right have the same drive letter. Different libraries handle this differently.
Paths like C:
or C:bar
that have a drive letter but no root, and thus are relative to the drive-specific current directory on the drive they specify, are one of two major cases of non-UNC Windows paths that may feel absolute but are not. The other major case is a path that has a root but no drive letter, i.e. it starts with /
or \
but not \\
, and thus is relative to the current drive, but not relative to the current directory on that drive. (UNC paths are always absolute.)
Both these cases can lead to bugs in code written to assume more Unix-like semantics of the relative/absolute path distinction. The first case, where ..
component.
The other factor that makes this hard, at least for me, is that the terminology in this part of the official documentation is extremely strange, for example by characterizing paths like \file.txt
as both relative and absolute, and by seeming to imply that a path with any ..
component anywhere is relative regardless of how it starts. I think the more commonly used terminology is what this other article uses, even though that is (a Windows-specific) part of the .NET documentation, rather than being part of the Windows documentation. That is the terminology I have used here.
p("\\\\.").join("C:"), | ||
p("C:"), | ||
"device-namespace-unc + win-drive-relative = win-drive-relative - c: was supposed to be relative, but it's not acting like it." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The device-namespace-unc
part of the custom message suggests that this may have been meant to start with p("\\\\.\\")
rather than p("\\\\.")
. As currently written, I am not sure what the ideal behavior would be here, because I am not sure the trailing \
of the prefix \\?\
is strictly speaking functioning as a path separator. The Rust standard library does sometimes treat \\.\
and \\.
differently as the left side of a path join, though both for the relative path C:
and the absolute path C:\
on the right, the result is the same.
However, for the same reason as in the above comment, that C:
is relative should not be a reason to expect anything other than this behavior. It's less clear what it means to be located at \\.
or \\.\
. But I don't think it should be an exception to the usual effect of C:
on the right side.
Furthermore, although \\.\C:
and \\.\C:\
are both valid paths, we cannot correctly form either one of them by concatenating \\.
or \\.\
with C:
, and the reason is because a path C:
is a relative path:
\\.\C:
is in the device namespace and ends inC:
with no path separator, so thatC:
is a device name. Therefore\\.\C:
refers to theC:
drive as a device; theC:
in\\.\C:
is not a reference to the current directory, root directory, or any file or directory on theC:
drive, but instead to theC:
drive itself. (\\.\C:
is an absolute path, but not to a file or directory.)\\.\C:\
, due to the trailing\
, is the root directory on theC:
drive, and thus generally equivalent to\\?\C:\
(though I don't know that all APIs that accept one will accept the other or that all APIs that accept both will necessarily treat them identically). But this is an absolute path to the root of theC:
drive, whichC:
by itself does not designate because it is a relative path whose meaning depends on the current directory on theC:
drive.
See also #1528 (comment).
assert_eq!( | ||
p("/").join("\\\\localhost"), | ||
p("\\localhost"), | ||
"unix-absolute + win-absolute-unc = win-absolute-unc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not raw strings, so this is indicating that /
joined with \\localhost
gives \localhost
with one leading \
, and I don't understand why this is what happens. It also does not appear to be what this meant to assert, since it characterizes the effect of the join as = win-absolute-unc
, but a path that begins with only one \
is not a UNC path. So I don't know what's going on here.
See also #1528 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a mistake, as it indeed doesn't consider that "\…" is not a raw string.
I don't know what an alternative comment would be, but it definitely isn't = win-absolute-unc
.
I also have faint memory of that, but not the specific thing we thought could be fixed. Hopefully that will come back once these cases are improved and/or fixed up. |
Considering recent changes in std in the same domain, could you have been considering implementing/adding something along the lines of |
Although that makes sense and would have been a good idea, I don't think so. If I remember correctly, I think some behavior of the standard library shown or commented on in those tests had seemed like it was wrong, at the time. |
It looks to me like that is a bug in the Microsoft documentation. Over time, but more recently than that article was edited to mention UWP (see below on the history), I've attempted to use a limited user account (without extra privileges added), as well as with a UAC-unelevated administrator account, to create symbolic links on a few Windows 10 and Windows 11 systems, including multiple versions (builds) of both those operating systems, as well as on one Windows Server 2022 system. All of this was with non-UWP executables. Besides being compiled as traditional rather than UWP or WinRT executables, the executables were fairly diverse, and included:
In these tests, I have never been able to create symlinks without enabling Developer Mode, performing UAC elevation, or using an administrator account for which UAC was not enabled. These experiments were after the relevant edit to the above-linked
I am unsure if "under MSIX" is a reference to operations that take place during an MSIX installation process, or if this is referring to the behavior of all software installed via MSIX, or if this referring to software built as an MSIX-related project type in Visual Studio, or if "MSIX" here is meant to refer to all non-UWP (or all non-WinRT) software. Although it seems unlikely, it could be that the above observations were due to the functions not being called with the necessary flags, or something peculiar about the builds (for example, some seem to have been built against For those reasons, I have now also written and run a small test program in C that tests out both the Although that shows that the documentation is at least partially wrong and incomplete, I regard the test using that program to be the most limited source of information I have about this, just because I have not built it against multiple SDK versions or run it on multiple Windows operating systems or versions of the same Windows operating system. The other thing I have noticed, which I think is the most important of all, is that the change in the
But when the Windows API has pairs of "A" and "W" functions, the "A" function is an ANSI wrapper for the "W" function. An "A" function converts ANSI to Unicode for any strings referenced by input parameters, calls the corresponding "W" function, and then converts Unicode to ANSI for any strings referenced by output parameters or return values. (This is how they work on NT systems. In Windows 9x/Me with MSLU, it was the other way around, with "W" functions being Unicode wrappers for the "A" functions.) So the API documentation does still say that Developer Mode is needed, in the documentation for the "W" version of the function. This is furthermore the primary version. Thus a revision only to the documentation of the "A" version seems unintentional in and of itself. (Something I have noticed is that, when I search using the learn.microsoft.com interface--and sometimes even with general purpose search engines--I get the "A" versions of functions first. This seems to be an unfortunate result of the Microsoft website disambiguating by picking the name that sorts earlier alphabetically--and then search engines ranking partly based on links that go to the "A" versions that were found that way. When one of the unsuffixed macros, such as |
That is some A+ effort there. Good to know the docs and reality are out of sync on that front. I hope they'll be widely supported in the future. Perhaps a compile-time check or feature flag could be implemented at some point, but as-is it's not a safe default. Thanks for the poke! |
Tasks
is_symlink
assertion failure inoverwriting_files_and_lone_directories_works
#1373sha1/asm
is enabled on windows #917Original Checklist from private PR
This PR lays down the infrastructure to further validate tree-names before they are used, and make sure validation happens in the right places.
That way, writes to forbidden locations inside or outside of the repository should be prevented.
Tasks
progress
with the configuration flags to be implementedgix-validate
.git
and variants.gitmodules
as symlinksgix-validate
mode
informationprevent index creation from invalid names- improve awarenessCON
protection.git\hooks\precommit
is_symlink
assertion failure inoverwriting_files_and_lone_directories_works
#1373sha1/asm
is enabled on windows #917gix-validate
) for now (see this issue for details)References