Skip to content
This repository has been archived by the owner on Nov 9, 2017. It is now read-only.

Canonicalize paths into xutftowcs_path #86

Merged
merged 1 commit into from
Dec 30, 2013
Merged

Conversation

vangdfang
Copy link

This canonicalizes paths when calling xutftowcs_path(), since
extremely long relative paths will cause errors with Windows due
to path length limitations.

Signed-off-by: Doug Kelly dougk.ff7@gmail.com

@buildhive
Copy link

MSysGit - the development behind Git for Windows » git #112 SUCCESS
This pull request looks good
(what's this?)

@sschuberth
Copy link

Does this still make sense with pull request #85 applied?

@vangdfang
Copy link
Author

I don't think it does. It fixes the same problems that #85 tried to fix for me (specifically, long relative pathnames causing failures), and at this point, I'm not convinced the method #85 uses actually has any benefit, because of other Windows limitations found along the way (mainly those in the actual Windows shell--users could feasibly end up with files they can't access or delete, even if it did work).

@vangdfang
Copy link
Author

Er, guess I should clarify. It may be better to think of this patch as a stripped-down/targeted version of what the other patch attempted to fix (longer temporary buffers when converting UTF8 to WCS-2, and using GetFullPathNameW() to remove long relative pathnames). So, depending on how you look at it, they could be the same thing, or even #85 could build on this to use the "?" if that's ever deemed useful.

@buildhive
Copy link

MSysGit - the development behind Git for Windows » git #114 SUCCESS
This pull request looks good
(what's this?)

@vangdfang
Copy link
Author

@kusma Would you be kind enough to look at the new commit that has been uploaded (f3053586)? Thanks!

@vangdfang
Copy link
Author

In case anyone's still following, found a small bug related to the output of "git rev-parse --git-dir" not showing a path bash would interpret directly... i.e. "scp remote:hooks/commit-msg $(git rev-parse --git-dir)" would fail, but "tmp=$(git rev-parse --git-dir) scp remote:hooks/commit-msg $tmp" worked. The latest version attached to this pull request fixes this issue.

@buildhive
Copy link

MSysGit - the development behind Git for Windows » git #137 SUCCESS
This pull request looks good
(what's this?)

@vangdfang
Copy link
Author

Okay, scratch that. Attempting to munge what rev-parse returns seems like a bad idea. I can't explain why the msys bash has issues with it, but here's an updated version of my original patch.

@buildhive
Copy link

MSysGit - the development behind Git for Windows » git #138 SUCCESS
This pull request looks good
(what's this?)

@dscho
Copy link
Member

dscho commented Dec 17, 2013

Hmm... So, how are we going to proceed about this pull request and #85? Which one is preferable?

@@ -1153,6 +1154,8 @@ static pid_t mingw_spawnve_fd(const char *cmd, const char **argv, char **deltaen

if (xutftowcs_path(wcmd, cmd) < 0)
return -1;
if(!dir)
dir = getcwd(tmpdir, MAX_PATH);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did not need this hunk earlier, why now? AFAICT passing NULL as working directory will Do The Right Thing...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a good question, but from the looks of it, sending a NULL into the xutftowcs_path command will result in a failure from that function with EINVAL... I couldn't tell you exactly what regression test led me to adding this change (it's been so long), but I'm sure the easiest way to tell is to remove the line and compare test results...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sites that ask xutftowcs_path to handle NULL need to be fixed, then.

@vangdfang
Copy link
Author

Yeah, between this and #85, I prefer my patch, just because I've found it to be cleaner and has fewer quirks, but I don't think there's anything conceptually wrong with either. I can also say I've been using this in a corporate environment with the only reported issue being the one I commented on earlier -- bash doesn't like what rev-parse is handing back -- but there seems to be little we can do from the git perspective to correct this.

@dscho
Copy link
Member

dscho commented Dec 30, 2013

@vangdfang so... your patch actually manages to make Git handle file names longer than 255 characters, eh? If so, I fail to see it... (meaning: your patch might convert the file names correctly, but it does not address the problem that MAX_PATH is really small and the Win32 API seems to be unable to handle long paths in general).

@vangdfang
Copy link
Author

@dscho No, that's not what I intended. I think the comments in #85 and issue #24 explain that while using the "?" prefix to create a UNC path longer than 255 characters, the problem is that still, the Windows shell and many Win32 APIs will not support this path. The purpose here was to take a conservative approach to fixing the issue where a long recursive submodule (relative path) would trip over the 255 character limit and create nonsense errors. Basically, the error reported in http://thread.gmane.org/gmane.comp.version-control.msysgit/16284 can be a manifestation of the path temporarily becoming too long due to the round-trips between UTF8 and WCS2 that happen already and enhanced by the recursive path.

Note that the author of #85 did actually test what IDEs and applications are able to natively see and edit files with the long path names. The list isn't very long, and the unusual behaviors and regressions I experienced as a result of creating almost the same patch independently convinced me this was a better route to take.

@dscho
Copy link
Member

dscho commented Dec 30, 2013

@vangdfang I see. But your canonicalization still tries to address basically the same issue, failing in only a few less situations than the current Git for Windows, correct?

I know that many IDEs cannot handle longer file paths, but I'd rather have a fix for Git that will not only buy us time but will prepare for times when even WordPad can handle long paths.

@dscho
Copy link
Member

dscho commented Dec 30, 2013

@vangdfang note that I also prefer the clean-ness of your pull request over #85. But if its solution is too limited, I'd rather have the same method as #85, but with a nice and clean pull request like this one.

@vangdfang
Copy link
Author

I think the better way of looking at this is it prevents the round-trip conversion of paths from incorrectly exceeding the limit unnecessarily. It doesn't try to buy us more characters, but it does prevent us from throwing errors prematurely. Also, if the goal is to eventually apply the use of long path names, this could be seen as a logical first step, since http://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx#maxpath does document that minimal processing is done when you use the "\?" prefix (relative paths are not supported, so canonicalization would be necessary regardless before using the path).

Realistically, I don't think we'll ever see a time where MAX_PATH on Windows changes, since this would be a fundamental change to the Win32 API. Also, the changes which Microsoft has made to accommodate longer paths at the filesystem layer (and not the shell layer) are inconsistent -- thankfully, almost all the msysgit ported calls deal with the WCS2 versions of functions which would support the long paths, but any functions that do not accept a wide string would have to be ported or rewritten.

If you want, I can dust off my change to attempt the same thing as #85. I can't guarantee nearly the same level of testing as I've done with this, but I can probably attach it as a second change to this pull request (thus keeping the logical separation of the two changes). I should have an answer regarding what tests break when removing the block you've commented on above, and I can at the very least run a regression test again to ensure no further testcases break when using the "\?" prefix.

@dscho
Copy link
Member

dscho commented Dec 30, 2013

@vangdfang thanks for the clarification. I fear I came over as a dick, when all I wanted was to understand the issues, which I do now.

So I'd like to see whether I can change the solution to the NULL problem and merge this.

@dscho
Copy link
Member

dscho commented Dec 30, 2013

Actually, seeing that wdir is only used iff dir != NULL, I think that hunk is really unneeded.

@vangdfang
Copy link
Author

And I think you're right! I saw only one additional regression test fail with the additional code that's in this patch, and it appears that was a random failure. I'll fix up the first patch and re-push here shortly.

As for the other part, no worries -- we're all working towards the same goal, and it's good to understand things. Like I said, I'm totally okay with looking at using the extended path format, but I just wanted to warn that doing so should be heavily tested (and it won't buy much).

@dscho
Copy link
Member

dscho commented Dec 30, 2013

Sorry for the flurry of comments, but I thought that you'd want to know about all those changes rather than letting me make them and you not even knowing that upstream git.git would not have accepted your patch without these changes.

As to the \\?\ change: I agree that this can go on top of this pull request, and later. Maybe we can even interest @nitram509 to port his changes to the cleaner format of this pull request?

@vangdfang
Copy link
Author

@dscho I think I've addressed all your comments. :) Many are a force of habit of the various coding styles I'm subjected to professionally or otherwise. And absolutely, the work @nitram509 made is absolutely beneficial -- I don't want to discount the effort he and his colleagues have made.

This provides an xutftowcs helper, xutftowcs_canonical_path(),
since extremely long relative paths will cause errors with Windows
due to path length limitations.

Signed-off-by: Doug Kelly <dougk.ff7@gmail.com>
dscho added a commit that referenced this pull request Dec 30, 2013
Canonicalize paths into xutftowcs_path
@dscho dscho merged commit bac2422 into msysgit:master Dec 30, 2013
@dscho
Copy link
Member

dscho commented Dec 30, 2013

Thanks @vangdfang!

@buildhive
Copy link

MSysGit - the development behind Git for Windows » git #154 SUCCESS
This pull request looks good
(what's this?)

@buildhive
Copy link

MSysGit - the development behind Git for Windows » git #155 FAILURE
Looks like there's a problem with this pull request
(what's this?)

@kblees
Copy link

kblees commented Jan 6, 2014

Sorry for beeing late to comment, but I think this was merged a bit prematurely (see code comments).

Apart from that, I don't see what this patch actually fixes. The missing error handling in git-clone was fixed upstream in V1.8.3 (0aac7bb "clone: die on errors from unpack_trees"), and it doesn't use errno in the error message. Are there other commands that benefit from ENAMETOOLONG in a few more cases?

@dscho
Copy link
Member

dscho commented Jan 8, 2014

As @kblees mentioned in #111 he would prefer to revert this change. I tend to agree and take all the blame: I should have reviewed it better.

@vangdfang are you okay with that? Or do you have an idea how to resolve this issue (CONIN$ being converted incorrectly) in a better way?

@vangdfang
Copy link
Author

I disagree wholeheartedly with a revert (there is an honest problem to be solved here, and a revert is tantamount to turning your back to it), but I do see the problem mentioned... I'd be willing to spend some more time to clean up further (I know some decent comments for how to achieve the same ends were made on #110). The more I think about it, the canonicalization may be over-complicating the situation, but perhaps I can rework this patch into a form more palatable -- and with a test case to show the issue. Does that seem reasonable? (I may be able to remove the need for the canonicalization all together -- let me investigate some more).

@vangdfang
Copy link
Author

Okay, so, I did my test, and this does actually do stuff (so I'd like to fix it up rather than revert). Here's a test case to explain the behavior (it will fail before my patch, passes after): https://gist.github.com/vangdfang/8320939

Sorry I didn't include this with my initial commit... I'll be interested in cleaning this up, though, and include this script with it.

@dscho
Copy link
Member

dscho commented Jan 8, 2014

@vangdfang perfect. That test case proves that there is something to fix and at the same time it will prevent us from breaking it.

However, the problem @kblees pointed to is that the canonicalization is a little too eager: apparently, it is responsible for the breakage reported here and here that resulted in my ugly workaround #111.

Therefore, if we should not revert this pull request wholesale, we need a fine-grained fix for the CONIN$ issue.

@dscho
Copy link
Member

dscho commented Jan 8, 2014

@vangdfang the comment by @kblees is here.

@vangdfang
Copy link
Author

Agreed. I'll work on it after lunch.

On Wednesday, January 8, 2014, dscho wrote:

@vangdfang https://github.com/vangdfang the comment by @kbleeshttps://github.com/kbleesis
here #111 (comment).


Reply to this email directly or view it on GitHubhttps://github.com//pull/86#issuecomment-31859074
.

@kblees
Copy link

kblees commented Jan 8, 2014

@dscho I'm willing to take some blame on me, as I only commented on #85 (long ago), and missed your question to include this in 1.8.5

@vangdfang thanks for working on the issue so tirelessly. Here's some more ideas / constructive advice:

  • Please don't redefine MAX_PATH, but rather define your own constant, e.g. LONG_PATH or whatever.
  • Try to keep UTF-conversion and path prefixing separate. There's no problem with a wrapper that does both, but I think we'll sooner or later need to handle long paths when already given a wide-char string.
  • Document pitfalls in comments or the commit message (e.g. the hard MAX_PATH limit of SetCurrentDirectoryW and the respective CreateProcessW argument).
  • According to MSDN docs, the input to GetFullPathNameW is also subject to the MAX_PATH limit, unless already \?-prefixed. So using this function to determine if a path needs prefixing poses kind of a chicken-egg problem. To support really long paths (>520), you'd have to do the prefixing yourself. Alternatively, we'd have to assume that the GetFullPathNameW docs are wrong and it works with longer paths as well. However, this should be a concious, documented decision as it may fail in any future Windows version.

Here's some (untested) sample code how I think it could be done. Note that there's no path prefixing unless necessary, so device names won't be a problem:

static int max_relative_path_len = 0;

int handle_long_path(wchar_t *wcs, int len)
{
    /* calculate max length of relative paths if uninitialized */
    /* TODO: reset or recalculate max_relative_path_len in mingw_chdir */
    if (!max_relative_path_len)
        max_relative_path_len = MAX_PATH - GetCurrentDirectoryW(0, NULL);

    /* check if wcs is within the MAX_PATH limit */
    if (len < max_relative_path_len || (len < MAX_PATH && is_absolute_path_w(wcs)))
        return len;

    /* now we can set appropriate error code or \\?\-prefix wcs in place */
    errno = ENAMETOOLONG;
    return -1;
}

int xutftowcs_long_path(wchar_t *wcs, const char *utf)
{
    int result = xutftowcsn(wcs, utf, MAX_PATH, -1);
    if (result < 0 && errno == ERANGE)
        errno = ENAMETOOLONG;
    if (result >= 0)
        result = handle_long_path(wcs, result);
    return result;
}

@vangdfang
Copy link
Author

Hi @kblees, the feedback you've given is valuable, but seems to apply more to #110. I know the change in #110 does redefine MAX_PATH, and that's going to be something to improve in the future, but I figured the more immediate issue raised by #111 was more important.

@kusma
Copy link
Member

kusma commented Jan 9, 2014

Just to put in my vote: I vote to revert. IMO this series leads us down a road that I can only see ending in insanity.

I'm quite surprised to see that this was merged in the first place; when Karsten did his Unicode work, his work was reviewed and tested quite thoroughly. Which makes sense, as it was a risky change. But this change is just as risky, and it does not seem to have gone through much testing at all.

So I think we should revert, and maybe merge an improved version, once all kinks have been worked around.

@vangdfang
Copy link
Author

That's fair, but clearly there was a gap in the test coverage, as I've
shown here. I've also run as much automated regression as msysgit currently
passes, and it introduces no issues that the regression tests could find.
Clearly, #111 shows another gap in the coverage, but one that I'm not sure
if there's a good way to catch in an automated way.

@dscho
Copy link
Member

dscho commented Jan 9, 2014

@vangdfang maybe the problem with the canonicalization was that it was too overzealous? If it converts CONIN$, it probably also converts README -- even if that does not need conversion, correct?

To be sure, I want to have a fix for the problem you fixed with this pull request, but it starts to look as if we need to be more careful and maybe the best way (now that three co-maintainers chimed in with this suggestion) to revert and start afresh?

dscho added a commit that referenced this pull request Jan 9, 2014
This reverts commit bac2422, reversing
changes made to ac03519.
@dscho dscho mentioned this pull request Jan 9, 2014
@kblees kblees mentioned this pull request Jan 18, 2014
dscho pushed a commit to dscho/git that referenced this pull request Jul 18, 2022
QNX uses sys/link.h rather than link.h for dl_iterate_phdr

Fixes msysgit#86

	* configure.ac: Check for sys/link.h.  Use either link.h or
	sys/link.h when checking for dl_iterate_phdr.
	* elf.c: Include sys/link.h if available.
	* configure, config.h.in: Regenerate.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants