Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symlink/Junction improvements #254

Merged
merged 3 commits into from
Nov 10, 2017
Merged

Symlink/Junction improvements #254

merged 3 commits into from
Nov 10, 2017

Conversation

phated
Copy link
Member

@phated phated commented Jun 26, 2017

As a talking point for #248, I wanted to create a couple of "integration" tests that show why we should keep the file.isDirectory() call inside .dest and .symlink methods.

@erikkemperman what do you think?

TODO: remove the .only call on the integration suite

@erikkemperman
Copy link
Member

@phated

Interesting! If I were feeling contrary, couldn't I claim that this works just because the vinyl post-symlink is carrying the wrong stats (namely those of the target as opposed to the link)?

My point earlier, integration wise, was just this: shouldn't the output of symlink (and of dest having been fed a vinyl with symlink property) be a symbolic vinyl, with stats that return true for isSymbolic()?

E.g., shouldn't

vfs.src('./in/my-dir')
  .pipe(vfs.symlink('./tmp'))
  .pipe(vfs.dest('./out'));

be the same as doing the following, in series:

vfs.src('./in/my-dir')
  .pipe(vfs.symlink('./tmp')),
vfs.src('./tmp/my-dir', {resolveSymlinks: false})
  .pipe(vfs.dest('./out'));

?

I might try playing around with something like your better-stats to see if I can't make it remember stats about both link and target. That might simplify this conundrum, it seems to me, whilst not suffering any spurious calls (alternating between fs.stat and fs.lstat over the course of vfs internal pipe stages)...

@phated
Copy link
Member Author

phated commented Jun 26, 2017

@erikkemperman I had an idea before and forgot it but you just re-sparked it. What if symlink just becomes a bucket to store a target path and the stats you need later? It's an undocumented property and we could mark it as a breaking change. Does that solve the issues without taking on the complexity of better-stats?

@erikkemperman
Copy link
Member

That should work, I think!

@phated
Copy link
Member Author

phated commented Jun 27, 2017

@erikkemperman perfect! Will you have time to put that together? I'm a bit swamped this week and the little time I have I'm spending cleaning up the other modules for the 4.0 release.

@erikkemperman
Copy link
Member

@phated I'll give it a shot! I briefly tried something today but it got a bit messy, and I think an extra fs call can't be avoided (on windows) without tapping into mode-glob's cache.

Also got to wondering, what if an already symbolic vinyl is piped into symlink?

@phated
Copy link
Member Author

phated commented Jun 27, 2017

I think we just create a link to that link (a user might cause themselves to create a recursive link but that's always a risk with symlinks).

@phated
Copy link
Member Author

phated commented Jul 14, 2017

@erikkemperman where do we stand with this symlink stuff? I agree that it's not semantically perfect but I think we can ship it (no one really uses these features, otherwise we'd have tons of bug reports because it barely works at all currently). I just don't want us to get totally hung up on this as the only blocker.

@erikkemperman
Copy link
Member

@phated I tried your idea of making the symlink property hold both the path and stats but it got ugly for some reason. I'll look at it again asap.

@phated
Copy link
Member Author

phated commented Sep 29, 2017

@erikkemperman I'm adding more unit tests here before I take a look at the implementation. I've just added the tests for emitting symbolic vinyls. What other tests do you think we need here?

@erikkemperman
Copy link
Member

erikkemperman commented Sep 29, 2017

@phated Coincidentally, I was working on this earlier :-)

erikkemperman@c40d603

I think this makes it functionally correct but it isn't especially pretty (an extra stat call in Windows).

In terms of tests I thought we could do with some more on how the vinyl looks post-dest.

Maybe we could merge ours efforts?

EDIT fixed link

@erikkemperman
Copy link
Member

erikkemperman commented Sep 30, 2017

So I can't find where it was, but at some point I suggested we start with a simple, if ugly, approach that does the right things, and take it from there. That's what the commit I linked yesterday was attempting. I think the idea with making the symlink property an object with both a path and stats might be unreliable -- although technically it seems possible as I've implemented it earlier, I felt it was quite brittle for some reason.

But looking back through some of these threads it seems I had repressed the complication you mentioned about users manipulating the vinyl or injecting it without sourcing anything (in particular effecting its isDirectory() method) before it ends up in symlink or dest... And actually some of the unit tests have that mocked in, so it is slightly surprising that my branch passes both Travis and Appveyor.

Will need to look into this further :-(

@erikkemperman
Copy link
Member

Ok the tests pass without calling those mock isDirectory() functions because the paths just happen to point to actual directories... So that much at least makes sense.

@erikkemperman
Copy link
Member

Also, I wanted to add an integration test similar to yours but instead making the link with symlink rather than sourcing an existing symlink with resolveSymlinks set to false. However I will have to that with different paths because the one you used has different base names for link and target -- which symlink currently can't do. There's an issue where that was discussed.

@phated
Copy link
Member Author

phated commented Oct 1, 2017

@erikkemperman I was working on some solutions based on your changes. Something to note is that constructing a new fs.Stat() is generally undesirable because most properties become invalid when you construct without any arguments so we should always try to reuse what we have.

@phated
Copy link
Member Author

phated commented Oct 1, 2017

Also, I think I have a clean-ish solution that does the right thing in 99% of the cases and I'm fine not handling the 1% of edge cases; however, I want us to have a full test suite for everything we want to support before I work on the implementation. Can you help with those remaining test cases?

@phated
Copy link
Member Author

phated commented Oct 1, 2017

Should .dest only write a symlink if file.isSymbolic()? It currently checks for the .symlink property instead of isSymbolic()

Edit: It's also the only case in which we don't check the Vinyl method

@phated
Copy link
Member Author

phated commented Oct 1, 2017

@erikkemperman do you know if we can check flags to find out if something is a junction? It looks like they set something on flags to create them

@phated
Copy link
Member Author

phated commented Oct 1, 2017

^ It looks like native code is required to GetFileAttributes on Windows where we'd be able to check for a "reparse point". There's a native npm module that exposes this but I don't want to include it as a dependency just for Windows.

@phated
Copy link
Member Author

phated commented Oct 1, 2017

My solution is 4-fold:

  1. .src() adds a "frozen" & "hidden" property (like ._symtype) to each symlink Vinyl file using the logic from .dest()/.symlink() because it's actually loading a file into memory, the only time we can assume the original file exists. - I see this property eventually ending up in the better-stats module
  2. .dest() should only enter into the write-symbolic-link path when file.isSymbolic() is true. This feels like the proper way it should be working and avoids us having to add extra fs.Stat logic in dest/prepare to ensure that a symlink is being emitted.
  3. Both .dest() and .symlink() should attempt to read the ._symtype property and only try to guess the symtype (using the current logic, including file.isDirectory()) if it doesn't exist on the Vinyl. They would not read from the file system to guess this because we can't assume the file exists on disk. If they need to guess the symtype, they would attach it as ._symtype for future steps in the pipeline.
  4. .symlink() should mask(?) the stat.mode with a symlink constant in symlink/prepare to ensure a symbolic Vinyl is emitted.

@erikkemperman please review the above and let me know if I'm missing anything. I believe it handles your concerns and my concern about non-vinyl-fs-created Vinyl objects being streamed through the pipeline. If this handles everything, what tests do we need to ensure the steps have full coverage?

@erikkemperman
Copy link
Member

@phated

About testing .symlink and/or .isSymbolic() in dest: we've discussed this earlier, here. My takeaway from that was there might be cases where the vinyl might have .symlink but not have stats (and so isSymbolic() would return false). That's why I thought we should just check the former. But perhaps the safe bet would be file.symlink || file.isSymbolic()?

About detecting junctions: the tests currently look for a trailing slash out of fs.readlink. It seems to work but is kinda quirky.

Your 4-point strategy looks sound at a first casual glance, I might find time for a more thorough review, and perhaps suggestions on testing the whole thing, later today.

var file = new File({
base: inputBase,
path: inputPath,
contents: null,
stat: {
isSymbolic: isSymbolic,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong -- we'll want to mock either file.isSymbolic() or file.stat.isSymbolicLink() right?

https://nodejs.org/api/fs.html#fs_class_fs_stats
https://github.com/gulpjs/vinyl/blob/master/index.js#L84

contents: null,
stat: {
isSymbolic: function() {
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

path: inputPath,
contents: null,
stat: {
isSymbolic: isSymbolic,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.


function assert(err) {
expect(err).toExist();
// TODO: Should we assert anything else about this err?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, elsewhere we're testing the messages of (our custom) errors.

vfs.dest(outputDirpath),
concat(assert),
], done);
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it might make sense to additionally test the case of directly sourcing a symlinked directory and copying it:

it('(*nix) sources a symlink and copies it', function(done) {
    if (isWindows) {
      this.skip();
      return;
    }

    fs.mkdirSync(base);
    fs.mkdirSync(symlinkDirpath);
    fs.symlinkSync(inputDirpath, outputSymlink);

    function assert(files) {
      var destResult = fs.readlinkSync(outputDirpathSymlink);

      expect(destResult).toEqual(inputDirpath);
      expect(files[0].symlink).toEqual(inputDirpath);
    }

    pipe([
      vfs.src(outputSymlink, { resolveSymlinks: false }),
      vfs.dest(outputDirpath),
      concat(assert),
    ], done);
  });

  it('(windows) sources a junction and copies it', function(done) {
    if (!isWindows) {
      this.skip();
      return;
    }

    fs.mkdirSync(base);
    fs.mkdirSync(symlinkDirpath);
    fs.symlinkSync(inputDirpath, outputSymlink, 'junction');

    function assert(files) {
      // Junctions add an ending separator
      var expected = inputDirpath + path.sep;
      var destResult = fs.readlinkSync(outputDirpathSymlink);

      expect(destResult).toEqual(expected);
      expect(files[0].symlink).toEqual(inputDirpath);
    }

    pipe([
      vfs.src(outputSymlink, { resolveSymlinks: false }),
      vfs.dest(outputDirpath),
      concat(assert),
    ], done);
  });

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 9c021e4

@erikkemperman
Copy link
Member

erikkemperman commented Oct 4, 2017

Some addenda:

  1. This new _symtype property would only need to be attached in src when the resolveSymlinks option is set to false, I guess?
  2. I suppose the vinyls coming out of symlink and dest (if symbolic) should also have this new prop.
  3. Should its (in)visibility and (im)mutability be tested?
  4. Symlink shouldn't "mask" the mode but simply overwrite it to parseInt('120777', 8) (the lower 8 bits don't actually have meaning for links, but this is how they show up in ls -la after doing a regular ln -s).

@erikkemperman
Copy link
Member

Also, I'm wondering about re-using the stats in symlink. It is true that new fs.Stats() leaves things invalid but otherwise they'd be valid but wrong (namely the target's metadata not the link's). Not sure which is worse?

@erikkemperman
Copy link
Member

Sorry for so many messages :-)

I've played around with the property you mentioned, but unless I'm missing something it's not getting us around the need for an extra fs.stat call on Windows, because we only ever end up where I think it would be attached (in src/read-contents/read-symbolic-link) if resolveSymlinks is false.

So at that point we'll only have the stats of the link, not the target, and other than (slightly awkwardly) detecting 'junction' by looking for a trailing slash out of readlink we can't tell the difference between 'file' and 'dir' without asking the filesystem.

If the above is accurate, then IMHO it's cleaner to do the stat call just before we need it, in dest rather than src, and we don't need the property I think.

For what it's worth, I've rebased by earlier effort on this branch, and the whole thing on latest master, and then updated to reflected the discussion here:
https://github.com/erikkemperman/vinyl-fs/tree/symlink-reb
(It passes all the tests.)

@phated
Copy link
Member Author

phated commented Oct 6, 2017

@erikkemperman I think our fundamental disagreement around this is that I am working under the ethos that .dest()/.symlink() should not be doing any reading of files that they haven't created. Keeping the filesystem reads separate from the filesystem writes allows you to replace either end of the pipeline with a different vinyl adapter. I think we can do those extra calls you mention in the .src() method and attach the results to the Vinyl object before we pass it along. If we do it right, it'll only affect the performance on Windows in uncommon edge cases.

@phated
Copy link
Member Author

phated commented Oct 6, 2017

I think I'm missing a test that outlines my above philosophy.

@phated
Copy link
Member Author

phated commented Oct 6, 2017

About testing .symlink and/or .isSymbolic() in dest: we've discussed this earlier, here. My takeaway from that was there might be cases where the vinyl might have .symlink but not have stats (and so isSymbolic() would return false). That's why I thought we should just check the former. But perhaps the safe bet would be file.symlink || file.isSymbolic()?

I think once all of this symlink stuff has been improved, it'll actually be correct to test for isSymbolic() but I'll just be keeping an eye on it for now.

@erikkemperman
Copy link
Member

@phated Pushed a few more changes pushed to the branch linked in the comment above. Mostly to use the new reflectMetadata more effectively.

I also figured out what is going wrong on Windows, see this commit which makes the build pass -- but I still need to track this down further to see what changed in the libuv source, I guess.

@erikkemperman
Copy link
Member

Reported the Windows quirk, FYI:
nodejs/node#16496

@phated
Copy link
Member Author

phated commented Oct 26, 2017

Thanks for digging into that!

@erikkemperman
Copy link
Member

erikkemperman commented Oct 29, 2017

@phated I took a quick stab at documenting the caveats on dangling links on Windows, pushed it to my branch here:
https://github.com/erikkemperman/vinyl-fs/commits/is-directory-integrations

With something like that in place, I believe my branch now more or less covers all three bullets you listed earlier.

But having just written that caveats section, I wondered if instead of a (Windows only) useJunctions option, perhaps it would be better to just have a (Windows only) symlinkType option and simply use that whenever we can't determine the right type (because the target is not a regular file, but a directory or non-existent).

Thoughts?

@phated
Copy link
Member Author

phated commented Nov 7, 2017

@erikkemperman I'm getting back to this today. I've cherry-picked some of your work but left out the 2nd round of reflectMetadata code. I think we can use something similar but I'd like to clean it up a bit

file.cwd = cwd;
file.base = basePath;
file.path = writePath;
if (!file.isSymbolic()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to document this in the options

@@ -13,7 +13,7 @@ function writeContents(optResolver) {

function writeFile(file, enc, callback) {
// Write it as a symlink
if (file.symlink) {
if (file.isSymbolic()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the documents talk about .symlink but if so, it'll need to be updated

@phated phated force-pushed the is-directory-integrations branch 4 times, most recently from 4b21b97 to 03c5693 Compare November 8, 2017 00:40
@@ -363,6 +391,7 @@ WriteStream.prototype.open = function() {
// Use our `end` method since it is patched for flush
WriteStream.prototype.destroySoon = WriteStream.prototype.end;
// Use node's `fs.WriteStream` methods
WriteStream.prototype._destroy = fs.WriteStream.prototype._destroy;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erikkemperman this is why your latest tests were failing for node 8 - they changed .destroy to ._destroy on the prototype.

@phated phated force-pushed the is-directory-integrations branch from 03c5693 to c02e0c1 Compare November 8, 2017 03:45
@phated phated changed the title Integrations for isDirectory usage Symlink/Junction improvements Nov 8, 2017
@phated
Copy link
Member Author

phated commented Nov 8, 2017

(Title changed to reflect what happened here)

@erikkemperman @stevelacy @contra could I get a thorough review of this? It is the last thing blocking vinyl-fs 3.0.0 and I want to make sure it's done correct.

@yocontra
Copy link
Member

yocontra commented Nov 8, 2017

My only nitpick would be junctions: true instead of useJunctions: true as the option, but thats really just my personal taste + more consistent with the rest of the gulp API.

@erikkemperman
Copy link
Member

@phated @contra @stevelacy I just had a bit of time to compare the diff of the current state of this PR against what I had in my branch, but superficially this looks good to me!

Could I just ask you guys to take a close look at the wording of the README changes I made, and which look like they made it in here verbatim? English isn't my first language :-) Specifically the caveats section about links on Windows. Thanks!

@stephenlacy
Copy link

Looks good, if I were to nitpick I'd add a link to the caveats section from the useJunctions bulletin point. "Please refer to the [caveats section](<link>) below."

@phated
Copy link
Member Author

phated commented Nov 9, 2017

@stevelacy great minds think alike - I'm going to do a doc review pass before I ship 3.0 and that was on my list.

@contra I think we talked about that in another issue/PR and decided to be more verbose for the Windows users.

@erikkemperman I actually fully reviewed the docs and found them great, which is why there was no comment from me 😛

@phated phated merged commit c02e0c1 into master Nov 10, 2017
@phated
Copy link
Member Author

phated commented Nov 10, 2017

Thanks everyone for the help - especially you @erikkemperman, most of this is your work 😄

@erikkemperman
Copy link
Member

@phated Excellent, very good to see this epic landed!

@phated phated deleted the is-directory-integrations branch November 10, 2017 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants