Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TarWriter on Windows Server treats File Deduplication reparse point flag as symbolic links #82949

Open
billfreist opened this issue Mar 3, 2023 · 8 comments
Assignees
Milestone

Comments

@billfreist
Copy link

if ((attributes & FileAttributes.ReparsePoint) != 0)

As seen in the snippet above, the Tar Entry gets mistaken for a symlink if the reparse point flag is set, but it doesn't verify if the file/directory is actually a symlink/junction. Windows Server with file deduplication enabled, or even OneDrive synced files, can set the Reparse point flag, but utilizes entirely different reparse tags.

IO_REPARSE_TAG_DEDUP is used for deduplication, whereas IO_REPARSE_TAG_MOUNT_POINT & IO_REPARSE_TAG_SYMLINK are used for junctions and symlinks.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Mar 3, 2023
@ghost
Copy link

ghost commented Mar 3, 2023

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

if ((attributes & FileAttributes.ReparsePoint) != 0)

As seen in the snippet above, the Tar Entry gets mistaken for a symlink if the reparse point flag is set, but it doesn't verify if the file/directory is actually a symlink/junction. Windows Server with file deduplication enabled, or even OneDrive synced files, can set the Reparse point flag, but utilizes entirely different reparse tags.

IO_REPARSE_TAG_DEDUP is used for deduplication, whereas IO_REPARSE_TAG_MOUNT_POINT & IO_REPARSE_TAG_SYMLINK are used for junctions and symlinks.

Author: billfreist
Assignees: -
Labels:

area-System.IO, untriaged

Milestone: -

@billfreist
Copy link
Author

I should note, this results in the following exception:

System.ArgumentException: The value cannot be an empty string. (Parameter 'value')
   at System.ArgumentException.ThrowNullOrEmptyException(String argument, String paramName)
   at System.Formats.Tar.TarEntry.set_LinkName(String value)
   at System.Formats.Tar.TarWriter.ConstructEntryForWriting(String fullPath, String entryName, FileOptions fileOptions)
   at System.Formats.Tar.TarWriter.ReadFileFromDiskAndWriteToArchiveStreamAsEntryAsync(String fullPath, String entryName, CancellationToken cancellationToken)

@billfreist
Copy link
Author

Looking further, it appears this line is masking the issue (which ultimately leads to the exception due to the fallback to passing string.Empty.

entry.LinkName = info.LinkTarget ?? string.Empty;

FileSystemInfo.LinkTarget checks for those proper Reparse Tags mentioned above and returns null when it doesn't have those tags. It would seem that the code could just check for the existence of that valid LinkTarget once it determines it has the reparse point file flag. Something like:

if ((attributes & FileAttributes.ReparsePoint) != 0 && info.LinkTarget is not null)

@carlossanlop
Copy link
Member

@jozkee @adamsitnik is this something that the System.IO APIs do correctly, or do we have to address this problem in both the Tar APIs and the IO APIs? (maybe Zip too?)

@carlossanlop carlossanlop removed the untriaged New issue has not been triaged by the area owner label Mar 13, 2023
@carlossanlop carlossanlop added this to the 8.0.0 milestone Mar 13, 2023
@jozkee
Copy link
Member

jozkee commented Mar 13, 2023

LinkTarget works properly.

FileSystemInfo.LinkTarget checks for those proper Reparse Tags mentioned above and returns null

@billfreist
Copy link
Author

LinkTarget works properly.

FileSystemInfo.LinkTarget checks for those proper Reparse Tags mentioned above and returns null

Sorry, the wording wasn't clear. I meant that it returns null when its supposed to. So in the case I'm hitting with the dedup tag, its null, when it has the symbolic link or junction tags then it returns a valid link target.

@carlossanlop
Copy link
Member

@billfreist -

I see what you mean. We should explicitly confirm that the file is a reparse point of type junction, and nothing else, and if the answer is "junction", we should store it in the archive as a symbolic link (because it's the closest entry type to junction, and because the Tar spec does not officially support junctions).

I am unsure how the other reparse points should be treated. I suspect we could try treating them as regular files. I'd have to investigate.

If it's helpful for you, @billfreist , the workaround would be to manually extract that reparse point junction entry manually, using TarArchive to open the archive, then iterating the entries until the one that failed is found, and manually creating a symbolic link for it: https://learn.microsoft.com/en-us/dotnet/api/system.io.file.createsymboliclink?view=net-7.0

@carlossanlop
Copy link
Member

Ok I have a fix that would specifically check, just like we do in FileSystem.Windows.cs, that if we have a reparse point, we only treat it as a tar symlink if windows categorized it as a junction or as a symlink.

@carlossanlop carlossanlop self-assigned this Jul 18, 2023
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jul 18, 2023
@carlossanlop carlossanlop modified the milestones: 8.0.0, 9.0.0 Jul 20, 2023
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Nov 29, 2023
@jeffhandley jeffhandley modified the milestones: 9.0.0, Future Jul 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants