-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a bug when including symbolic link files to the package's checksum. #937
Conversation
When a file is a symbolic link to some other file the proper behavior when calculating the checksum of a package is to use the link path instead of the content of the pointed file. Additionally the `getPackageFileListWithoutVcs` procedure is fixed to return also the symbolic link files. Related to nim-lang#127
When cloning a Git repository on Windows Git puts the link file path as the content of an ordinary file. When downloading tarball the symbolic link files cannot be properly established and Tar writes empty files instead. This leads to different packages checksums compared to the case when the package is downloaded via Git or downloaded on Linux either via Git or via tarball. To fix the problem we parsed the content of the tarball and extract the link paths from there. After that, we write a link file with the link path in it overriding the empty file extracted by Tar. Additionally, the `getPackageFileListWithoutVcs` procedure is fixed to return relative paths to files with `/` character as directory separator even on Windows, because they are used in the package checksum calculation and we must have the same checksum for the same package regardless of the operating system or download method. Related to nim-lang#127
5bb35ea
to
ef5edde
Compare
src/nimblepkg/download.nim
Outdated
when defined(windows): | ||
# On Windows symbolic link files are not being extracted properly by the | ||
# `tar` command. They are extracted as empty files, but when cloning the | ||
# repository with Git they are extracted as ordinary files with the link | ||
# path in them. For that reason here we parse the tar file content to | ||
# extract the symbolic links and add their paths manually to the content of | ||
# their files. | ||
let listCmd = &"{getTarExePath()} -ztvf {filePath} --force-local" | ||
let (cmdOutput, cmdExitCode) = doCmdEx(listCmd) | ||
if cmdExitCode != QuitSuccess: | ||
raise nimbleError(tryDoCmdExErrorMessage(listCmd, cmdOutput, cmdExitCode)) | ||
let lines = cmdOutput.splitLines() | ||
for line in lines: | ||
if line.contains(" -> "): | ||
let parts = line.split | ||
let linkPath = parts[^1] | ||
let linkNameParts = parts[^3].split('/') | ||
let linkName = linkNameParts[1 .. ^1].foldl(a / b) | ||
writeFile(downloadDir / linkName, linkPath) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like we will always have edge cases where our hashes are different. Maybe we should instead just ignore symlinks in our hash calculations? How do other package managers handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that for now all edge cases are handled properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, filtering the symbolic links will be harder to implement because where the package's checksum is calculated I already don't have the tar file. The information has to be moved through a few call stack levels in some data structure which will be used only by a part of the code paths. It will lead to a much uglier code IMHO.
To be honest I'm not sure why we need this extra logic. Why is it a bug that we use the content of the linked file instead of its path? The resulting hash won't be compromised, if the symlink changes then we'll get a different hash because the new linked file's contents will be different. |
@dom96 On Windows, the links are not actually links, but ordinary files with link paths in them. Without the extra logic, we will have different checksums on Windows and POSIX systems. |
Ahh, I get it. Thanks for the explanation! I'll leave to @Araq to give his thoughts. |
Related to nim-lang#127
Seems fine to me, should even work with Windows UNC paths. |
When a file is a symbolic link to some other file the proper behavior when calculating the checksum of a package is to use the link path instead of the content of the pointed file.
When cloning a Git repository on Windows Git puts the link file path as the content of an ordinary file. When downloading tarball the symbolic link files cannot be properly established and Tar writes empty files instead. This leads to different packages checksums compared to the case when the package is downloaded via Git or downloaded on Linux either via Git or via tarball.
To fix the above problem we parsed the content of the tarball and extract the link paths from there. After that, we write a link file with the link path in it overriding the empty file extracted by Tar.
The
getPackageFileListWithoutVcs
procedure is fixed to return also the symbolic link files and to return files with/
separator even on Windows (in order to achieve the same checksum).