Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpacking tar archive with long link in it results in truncated filenames #369

Open
schneems opened this issue Jul 3, 2024 · 3 comments

Comments

@schneems
Copy link
Contributor

schneems commented Jul 3, 2024

Context

I'm using tar to extract the contents downloaded from https://repo1.maven.org/maven2/org/jruby/jruby-dist/9.4.8.0/jruby-dist-9.4.8.0-bin.tar.gz for re-packaging it and uploading it to S3 https://github.com/heroku/docker-heroku-ruby-builder/blob/9e64b4401be4df7c158c9253290f6a3248927023/shared/src/lib.rs#L24-L31.

Unfortunately I've learned that this file uses ././@LongLink and it seems that the rust tar archive truncates the file by default. To demonstrate the issue I made a reproduction https://github.com/schneems/tar_long_link_repro.

Reproduction

$ git clone https://github.com/schneems/tar_long_link_repro
$ cd tar_long_link_repro

Then run it:

$ cargo run

Expected

I expect that tmp/rust-extracted/jruby-9.4.8.0/lib/ruby/stdlib/bundler/vendor/molinillo/lib/molinillo/delegates will contain a file specification_provider.rb.

Actual

It does not, the command fails an assertion:

$ cargo run
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/tar_long_link_repro`
thread 'main' panicked at src/main.rs:32:5:
expected ["resolution_state.rb", "specification_provide"] to include "specification_provider.rb" but it did not
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Note the filename seems truncated:

$ ls tmp/rust-extracted/jruby-9.4.8.0/lib/ruby/stdlib/bundler/vendor/molinillo/lib/molinillo/delegates
resolution_state.rb	specification_provide

More

I see there that the header and entry inside of tar are aware of the concept of long names (https://docs.rs/tar/0.4.41/tar/struct.Header.html?search=long), but I'm not sure how to unpack a Gnu tar file that has them. If the feature exists, we could possibly update the documentation to make it clearer.

@schneems schneems changed the title Cannot tar::Archive.unpack() an archive with long link in it Unpacking tar archive with long link in it results in truncated filenames Jul 3, 2024
schneems added a commit to heroku/docker-heroku-ruby-builder that referenced this issue Jul 3, 2024
The tar crate is currently truncating filenames that are using the GNU "long link" feature alexcrichton/tar-rs#369.

The fix is to shell out to system tar instead.
schneems added a commit to heroku/docker-heroku-ruby-builder that referenced this issue Jul 5, 2024
The tar crate is currently truncating filenames that are using the GNU "long link" feature alexcrichton/tar-rs#369.

The fix is to shell out to system tar instead.
@headius
Copy link

headius commented Jul 8, 2024

I found several reports online of problems with the maven-assembly-plugin or the plexus-archiver that it seems to use to produce a tarball, but nothing exactly matches this. The fact that other tar libraries work properly would seem to implicate tar-rs. However, the @LongLink appears to be a gnu tar extension, which could mean it is simply a problematic gnu-specific feature, or it could mean that the maven plugins for tarballs are doing something unusual or incorrect with it.

It would not seem to be a JRuby bug, since we don't do the tarball construction ourselves, but if there's a problem with an upstream plugin or some configuration change needed to omit @LongLink entries, it will be necessary to patch JRuby's build. If you are able to come up with a reproduction that implicates the JRuby tarball as the source of problems, I would recommend opening a JRuby issue and we can help investigate further.

@RazerM
Copy link

RazerM commented Nov 4, 2024

I found out what's happening here while trying to debug a separate issue, the reproducer was very useful to rule out my issue as distinct and to find what is wrong here.

the entry header is not what tar-rs is expecting, this patch makes it work correctly:

diff --git a/src/header.rs b/src/header.rs
index 8e39ab6..a2ea05b 100644
--- a/src/header.rs
+++ b/src/header.rs
@@ -193,11 +193,18 @@ impl Header {

     fn is_ustar(&self) -> bool {
         let ustar = unsafe { cast::<_, UstarHeader>(self) };
-        ustar.magic[..] == b"ustar\0"[..] && ustar.version[..] == b"00"[..]
+        ustar.magic[..] == b"ustar\0"[..]
+            && (ustar.version[..] == b"00"[..] || ustar.version[..] == b"\0\0"[..])
     }

I was comparing the implementation to CPython's, and from what I can see (unless I missed some indirection), CPython doesn't actually check the version bytes. I'm also not sure what the best way forward is for handling this is. I can't find anything about why these bytes might be different than expected, if not just a mixup of \0 and 0.

@frederikbosch
Copy link

@RazerM, thanks a lot for posting that patch. I does solve my problem. I have created a PR for the solution, and hopefully it will merged in some form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants