Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't untar result file #382

Closed
haludi opened this issue Sep 1, 2019 · 11 comments
Closed

Can't untar result file #382

haludi opened this issue Sep 1, 2019 · 11 comments
Assignees
Labels
documentation gzip tar Related to TAR file format

Comments

@haludi
Copy link

haludi commented Sep 1, 2019

Steps to reproduce

  1. Creating dotnet core app with the code included
  2. Installing SharpZipLib version 1.2.0 by NuGet
  3. Run the application on a directory that contains one text file
  4. Untar the result file by command line - tar -C ./ -xvf /mnt/c/work/temp/Haludi8.tar.gz

Expected behavior

When I use version 1.1.0 the operation works
and I can untar the result file
image

Actual behavior

I get an error

gzip: stdin: decompression OK, trailing garbage ignored
tar: Child returned status 2
tar: Error is not recoverable: exiting now

image

Version of SharpZipLib

1.2.0

Obtained from (only keep the relevant lines)

  • Package installed using NuGet
using System;
using System.IO;
using ICSharpCode.SharpZipLib.GZip;
using ICSharpCode.SharpZipLib.Tar;

namespace SharpZipLibApp
{
    class Program
    {
        static void Main(string[] args)
        {
            using (var stream = new MemoryStream())
            using (var file = File.OpenWrite(@"C:\work\temp\Haludi8.tar.gz"))
            {
                var dir = @"C:\Users\igal\AppData\Local\Temp\Haludi";
                TarHelper.CreateGzippedTarArchive(stream, sourceDirectory: dir);
                stream.CopyTo(file);
            }
        }
    }

    public class TarHelper
    {
        public static void CreateGzippedTarArchive(Stream outStream, 
            string sourceDirectory)
        {
            using (var gzoStream = new GZipOutputStream(outStream) { 
                IsStreamOwner = false 
            })
            using (var tarArchive = TarArchive.CreateOutputTarArchive(gzoStream))
            {
                tarArchive.IsStreamOwner = false;

                // Note that the RootPath is currently case sensitive 
                // and must be forward slashes e.g. "c:/temp"
                // and must not end with a slash, otherwise 
                // cuts off first char of filename
                // This is scheduled for fix in next release
                tarArchive.RootPath = sourceDirectory.Replace('\\', '/');
                if (tarArchive.RootPath.EndsWith("/"))
                    tarArchive.RootPath = tarArchive.RootPath
                        .Remove(tarArchive.RootPath.Length - 1);

                AddDirectoryFilesToTGZ(tarArchive, sourceDirectory);
            }
            outStream.Seek(0, SeekOrigin.Begin);
        }



        private static void AddDirectoryFilesToTGZ(TarArchive tarArchive, 
            string sourceDirectory)
        {
            AddDirectoryFilesToTGZ(tarArchive, sourceDirectory, string.Empty);
        }

        private static void AddDirectoryFilesToTGZ(TarArchive tarArchive, 
            string sourceDirectory, string currentDirectory)
        {
            var pathToCurrentDirectory = Path.Combine(sourceDirectory, currentDirectory);

            // Write each file to the tgz.
            var filePaths = Directory.GetFiles(pathToCurrentDirectory);
            foreach (string filePath in filePaths)
            {
                var tarEntry = TarEntry.CreateEntryFromFile(filePath);

                // Name sets where the file is written. 
                // Write it in the same spot it exists in the source directory
                tarEntry.Name = filePath.Replace(sourceDirectory, "");

                // If the Name starts with '\' then an extra folder (with a 
                // blank name) will be created, we don't want that.
                if (tarEntry.Name.StartsWith('\\'))
                {
                    tarEntry.Name = tarEntry.Name.Substring(1);
                }
                tarArchive.WriteEntry(tarEntry, true);
            }

            // Write directories to tgz
            var directories = Directory.GetDirectories(pathToCurrentDirectory);
            foreach (string directory in directories)
            {
                AddDirectoryFilesToTGZ(tarArchive, sourceDirectory, directory);
            }
        }
    }
}
@piksel
Copy link
Member

piksel commented Sep 2, 2019

I have not gone through and fixed the tar/gzip examples, there might be something about them that is not quite right for the newer versions (they were written for 0.8*).
You can skip the memorystream and use the output file as the output stream directly, although I don't think thats the problem here. I'll try to repro this and figure out whats going on...

@piksel piksel self-assigned this Sep 2, 2019
@piksel piksel added tar Related to TAR file format gzip documentation labels Sep 2, 2019
@ProtoThis
Copy link

Just wanted to add I am experiancing the same problem. Updated the SharpZipLib in my project from version 1.1 to version 1.2. Creating a plain tar file still works as expected but creating a tar.gz file (using gzipoutputstream) the resulting file does not working / is not extractable.

Kind regards,

Ferry

@Numpsy
Copy link
Contributor

Numpsy commented Oct 1, 2019

I tried testing it, and what seems to be happening is that flushing GZipOutputStream before writing anything to it causes something to go wrong.

This test without using Tar at all seems to show the issue:

[Test]
[Category("GZip")]
public void DelayedHeaderWriteFlushNoData()
{
	var ms = new MemoryStream();

	using (GZipOutputStream outStream = new GZipOutputStream(ms) { IsStreamOwner = false })
	{
		outStream.Flush();
	}

	ms.Seek(0, SeekOrigin.Begin);
	using (var inStream = new GZipInputStream(ms))
	{
		using (var decompressedStream = new MemoryStream())
		{
			inStream.CopyTo(decompressedStream);
		}
	}
}

(It seems ok without the flush).

Caused by the changes to DeflatorOutputStream.Deflate in #225 perhaps?

@Numpsy
Copy link
Contributor

Numpsy commented Oct 1, 2019

(The underlying issue with the corrupt gzip file might be because the call to DeflatorOutputStream.Flush causes it to write some data to the output stream before GZipOutputStream has written the Gzip header, so the produced stream can't then be read.)

Not sure how much of that is a deflate issue and how much is down to the way GZipOutputStream handles writing the headers - would it be reasonable for GZipOutputStream to ensure the headers are written before any flush on the base deflator stream occurs?

@Numpsy
Copy link
Contributor

Numpsy commented Oct 1, 2019

@piksel Any thoughts on the effects of adding something like

public override void Flush()
{
	if (state_ == OutputState.Header)
	{
		WriteHeader();
	}

	base.Flush();
}

to GZipOutputStream ?

@piksel
Copy link
Member

piksel commented Oct 2, 2019

Yeah, that would be my initial solution as well. Of course, it could still be the case that the header could be written before it's fully finalized by the consumer. But in that case, this would still fail using the current code. There is no reason to ever want to write the contents before the header, so this should be an improvement in any case.
I am not sure I get why the GZip stream would be flushed in any of these cases though. There might be another bug somewhere...

@Numpsy
Copy link
Contributor

Numpsy commented Oct 3, 2019

I am not sure I get why the GZip stream would be flushed in any of these cases though. There might be another bug somewhere...

In TarArchive.Dispose, it does

tarOut.Flush();
tarOut.Dispose();

and TarOutputStream.Flush flushes the underlying output stream (I think TarOutputStream.Dispose then writes more data to the stream and flushes again, so I'm not entriely sure if the initial flush is needed in this situation)

@piksel
Copy link
Member

piksel commented Dec 31, 2019

This all has to do with #225 and how the Deflate engine handles an empty buffer (not well).

piksel pushed a commit that referenced this issue Jun 19, 2020
…efore flush

* Add unit tests to repro #382
* Add an override of Flush() to GZipOutputStream to ensure the headers is writen before flushing
@aluidasa
Copy link

Hi, will this be fixed in the next version?

I think my problems are related: Since updating on 1.2.0 tar.gz-archives are corrupt when they are quite small.
I tested with a windows testfile, created via fsutil, and the tar.gz is extractable as soon as the testfile is larger than about 5635 bytes. But It seems you have found the cause already?

@Numpsy
Copy link
Contributor

Numpsy commented Oct 9, 2020

@aluidasa version 1.3 is out now and should contain the fix for this, if you want to give it a try?

@aluidasa
Copy link

Perfect :) Thank you guys so much!

@piksel piksel closed this as completed Nov 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation gzip tar Related to TAR file format
Projects
None yet
Development

No branches or pull requests

5 participants