v4 rewrite: All new design, core types, and stream-oriented interfaces #302

mholt · 2022-01-02T00:38:26Z

I spent the holiday rewriting this package from scratch with a completely new approach to handling archives and compression formats. This will become v4.

The new core APIs are completely stream-oriented and file-agnostic. The abstractions for files, directories, and file systems are virtualized thanks to the recently-added io/fs package in the Go standard library. I expect these design changes will close most open issues and PRs because it either fixes the problems or makes them irrelevant.

A significant number of issues relate directly to interactions with specific file systems (files on disk) and this new API does not deal with that directly anymore, except for a couple specific functions that read the disk in order to create the abstraction. Nothing in the core API writes to disk or deals with that. (Yay!) The stream and FS abstractions are highly flexible to build upon.

Another nice thing about the new design is that there's no more need for explicit composite types (like TarGz and TarBz2), because we have a new CompressedArchive type that composes an archive and a compression format. It's mainly used with Tar only, since Zip and Rar do their own thing, but another nice feature is the Identify() function that automatically gets you the right type:

// opening a file on disk for this example, but can be any ReadSeeker
unknownFile, err := os.Open("filename.tar.gz")
if err != nil {
	return err
}
defer unknownFile.Close()

// if you don't have a filename, leave it blank: identification uses both/either filenames and/or streams
format, err := archiver.Identify("filename.tar.gz", unknownFile)
if err != nil {
	return err
}

// we can now work with the file, for example, extract a file out of it
if ex, ok := format.(archiver.Extractor); ok {
	ex.Extract(context.Background(), unknownFile, "target.txt", func(_ context.Context, f File) error {
		// do something with the file ...
		return nil
	})
}

// or maybe it's just a compressed log file
if decom, ok := format.(archiver.Decompressor); ok {
	rc, err := decom.OpenReader(unknownFile)
	if err != nil {
		return err
	}
	defer rc.Close()
	// read from it ... all reads are decompressed now
}

I think the new APIs are pretty slick, and you should try them out and let me know what you think.

One of my favorite new features is the FileSystem() function. Give it a path on disk, and it will return a fs.ReadDirFS. Basically, it lets you read from real directories, regular files, archive files, and compressed archive files, ALL THE SAME WAY. This is pretty cool I think. You don't have to worry about whether the given file is just a regular file, a directory, or an archive file (which acts like a directory because it contains other files!) -- you can traverse it all the same way. And the archive format doesn't matter either, it's automatically identified for you! You literally won't even know things are being decompressed as you read from them:

fsys, err := archiver.FileSystem("/path/to/folder/or/file")
if err != nil {
	return err
}

// traverse everything except the ".git" folder...
err = fs.WalkDir(fsys, ".", func(path string, d fs.DirEntry, err error) error {
	if err != nil {
		return err
	}
	if path == ".git" {
		return fs.SkipDir
	}
	fmt.Println(path, d.IsDir())
	return nil
})
if err != nil {
	return err
}

// ...or just open one file
file, err := fsys.Open("example.txt")
if err != nil {
	return err
}
defer file.Close()

To make a new archive from files on disk, you might do this:

files, err := archiver.FilesFromDisk(map[string]string{
	"/path/on/disk/file1.txt": "file1.txt",
	"/path/on/disk/file2.txt": "subfolder/file2.txt",
	"/path/on/disk/folder":    "",
})
if err != nil {
	return err
}

out, err := os.Create("example.tar.gz")
if err != nil {
	return err
}
defer out.Close()

caf := archiver.CompressedArchive{
	Compression: archiver.Gz{},
	Archival:    archiver.Tar{},
}

err = caf.Archive(context.Background(), out, files)
if err != nil {
	return err
}

Notice how you have the flexibility of mapping each file (or folder) to a different path in the archive. You can also leave the mapped path blank to assume the base filename for convenience. Folders are added recursively.

Oh yeah, and I added basic context support.

The arc command has not been ported over yet, and as that has to deal directly with the file system, it will require some work before it is ready. This would be the only part of the repository that would write to disk, as the core library doesn't write to disk anymore.

Looks like I was able to delete about 70% of the code, however that count includes test files, the command, and the README, which have not been restored yet. Still, I could feel a significant code reduction in this new design when I wrote it.

Should make irrelevant / close / fix #118, #128, #131, #141, #146, #150, #227, #194, #204, #216, #239, #255, #262, #278, #282

mholt · 2022-01-02T00:58:03Z

I might rename this package to archives. It was suggested before but I didn't get around to it. Maybe now is a good time.

coolaj86 · 2022-01-02T02:44:45Z

I might rename this package to archives. It was suggested before but I didn't get around to it. Maybe now is a good time.

I believe in the Go philosophy that if you're making a breaking change, that's a version bump, but if you're making a philosophical change, that's a new package.

coolaj86 · 2022-01-02T02:48:18Z

This isn't a PR that one can simply review. This is a new project from the ground up.

Make a new repo (without the history baggage), get some tests in, mark this as deprecated, and link to the new repo.

How do you want to handle tests in the new project?

mholt · 2022-01-02T03:53:23Z

Ha, sorry @coolaj86 , I tagged you as a reviewer not with the expectation that you'd actually give a full review -- I probably should have just @-mentioned you instead, since all I wanted to do was let you know that this is inbound. :)

There's no philosophical changes here... just semantics with the name. Meh, I dunno.

As for tests, I haven't thought too much about those yet. I think having a test corpus is a good idea, but as we learned from experience, some things should be generated (like symlinks) whereas other things can be committed... that'll probably come later.

Closes #147 Supersedes #149

Not sure when I'll get around to reimplementing this for v4.

More aesthetically pleasing when enumerating files

mholt · 2022-01-04T06:17:57Z

After quite a few more trials, this seems to be working decently well at least, so I am going to merge it into master but I won't tag it yet (except pre-releases).

Some more tests would be nice, as well as an actual command, but for now I want to have some people start using this, and I need to move my focus onto some other projects here pretty soon anyway.

The new README shows off some of the new version's elegance and features, so I hope you'll enjoy!

jeff-mccoy · 2022-02-16T16:25:41Z

@mholt this is a substantial update and a lot of really great additions. I'm very interested in the stream options now as we have a new use for them for our air gap delivery tool that uses your library. Thanks for the work on this, I see some notes about needing testing/validation, but also notice the tests seem to be less robust in this new cut. I don't think we'll be able to jump to v4 without some more automated tests, but will play with this new version for sure. I also saw you have other priorities soon, do you have an idea what the plan will be for automated tests with v4 going forward?

On another note, I spent the holiday break doing a "mega PR" myself for our project that caused some excitement 🤣: zarf-dev/zarf#237.

mholt · 2022-02-16T17:22:30Z

@jeff-mccoy That's great to hear! I agree v4 could benefit from more tests.

I also saw you have other priorities soon, do you have an idea what the plan will be for automated tests with v4 going forward?

Yeah, so that's true... I'm really busy with Caddy/CertMagic right now, and what I have written for archiver v4 works for me and my needs. However, I do accept sponsorships, and at the right tiers I can absolutely prioritize writing some automated tests for you. 👍

Otherwise, I can try to get around to them when I have a chance, or I'm happy to accept contributions (PRs) as well! Let me know what suits you best and we can do that.

(PS. Wowzers, that's a monster PR you've got there!)

jeff-mccoy · 2022-02-17T00:12:13Z

Awesome. I think we'll do some combination of those options. We deeply value open source projects and know it can be a thankless endeavor sometimes. We'll keep in touch for sure, thanks again for the quick reply and the great library.

mholt · 2022-02-17T06:15:08Z

@jeff-mccoy Sounds good -- would look forward to working with you on this if that can happen!

mholt · 2022-06-20T19:35:17Z

@jeff-mccoy Just checking in, how is this lib working for you? (It's been through several releases now that enhance its functionality and fix bugs.)

mholt added 2 commits January 1, 2022 17:04

Initial commit of v4 rewrite; core types, methods

0390e64

Correct file permissions

b8bae60

mholt requested a review from coolaj86 January 2, 2022 00:38

mholt self-assigned this Jan 2, 2022

This was referenced Jan 2, 2022

Return the archived/unarchived file path #130

Closed

Added FullPath to File for Zips. #215

Closed

Subdirectory names get dropped #250

Closed

mholt changed the title ~~Initial commit of v4 rewrite; core types, methods~~ v4 rewrite: All new design, core types, and stream-oriented interfaces Jan 2, 2022

mholt removed the request for review from coolaj86 January 2, 2022 09:55

zip: Decode non-UTF-8 encoded filenames (by @zxdvd)

1f30051

Closes #147 Supersedes #149

This was referenced Jan 2, 2022

archiver.Extract with archiver.Walk not support package with dir #275

Closed

how about add an option to specify non-utf8 encoding of zip filename #147

Closed

bugfix extract single file from zip #142

Closed

mholt added 7 commits January 3, 2022 21:26

Improve ArchiveFS; fix directory traversal

efcd8ae

Add placeholder command for now

6624cdd

Not sure when I'll get around to reimplementing this for v4.

ci: Require Go 1.17

4d10547

Merge branch 'master' into new-v4

186d5f1

Swap params for Inserter interface

632853e

More aesthetically pleasing when enumerating files

Add readme

a72d105

Merge branch 'new-v4' of github.com:mholt/archiver into new-v4

6e0e294

mholt merged commit 10c5080 into master Jan 4, 2022

mholt deleted the new-v4 branch January 4, 2022 06:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4 rewrite: All new design, core types, and stream-oriented interfaces #302

v4 rewrite: All new design, core types, and stream-oriented interfaces #302

mholt commented Jan 2, 2022 •

edited

Loading

mholt commented Jan 2, 2022

coolaj86 commented Jan 2, 2022

coolaj86 commented Jan 2, 2022 •

edited

Loading

mholt commented Jan 2, 2022 •

edited

Loading

mholt commented Jan 4, 2022

jeff-mccoy commented Feb 16, 2022

mholt commented Feb 16, 2022 •

edited

Loading

jeff-mccoy commented Feb 17, 2022

mholt commented Feb 17, 2022

mholt commented Jun 20, 2022

v4 rewrite: All new design, core types, and stream-oriented interfaces #302

v4 rewrite: All new design, core types, and stream-oriented interfaces #302

Conversation

mholt commented Jan 2, 2022 • edited Loading

mholt commented Jan 2, 2022

coolaj86 commented Jan 2, 2022

coolaj86 commented Jan 2, 2022 • edited Loading

mholt commented Jan 2, 2022 • edited Loading

mholt commented Jan 4, 2022

jeff-mccoy commented Feb 16, 2022

mholt commented Feb 16, 2022 • edited Loading

jeff-mccoy commented Feb 17, 2022

mholt commented Feb 17, 2022

mholt commented Jun 20, 2022

mholt commented Jan 2, 2022 •

edited

Loading

coolaj86 commented Jan 2, 2022 •

edited

Loading

mholt commented Jan 2, 2022 •

edited

Loading

mholt commented Feb 16, 2022 •

edited

Loading