Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Compression and File utilities for Zip and GZ handling #22

Merged
merged 1 commit into from
Mar 23, 2015

Conversation

drcrallen
Copy link
Contributor

I moved these from druid-api and added / expanded the functionality.

@drcrallen
Copy link
Contributor Author

I think I want to add one or two more options for the utilities. give me a min

@drcrallen drcrallen closed this Mar 6, 2015
@drcrallen drcrallen reopened this Mar 7, 2015
@drcrallen drcrallen force-pushed the CompressionFileUtils branch from cd6228b to 253e643 Compare March 7, 2015 02:00
@drcrallen
Copy link
Contributor Author

Added more

@drcrallen drcrallen force-pushed the CompressionFileUtils branch from 253e643 to 309955e Compare March 9, 2015 17:26
@drcrallen
Copy link
Contributor Author

get --> build in the docs shouldn't be a change. Will fix

@drcrallen drcrallen force-pushed the CompressionFileUtils branch from 309955e to 3da9158 Compare March 9, 2015 17:44
@drcrallen
Copy link
Contributor Author

I'm adding a few more cases, closing until they are in

}
}

public static InputStream fix7036144(final InputStream in)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call this something else and reference the bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug is referenced below, but I'll put it in the method comments and change the name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@drcrallen
Copy link
Contributor Author

It was asked which parts are new.

A very large part is new. Here's the prior CompressionUtil which did not really have retries or data consistency checks.

FileUtils is new

@drcrallen drcrallen force-pushed the CompressionFileUtils branch 2 times, most recently from b676362 to 95776df Compare March 11, 2015 00:16
@drcrallen drcrallen force-pushed the CompressionFileUtils branch 2 times, most recently from 1ea906d to 15151b0 Compare March 19, 2015 01:18
import java.util.zip.ZipInputStream;
import java.util.zip.ZipOutputStream;

public class CompressionUtils
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all the functions in this class need at least a little bit of documentation, especially around what their return values mean, when they will or won't throw exceptions, and what the retry behavior is.

Also, do we actually need all of these? More functions are more convenient but also more code to test and maintain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@drcrallen drcrallen force-pushed the CompressionFileUtils branch from 8a22f18 to 406a579 Compare March 19, 2015 23:12
}

// Use unzip(ByteStream, File) if possible
public static FileUtils.FileCopyResult unzip(InputStream in, File outDir) throws IOException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one's missing javadocs for some reason

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some of the "older" functions I did not add java docs, but rather copied them directly from druid-api. I can add for these though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to document those as well, since they are new to java-util. Otherwise, we would just leave them in druid-api. We also made some changes to the return values.

@drcrallen drcrallen force-pushed the CompressionFileUtils branch from 406a579 to 50f7642 Compare March 23, 2015 18:25
@@ -35,32 +39,36 @@
// The default buffer size to use (from IOUtils)
private static final int DEFAULT_BUFFER_SIZE = 1024 * 4;

public static void copyToFileAndClose(InputStream is, File file) throws IOException
// It is highly advised to use FileUtils.retryCopy whenever possible, and not use a raw `InputStream`
// This should only be used if there is absolutely no way to replay the InputStream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make those javadocs as well?

@drcrallen drcrallen force-pushed the CompressionFileUtils branch from 50f7642 to 7f3239d Compare March 23, 2015 18:50
@drcrallen drcrallen force-pushed the CompressionFileUtils branch from 7f3239d to 368f7c3 Compare March 23, 2015 18:53
xvrl added a commit that referenced this pull request Mar 23, 2015
Add Compression and File utilities for Zip and GZ handling
@xvrl xvrl merged commit d40e51a into master Mar 23, 2015
@xvrl xvrl deleted the CompressionFileUtils branch March 23, 2015 18:53
drcrallen added a commit to metamx/druid that referenced this pull request Mar 30, 2015
* Requires druid-io/druid-api#37
* Requires metamx/java-util#22
* Moves the puller logic to use a more standard workflow going through java-util helpers instead of re-writing the handlers for each impl
  * General workflow goes like this: 1) LoadSpec makes sure the correct Puller is called with the correct parameters. 2) The Puller sets up general information like how to make an InputStream, how to find a file name (for .gz files for example), and when to retry. 3) CompressionUtils does most of the heavy lifting when it can
cheddar pushed a commit to cheddar/druid that referenced this pull request Jul 1, 2015
* Requires druid-io/druid-api#37
* Requires metamx/java-util#22
* Moves the puller logic to use a more standard workflow going through java-util helpers instead of re-writing the handlers for each impl
  * General workflow goes like this: 1) LoadSpec makes sure the correct Puller is called with the correct parameters. 2) The Puller sets up general information like how to make an InputStream, how to find a file name (for .gz files for example), and when to retry. 3) CompressionUtils does most of the heavy lifting when it can
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants