Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add URI Handlers and LoadSpec. Deprecate CompressionUtils #37

Merged
merged 1 commit into from
Mar 30, 2015

Conversation

drcrallen
Copy link
Member

Adds URIDataPuller interface for dealing with URI data. I also have a patch incoming to Druid that implements this update.

Also adds a LoadSpec to help solidify the loading of segment data.

Also move CompressionUtils to java-util via metamx/java-util#22

* @throws SegmentLoadingException if there are any errors
*/
@Deprecated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we deprecate this? Do we want the URI to be the standard way to pull down segments? I think there is simplicity in allowing you to just pass a segment and not having to worry about where that segment is stored.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Omni segment loader that should be true. For the puller we were effectively implementing our own non-standard uri. This essentially changes the format for the segment load spec to formal URI compliance

@drcrallen drcrallen force-pushed the dataSegmentURIHandlers branch from cb180a4 to b24d2f0 Compare March 2, 2015 22:10
@drcrallen drcrallen changed the title Add URI handlers to DataSegmentPuller Add URI Handlers Mar 2, 2015
@drcrallen
Copy link
Member Author

@fjy Updated this one to address the polymorphic pulling stuff. Update to druid coming soon

@drcrallen drcrallen force-pushed the dataSegmentURIHandlers branch 3 times, most recently from 0accc4e to fd2549e Compare March 9, 2015 17:27
@drcrallen drcrallen force-pushed the dataSegmentURIHandlers branch from fd2549e to b5e478e Compare March 11, 2015 23:24
* A means of pulling segment files into a destination directory
*/
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type")
public interface LoadSpec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any jsonsubtypes here for serde?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the LoadSpec and the pullers are intimately tied together, I didn't define any subtypes here, instead opting to register the subtypes whenever the pullers are registered. LoadSpec is passed as a Map<String, Object> until it is finally ready to be consumed. At that point both the LoadSpec subtype and the appropriate puller will be registered.

Also, I didn't want to add a bunch of dependencies into the API jar, which would be required to properly couple the actual LoadSpec impl and the SegmentPuller

@drcrallen drcrallen force-pushed the dataSegmentURIHandlers branch from 1e2d806 to 63c17d4 Compare March 18, 2015 18:18
@xvrl
Copy link
Member

xvrl commented Mar 18, 2015

can we add more details to the description, saying this also moved the CompressionUtils to java-util and like to the corresponding PR?

@drcrallen
Copy link
Member Author

@xvrl : added

* @return The time in ms since 1970-01-01Z when the segment was last modified
* @throws SegmentLoadingException on error
*/
public long getLastModified(URI uri) throws IOException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason this is not simply a DateTime?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified to generic getVersion

@drcrallen drcrallen force-pushed the dataSegmentURIHandlers branch from 63c17d4 to dcdb624 Compare March 23, 2015 20:10
// Hold interesting data about the results of the segment load
public static class LoadSpecResult{
private final Long size;
public LoadSpecResult(Long size){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does size accept null values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed that one in the great Long purge. Fixed

@drcrallen drcrallen force-pushed the dataSegmentURIHandlers branch from dcdb624 to c1ce5f1 Compare March 23, 2015 22:19
@drcrallen drcrallen changed the title Add URI Handlers and LoadSpec Add URI Handlers and LoadSpec. Remove CompressionUtils Mar 26, 2015
@drcrallen drcrallen force-pushed the dataSegmentURIHandlers branch from c1ce5f1 to 27a8e04 Compare March 30, 2015 17:35
@drcrallen drcrallen changed the title Add URI Handlers and LoadSpec. Remove CompressionUtils Add URI Handlers and LoadSpec. Deprecate CompressionUtils Mar 30, 2015
* @param outDir The destination directory to put the resulting file
*
* @throws IOException on propogated IO exception, IAE if it cannot determine the proper new name for `pulledFile`
*/
public static void gunzip(File pulledFile, File outDir) throws IOException
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not deprecate this as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the functionality is slightly different (outDir vs outFile) and there is no alternative for specifying an output directory yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't mean you can't deprecate it. There's technically an alternative...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allright, I'll put a tag in there

* Also moved most functionality of CompresisonUtils away from here (now in java-util)
* Added URIDataPuller interface
@drcrallen drcrallen force-pushed the dataSegmentURIHandlers branch from 27a8e04 to 2eb740e Compare March 30, 2015 18:29
@cheddar
Copy link
Member

cheddar commented Mar 30, 2015

LGTM

fjy added a commit that referenced this pull request Mar 30, 2015
Add URI Handlers and LoadSpec. Deprecate CompressionUtils
@fjy fjy merged commit 1b3b335 into druid-io:master Mar 30, 2015
@drcrallen drcrallen deleted the dataSegmentURIHandlers branch March 30, 2015 18:38
drcrallen added a commit to metamx/druid that referenced this pull request Mar 30, 2015
* Requires druid-io/druid-api#37
* Requires metamx/java-util#22
* Moves the puller logic to use a more standard workflow going through java-util helpers instead of re-writing the handlers for each impl
  * General workflow goes like this: 1) LoadSpec makes sure the correct Puller is called with the correct parameters. 2) The Puller sets up general information like how to make an InputStream, how to find a file name (for .gz files for example), and when to retry. 3) CompressionUtils does most of the heavy lifting when it can
cheddar pushed a commit to cheddar/druid that referenced this pull request Jul 1, 2015
* Requires druid-io/druid-api#37
* Requires metamx/java-util#22
* Moves the puller logic to use a more standard workflow going through java-util helpers instead of re-writing the handlers for each impl
  * General workflow goes like this: 1) LoadSpec makes sure the correct Puller is called with the correct parameters. 2) The Puller sets up general information like how to make an InputStream, how to find a file name (for .gz files for example), and when to retry. 3) CompressionUtils does most of the heavy lifting when it can
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants