-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add URI Handlers and LoadSpec. Deprecate CompressionUtils #37
Conversation
6fc7510
to
b46b111
Compare
b46b111
to
cb180a4
Compare
* Requires druid-io/druid-api#37 * Requires metamx/java-util#20
* @throws SegmentLoadingException if there are any errors | ||
*/ | ||
@Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did we deprecate this? Do we want the URI to be the standard way to pull down segments? I think there is simplicity in allowing you to just pass a segment and not having to worry about where that segment is stored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the Omni segment loader that should be true. For the puller we were effectively implementing our own non-standard uri. This essentially changes the format for the segment load spec to formal URI compliance
cb180a4
to
b24d2f0
Compare
@fjy Updated this one to address the polymorphic pulling stuff. Update to druid coming soon |
0accc4e
to
fd2549e
Compare
fd2549e
to
b5e478e
Compare
* A means of pulling segment files into a destination directory | ||
*/ | ||
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type") | ||
public interface LoadSpec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any jsonsubtypes here for serde?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the LoadSpec and the pullers are intimately tied together, I didn't define any subtypes here, instead opting to register the subtypes whenever the pullers are registered. LoadSpec is passed as a Map<String, Object>
until it is finally ready to be consumed. At that point both the LoadSpec subtype and the appropriate puller will be registered.
Also, I didn't want to add a bunch of dependencies into the API jar, which would be required to properly couple the actual LoadSpec impl and the SegmentPuller
1e2d806
to
63c17d4
Compare
can we add more details to the description, saying this also moved the CompressionUtils to java-util and like to the corresponding PR? |
@xvrl : added |
* @return The time in ms since 1970-01-01Z when the segment was last modified | ||
* @throws SegmentLoadingException on error | ||
*/ | ||
public long getLastModified(URI uri) throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason this is not simply a DateTime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified to generic getVersion
63c17d4
to
dcdb624
Compare
// Hold interesting data about the results of the segment load | ||
public static class LoadSpecResult{ | ||
private final Long size; | ||
public LoadSpecResult(Long size){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does size accept null values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed that one in the great Long
purge. Fixed
dcdb624
to
c1ce5f1
Compare
c1ce5f1
to
27a8e04
Compare
* @param outDir The destination directory to put the resulting file | ||
* | ||
* @throws IOException on propogated IO exception, IAE if it cannot determine the proper new name for `pulledFile` | ||
*/ | ||
public static void gunzip(File pulledFile, File outDir) throws IOException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not deprecate this as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the functionality is slightly different (outDir vs outFile) and there is no alternative for specifying an output directory yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't mean you can't deprecate it. There's technically an alternative...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allright, I'll put a tag in there
* Also moved most functionality of CompresisonUtils away from here (now in java-util) * Added URIDataPuller interface
27a8e04
to
2eb740e
Compare
LGTM |
Add URI Handlers and LoadSpec. Deprecate CompressionUtils
* Requires druid-io/druid-api#37 * Requires metamx/java-util#22 * Moves the puller logic to use a more standard workflow going through java-util helpers instead of re-writing the handlers for each impl * General workflow goes like this: 1) LoadSpec makes sure the correct Puller is called with the correct parameters. 2) The Puller sets up general information like how to make an InputStream, how to find a file name (for .gz files for example), and when to retry. 3) CompressionUtils does most of the heavy lifting when it can
* Requires druid-io/druid-api#37 * Requires metamx/java-util#22 * Moves the puller logic to use a more standard workflow going through java-util helpers instead of re-writing the handlers for each impl * General workflow goes like this: 1) LoadSpec makes sure the correct Puller is called with the correct parameters. 2) The Puller sets up general information like how to make an InputStream, how to find a file name (for .gz files for example), and when to retry. 3) CompressionUtils does most of the heavy lifting when it can
Adds
URIDataPuller
interface for dealing with URI data. I also have a patch incoming to Druid that implements this update.Also adds a
LoadSpec
to help solidify the loading of segment data.Also move
CompressionUtils
to java-util via metamx/java-util#22