-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URI parts ingest processor #65150
URI parts ingest processor #65150
Conversation
Pinging @elastic/es-core-features (Team:Core/Features) |
@leehinman and @andrewkroh, I would be interested in your feedback (and/or others, as appropriate) on whether this provides the functionality you described in #57481. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this will make writing pipelines simpler for us.
|
||
URL url; | ||
try { | ||
url = new URL(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there limitations on which schemes that this can parse? Would these parse?
- ftp://ftp.is.co.za/rfc/rfc1808.txt
- ldap://[2001:db8::7]/c=GB?objectClass?one
- telnet://192.0.2.16:80/
Perhaps using java.net.URI would be more forgiving and not require a URLStreamHandler to be loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly parts of the URL are required for parsing to work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewkroh, thanks for looking it over and commenting. I switched over to java.net.URI
which does support more schemes including all three of the examples you list above.
Right now, no parts of a URI are required beyond what java.net.URI
needs to construct an instance. Is that what you would prefer?
throw new IllegalArgumentException("unable to parse URL [" + value + "]"); | ||
} | ||
var urlParts = new HashMap<String, Object>(); | ||
urlParts.put("domain", url.getHost()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ECS isn't clear on what's correct here, but what does getHost
return for bracked IPv6 addresses?
@elastic/ecs Should url.domain
include the brackets that are required when using IPv6 addresses in URLs? https://www.ietf.org/rfc/rfc2732.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per this source, getHost()
will return the IPv6 address enclosed in the brackets.
Since brackets are required with a literal IPv6 address, url.domain
should include the brackets. We can improve the description of url.domain
in the ECS docs to clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. 👍
This is great @danhermann 👍 Note that I think it's fine if this processor doesn't populate the domain breakdown fields |
There's a request for a registered domain processor (#57476). Perhaps if that gets implemented this one could internally use it. |
I can add that on this processor as an option once the registered domain processor is completed. |
Last functional question on this one -- should the processor be renamed to Edit: Though I do see that the ECS field names are all prefixed with |
This is awesome. Thank You. |
I don't really mind either way, but I think it'll indeed be clearer to users that they can use it on all sorts of URIs if it's named |
@elasticmachine run elasticsearch-ci/2 |
cc: @elastic/es-ui in case Kibana auto-complete needs to be updated with this new processor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for adding this processor @danhermann
Left a minor comment
if (userInfo.contains(":")) { | ||
int colonIndex = userInfo.indexOf(":"); | ||
uriParts.put("username", userInfo.substring(0, colonIndex)); | ||
uriParts.put("password", colonIndex < userInfo.length() ? userInfo.substring(colonIndex + 1) : ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this fail with IndexOutOfBounds
for http://user:@www.google.com:80/blarg.gif#ref
? (no password)
Shall we add a test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, the password is set to an empty string. I'll add another test case to make that clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, @andreidan!
Thanks for the heads up @danhermann! I've opened elastic/kibana#83915 to add support for this in the UI. |
Adds a new
uri_parts
processor that decomposes a URI into its constituent parts. E.g.:results in:
The processor relies on the
java.net.URI
class to parse the URI and attempts to map the parts into ECS fields. Some ECS fields are not part of the URI spec, so see the table below for how those are handled:.
exists in the pathusername
orpassword
though they are commonly presented with theusername:password
convention. Theusername
andpassword
fields are parsed out of theuser_info
field on a best-effort basis if a:
exists.password
aboveAlso introduces a new module for ingest processors.
Closes #57481