-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added hep protocol support as parser in telegraf #10039
Conversation
Thanks so much for the pull request! |
!signed-cla |
@srebhan here you are |
📦 Looks like new artifacts were built from this PR. Expand this list to get them here! 🐯Artifact URLs |
@@ -0,0 +1,19 @@ | |||
# HEP | |||
|
|||
The HEP data format parses a HEP packet into metric fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with HEP and I suspect many telegraf users aren't either. Could you add a link to the project here in the docs? https://github.com/sipcapture/HEP
It might be worthwhile to provide a more comprehensive example of how this parser is meant to be used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parser provides compatibility with the HEP encapsulation protocol which is almost universally supported in Open-Source VoIP platforms such as Asterisk, Freeswitch, Kamailio, OpenSIPS and many more, alongside major vendors like Genesys, Sansay, and other. This parser is dedicated to provide a layer of compatibility for those platforms to form and send metrics to Telegraf without implementing new patches/protocols.
@@ -0,0 +1,19 @@ | |||
# HEP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it's going through a name change to EEP? If we do add a parser, shouldn't it use the new name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to the large number of integrations using HEP, the name change did not go through as of yet and won't be for the foreseeable future to avoid confusion on an already obscure protocol.
|
||
**NOTE:** All HEP packets are stores as Tags unless provided specifically | ||
provided with `hep_header` array and body is parsed with JSON parser. | ||
All the JSON parser features were imported in Hep parser. Please check Json Parser for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean by "all json parser features were imported". Telegraf has two JSON parsers, "json" and "json_v2". This PR uses the older one which doesn't work well for some common json object structures. Should you switch to v2?
The HEP data format parses a HEP packet into metric fields. | ||
|
||
**NOTE:** All HEP packets are stores as Tags unless provided specifically | ||
provided with `hep_header` array and body is parsed with JSON parser. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does HEP only ever embed JSON formatted data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reimda as far as I read into this, newer versions of the protocol can embedd a JSON payload, so we then need an embedded JSON parsing... :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reimda HEP is a generic encapsulation protocol and per-se can carry any payload, including binary. The focus is on JSON in this specific usecase as all the integrating platforms are capable of producing and consuming it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm if this is the case, why not export the payload in a field payload
together with a payload_type
and then use a parser processor to parse these? This way you can stay generic in this parser...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @adubovikov for the nice PR! Overall it looks quite good but I do have some comments additional to @reimda. Please take a look and also check out the linter issues. You can check the linter status locally running make lint-branch
on your local git branch.
if len(h.HepHeader) != 0 { | ||
var headerArray []int | ||
for _, v := range h.HepHeader { | ||
headerArray = append(headerArray, headerNames[v]) | ||
headerTags = h.addHeaders(headerArray, hep) | ||
} | ||
headerTags = h.addHeaders(headerArray, hep) | ||
} else { | ||
var headerArray []int | ||
for k := range headerReverseMap { | ||
headerArray = append(headerArray, k) | ||
headerTags = h.addHeaders(headerArray, hep) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is much more complex than it needs to be. You first use the name in the header (e.g. "version") to lookup an index. In addHeaders()
you then compare this index to an enumerate exactly matching the order in headerNames
to decide which field to add. Afterwards you revert the index to the original name coming from the protocol itself. Why not just using the name directly in addHeaders()
with a string-comparison?
var headerArray []int | ||
for _, v := range h.HepHeader { | ||
headerArray = append(headerArray, headerNames[v]) | ||
headerTags = h.addHeaders(headerArray, hep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the linter says, you overwrite this assignment directly after the loop...
h.MetricName = "hep" | ||
} | ||
headerTags := make(map[string]string) | ||
jsonParser, err := json.New( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please create the JSON-parser only once per Parser
instead of doing it in each call to Parse()
.
} | ||
} | ||
|
||
if hep.ProtoType >= 2 && hep.Payload != "" && hep.ProtoType != 100 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please also get definitions for those magic numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very valid point, this should have been better commented. HEP protocol types are defined in the HEP Draft. The protocol type 100 is for Logs/JSON objects which are the only viable subject in the scope of this integration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just define those as constants to have speaking names so people not knowing the protocol in depth (like myself) can still read the code. Adding the link you posted in a comment would also be nice.
case 1: | ||
h.ProtoString = "sip" | ||
case 5: | ||
h.ProtoString = "rtcp" | ||
case 34: | ||
h.ProtoString = "rtpagent" | ||
case 35: | ||
h.ProtoString = "rtcpxr" | ||
case 38: | ||
h.ProtoString = "horaclifix" | ||
case 53: | ||
h.ProtoString = "dns" | ||
case 100: | ||
h.ProtoString = "log" | ||
case 112: | ||
h.ProtoString = "alert" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please define those magic numbers similarly to the chunk-type etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Protocol strings are the equivalent of a type
tag and are injected by HEP senders to further identify the payload type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't doubt any of your words, but it won't hurt to define those numbers as consts and use them as speaking names. You can the even implement a type with the Stringer
interface to cover this conversion here.
if err != nil { | ||
return nil, err | ||
} | ||
metric := m[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why only use the first metric?
nFields := make(map[string]interface{}) | ||
nFields["protocol_type_field"] = hep.ProtoType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please name those fields
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore, please think about what should be a tag and what should be a field. I think the protocol type is rather a tag
as you might want to query for all packets of a certain protocol type. However, the IPs and also rapidly growing ID
s below should really be fields as otherwise cardinality might explode...
nFields := make(map[string]interface{}) | ||
nFields["protocol_type_field"] = hep.ProtoType | ||
|
||
metric := metric.New(h.MetricName, headerTags, nFields, time.Now()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the time.Now()
part as it is the default if not provided.
parser, err = newHEPParser(config.MetricName, | ||
config.HepMeasurementName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think MetricName
and HepMeasurementName
denote the same thing and will conflict in the parser. Please remove HepMeasurementName
in favor of MetricName
.
@@ -232,6 +238,17 @@ func NewParser(config *Config) (Parser, error) { | |||
config.GrokCustomPatternFiles, | |||
config.GrokTimezone, | |||
config.GrokUniqueTimestamp) | |||
case "hep": | |||
parser, err = newHEPParser(config.MetricName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe directly construct the parser here...
It seems to me that HEP should be a Telegraf input plugin, not a parser. Input plugins are data sources. Network protocols are usually implemented in input plugins (http, socket/tcp/udp, mqtt, kafka, amqp). Non network data sources such as file content (file or tail input) or program output (exec or execd input) are also input plugins. Prsers plugins decode data (like json, csv, influx line protocol) that come into telegraf through an input plugin. Parser plugins are useful because their formats are used by various inputs. Take the example http listener using json parser.
HEP seems analogous to the HTTP listener in this example- the input, not the parser. |
@adubovikov The PR description is empty and there is no issue that describes what your use case or goals are with this PR. Could you open an issue and describe what you want from the integration of telegraf and HEP? I think it would help us reviewers to have a better idea of what you are trying to do. |
@reimda very good points being made, but HEP is an encapsulation protocol and as such, it can be transported by any other protocol (TCP/UDP/SCPT/Queuing protocols/etc) and as such as thought it would be best positioned as a decoder in the pipeline. Having it as an input would force it to carry support for protocols Telegraf already offers a comfortable input for. EDIT: I will provide an issue with the usecase description as requested |
Regarding the embedded JSON, XML, etc, it might be an option to keep the encapsulated part in a field and then chain another parser, only parsing this field... What do you think? |
Keep in mind that inputs already can support multiple protocols. This has been done typically by having the plugin accept a url that specifies the protocol. See socket_listener's service_address setting for a listening input example or mqtt_consumer's servers setting for a connecting client example. We could have HEP be a parser in telegraf, but it sounds like it would need to own another parser to do the job of parsing the encapsulated data (json or other format). I understand that HEP is an encapsulation protocol and it may feel natural to do it this way. Telegraf has never needed to have a nested parser like this before. There are other encapsulated use cases that telegraf handles with the parser processor I see a few options here. To describe them I'll write example configuration for listening on a UDP port for HEP formatted data.
This would introduce a new concept of nested parsers to telegraf users. Nesting tables in TOML is repetetive and tends to confuse users. It could also be a little ugly to implement because telegraf parser settings are implemented as input level settings.
This requires hep to handle its own transport protocols but the configuration is the most simple and feels the most familiar to me.
This feels relatively familiar to a telegraf user. Thanks @lmangani for working on an issue to describe the use case. Having that should make it easier to decide which way to go. |
Thanks for the very nice visualization @reimda. I agree that option 2 looks the easiest from the user perspective. However, from a maintainers perspective it is the worst as we would now fold all possible transports (e.g. So I see two options from implementation side:
For reimda's option 1 we would need to restructure the parsers (maybe starting with something similar to #8791) first in order to allow them to specify the TOML options in the parser itself... What do you guys think? |
@adubovikov any news on this PR? |
Is there still interest in this PR? |
Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Page. Thank you! |
Required for all PRs:
resolves #