Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added hep protocol support as parser in telegraf #10039

Closed
wants to merge 2 commits into from

Conversation

adubovikov
Copy link

@adubovikov adubovikov commented Nov 1, 2021

Required for all PRs:

resolves #

@telegraf-tiger
Copy link
Contributor

telegraf-tiger bot commented Nov 1, 2021

Thanks so much for the pull request!
🤝 ✒️ Just a reminder that the CLA has not yet been signed, and we'll need it before merging. Please sign the CLA when you get a chance, then post a comment here saying !signed-cla

@telegraf-tiger telegraf-tiger bot added fix pr to fix corresponding bug new plugin plugin/parser 1. Request for new parser plugins 2. Issues/PRs that are related to parser plugins labels Nov 1, 2021
@adubovikov
Copy link
Author

!signed-cla

@adubovikov
Copy link
Author

@srebhan here you are

@telegraf-tiger
Copy link
Contributor

telegraf-tiger bot commented Nov 1, 2021

@@ -0,0 +1,19 @@
# HEP

The HEP data format parses a HEP packet into metric fields.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with HEP and I suspect many telegraf users aren't either. Could you add a link to the project here in the docs? https://github.com/sipcapture/HEP

It might be worthwhile to provide a more comprehensive example of how this parser is meant to be used

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parser provides compatibility with the HEP encapsulation protocol which is almost universally supported in Open-Source VoIP platforms such as Asterisk, Freeswitch, Kamailio, OpenSIPS and many more, alongside major vendors like Genesys, Sansay, and other. This parser is dedicated to provide a layer of compatibility for those platforms to form and send metrics to Telegraf without implementing new patches/protocols.

@@ -0,0 +1,19 @@
# HEP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it's going through a name change to EEP? If we do add a parser, shouldn't it use the new name?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the large number of integrations using HEP, the name change did not go through as of yet and won't be for the foreseeable future to avoid confusion on an already obscure protocol.


**NOTE:** All HEP packets are stores as Tags unless provided specifically
provided with `hep_header` array and body is parsed with JSON parser.
All the JSON parser features were imported in Hep parser. Please check Json Parser for more details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by "all json parser features were imported". Telegraf has two JSON parsers, "json" and "json_v2". This PR uses the older one which doesn't work well for some common json object structures. Should you switch to v2?

The HEP data format parses a HEP packet into metric fields.

**NOTE:** All HEP packets are stores as Tags unless provided specifically
provided with `hep_header` array and body is parsed with JSON parser.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does HEP only ever embed JSON formatted data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reimda as far as I read into this, newer versions of the protocol can embedd a JSON payload, so we then need an embedded JSON parsing... :-(

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reimda HEP is a generic encapsulation protocol and per-se can carry any payload, including binary. The focus is on JSON in this specific usecase as all the integrating platforms are capable of producing and consuming it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm if this is the case, why not export the payload in a field payload together with a payload_type and then use a parser processor to parse these? This way you can stay generic in this parser...

Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @adubovikov for the nice PR! Overall it looks quite good but I do have some comments additional to @reimda. Please take a look and also check out the linter issues. You can check the linter status locally running make lint-branch on your local git branch.

Comment on lines +124 to +137
if len(h.HepHeader) != 0 {
var headerArray []int
for _, v := range h.HepHeader {
headerArray = append(headerArray, headerNames[v])
headerTags = h.addHeaders(headerArray, hep)
}
headerTags = h.addHeaders(headerArray, hep)
} else {
var headerArray []int
for k := range headerReverseMap {
headerArray = append(headerArray, k)
headerTags = h.addHeaders(headerArray, hep)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is much more complex than it needs to be. You first use the name in the header (e.g. "version") to lookup an index. In addHeaders() you then compare this index to an enumerate exactly matching the order in headerNames to decide which field to add. Afterwards you revert the index to the original name coming from the protocol itself. Why not just using the name directly in addHeaders() with a string-comparison?

var headerArray []int
for _, v := range h.HepHeader {
headerArray = append(headerArray, headerNames[v])
headerTags = h.addHeaders(headerArray, hep)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the linter says, you overwrite this assignment directly after the loop...

h.MetricName = "hep"
}
headerTags := make(map[string]string)
jsonParser, err := json.New(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create the JSON-parser only once per Parser instead of doing it in each call to Parse().

}
}

if hep.ProtoType >= 2 && hep.Payload != "" && hep.ProtoType != 100 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please also get definitions for those magic numbers?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very valid point, this should have been better commented. HEP protocol types are defined in the HEP Draft. The protocol type 100 is for Logs/JSON objects which are the only viable subject in the scope of this integration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just define those as constants to have speaking names so people not knowing the protocol in depth (like myself) can still read the code. Adding the link you posted in a comment would also be nice.

Comment on lines +116 to +131
case 1:
h.ProtoString = "sip"
case 5:
h.ProtoString = "rtcp"
case 34:
h.ProtoString = "rtpagent"
case 35:
h.ProtoString = "rtcpxr"
case 38:
h.ProtoString = "horaclifix"
case 53:
h.ProtoString = "dns"
case 100:
h.ProtoString = "log"
case 112:
h.ProtoString = "alert"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please define those magic numbers similarly to the chunk-type etc?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protocol strings are the equivalent of a type tag and are injected by HEP senders to further identify the payload type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't doubt any of your words, but it won't hurt to define those numbers as consts and use them as speaking names. You can the even implement a type with the Stringer interface to cover this conversion here.

if err != nil {
return nil, err
}
metric := m[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only use the first metric?

Comment on lines +154 to +155
nFields := make(map[string]interface{})
nFields["protocol_type_field"] = hep.ProtoType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please name those fields.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, please think about what should be a tag and what should be a field. I think the protocol type is rather a tag as you might want to query for all packets of a certain protocol type. However, the IPs and also rapidly growing IDs below should really be fields as otherwise cardinality might explode...

nFields := make(map[string]interface{})
nFields["protocol_type_field"] = hep.ProtoType

metric := metric.New(h.MetricName, headerTags, nFields, time.Now())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the time.Now() part as it is the default if not provided.

Comment on lines +242 to +243
parser, err = newHEPParser(config.MetricName,
config.HepMeasurementName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think MetricName and HepMeasurementName denote the same thing and will conflict in the parser. Please remove HepMeasurementName in favor of MetricName.

@@ -232,6 +238,17 @@ func NewParser(config *Config) (Parser, error) {
config.GrokCustomPatternFiles,
config.GrokTimezone,
config.GrokUniqueTimestamp)
case "hep":
parser, err = newHEPParser(config.MetricName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe directly construct the parser here...

@srebhan srebhan self-assigned this Nov 4, 2021
@reimda
Copy link
Contributor

reimda commented Nov 4, 2021

It seems to me that HEP should be a Telegraf input plugin, not a parser.

Input plugins are data sources. Network protocols are usually implemented in input plugins (http, socket/tcp/udp, mqtt, kafka, amqp). Non network data sources such as file content (file or tail input) or program output (exec or execd input) are also input plugins.

Prsers plugins decode data (like json, csv, influx line protocol) that come into telegraf through an input plugin. Parser plugins are useful because their formats are used by various inputs.

Take the example http listener using json parser.

[[inputs.http_listener_v2]]
  service_address = ":8080"
  data_format = "json_v2"
    [[inputs.file.json_v2]]
        measurement_name = "json_data"
        measurement_name_path = "gjson/path"
        ...

HEP seems analogous to the HTTP listener in this example- the input, not the parser.

@reimda
Copy link
Contributor

reimda commented Nov 4, 2021

@adubovikov The PR description is empty and there is no issue that describes what your use case or goals are with this PR. Could you open an issue and describe what you want from the integration of telegraf and HEP? I think it would help us reviewers to have a better idea of what you are trying to do.

@lmangani
Copy link

lmangani commented Nov 4, 2021

@reimda very good points being made, but HEP is an encapsulation protocol and as such, it can be transported by any other protocol (TCP/UDP/SCPT/Queuing protocols/etc) and as such as thought it would be best positioned as a decoder in the pipeline. Having it as an input would force it to carry support for protocols Telegraf already offers a comfortable input for.

EDIT: I will provide an issue with the usecase description as requested

@srebhan
Copy link
Member

srebhan commented Nov 5, 2021

Regarding the embedded JSON, XML, etc, it might be an option to keep the encapsulated part in a field and then chain another parser, only parsing this field... What do you think?

@reimda
Copy link
Contributor

reimda commented Nov 5, 2021

Keep in mind that inputs already can support multiple protocols. This has been done typically by having the plugin accept a url that specifies the protocol. See socket_listener's service_address setting for a listening input example or mqtt_consumer's servers setting for a connecting client example.

We could have HEP be a parser in telegraf, but it sounds like it would need to own another parser to do the job of parsing the encapsulated data (json or other format). I understand that HEP is an encapsulation protocol and it may feel natural to do it this way. Telegraf has never needed to have a nested parser like this before. There are other encapsulated use cases that telegraf handles with the parser processor

I see a few options here. To describe them I'll write example configuration for listening on a UDP port for HEP formatted data.

  1. HEP is a parser plugin that owns another parser
[[inputs.socket_listener]]
  service_address = "udp://:9060"
  data_format = "hep"
  [[inputs.socket_listener.hep]]
    data_format = "json_v2"
    [[inputs.socket_listener.hep.json_v2]]
      measurement_name = "json_data_from_hep"
      measurement_name_path = "gjson/path"
      ...

This would introduce a new concept of nested parsers to telegraf users. Nesting tables in TOML is repetetive and tends to confuse users. It could also be a little ugly to implement because telegraf parser settings are implemented as input level settings.

  1. HEP is an input plugin that has a parser
[[inputs.hep]]
  service_address = "udp://:9060"
  data_format = "json_v2"
  [[inputs.hep.json_v2]]
    measurement_name = "json_data_from_hep"
    measurement_name_path = "gjson/path"
    ...

This requires hep to handle its own transport protocols but the configuration is the most simple and feels the most familiar to me.

  1. HEP is a parser plugin that returns its encapsulated data as a field which can be decoded by a parser processor
[[inputs.socket_listener]]
  service_address = "udp://:9060"
  data_format = "hep" # produces a field called hep_payload
[[processor.parser]]
  parse_fields = ["hep_payload"]
  data_format = "json_v2"
  [[inputs.socket_listener.hep.json_v2]]
    measurement_name = "json_data_from_hep"
    measurement_name_path = "gjson/path"

This feels relatively familiar to a telegraf user.

Thanks @lmangani for working on an issue to describe the use case. Having that should make it easier to decide which way to go.

@srebhan
Copy link
Member

srebhan commented Nov 8, 2021

Thanks for the very nice visualization @reimda. I agree that option 2 looks the easiest from the user perspective. However, from a maintainers perspective it is the worst as we would now fold all possible transports (e.g. http_listener,
socket_listener, ...) into one plugin. Already now socket_listener as a complex piece of software.
Even more severe, we see quite some work on the TLS part for http, so we risk a duplication of those changes...

So I see two options from implementation side:

  1. Continue with reimda's option 2 and abstract the transport similar to what we do in feat: Unify graphics SMI implementations #9815. This would mean the new input plugin can easily configure the transportation and expose the config options to the user. This is nice in terms of easing work for plugins that also want to implement such encapsulation protocols.
  2. Decide for option 3 or a variation thereof. Contrary to option 1, this can be handled by the current structure. You can think of parsing the embedded JSON such that you simply keep every field instead of not parsing, but that would limit the parser to embedded JSON. It's up to you @adubovikov, @lmangani.

For reimda's option 1 we would need to restructure the parsers (maybe starting with something similar to #8791) first in order to allow them to specify the TOML options in the parser itself...

What do you guys think?

@srebhan
Copy link
Member

srebhan commented Feb 4, 2022

@adubovikov any news on this PR?

@srebhan
Copy link
Member

srebhan commented Jun 22, 2022

Is there still interest in this PR?

@srebhan srebhan added the waiting for response waiting for response from contributor label Jun 22, 2022
@telegraf-tiger
Copy link
Contributor

telegraf-tiger bot commented Jul 7, 2022

Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Page. Thank you!

@telegraf-tiger telegraf-tiger bot closed this Jul 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix pr to fix corresponding bug new plugin plugin/parser 1. Request for new parser plugins 2. Issues/PRs that are related to parser plugins waiting for response waiting for response from contributor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants