Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQTT input + JSON data format - trailing NULL char in payload causes parse error #7536

Closed
mateuszz-sudo opened this issue May 18, 2020 · 9 comments
Labels
area/json json and json_v2 parser/serialiser related area/mqtt bug unexpected problem or unintended behavior

Comments

@mateuszz-sudo
Copy link

Hi everyone,

I’m working on some project, where I need to collect data to influxdb using mqtt input plugin in telegraf. But I’m facing strange behaviour of mqtt input. I’m sending data on specific topic using JSON string (i.e. “{value”:15}", but MQTT producer, which I’m using adds, null (’\x00’) char at the end of message. It looks like below in preview using MQTTBox client:
14eeee37bcfdb5e2943c54c0281cae077d6ebade

First message was sent from MQTT producer and it has lenght of 41 bytes, while second I sent from regular MQTT client - MQTTBox - length of the second message is 40 bytes - null character at the end is missing. First message causes error in telegraf, while second is correctly wrote to database:

2020-05-16T14:03:14Z E! [inputs.mqtt_consumer] Error in plugin: invalid character '\x00' after top-level value
2020-05-16T14:03:15Z E! [inputs.mqtt_consumer] Error in plugin: invalid character '\x00' after top-level value

My current telegraf config file looks like this (I have to use multiple MQTT inputs to multiple Influx databases, but this not a problem):

[[outputs.influxdb]]
   urls = ["http://localhost:8086"]
   database = "telegraf_metrics_01"
   retention_policy = ""
   write_consistency = "any"
   timeout = "37s"
   tagexclude = ["destinationdb"]
   [outputs.influxdb.tagpass]
      destinationdb = ["db01"]

[[inputs.mqtt_consumer]]
   servers = ["tcp://localhost:1883"]
   qos = 0
   topics = [
      "test/maszyna1/#"
   ]
   data_format = "json"

   [inputs.mqtt_consumer.tags]
   destinationdb = "db01"

Do you have any idea, how to configure telegraf to (maybe) ignore those null characters, or cut this one byte using some processor plugin? I read some about proccessors.string.trim_right plugin, but as I tested - it only works, when input data are correct, so It won't be helpful in this case.

It’s also important that, when I’m using “value” data_format of mqtt input, instead of “json” - telegraf works fine with this mqtt producer, even at the end of simple value also null character appears.
So maybe plugin (or even parser) should ignore those characters in payload (i.e. whitespaces, nulls) by default? Anyways - I have to use json format to pass multiple parameters to influx at once.

I will appreciate any help
Have a good day!

@mateuszz-sudo
Copy link
Author

I think, that I found, what could be the reason of JSON parser behave as above. I've compared part of source code in parsers/value/parser.go:

Part of telegraf/plugins/parsers/value/parser.go file (from line 20):

(...)
func (v *ValueParser) Parse(buf []byte) ([]telegraf.Metric, error) {
	vStr := string(bytes.TrimSpace(bytes.Trim(buf, "\x00")))

(...)

At line 20, there is function to trim Nul ('\x00') character - everything works fine for that data format.

On the other hand - in case of json/parser.go this part of function obviously looks different:

Part of telegraf/plugins/parsers/json/parser.go file (from line 196):

buf = bytes.TrimSpace(buf)
buf = bytes.TrimPrefix(buf, utf8BOM)
if len(buf) == 0 {
	return make([]telegraf.Metric, 0), nil
}

In this case there is no part where nulls are trimmed, probably this one would fix problem:

buf = bytes.TrimSpace(buf)
buf = bytes.TrimPrefix(buf, utf8BOM)
buf = bytes.TrimSuffix(buf,"\x00") # or just buf = bytes.Trim(buf,"\x00")
if len(buf) == 0 {
	return make([]telegraf.Metric, 0), nil
}

@danielnelson
Copy link
Contributor

Adding a null byte to the end of the JSON document seems like an error, the document is no longer valid JSON, null bytes are not allowed whitespace: https://tools.ietf.org/html/rfc7159#section-2

I'm not familiar with MQTT Producer, and a search brings up a few links, can you add a link to the product you are using?

@danielnelson danielnelson added area/json json and json_v2 parser/serialiser related area/mqtt bug unexpected problem or unintended behavior labels May 18, 2020
@mateuszz-sudo
Copy link
Author

I'm using industrial MQTT producer - OI Gateway brought by Wonderware (AVEVA now) software. Below you'll find some video with explanation, how it works:
https://youtu.be/lsAVFvL2nWM?t=71
As I found, this null character is added somewhere in communication driver, but I have no influence on it, unfortunately.

@danielnelson
Copy link
Contributor

Thanks for the link, can you report the issue to them? I don't really want to workaround other vendors bugs in Telegraf, since it makes things more complex on our end.

@mateuszz-sudo
Copy link
Author

mateuszz-sudo commented May 19, 2020

I fully understand that making workarounds could lead someday to problems, but I belive that in this case it's not only a workaround, but more like - unification.

I was curious why "value" data format has those trimming at the end and i found pull request from 2016 associated with exact the same bug as we have here - it was request #2049

Some producers (such as the paho embedded c mqtt client) add a null
character "\x00" to the end of a message. The Value parser would fail on
any message from such a producer.

So it's not even an issue with json data format, but any other, where MQTT payload need to be parsed. And partially I confirmed this: I've tried to send data via influx line protocol, but also got an error:

[inputs.mqtt_consumer] Error in plugin: metric parse error: expected timestamp at 1:77: weather,location=us-midwest,season=summer temperature=82 1465839830100400200\x00"

As we see above - again at the end of payload message nul character appears. As well, as in case of plain value data format, but value data format was fixed long time ago :)

In my opinion it's worth to consider adding null character trimming not only in json parser, but also in rest of parsers (where issue like this one could happen), except value, where it's implemented already.

@danielnelson
Copy link
Contributor

That's an interesting issue, I learned something new about Telegraf today.

I tested the master branch of the https://github.com/eclipse/paho.mqtt.embedded-c, and I do not have this issue with the trailing nul. I was also unable to find any issues reported where it may have been fixed.

There are a few things accepting the PR would be an issue for. We want to be able to support non-text based formats in the parsers where we cannot safely remove trailing nul characters. We also would like to be able to add support for compression to inputs. Given that this would also need added for all parsers, I'm inclined to not add a workaround for these bad payloads.

I can give you a workaround though. In the mqtt_consumer parse the payload as a string field using the value parser. Then use the strings processor to remove the right most byte. Finally, use the parser processor to parse the remaining string as JSON.

@mateuszz-sudo
Copy link
Author

mateuszz-sudo commented May 21, 2020

Thank you for explanation @danielnelson, I appreciate it.
And thanks a lot for workaround tip - I wasn't quite sure it could be done in any way around, it seems I have a couple tests to do :)
Have a good day!

Fun fact - using string parser as first parser (in mqtt_consumer) removes null char by default, and THEN json parser has no problem with converting data to json format...

For everyone who would face this problem in the future, my config file looks like this at the moment:

[[outputs.influxdb]]
   urls = ["http://localhost:8086"]
   database = "telegraf_metrics_01"
   retention_policy = ""
   write_consistency = "any"
   timeout = "30s"
   tagexclude = ["destinationdb"]
   [outputs.influxdb.tagpass]
      destinationdb = ["db01"]

[[inputs.mqtt_consumer]]
   servers = ["tcp://localhost:1883"]
   qos = 0
   topics = [
      "test/maszyna1/parametry"
   ]
   data_format = "value"
   data_type = "string"

   [inputs.mqtt_consumer.tags]
      destinationdb = "db01"

[[processors.parser]]
   parse_fields = ["value"]
   merge = "override"
   data_format = "json"

martins1991 added a commit to martins1991/balena-sense that referenced this issue Feb 19, 2021
@NickJLange
Copy link

NickJLange commented Jan 8, 2023

As of Jan 9th 2023, this error is still out there. I've implemented the workaround. I agree with @danielnelson this is hard to pin down the source of the bad data (ie. is this a bad-sensor), as there is no context around the parsing error. Good context would be the topic that has the bogus data.

What I would like to see fixed is that the entire plugin chokes-up - it should fail on the bad value, then recover gracefully. Shall I put that into a feature request?

@moracca
Copy link

moracca commented Mar 28, 2023

I have a somewhat similar issue, where I am collecting MQTT data, however in my case my data_format is binary, so I need to define the message schema. In many cases, the values being sent are strings of a fixed length, but with 00 padding filling out the remaining space.

The binary config looks like:

  [[inputs.mqtt_consumer.binary]]
    metric_name = "mgmt_response_ctrl"
    entries = [
      { name = "message_type", type = "uint8", assignment = "field" },
      { name = "message_sub_type", type = "uint8", assignment = "field" },
      { name = "length", type = "uint16", assignment = "field" },
      { name = "controller_name", type = "string", bits = 512 },
      { name = "api_version", type = "string", bits = 128 },
      { name = "sw_version", type = "string", bits = 128 }
    ]
    [inputs.mqtt_consumer.binary.filter]
      selection = [
        { offset = 0, bits = 8, match = "0x02" },
        { offset = 8, bits = 8, match = "0x01" }
      ]

Where the MQTT messages are defined as
image

and a sample of data looks like

0201 0060 4c44 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 332e 3000 0000 0000 0000 0000 0000 0000 362e 312e 312e 3000 0000 0000 0000 0000 

In this example the "controller_name" is defined as a 64 byte string value, but the content doesn't take up the full 64 bytes, and is filled with zeros for the missing bytes. This causes an error like
[outputs.postgresql] write error (permanent, dropping sub-batch): ERROR: invalid byte sequence for encoding "UTF8": 0x00 (SQLSTATE 22021)

I am wondering if I too can use some sort of workaround, but it doesn't quite work to go to text first as the null bytes are in the middle of the message. I was hoping I could use the binary parser's 'bits = 512' and 'terminator = "null"' together to define the upper limit of the field, but stop at the first null byte and ignore the rest, but it doesn't appear possible (adding parser failed: config 1 invalid: entry "controller_name" (3): cannot use 'bits' and 'null' terminator together for "controller_name")

Any suggestions here? Or am I asking in the wrong place?

Thanks in advance!

EDIT: resolved my issue using the regex processor to remove \u0000 characters from my fields

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/json json and json_v2 parser/serialiser related area/mqtt bug unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

4 participants