Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure parser returns hash #4474

Merged
merged 4 commits into from
Apr 30, 2024
Merged

Make sure parser returns hash #4474

merged 4 commits into from
Apr 30, 2024

Conversation

daipom
Copy link
Contributor

@daipom daipom commented Apr 26, 2024

Which issue(s) this PR fixes:

What this PR does / why we need it:

  • Make sure parser_json and parser_msgpack return Hash.
  • Make parser_json and parser_msgpack accept only Hash or Array of Hash

It is wrong for Parser to return a record that is not Hash.
Subsequent processing may result in errors.

parser_json and parser_msgpack could return non-Hash objects, so they have been fixed in this PR.
In addition, in_http was designed based on this wrong behavior, so it has been fixed as well in this PR.

However, we have a remaining problem with filter_parser.
filter_parser could return Array based on this wrong behavior.
This PR disables it, so it can not return multiple parsed results anymore.
Even though it was wrong to begin with, it's possible that this change in specifications would be unacceptable.
We need to consider this change carefully.
I explain some examples in detail below.

Docs Changes:
TODO

Release Note:

  • Make sure parser_json and parser_msgpack return Hash.
  • Make parser_json and parser_msgpack accept only Hash or Array of Hash

Example: parser_json

Case: No subsequent processing (without convert_values)

  • Before: Array => Single record
  • After: Array => Multiple records

Config

<source>
  @type tcp
  tag test.tcp
  <parse>
    @type json
  </parse>
</source>

<match test.**>
  @type stdout
</match>

Operation

$ netcat 0.0.0.0 5170
[{"k":"v"}, {"k2":"v2"}]

Result BEFORE this fix

2023-03-21 23:30:30.623063804 +0900 test.tcp: [{"k":"v"},{"k2":"v2"}]

Result AFTER this fix

2023-03-21 23:29:20.147728222 +0900 test.tcp: {"k":"v"}
2023-03-21 23:29:20.147754433 +0900 test.tcp: {"k2":"v2"}

Case: No subsequent processing (with convert_values)

  • Before: Array => NoMethodError
  • After: Array => Multiple records

Config

<source>
  @type tcp
  tag test.tcp
  <parse>
    @type json
    null_empty_string
  </parse>
</source>

<match test.**>
  @type stdout
</match>

Operation

$ netcat 0.0.0.0 5170
[{"k":"v"}, {"k2":"v2"}]

Result BEFORE this fix

2024-04-26 18:48:43 +0900 [error]: #0 unexpected error on reading data host="127.0.0.1" port=33268 error_class=NoMethodError error="undefined method `each_key' for [{\"k\"=>\"v\"}, {\"k2\"=>\"v2\"}]:Array"

Result AFTER this fix

2024-04-26 18:51:02.763961229 +0900 test.tcp: {"k":"v"}
2024-04-26 18:51:02.763971914 +0900 test.tcp: {"k2":"v2"}

Case: With subsequent filter processing

  • Before: Array => NoMethodError
  • After: Array => Multiple records

Config

<source>
  @type tcp
  tag test.tcp
  <parse>
    @type json
  </parse>
</source>

<filter test.**>
  @type record_transformer
  enable_ruby true
  <record>
    class ${record.class}
  </record>
</filter>

<match test.**>
  @type stdout
</match>

Operation

$ netcat 0.0.0.0 5170
[{"k":"v"}, {"k2":"v2"}]

Result BEFORE this fix

2023-03-21 23:30:51 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for [{\"k\"=>\"v\"}, {\"k2\"=>\"v2\"}]:Array" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.tcp" time=2023-03-21 23:30:51.040692881 +0900 record=[{"k"=>"v"}, {"k2"=>"v2"}]

Result AFTER this fix

2023-03-21 23:29:55.783183244 +0900 test.tcp: {"k":"v","class":"Hash"}
2023-03-21 23:29:55.783215076 +0900 test.tcp: {"k2":"v2","class":"Hash"}

Example: filter_parser

Case: No subsequent processing (without any option)

  • Before: Array => Single record
  • After: Array => Imcomleted single record that has only the first value !!Remaining Problem!!

Config

<source>
  @type sample
  tag test.array
  sample {"message": "[{\"k\":\"v\"}, {\"k2\":\"v2\"}]"}
</source>

<filter test.**>
  @type parser
  key_name message
  <parse>
    @type json
  </parse>
</filter>

<match test.**>
  @type stdout
</match>

Result BEFORE this fix

2024-04-26 18:53:30.029602646 +0900 test.array: [{"k":"v"},{"k2":"v2"}]

Result AFTER this fix

2024-04-26 18:55:28.010836436 +0900 test.array: {"k":"v"}

Case: No subsequent processing (with some options such as reserve_data)

  • Before: Array => ParserError and return the original record
    • parse failed no implicit conversion of Array into Hash
  • After: Array => Imcomleted record that has only the first value !!Remaining Problem!!

Config

<source>
  @type sample
  tag test.array
  sample {"message": "[{\"k\":\"v\"}, {\"k2\":\"v2\"}]"}
</source>

<filter test.**>
  @type parser
  key_name message
  reserve_data
  <parse>
    @type json
  </parse>
</filter>

<match test.**>
  @type stdout
</match>

Result BEFORE this fix

2024-04-26 18:58:25 +0900 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="parse failed no implicit conversion of Array into Hash" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_parser.rb:110:in `rescue in filter_with_time'" tag="test.array" time=2024-04-26 18:58:25.027466823 +0900 record={"message"=>"[{\"k\":\"v\"}, {\"k2\":\"v2\"}]"}
2024-04-26 18:58:25.027466823 +0900 test.array: {"message":"[{\"k\":\"v\"}, {\"k2\":\"v2\"}]"}

Result AFTER this fix

2024-04-26 18:58:08.091274876 +0900 test.array: {"message":"[{\"k\":\"v\"}, {\"k2\":\"v2\"}]","k":"v"}

Case: With subsequent filter processing

  • Before: Array => dump an error event: error_class=NoMethodError
  • After: Array => Imcomleted record that has only the first value !!Remaining Problem!!

Config

<source>
  @type sample
  tag test.array
  sample {"message": "[{\"k\":\"v\"}, {\"k2\":\"v2\"}]"}
</source>

<filter test.**>
  @type parser
  key_name message
  <parse>
    @type json
  </parse>
</filter>

<filter test.**>
  @type record_transformer
  enable_ruby true
  <record>
    class ${record.class}
  </record>
</filter>

<match test.**>
  @type stdout
</match>

Result BEFORE this fix

2024-04-26 19:06:23 +0900 [warn]: #0 dump an error event: error_class=NoMethodError error="undefined method `merge!' for [{\"k\"=>\"v\"}, {\"k2\"=>\"v2\"}]:Array" location="/home/daipom/work/fluentd/fluentd/lib/fluent/plugin/filter_record_transformer.rb:135:in `reform'" tag="test.array" time=2024-04-26 19:06:23.032939419 +0900 record=[{"k"=>"v"}, {"k2"=>"v2"}]

Result AFTER this fix

2024-04-26 19:07:25.096004693 +0900 test.array: {"k":"v","class":"Hash"}

@daipom daipom force-pushed the make-sure-parser-returns-hash branch 3 times, most recently from 900854d to 85e20ad Compare April 30, 2024 02:05
daipom added 4 commits April 30, 2024 11:05
It is wrong for Parser to return a record that is not Hash.
Subsequent processing may result in errors.

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
It is wrong for Parser to return a record that is not Hash.
Subsequent processing may result in errors.

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
in_http didn't support yield of Parser.
The specification assumed that Parser could return Array.
However, this is wrong. Parser shouldn't return Array.

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
Config to reproduce:

    <source>
      @type sample
      tag test.array
      sample {"message": "[{\"k\":\"v\"}, {\"k2\":\"v2\"}]"}
    </source>

    <filter test.**>
      @type parser
      key_name message
      <parse>
        @type json
      </parse>
    </filter>

    <match test.**>
      @type stdout
    </match>

Result:

    2023-03-21 23:24:52.004470792 +0900 test.array: {"k":"v"}
    2023-03-21 23:24:52.004470792 +0900 test.array: {"k":"v"}
    ...

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
@daipom daipom force-pushed the make-sure-parser-returns-hash branch from 85e20ad to 6cace97 Compare April 30, 2024 02:06
Copy link
Member

@ashie ashie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ashie ashie added this to the v1.17.0 milestone Apr 30, 2024
@ashie ashie merged commit 35e2210 into master Apr 30, 2024
15 of 16 checks passed
@ashie ashie deleted the make-sure-parser-returns-hash branch April 30, 2024 03:02
@daipom
Copy link
Contributor Author

daipom commented Apr 30, 2024

Thanks! I have added Release Note to #4474 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JSON parser plugin can sometimes output records as strings rather than Hash
2 participants