Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native encode add null value to metrics data type value #18558

Closed
ee07b415 opened this issue Sep 14, 2023 · 9 comments · Fixed by #18650
Closed

Native encode add null value to metrics data type value #18558

ee07b415 opened this issue Sep 14, 2023 · 9 comments · Fixed by #18650
Assignees
Labels
type: bug A code related bug.

Comments

@ee07b415
Copy link

ee07b415 commented Sep 14, 2023

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

I'm testing the native_json encode and decode as the sink and source, the vector service always runs into the panic as:

And I try to print all the native_json enabled data find:

{
                "metric":{
                    "name":"source_lag_time_seconds",
                    "namespace":"vector",
                    "tags":{
                        "component_id":"internal_metric_source",
                        "component_kind":"source",
                        "component_name":"internal_metric_source",
                        "component_type":"internal_metrics",
                        "host":""},
                    "timestamp":"2023-09-14T01:29:16.224012Z",
                    "kind":"absolute",
                    "aggregated_histogram": {
                        "buckets":[
                            {"upper_limit":0.015625,"count":0},
                            {"upper_limit":0.03125,"count":0},
                            {"upper_limit":0.0625,"count":0},
                            {"upper_limit":0.125,"count":0},
                            {"upper_limit":0.25,"count":0},
                            {"upper_limit":0.5,"count":0},
                            {"upper_limit":1.0,"count":0},
                            {"upper_limit":2.0,"count":0},
                            {"upper_limit":4.0,"count":0},
                            {"upper_limit":8.0,"count":0},
                            {"upper_limit":16.0,"count":0},
                            {"upper_limit":32.0,"count":0},
                            {"upper_limit":64.0,"count":0},
                            {"upper_limit":128.0,"count":0},
                            {"upper_limit":256.0,"count":0},
                            {"upper_limit":512.0,"count":0},
                            {"upper_limit":1024.0,"count":0},
                            {"upper_limit":2048.0,"count":0},
                            {"upper_limit":4096.0,"count":0},
                            {"upper_limit":null,"count":0}],
                        "count":0,
                        "sum":0.0
                    }
                }
            }

We have a null value at the butcket upper_limit, changing this to a number will solve the problem, so I think the problem is why encoder pick null while we are looking for number.

Add the prometheus exporter output for the metrics:

# HELP vector_source_lag_time_seconds source_lag_time_seconds
# TYPE vector_source_lag_time_seconds histogram
vector_source_lag_time_seconds_bucket{component_id="internal_metric_source",component_kind="source",component_name="internal_metric_source",component_type="internal_metrics",host=".attlocal.net",le="0.015625"} 123 1694657914639
vector_source_lag_time_seconds_bucket{component_id="internal_metric_source",component_kind="source",component_name="internal_metric_source",component_type="internal_metrics",host=".attlocal.net",le="0.03125"} 123 1694657914639
vector_source_lag_time_seconds_bucket{component_id="internal_metric_source",component_kind="source",component_name="internal_metric_source",component_type="internal_metrics",host=".attlocal.net",le="0.0625"} 123 1694657914639
vector_source_lag_time_seconds_bucket{component_id="internal_metric_source",component_kind="source",component_name="internal_metric_source",component_type="internal_metrics",host=".attlocal.net",le="0.125"} 123 1694657914639
vector_source_lag_time_seconds_bucket{component_id="internal_metric_source",component_kind="source",component_name="internal_metric_source",component_type="internal_metrics",host=".attlocal.net",le="0.25"} 123 1694657914639
vector_source_lag_time_seconds_bucket{component_id="internal_metric_source",component_kind="source",component_name="internal_metric_source",component_type="internal_metrics",host=".attlocal.net",le="0.5"} 123 1694657914639
vector_source_lag_time_seconds_bucket{component_id="internal_metric_source",component_kind="source",component_name="internal_metric_source",component_type="internal_metrics",host=".attlocal.net",le="1"} 123 1694657914639
vector_source_lag_time_seconds_bucket{component_id="internal_metric_source",component_kind="source",component_name="internal_metric_source",component_type="internal_metrics",host=".attlocal.net",le="2"} 123 1694657914639
vector_source_lag_time_seconds_bucket{component_id="internal_metric_source",component_kind="source",component_name="internal_metric_source",component_type="internal_metrics",host=".attlocal.net",le="4"} 123 1694657914639
vector_source_lag_time_seconds_bucket{component_id="internal_metric_source",component_kind="source",component_name="internal_metric_source",component_type="internal_metrics",host=".attlocal.net",le="8"} 123 1694657914639
vector_source_lag_time_seconds_bucket{component_id="internal_metric_source",component_kind="source",component_name="internal_metric_source",component_type="internal_metrics",host=".attlocal.net",le="16"} 123 1694657914639
vector_source_lag_time_seconds_bucket{component_id="internal_metric_source",component_kind="source",component_name="internal_metric_source",component_type="internal_metrics",host=".attlocal.net",le="32"} 123 1694657914639
vector_source_lag_time_seconds_bucket{component_id="internal_metric_source",component_kind="source",component_name="internal_metric_source",component_type="internal_metrics",host=".attlocal.net",le="64"} 123 1694657914639
vector_source_lag_time_seconds_bucket{component_id="internal_metric_source",component_kind="source",component_name="internal_metric_source",component_type="internal_metrics",host=".attlocal.net",le="128"} 123 1694657914639
vector_source_lag_time_seconds_bucket{component_id="internal_metric_source",component_kind="source",component_name="internal_metric_source",component_type="internal_metrics",host=".attlocal.net",le="256"} 123 1694657914639
vector_source_lag_time_seconds_bucket{component_id="internal_metric_source",component_kind="source",component_name="internal_metric_source",component_type="internal_metrics",host=".attlocal.net",le="512"} 123 1694657914639
vector_source_lag_time_seconds_bucket{component_id="internal_metric_source",component_kind="source",component_name="internal_metric_source",component_type="internal_metrics",host=".attlocal.net",le="1024"} 123 1694657914639
vector_source_lag_time_seconds_bucket{component_id="internal_metric_source",component_kind="source",component_name="internal_metric_source",component_type="internal_metrics",host=".attlocal.net",le="2048"} 123 1694657914639
vector_source_lag_time_seconds_bucket{component_id="internal_metric_source",component_kind="source",component_name="internal_metric_source",component_type="internal_metrics",host=".attlocal.net",le="4096"} 123 1694657914639
vector_source_lag_time_seconds_bucket{component_id="internal_metric_source",component_kind="source",component_name="internal_metric_source",component_type="internal_metrics",host=".attlocal.net",le="+Inf"} 123 1694657914639
vector_source_lag_time_seconds_sum{component_id="internal_metric_source",component_kind="source",component_name="internal_metric_source",component_type="internal_metrics",host=".attlocal.net"} 0.026000000000000016 1694657914639
vector_source_lag_time_seconds_count{component_id="internal_metric_source",component_kind="source",component_name="internal_metric_source",component_type="internal_metrics",host=".attlocal.net"} 123 1694657914639

Configuration

sources:
  internal_metric_source:
    type: internal_metrics
    scrape_interval_secs: 15
sinks:
  console:
    type: console
    inputs: ["internal_metric_source"]
    encoding:
      codec: native_json

Version

vector 0.33.0

Debug Output

thread 'decoding::format::native_json::test::parses_metric' panicked at 'called `Result::unwrap()` on an `Err` value: Error("invalid type: null, expected f64", line: 0, column: 0)', lib/codecs/src/decoding/format/native_json.rs:215:74
stack backtrace:
   0: _rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: core::result::Result<T,E>::unwrap
             at /private/tmp/rust-20230901-16881-ycyewx/rustc-1.72.0-src/library/core/src/result.rs:1076:23
   4: codecs::decoding::format::native_json::test::parses_metric
             at ./src/decoding/format/native_json.rs:215:26
   5: codecs::decoding::format::native_json::test::parses_metric::{{closure}}
             at ./src/decoding/format/native_json.rs:143:23
   6: core::ops::function::FnOnce::call_once
             at /private/tmp/rust-20230901-16881-ycyewx/rustc-1.72.0-src/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
test decoding::format::native_json::test::parses_metric ... FAILED

Example Data

No response

Additional Context

No response

References

No response

@ee07b415 ee07b415 added the type: bug A code related bug. label Sep 14, 2023
@jszwedko
Copy link
Member

Hi @ee07b415 !

Thanks for filing this. That stack trace doesn't seem to be pointing to code in v0.33.0. Did you happen to compile a build locally with additional code? If so, could you share the modifications?

@ee07b415
Copy link
Author

Hi @jszwedko , yeah, I add unit test for testing:

#[test]
    fn parses_metric(){
        let config = NativeJsonDeserializerConfig::default();
        let deserializer = config.build();

        let json = json!(
            {
                "metric": {
                    "aggregated_summary":{
                        "count":2502,
                        "quantiles":[
                            {"quantile":0.0,"value":0.000031142},
                            {"quantile":0.25,"value":0.000038841},
                            {"quantile":0.5,"value":0.000045922},
                            {"quantile":0.75,"value":0.000050285},
                            {"quantile":1.0,"value":0.00008474}],
                        "sum":0.114286825
                    },
                    "kind":"absolute",
                    "name":"go_gc_duration_seconds",
                    "path":"/",
                    "source_type":"http_server",
                    "tags":{"nodename":"ip-", "service_name":"nodeExporter", "source_type":"nodeExporter"},
                    "timestamp":"2023-09-12T19:08:20.427244440Z"
                }
            }
        );

        let json2 = json!(
            {
                "metric":{
                    "name":"source_lag_time_seconds",
                    "namespace":"vector",
                    "tags":{
                        "component_id":"internal_metric_source",
                        "component_kind":"source",
                        "component_name":"internal_metric_source",
                        "component_type":"internal_metrics",
                        "host":"Yutians-MBP.attlocal.net"},
                    "timestamp":"2023-09-14T01:29:16.224012Z",
                    "kind":"absolute",
                    "aggregated_histogram": {
                        "buckets":[
                            {"upper_limit":0.015625,"count":0},
                            {"upper_limit":0.03125,"count":0},
                            {"upper_limit":0.0625,"count":0},
                            {"upper_limit":0.125,"count":0},
                            {"upper_limit":0.25,"count":0},
                            {"upper_limit":0.5,"count":0},
                            {"upper_limit":1.0,"count":0},
                            {"upper_limit":2.0,"count":0},
                            {"upper_limit":4.0,"count":0},
                            {"upper_limit":8.0,"count":0},
                            {"upper_limit":16.0,"count":0},
                            {"upper_limit":32.0,"count":0},
                            {"upper_limit":64.0,"count":0},
                            {"upper_limit":128.0,"count":0},
                            {"upper_limit":256.0,"count":0},
                            {"upper_limit":512.0,"count":0},
                            {"upper_limit":1024.0,"count":0},
                            {"upper_limit":2048.0,"count":0},
                            {"upper_limit":4096.0,"count":0},
                            {"upper_limit":null,"count":0}],
                        "count":0,
                        "sum":0.0
                    }
                }
            }
        );

        let json_array = json!([json, json2]);
        let input = Bytes::from(serde_json::to_vec(&json_array).unwrap());

        let mut events = deserializer.parse(input, LogNamespace::Legacy).unwrap();
        let event = events.remove(0);
        let metric = event.as_metric();
        assert_eq!(metric.kind(), MetricKind::Absolute);

        // let event2 = events.remove(0);
        // let metric2 = event2.as_metric();
        // assert_eq!(metric2.kind(), MetricKind::Absolute);
    }

@jszwedko
Copy link
Member

Perfect, thanks @ee07b415 ! This'll be helpful to fix the issue.

@jszwedko
Copy link
Member

According to #18557 this also affects the native codec as well as native_json.

@pront
Copy link
Member

pront commented Sep 14, 2023

Hi @ee07b415, it is expected that native JSON decoding will fail if a null value is provided instead of a number because we expect the upper_limit to be defined here.

I ran Vector with the configuration you provided above and it did not panic. Does it panic in your environment?

@ee07b415
Copy link
Author

Hi @pront , sorry for the confusion, the above configure is for generate the console output of the data, the panic is at the http server vector, for example you can start another vector with the config as:

sources:
  http_server:
    type: http_server
    address: 0.0.0.0:9555
    decoding: 
      codec: native_json
    method: POST

and send the data from the previous vector to this source:

sources:
  internal_metric_source:
    type: internal_metrics
    scrape_interval_secs: 15
sinks:
  vector_sink:
    type: http
    inputs: ["internal_metric_source"]
    uri: 0.0.0.0:9555
    method: post
    batch:
      max_bytes: 2500000
      timeout_secs: 15
    healthcheck: true
    compression: gzip
    encoding:
      codec: native_json
      timestamp_format: rfc3339

@ee07b415
Copy link
Author

ee07b415 commented Sep 14, 2023

Oh, actually another thing I forgot to mention @pront and @jszwedko above is I bypass all the as_mut_log in the http server enrich_events, otherwise you will encounter #17056 first before this invalid type issue. A example change:

for event in events.iter_mut() {
            if let Event::Log(_log) = event {
                let log = event.as_mut_log();

                // add request_path to each event
                self.log_namespace.insert_source_metadata(
                    SimpleHttpConfig::NAME,
                    log,
                    self.path_key.path.as_ref().map(LegacyKey::InsertIfEmpty),
                    path!("path"),
                    request_path.to_owned(),
                );

                // add each header to each event
                for header_name in &self.headers {
                    let value = headers_config.get(header_name).map(HeaderValue::as_bytes);

                    self.log_namespace.insert_source_metadata(
                        SimpleHttpConfig::NAME,
                        log,
                        Some(LegacyKey::InsertIfEmpty(path!(header_name))),
                        path!("headers", header_name),
                        Value::from(value.map(Bytes::copy_from_slice)),
                    );
                }
            }
        }

This is the reason I file a separate issue from 17056 which I want to test whether the native_json will work properly if we fixed 17056

@jszwedko
Copy link
Member

jszwedko commented Sep 15, 2023

I was able to reproduce this with the following configs and v0.32.0:

in.toml

[sources.source0]
type = "internal_metrics"

[sinks.sink0]
type = "http"
inputs = ["source0"]
uri = "http://127.0.0.1:8080"
encoding.codec = "native_json"
framing.method = "newline_delimited"

out.toml

[sources.source0]
type = "http_server"
address = "0.0.0.0:8080"
decoding.codec = "native_json"
framing.method = "newline_delimited"

[sinks.sink0]
inputs = ["source0"]
print_interval_secs = 1
type = "blackhole"

The instance running out.toml reports the errors:

2023-09-15T07:48:22.150634Z ERROR source{component_kind="source" component_id=source0 component_type=http_server component_name=source0}: vector::internal_events::codecs: Internal log [Failed deserializing frame.] has been suppressed 5 times.
2023-09-15T07:48:22.150649Z ERROR source{component_kind="source" component_id=source0 component_type=http_server component_name=source0}: vector::internal_events::codecs: Failed deserializing frame. error=invalid type: null, expected f64 error_type="parser_failed" stage="processing" internal_log_rate_limit=true
2023-09-15T07:48:22.150679Z  WARN source{component_kind="source" component_id=source0 component_type=http_server component_name=source0}: vector::internal_events::http: Internal log [Received bad request.] has been suppressed 5 times.
2023-09-15T07:48:22.150692Z  WARN source{component_kind="source" component_id=source0 component_type=http_server component_name=source0}: vector::internal_events::http: Received bad request. error=Failed decoding body: ParsingError(invalid type: null, expected f64) error_code=http_response_400 error_type="request_failed" error_stage="receiving" http_code=400 internal_log_rate_limit=true

@pront
Copy link
Member

pront commented Sep 15, 2023

Thank you @jszwedko and @ee07b415. I didn't really dive into this, but I noticed we use f64::INFINITY here which maps to JSON's null. In other words, we will decode an aggregated histogram metric with the following bucket:

Bucket {
    upper_limit: null,
    count: 0,
},

This is an interesting discussion that explains the problem well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug.
Projects
None yet
3 participants