Investigate `datadog_agent -> remap -> datadog_logs` #8512

blt · 2021-07-29T17:04:52Z

This issue is a record of and parent to the investigation of vector's performance when configured like so:

data_dir = "/tmp/data/vector"

##
## Sources
##

[sources.internal_metrics]
type = "internal_metrics"

[sources.datadog_agent]
type = "datadog_agent"
acknowledgements = false
address = "0.0.0.0:8282"

##
## Transforms
##

[transforms.parse_message]
type = "remap"
inputs = ["datadog_agent"]
source = '''
pyld, err = parse_json(.message)
if err == null {
  .message = pyld.mineral
}
'''

##
## Sinks
##

[sinks.prometheus]
type = "prometheus_exporter"
inputs = ["internal_metrics"]
address = "0.0.0.0:9598"

[sinks.dd_logs_egress]
type = "datadog_logs"
inputs = ["parse_message"]
endpoint = "http://localhost:8080/v1/input"
default_api_key = "DEADBEEF"

buffer.type = "memory"
buffer.max_events = 40000 # buffer 40 payloads at a time

Inbound http_gen is configured with:

worker_threads = 2
prometheus_addr = "0.0.0.0:9001"

[targets.simple]
target_uri = "http://localhost:8282/v1/input"
bytes_per_second = "500 Mb"
parallel_connections = 10
method.type = "Post"
method.variant = "DatadogLog"
method.maximum_prebuild_cache_size_bytes = "1 Gb"

[targets.simple.headers]
dd-api-key = "DEADBEEF"

Much like in #8263 our method will be to measure inbound throughput and outbound throughput primarily. Outbound target is http_blackhole, also from the lading project, and all three components are executed as follows until otherwise stated:

> firejail --noprofile --rlimit-as=2g --cpu=1,2,3,4,5,6,7,8 ./target/release/vector --config ~/projects/us/troutwine/lading/configs/vector-ddlogs-remap.toml

> firejail --rlimit-as=2g  --noprofile ./target/release/http_blackhole --worker-threads 16 --concurrent-requests-max 512 --binding-addr "0.0.0.0:8080" --prometheus-addr "0.0.0.0:9002"

> firejail --rlimit-as=8g --cpu=11,12 --noprofile ./target/release/http_gen --config-path configs/http_gen.toml

The text was updated successfully, but these errors were encountered:

jszwedko · 2021-07-29T17:08:43Z

I might suggest the remap config be adjusted slightly to:

pyld, err = parse_json(.message)
if err == null {
  .message = del(pyld.mineral)
  . |= object!(pyld)
}

I think users commonly want to parse and merge JSON objects so that should be more representative.

blt · 2021-07-29T17:13:02Z

At the close of #8263 we observe vector in a similar configuration -- dropping the remap step -- is able to accomplish 210Mb/s throughput. Building vector with the following string:

> CARGO_PROFILE_RELEASE_DEBUG=2 cargo build --release --no-default-features --features "sources-datadog,sources-internal_metrics,transforms-remap,transforms-log_to_metric,sinks-blackhole,sinks-datadog,sinks-prometheus"

we find that:

SHA	`http_gen` generation Mb/s	`datadog_logs` bytes processed Mb/s	`http_blackhole` receipt Mb/s
`b54bd88`	31.9	29.4	29.4

The remap transform does not appear to instrument vector_processed_bytes_total -- though this is present in the documentation -- so I have not included it in the table. At present the remap transform appears to regress performance by -180.6Mb/s compared to without its presence.

blt · 2021-07-29T17:16:09Z

@jszwedko your proposed configuration change results in:

➜  vector git:(master) ✗ firejail --noprofile --rlimit-as=2g --cpu=1,2,3,4,5,6,7,8 ./target/release/vector --config ~/projects/us/troutwine/lading/configs/vector-ddlogs-remap.toml
Parent pid 457712, child pid 457713
Child process initialized in 4.72 ms
Jul 29 10:13:57.559  INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=trace,rdkafka=info" enable_datadog_tracing=false
Jul 29 10:13:57.559  INFO vector::app: Loading configs. paths=["/home/blt/projects/us/troutwine/lading/configs/vector-ddlogs-remap.toml"]
Jul 29 10:13:57.565  INFO vector::topology: Running healthchecks.
Jul 29 10:13:57.565  INFO vector::topology: Starting source. name="datadog_agent"
Jul 29 10:13:57.565  INFO vector::topology: Starting source. name="internal_metrics"
Jul 29 10:13:57.565  INFO vector::topology: Starting transform. name="parse_message"
Jul 29 10:13:57.565  INFO vector::topology: Starting sink. name="dd_logs_egress"
Jul 29 10:13:57.566  INFO source{component_kind="source" component_name=datadog_agent component_type=datadog_agent}: vector::sources::util::http: Building HTTP server. address=0.0.0.0:8282
Jul 29 10:13:57.566  INFO vector::topology: Starting sink. name="prometheus"
Jul 29 10:13:57.567  INFO vector::topology::builder: Healthcheck: Passed.
Jul 29 10:13:57.567  INFO vector: Vector has started. debug="true" version="0.16.0" arch="x86_64" build_id="none"
Jul 29 10:13:57.568  INFO vector::topology::builder: Healthcheck: Passed.
Jul 29 10:14:03.215  WARN transform{component_kind="transform" component_name=parse_message component_type=remap}: vector::internal_events::remap: Mapping failed with event. error="function call error for \"object\" at (88:101): expected \"object\", got \"integer\"" internal_log_rate_secs=30
Jul 29 10:14:03.234  WARN transform{component_kind="transform" component_name=parse_message component_type=remap}: vector::internal_events::remap: Internal log [Mapping failed with event.] is being rate limited.
Jul 29 10:14:33.242  WARN transform{component_kind="transform" component_name=parse_message component_type=remap}: vector::internal_events::remap: Internal log [Mapping failed with event.] has been rate limited 3176 times.
Jul 29 10:14:33.243  WARN transform{component_kind="transform" component_name=parse_message component_type=remap}: vector::internal_events::remap: Mapping failed with event. error="function call error for \"object\" at (88:101): expected \"object\", got \"integer\"" internal_log_rate_secs=30
Jul 29 10:14:33.274  WARN transform{component_kind="transform" component_name=parse_message component_type=remap}: vector::internal_events::remap: Internal log [Mapping failed with event.] is being rate limited.

It's worth noting, maybe, that some of the message fields in the datadog log payloads lading generates are just unstructured strings, some are structured json. I can't read remap well enough yet to understand what your suggested modification there actually does.

spencergilbert · 2021-07-29T17:23:02Z

pyld, err = parse_json(.message)
if err == null {
  .message = del(pyld.mineral)
  . |= object!(pyld) # Here is where the error is surfacing
   # you can only merge an object and `pyld` is sometimes apparently just an integer
}

So...

pyld, err = parse_json(.message)
if err == null {
  .message = del(pyld.mineral)
  if is_object(plyd) {
    . |= object!(pyld)
  }
}

spencergilbert · 2021-07-29T17:26:13Z

Err... however, in the case that pyld is just an integer... .message = del(plyd.mineral) would replace the message with null?

jszwedko · 2021-07-29T17:27:29Z

Yeah, it does seem like pyld is sometimes an integer. To match the common user case, I think pyld should always be a valid JSON string representing an object.

blt · 2021-07-29T17:33:25Z

Here's the structure that's being arbitrarily generated and fed into vector as a datadog logs json blob:

https://github.com/blt/lading/blob/fdb574383c2dea5f86131bebc30b2d68dc35798f/lading_common/src/payload/datadog_logs.rs#L166-L183

The important part is Message which is here

https://github.com/blt/lading/blob/fdb574383c2dea5f86131bebc30b2d68dc35798f/lading_common/src/payload/datadog_logs.rs#L136-L164

It's always a string. Sometimes the string happens to be valid json derived from

https://github.com/blt/lading/blob/fdb574383c2dea5f86131bebc30b2d68dc35798f/lading_common/src/payload/datadog_logs.rs#L101-L108

and sometimes it's an instance of

https://github.com/blt/lading/blob/fdb574383c2dea5f86131bebc30b2d68dc35798f/lading_common/src/payload/common.rs#L8-L11

but it's never an integer, unless remap is doing auto-coercion if the string is number-y.

spencergilbert · 2021-07-29T17:35:54Z

💭

pyld, err = parse_json(.message)
if err == null {
  log(pyld)
  .message = del(pyld.mineral)
  . |= object!(pyld)
}

This might be useful for debugging

jszwedko · 2021-07-29T17:43:14Z

but it's never an integer, unless remap is doing auto-coercion if the string is number-y.

parse_json would convert it to an integer if it is a valid JSON string representing a number.

I still think, for the purposes of this soak test, that having it always send JSON objects of a similar shape and size would be closer to the common user case.

blt · 2021-07-29T17:46:12Z

@jszwedko For what it's worth I patterned the input off an enterprise customer's setup: part unstructured, part structured input and all. This ticket is also not the soak test, necessarily.

Considering the observed regression introduced by the small example of remap I'm satisfied with the configuration outlined in the original comment for the purpose of providing a useful optimization target, absent any compelling use-case from a user that needs immediate work. If this ticket makes fast progress I'd be glad to entertain a more elaborate remap transform, focus on a different input pattern etc.

blt · 2021-07-29T18:59:13Z

Continuing on from this comment perf run indicates that memmov, malloc and free are the primary cost centers of the run -- 8.96%, 6.59% and 5.44% -- which coincides well with our understanding that events are cloned into remap for processing. CPU timeline is relatively full, see screenshot:

Flamegraph analysis suggests that we're spending a non-trivial portion of time buffering events by partially serializing them -- known from #8263 -- but 21% of runtime in Remap::transform:

We know from #8263 that the cost of the source is not a limit up to higher throughput than is observed presently, which suggests we should focus on remap primarily.

blt · 2021-07-30T00:59:31Z

The stack through vector into VRL is somewhat crowded, making it hard to find optimization targets. I wrote a more restricted program that mimics how the transform itself behaves:

use bytes::Bytes;
use lookup::LookupBuf;
use shared::TimeZone;
use std::collections::BTreeMap;
use vrl::{Program, Runtime, Target, Value};

const JSON: &[u8] = r#"{"key": "value"}"#.as_bytes();
const PROGRAM: &str = r#". = parse_json!(.message)"#;

#[derive(Debug, Clone, Copy, Default)]
struct StaticTarget {}

impl Target for StaticTarget {
    #[inline]
    fn insert(&mut self, _path: &LookupBuf, _value: Value) -> Result<(), String> {
        // nothing, intentionally
        Ok(())
    }

    #[inline]
    fn get(&self, path: &LookupBuf) -> Result<Option<Value>, String> {
        if path.is_empty() {
            Ok(Some(Value::Object(BTreeMap::default())))
        } else {
            Ok(Some(Value::Bytes(Bytes::from_static(JSON))))
        }
    }

    #[inline]
    fn remove(&mut self, _path: &LookupBuf, _compact: bool) -> Result<Option<Value>, String> {
        unimplemented!()
    }
}

fn main() {
    let program: Program = vrl::compile(PROGRAM, &vrl_stdlib::all()).unwrap();

    // this loop simulates the call of `<Remap as FunctionTransform>::transform`

    let timezone = TimeZone::default();

    loop {
        let mut runtime = Runtime::default();
        let mut target = StaticTarget::default();
        let _ = runtime.resolve(&mut target, &program, &timezone).unwrap();
    }
}

The key difference, to start, is the StaticTarget. It's as cheap as possible so as to show up as little as possible in perf reports: success. As expected the main cost center of PROGRAM is in serde_json, implying that, all other things being zero-cost, that serde_json is our bottleneck, in this program if not vector. So far so good.

blt · 2021-07-30T18:27:53Z

I have updated the experimental program to include an Event.

use bytes::Bytes;
use shared::TimeZone;
use std::collections::BTreeMap;
use vector_core::event::{self, Event, EventMetadata, LogEvent, VrlTarget};
use vrl::{Program, Runtime};

const JSON: &[u8] = r#"{"key": "value"}"#.as_bytes();
const PROGRAM: &str = r#". = parse_json!(.message)"#;

fn main() {
    let program: Program = vrl::compile(PROGRAM, &vrl_stdlib::all()).unwrap();

    // this loop simulates the call of `<Remap as FunctionTransform>::transform`

    let timezone = TimeZone::default();
    let mut map: BTreeMap<String, event::Value> = BTreeMap::new();
    map.insert(
        "message".to_owned(),
        event::Value::Bytes(Bytes::from_static(JSON)),
    );
    let metadata = EventMetadata::default();

    let log_event = LogEvent::from_parts(map, metadata);
    let event = Event::Log(log_event);

    loop {
        let mut runtime = Runtime::default();
        let mut target: VrlTarget = event.clone().into();
        let _ = runtime.resolve(&mut target, &program, &timezone).unwrap();
    }
}

The main cost centers here appear to be vector_core::event::value::Value::insert, serde_json and vector_core::event::value::Value::get, inline with the findings from #8331. Path iteration code currently lacks bail-out opportunities for 'flat' paths, incurring iteration overhead.

Backing out some, the remap transform itself is stateless but is a single choke point in our topology. It seems likely that we can run multiple in parallel.

spencergilbert · 2021-07-30T18:36:30Z

Could that introduce risk of getting events passed out of order?

blt · 2021-07-30T20:36:31Z

They would absolutely be passed out of order. I was under the impression we didn't guarantee ordering today and we should not if it can be helped at all.

spencergilbert · 2021-07-30T20:51:34Z

They would absolutely be passed out of order. I was under the impression we didn't guarantee ordering today and we should not if it can be helped at all.

I don't think we've explicitly guaranteed it, but we do implicitly... for example the loki sink requires events in order, and we set concurrency to 1 to enforce we don't accidentally deliver events out-of-order. Just something to keep in mind 😄

blt · 2021-07-30T21:22:00Z

They would absolutely be passed out of order. I was under the impression we didn't guarantee ordering today and we should not if it can be helped at all.

I don't think we've explicitly guaranteed it, but we do implicitly... for example the loki sink requires events in order, and we set concurrency to 1 to enforce we don't accidentally deliver events out-of-order. Just something to keep in mind smile

Gotcha, that makes sense. To be more explicit not guaranteeing in-order processing is an important optimization opportunity. If we can process events out of order we can do things in parallel without coordination -- like I'm thinking for the remap transform -- or shunt slow-to-process events behind fast-to-process events etc etc.

spencergilbert · 2021-07-30T21:25:22Z

100% the loki problem should be solved on their side soon, and I'm not aware of anything else we have that has it as a requirement - just wanted to make sure we didn't unintentionally break loki when we're also trying to polish and improve that sink this quarter

blt · 2021-07-30T21:33:22Z

100% the loki problem should be solved on their side soon, and I'm not aware of anything else we have that has it as a requirement - just wanted to make sure we didn't unintentionally break loki when we're also trying to polish and improve that sink this quarter

It's a good call out; I appreciate it. Do we have a ticket tracking this on our end? I think it means we have to preserve our accidental in-order processing and/or build a mechanism to re-order in the sink before we can make the transforms parallel. Or provide a knob to make them sequential and I hate new knobs.

spencergilbert · 2021-07-30T21:39:47Z

@blt a few tickets regarding loki ordering and performance...

and the epic: #8460

jszwedko · 2021-07-30T22:12:04Z

We do currently preserve message ordering within a given source-dependent context. For example, for the socket or syslog source using tcp, all messages from a given connection will be passed in-order downstream; for a given HTTP request to the http source, events would be pushed downstream in-order; for a given S3 object, the lines will be sent in-order. We can discuss if that is a guarantee we want to continue supplying though.

When we've previously discussed introducing transform parallelization, we've talked about doing it such that the transforms would be pushed up into the sources such that they would execute concurrently using whatever unit of concurrency the source does. For example, this would let us run remap in parallel across TCP connections to the socket source, but not in parallel for a given TCP connection (or HTTP request etc.). https://github.com/timberio/vector/blob/master/rfcs/2020-06-18-2625-architecture-revisit.md#topology-design talks about this in more detail.

blt · 2021-07-30T22:38:22Z

We do currently preserve message ordering within a given source-dependent context. For example, for the socket or syslog source using tcp, all messages from a given connection will be passed in-order downstream; for a given HTTP request to the http source, events would be pushed downstream in-order; for a given S3 object, the lines will be sent in-order. We can discuss if that is a guarantee we want to continue supplying though.

In-order out from the source is a-okay by me.

When we've previously discussed introducing transform parallelization, we've talked about doing it such that the transforms would be pushed up into the sources such that they would execute concurrently using whatever unit of concurrency the source does. For example, this would let us run remap in parallel across TCP connections to the socket source, but not in parallel for a given TCP connection (or HTTP request etc.). https://github.com/timberio/vector/blob/master/rfcs/2020-06-18-2625-architecture-revisit.md#topology-design talks about this in more detail.

Yep. That RFC's end state is the ideal, especially because we could elide any buffer overhead by jamming transforms up the topology. Much like with the way we tackled #8263 I'd like to take little steps that are cheap to throw out if they aren't worthwhile. Seems to me -- and I'm getting more intimately familiar with the topology code now -- we could make the remap transform itself concurrent internally and still work our way toward the Architecture Revisit vision without having to take a big leap there.

That is, assuming inserting a guaranteed order destroying transform wouldn't break anything downstream.

As a part of #8512 I need to have a much more clear understanding of the topology construct of vector than I presently do. I have an eye toward accomplishing some of the aims of the Architecture Revisit RFC but would like to get there in small steps. This commit does not functionally change vector. It moves some test code around and adjusts the documentation to be more, I hope, clear about what's what. Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

blt · 2021-08-03T17:59:17Z

I have re-run the experiment set out here for f132b0a and report no change in the numbers presented.

This commit removes a `span!` call with `Level::ERROR` setting from `FunctionCall::resolve`. My understanding here is that this span did not terminate into any collector and will not impact user observable remap behavior. This does, however, add +9Mb/s throughput to the configuration described in #8512. Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

This commit moves the `TransformConfig` down into vector-core. The ambition I've got is to work closer to the architecture described in Architecture Revisit and to allow for me to experiment with transforms that directly produce into streams etc etc. Motivation here being an observation in #8512 that we spend a good deal of time making small vecs, which happens in the topology. Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

blt · 2021-08-04T01:34:22Z

Following on from #8584 I attempted to extend the FunctionTransform trait to return an Iterator instance, but owing to the boxed-dyn nature of our current setup I could not figure out how to square the types for yet another dynamic type. So, instead I put a SmallVec instance into the topology builder in place of our current Vec. While this does cut down on some Vec allocation it's not a contribution to improved throughput. I take this as a sign that modifying either the dynamic dispatch nature of our Transform or reducing incidental allocations are, at present, not critical path.

While not fully exhausted any scope for improving the single-threaded throughput of this particular remap configuration is limited, seemingly, to a small subset of the flamegraph:

Unfortunately the perf this is derived from is too large to attach.

blt · 2021-08-04T02:01:57Z

Repeating with unchanged experimental setup for 403007e we find that:

SHA	`http_gen` generation Mb/s	`datadog_logs` bytes processed Mb/s	`http_blackhole` receipt Mb/s
`403007e`	41	37.9	37.9

The primary improvement in these numbers being the result of #8579, +8.5Mb/s as measured from the blackhole.

* Move transform configuration into vector-core This commit moves the `TransformConfig` down into vector-core. The ambition I've got is to work closer to the architecture described in Architecture Revisit and to allow for me to experiment with transforms that directly produce into streams etc etc. Motivation here being an observation in #8512 that we spend a good deal of time making small vecs, which happens in the topology. Signed-off-by: Brian L. Troutwine <brian@troutwine.us> * Add env-test-util dep for Windows Signed-off-by: Brian L. Troutwine <brian@troutwine.us> * Satisfy clippy Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

As a part of vectordotdev#8512 I need to have a much more clear understanding of the topology construct of vector than I presently do. I have an eye toward accomplishing some of the aims of the Architecture Revisit RFC but would like to get there in small steps. This commit does not functionally change vector. It moves some test code around and adjusts the documentation to be more, I hope, clear about what's what. Signed-off-by: Brian L. Troutwine <brian@troutwine.us> Signed-off-by: dbcfd <bdbrowning2@gmail.com>

This commit removes a `span!` call with `Level::ERROR` setting from `FunctionCall::resolve`. My understanding here is that this span did not terminate into any collector and will not impact user observable remap behavior. This does, however, add +9Mb/s throughput to the configuration described in vectordotdev#8512. Signed-off-by: Brian L. Troutwine <brian@troutwine.us> Signed-off-by: dbcfd <bdbrowning2@gmail.com>

* Move transform configuration into vector-core This commit moves the `TransformConfig` down into vector-core. The ambition I've got is to work closer to the architecture described in Architecture Revisit and to allow for me to experiment with transforms that directly produce into streams etc etc. Motivation here being an observation in vectordotdev#8512 that we spend a good deal of time making small vecs, which happens in the topology. Signed-off-by: Brian L. Troutwine <brian@troutwine.us> * Add env-test-util dep for Windows Signed-off-by: Brian L. Troutwine <brian@troutwine.us> * Satisfy clippy Signed-off-by: Brian L. Troutwine <brian@troutwine.us> Signed-off-by: dbcfd <bdbrowning2@gmail.com>

blt · 2021-08-18T17:47:21Z

Repeating with unchanged experimental setup for e876d40 and repeated for 403007e we find that:

SHA	`http_gen` generation Mb/s	`datadog_logs` bytes processed Mb/s	`http_blackhole` receipt Mb/s
`403007e`	41.6	38.2	38.2
`e876d40`	41.8	38.5	38.5

These numbers remain stable. We have noticed that warp factors heavily into stack traces for the datadog agent and intend to remove it from the HttpSource to reduce the steepness of the stack, with no belief that this will appreciably affect throughput. It'll just make analysis more simple.

This commit removes a lazy_static and call to a lazy_static from the DatadogAgentSource. As a part of #8512 I spent some time trying to remove warp from the agent source to see its effect and found that, on the whole, it would not really be worth all that much. warp might bloat our stack traces somewhat but is lean internally. That said, I did find that I could remove some serialization from the source, primarily by migrating globally retrieved types into the source struct itself. See also #8472 Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

blt · 2021-08-19T17:24:16Z

While I've been focused on the remap step in the config it has struck me that the sink contributes to half-loss of throughput. We very likely can be more explicit about the format we're encoding to in the sink so I'll tackle that next. As noted in #8789 reducing the stack by removing warp is interesting but not particularly feasible without either building our own in-house setup or making the agent source unique. This is not a bottleneck presently so there's no need.

…ce (#8789) * Migrate API_KEY_MATCHER and log_schema() into DatadogAgentSource This commit removes a lazy_static and call to a lazy_static from the DatadogAgentSource. As a part of #8512 I spent some time trying to remove warp from the agent source to see its effect and found that, on the whole, it would not really be worth all that much. warp might bloat our stack traces somewhat but is lean internally. That said, I did find that I could remove some serialization from the source, primarily by migrating globally retrieved types into the source struct itself. See also #8472 Signed-off-by: Brian L. Troutwine <brian@troutwine.us> * Update src/sources/datadog/agent.rs Signed-off-by: Brian L. Troutwine <brian@troutwine.us> Co-authored-by: Jesse Szwedko <jesse@szwedko.me> Co-authored-by: Jesse Szwedko <jesse@szwedko.me>

This PR rethinks how our Datadog logs sink works with an eye toward applying a similar transformation to each of our sinks. The work here is driven by the observation in #8512 that we spend a great deal of time in vector polling futures. After tracing out the path of an `Event` through our previous stack it's clear why we're doing that: ![Image from iOS](https://user-images.githubusercontent.com/3579/130370869-8dbc68dd-a000-41f1-ad27-c144ce0ff30a.jpg) The top portion of this diagram details the type hierarchy at play and the bottom the actual code path in practice for an event, born out by examining live stack traces. None of these calls are inlined with one another -- consistent with the amount of boxing and dynamic dispatch going on -- so we do in fact pay for all the polls. It's also quite hard to understand this hierarchy and make changes to it. So, this PR is a rethink of the whole setup as it affects the Datadog logs sink. Highlights: * the logs sink is now a _stream_ processor * the logs sink is responsible for batching by API key itself * serialization of payloads is done as late as possible and over a complete batch in one swoop This approach is at least twice as fast as our present approach. I expect the patch to be code positive but only because of inlined rustdocs. Things missing as of this writing: - [x] acks are not currently performed -- resolved in 225cfe2 - [x] telemetry is broken considering that is part of the generic type hierarchy abandoned here -- resolved in b6832a4 - [x] concurrency, rate-limiting and back-pressure need to be enabled yet - [x] it's possible that payload sizes cross the 5Mb boundary the API sets -- see [this comment](#8825 (comment)) With regard to the last point these are accomplished today by use of tower-based layering. I haven't decided whether the `HttpClient` should be responsible for participating in these concerns -- leaving the stream processor ignorant of them -- or if the stream processor should itself be a `Service`. - - - ## Method The basic method at play in the previous hierarchy was inheritance. A `DatadogLogsSink` is a type of `HttpSink` that exposes a few methods and is called by some obscure mechanism elsewhere. What I'm pushing for here is composition. A `LogApi` is a stream processor that is composed with an http client that knows about back-pressure, concurrency limits etc and clicks this together with buffering logic specific to the logs API. Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

blt · 2021-09-17T18:39:28Z

Now that #8825 is merged we have to slightly update our vector config used in this investigation:

data_dir = "/tmp/data/vector"

##
## Sources
##

[sources.internal_metrics]
type = "internal_metrics"

[sources.datadog_agent]
type = "datadog_agent"
acknowledgements = false
address = "0.0.0.0:8282"

##
## Transforms
##

[transforms.parse_message]
type = "remap"
inputs = ["datadog_agent"]
source = '''
pyld, err = parse_json(.message)
if err == null {
  .message = pyld.mineral
}
'''

##
## Sinks
##

[sinks.prometheus]
type = "prometheus_exporter"
inputs = ["internal_metrics"]
address = "0.0.0.0:9598"

[sinks.dd_logs_egress]
type = "datadog_logs"
inputs = ["parse_message"]
endpoint = "http://localhost:8080/v1/input"
default_api_key = "DEADBEEF"

Repeating our experiment for a32c7fd we find:

SHA	`http_gen` generation Mb/s	`datadog_logs` bytes processed Mb/s	`http_blackhole` receipt Mb/s
`a32c7fd`	27.6	27.9	26

Remap utilization in this configuration is 95% with logs sink utilization being 32%, implying a serious bottleneck in the remap transform. Removing remap from the center of this configuration we find that throughput is 92MB/s, confirming this suspicion.

blt · 2021-09-17T19:57:19Z

Follow-up investigation strongly suggests that #8579 has increased remap runtime in this soak to approximately 30% of runtime compared to last measure -- tracking issue in #9141 -- and use of emit! makes vector::internal_events::emit a surprisingly high contributor to runtime at 14%.

This commit speeds up the `rename_key` function of `LogEvent` by 57%, avoiding the use of path iteration code to achieve this. This results in +2MB/s in #8512. Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

This commit introduces a `ready_chunks` into the pipeline for function transforms. This combinator polls the underlying stream until either it signals Pending or the chunk limit is reached, sending a vector of items downstream. In this commit we pull at most 128 `Event` instances off the input stream for processing, improving locality. I have not done the same for task transforms, though I think it could be done. Without changing the trait type we'd need the inverse of `ready_chunks` to move from a `Stream<Item = Vec<Event>>` into a `Stream<Item = Event>`. This is worth +7MB/s in #8512 Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

This commit speeds up the `rename_key` function of `LogEvent` by 57%, avoiding the use of path iteration code to achieve this. This results in +2MB/s in #8512. Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

This commit introduces a `ready_chunks` into the pipeline for function transforms. This combinator polls the underlying stream until either it signals Pending or the chunk limit is reached, sending a vector of items downstream. In this commit we pull at most 128 `Event` instances off the input stream for processing, improving locality. I have not done the same for task transforms, though I think it could be done. Without changing the trait type we'd need the inverse of `ready_chunks` to move from a `Stream<Item = Vec<Event>>` into a `Stream<Item = Event>`. This is worth +7MB/s in #8512 Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

blt · 2021-09-20T21:02:01Z

Repeating our experiment with the config outlined here for be8ceaf we find that:

SHA	`http_gen` generation Mb/s	`datadog_logs` bytes processed Mb/s	`http_blackhole` receipt Mb/s
`be8ceaf`	35.5	35.5	32.7

Remap utilization is now 10%, datadog_agent utilization 40%, consistent with the behavior identified in #9228.

blt · 2021-11-23T05:44:11Z

This configuration is now part of our regular soak tests. We measure approximately 20Mb/s per core as of this writing.

blt added domain: performance Anything related to Vector's performance domain: vrl Anything related to the Vector Remap Language transform: remap Anything `remap` transform related labels Jul 29, 2021

blt mentioned this issue Jul 31, 2021

chore: Documentation and test code movement around topology #8542

Merged

blt mentioned this issue Aug 3, 2021

chore: Move transform configuration into vector-core #8584

Merged

blt mentioned this issue Aug 5, 2021

Investigate fluent -> remap -> kinesis #8610

Closed

blt self-assigned this Aug 16, 2021

blt mentioned this issue Aug 18, 2021

chore: Migrate API_KEY_MATCHER and log_schema() into DatadogAgentSource #8789

Merged

blt mentioned this issue Aug 22, 2021

chore: Rework the Datadog logs sink #8825

Merged

4 tasks

blt mentioned this issue Sep 18, 2021

chore: Speed up LogEvent::rename_key by 57% #9226

Merged

blt mentioned this issue Sep 18, 2021

chore: Slip ready_chunks into Transform::Function processing #9228

Merged

wbew mentioned this issue Oct 7, 2021

enhancement(buffers): Instrument buffer total bytes/events received, sent, dropped #9327

Merged

blt closed this as completed Nov 23, 2021

blt mentioned this issue Nov 23, 2021

Investigate http -> pipelines -> blackhole #10144

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate `datadog_agent -> remap -> datadog_logs` #8512

Investigate `datadog_agent -> remap -> datadog_logs` #8512

blt commented Jul 29, 2021

jszwedko commented Jul 29, 2021

blt commented Jul 29, 2021

blt commented Jul 29, 2021

spencergilbert commented Jul 29, 2021 •

edited

Loading

spencergilbert commented Jul 29, 2021

jszwedko commented Jul 29, 2021 •

edited

Loading

blt commented Jul 29, 2021

spencergilbert commented Jul 29, 2021

jszwedko commented Jul 29, 2021

blt commented Jul 29, 2021

blt commented Jul 29, 2021

blt commented Jul 30, 2021

blt commented Jul 30, 2021

spencergilbert commented Jul 30, 2021

blt commented Jul 30, 2021

spencergilbert commented Jul 30, 2021

blt commented Jul 30, 2021

spencergilbert commented Jul 30, 2021

blt commented Jul 30, 2021

spencergilbert commented Jul 30, 2021

jszwedko commented Jul 30, 2021

blt commented Jul 30, 2021

blt commented Aug 3, 2021

blt commented Aug 4, 2021

blt commented Aug 4, 2021 •

edited

Loading

blt commented Aug 18, 2021

blt commented Aug 19, 2021

blt commented Sep 17, 2021

blt commented Sep 17, 2021

blt commented Sep 20, 2021

blt commented Nov 23, 2021

Investigate datadog_agent -> remap -> datadog_logs #8512

Investigate datadog_agent -> remap -> datadog_logs #8512

Comments

blt commented Jul 29, 2021

jszwedko commented Jul 29, 2021

blt commented Jul 29, 2021

blt commented Jul 29, 2021

spencergilbert commented Jul 29, 2021 • edited Loading

spencergilbert commented Jul 29, 2021

jszwedko commented Jul 29, 2021 • edited Loading

blt commented Jul 29, 2021

spencergilbert commented Jul 29, 2021

jszwedko commented Jul 29, 2021

blt commented Jul 29, 2021

blt commented Jul 29, 2021

blt commented Jul 30, 2021

blt commented Jul 30, 2021

spencergilbert commented Jul 30, 2021

blt commented Jul 30, 2021

spencergilbert commented Jul 30, 2021

blt commented Jul 30, 2021

spencergilbert commented Jul 30, 2021

blt commented Jul 30, 2021

spencergilbert commented Jul 30, 2021

jszwedko commented Jul 30, 2021

blt commented Jul 30, 2021

blt commented Aug 3, 2021

blt commented Aug 4, 2021

blt commented Aug 4, 2021 • edited Loading

blt commented Aug 18, 2021

blt commented Aug 19, 2021

blt commented Sep 17, 2021

blt commented Sep 17, 2021

blt commented Sep 20, 2021

blt commented Nov 23, 2021

Investigate `datadog_agent -> remap -> datadog_logs` #8512

Investigate `datadog_agent -> remap -> datadog_logs` #8512

spencergilbert commented Jul 29, 2021 •

edited

Loading

jszwedko commented Jul 29, 2021 •

edited

Loading

blt commented Aug 4, 2021 •

edited

Loading