Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite sinks into the "new-style" #9261

Open
32 of 40 tasks
blt opened this issue Sep 20, 2021 · 4 comments
Open
32 of 40 tasks

Rewrite sinks into the "new-style" #9261

blt opened this issue Sep 20, 2021 · 4 comments
Labels
domain: performance Anything related to Vector's performance domain: sinks Anything related to the Vector's sinks type: enhancement A value-adding code change that enhances its existing functionality.

Comments

@blt
Copy link
Contributor

blt commented Sep 20, 2021

In #8825 and #8884 we identified a "new style" for sinks, one that focuses on stream processing -- migrating away from the Sink variant of our sinks to the Stream variant -- and composition. To the later point our "old style" sinks rely on an inheritance pattern for their implementation, leading both to very deep type hierarchies, duplication of responsibility in the hierarchy and some brittleness. Consider that the Batcher introduced in #8960 and is an extraction and retooling of our old-style PartitionInnerBuffer. This "new style" not only makes the sink code more explicable but allows better CPU saturation of vector machines, compared to the single-threaded push-style of the Sink trait.

As of this writing we have three "new style" sinks. They are:

We're aware that there's duplicating in these sinks. Their IO loops, as an example, are almost identical. Our ambition is to avoid premature abstraction across the construction of the new style sinks without abandoning the process of lifting common pieces when it's clear they're common and minimal. For instance, as mentioned, each of these new-style sinks shares an IO loop and @tobz is actively working to separate that out in #9215. We expect that other common pieces will fall out of the conversion work, requiring us to revisit sinks multiple times.

At the end of this work we will have removed VectorSink::Sink from the codebase entirely. Much of the support infrastructure for the "old style" sinks should also be removable as well. As this work proceed we hope to expand our use of property testing in the project -- consistent with #9131 -- and add micro-benchmarks where appropriate.

The sinks to be converted are (highest priority first):

Approaching each sink one at a time is likely the best tactic. As of this writing @tobz and @blt understand the most of about the "new style".

@blt blt added type: enhancement A value-adding code change that enhances its existing functionality. domain: sinks Anything related to the Vector's sinks domain: performance Anything related to Vector's performance labels Sep 20, 2021
@blt
Copy link
Contributor Author

blt commented Sep 20, 2021

@jdrouet I heard from @jszwedko you might be interested in converting the loki sink.

@jdrouet
Copy link
Contributor

jdrouet commented Sep 21, 2021

@blt I'd have to rewrite the loki sink without keeping the order considering that the new version of loki drops that requirement.
For backward compatibility though, maybe we should keep the current version and make a loki-stream for now? What do you think @jszwedko?

@blt
Copy link
Contributor Author

blt commented Sep 21, 2021

@blt I'd have to rewrite the loki sink without keeping the order considering that the new version of loki drops that requirement.

You can maintain order in the new-style. We've been careful so far to make it possible to do so, with consideration to Loki. The Datadog Logs sink as an example does not maintain order but that's a choice the sink makes. Unless you add any 'unordered' combinators in a stream pipeline the stream will be processed in the order that items were added to it.

For backward compatibility though, maybe we should keep the current version and make a loki-stream for now? What do you think @jszwedko?

I'd vote we keep the sink default in-order and make a toggle to support out-of-order logs. I'm sure strictly ordered loki will be a support target for us for a while yet.

@spencergilbert
Copy link
Contributor

I'd vote we keep the sink default in-order and make a toggle to support out-of-order logs. I'm sure strictly ordered loki will be a support target for us for a while yet.

💯 this.

addisonj added a commit to addisonj/vector that referenced this issue Sep 9, 2022
This commit heavily refactors the Pulsar Sink to use the StreamSink
interface and is modeled after the Kafka Sink.

It also adds additional features that bring it in line with
Kafka Sink feature set.

This includes:
* Refactoring to use StreamSink instead of Sink interace. See vectordotdev#9261
* Supports dynamic topics using a topic template
* Refactor configurations in advance of adding Pulsar source
* Rework message parsing to support logs and metrics, with support for
  dynamic keys and properties

This work is heavily modeled after Kafka sink. This means there has been
some duplication of some utility code. However, it has not been
refactored to remove the duplication as there wasn't a clear pattern of
where such shared code should be put.

Additionally, this refactor seems to be much simpler by using StreamSink
but does require some workarounds limitations in the Pulsar client
library by wrapping certain resources in Arc<Mutex> that *may* have
performance implications. I am not famaliar enough to know if there
might be some efficiencies by structuring this differently.

Remaining work:
* Add a few more tests
dsmith3197 added a commit that referenced this issue Jun 8, 2023
This will help steer contributors away from writing new sinks in the
deprecated fashion.

See #9261 for more context.

<!--
**Your PR title must conform to the conventional commit spec!**

  <type>(<scope>)!: <description>

  * `type` = chore, enhancement, feat, fix, docs
  * `!` = OPTIONAL: signals a breaking change
* `scope` = Optional when `type` is "chore" or "docs", available scopes
https://github.com/vectordotdev/vector/blob/master/.github/semantic.yml#L20
  * `description` = short description of the change

Examples:

  * enhancement(file source): Add `sort` option to sort discovered files
  * feat(new source): Initial `statsd` source
  * fix(file source): Fix a bug discovering new files
  * chore(external docs): Clarify `batch_size` option
-->
tombruijn added a commit to appsignal/vector that referenced this issue Aug 3, 2023
Previously, the AppSignal sink was written in what was already a bit of
an older style in PR vectordotdev#16650.

We want to change some functionality in the future for how metrics are
sent. To do this, it looks like we'll need to use the newer sink style.

We have updated the sink to the new StreamSink style, using a
HttpBatchService wrapper to send the requests to the AppSignal public
endpoint API.

Part of [tracking issue vectordotdev#9261][1]

[1]: vectordotdev#9261

Co-authored-by: Jeff Kreeftmeijer <jeff@kreeft.me>
tombruijn added a commit to appsignal/vector that referenced this issue Aug 3, 2023
Previously, the AppSignal sink was written in what was already a bit of
an older style in PR vectordotdev#16650.

We want to change some functionality in the future for how metrics are
sent. To do this, it looks like we'll need to use the newer sink style.
With this change, the AppSignal sink's functionality has remained the
same.

We have updated the sink to the new StreamSink style, using a
HttpBatchService wrapper to send the requests to the AppSignal public
endpoint API. We followed the [sink guides][2] initially and looked at
other sinks already rewritten linked in [issue vectordotdev#9261][1] to see how to
implement it further.

Updated the integration_tests to test if the sink is a HTTP sink with
the `HTTP_SINK_TAGS`. Previously, it didn't test yet if the
`EndpointBytesSent` event was sent.

We're unsure if `AppsignalResponse`'s `bytes_sent` needs to be
implemented or not. If it returns `None` the tests also pass, but we
thought we might as well implement it properly.

Part of [tracking issue vectordotdev#9261][1]

[1]: vectordotdev#9261
[2]: https://github.com/vectordotdev/vector/blob/600f8191a8fe169eb38c429958dd59714349acb4/docs/tutorials/sinks/1_basic_sink.md

Co-authored-by: Jeff Kreeftmeijer <jeff@kreeft.me>
tombruijn added a commit to appsignal/vector that referenced this issue Aug 10, 2023
Previously, the AppSignal sink was written in what was already a bit of
an older style in PR vectordotdev#16650.

We want to change some functionality in the future for how metrics are
sent. To do this, it looks like we'll need to use the newer sink style,
or at least it will be easier. With this change, the AppSignal sink's
functionality has remained the same.

We have updated the sink to the new StreamSink style, using a
HttpBatchService wrapper to send the requests to the AppSignal public
endpoint API. We followed the [sink guides][2] initially and looked at
other sinks already rewritten linked in [issue vectordotdev#9261][1] to see how to
implement it further.

Updated the integration_tests to test if the sink is a HTTP sink with
the `HTTP_SINK_TAGS`. Previously, it didn't test yet if the
`EndpointBytesSent` event was sent.

We're unsure if `AppsignalResponse`'s `bytes_sent` needs to be
implemented or not. If it returns `None` the tests also pass, but we
thought we might as well implement it properly.

Part of [tracking issue vectordotdev#9261][1]

[1]: vectordotdev#9261
[2]: https://github.com/vectordotdev/vector/blob/600f8191a8fe169eb38c429958dd59714349acb4/docs/tutorials/sinks/1_basic_sink.md

Co-authored-by: Jeff Kreeftmeijer <jeff@kreeft.me>
tombruijn added a commit to appsignal/vector that referenced this issue Aug 10, 2023
Previously, the AppSignal sink was written in what was already a bit of
an older style in PR vectordotdev#16650.

We want to change some functionality in the future for how metrics are
sent. To do this, it looks like we'll need to use the newer sink style,
or at least it will be easier. With this change, the AppSignal sink's
functionality has remained the same.

We have updated the sink to the new StreamSink style, using a
HttpBatchService wrapper to send the requests to the AppSignal public
endpoint API. We followed the [sink guides][2] initially and looked at
other sinks already rewritten linked in [issue vectordotdev#9261][1] to see how to
implement it further.

Updated the integration_tests to test if the sink is a HTTP sink with
the `HTTP_SINK_TAGS`. Previously, it didn't test yet if the
`EndpointBytesSent` event was sent.

We're unsure if `AppsignalResponse`'s `bytes_sent` needs to be
implemented or not. If it returns `None` the tests also pass, but we
thought we might as well implement it properly.

Part of [tracking issue vectordotdev#9261][1]

[1]: vectordotdev#9261
[2]: https://github.com/vectordotdev/vector/blob/600f8191a8fe169eb38c429958dd59714349acb4/docs/tutorials/sinks/1_basic_sink.md

Co-authored-by: Jeff Kreeftmeijer <jeff@kreeft.me>
github-merge-queue bot pushed a commit that referenced this issue Aug 16, 2023
* chore(appsignal sink): Refactor to use StreamSink

Previously, the AppSignal sink was written in what was already a bit of
an older style in PR #16650.

We want to change some functionality in the future for how metrics are
sent. To do this, it looks like we'll need to use the newer sink style,
or at least it will be easier. With this change, the AppSignal sink's
functionality has remained the same.

We have updated the sink to the new StreamSink style, using a
HttpBatchService wrapper to send the requests to the AppSignal public
endpoint API. We followed the [sink guides][2] initially and looked at
other sinks already rewritten linked in [issue #9261][1] to see how to
implement it further.

Updated the integration_tests to test if the sink is a HTTP sink with
the `HTTP_SINK_TAGS`. Previously, it didn't test yet if the
`EndpointBytesSent` event was sent.

We're unsure if `AppsignalResponse`'s `bytes_sent` needs to be
implemented or not. If it returns `None` the tests also pass, but we
thought we might as well implement it properly.

Part of [tracking issue #9261][1]

[1]: #9261
[2]: https://github.com/vectordotdev/vector/blob/600f8191a8fe169eb38c429958dd59714349acb4/docs/tutorials/sinks/1_basic_sink.md

Co-authored-by: Jeff Kreeftmeijer <jeff@kreeft.me>

* Split AppSignal sink into separate modules

As per review feedback: split the new sink style into separate module
files.

* Fix visibility of things in AppSignal sink

It doesn't need to be visible for the entire crate, only the AppSignal
sink scope.

---------

Co-authored-by: Jeff Kreeftmeijer <jeff@kreeft.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: performance Anything related to Vector's performance domain: sinks Anything related to the Vector's sinks type: enhancement A value-adding code change that enhances its existing functionality.
Projects
None yet
Development

No branches or pull requests

3 participants