-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite sinks into the "new-style" #9261
Comments
You can maintain order in the new-style. We've been careful so far to make it possible to do so, with consideration to Loki. The Datadog Logs sink as an example does not maintain order but that's a choice the sink makes. Unless you add any 'unordered' combinators in a stream pipeline the stream will be processed in the order that items were added to it.
I'd vote we keep the sink default in-order and make a toggle to support out-of-order logs. I'm sure strictly ordered loki will be a support target for us for a while yet. |
💯 this. |
This commit heavily refactors the Pulsar Sink to use the StreamSink interface and is modeled after the Kafka Sink. It also adds additional features that bring it in line with Kafka Sink feature set. This includes: * Refactoring to use StreamSink instead of Sink interace. See vectordotdev#9261 * Supports dynamic topics using a topic template * Refactor configurations in advance of adding Pulsar source * Rework message parsing to support logs and metrics, with support for dynamic keys and properties This work is heavily modeled after Kafka sink. This means there has been some duplication of some utility code. However, it has not been refactored to remove the duplication as there wasn't a clear pattern of where such shared code should be put. Additionally, this refactor seems to be much simpler by using StreamSink but does require some workarounds limitations in the Pulsar client library by wrapping certain resources in Arc<Mutex> that *may* have performance implications. I am not famaliar enough to know if there might be some efficiencies by structuring this differently. Remaining work: * Add a few more tests
This will help steer contributors away from writing new sinks in the deprecated fashion. See #9261 for more context. <!-- **Your PR title must conform to the conventional commit spec!** <type>(<scope>)!: <description> * `type` = chore, enhancement, feat, fix, docs * `!` = OPTIONAL: signals a breaking change * `scope` = Optional when `type` is "chore" or "docs", available scopes https://github.com/vectordotdev/vector/blob/master/.github/semantic.yml#L20 * `description` = short description of the change Examples: * enhancement(file source): Add `sort` option to sort discovered files * feat(new source): Initial `statsd` source * fix(file source): Fix a bug discovering new files * chore(external docs): Clarify `batch_size` option -->
Previously, the AppSignal sink was written in what was already a bit of an older style in PR vectordotdev#16650. We want to change some functionality in the future for how metrics are sent. To do this, it looks like we'll need to use the newer sink style. We have updated the sink to the new StreamSink style, using a HttpBatchService wrapper to send the requests to the AppSignal public endpoint API. Part of [tracking issue vectordotdev#9261][1] [1]: vectordotdev#9261 Co-authored-by: Jeff Kreeftmeijer <jeff@kreeft.me>
Previously, the AppSignal sink was written in what was already a bit of an older style in PR vectordotdev#16650. We want to change some functionality in the future for how metrics are sent. To do this, it looks like we'll need to use the newer sink style. With this change, the AppSignal sink's functionality has remained the same. We have updated the sink to the new StreamSink style, using a HttpBatchService wrapper to send the requests to the AppSignal public endpoint API. We followed the [sink guides][2] initially and looked at other sinks already rewritten linked in [issue vectordotdev#9261][1] to see how to implement it further. Updated the integration_tests to test if the sink is a HTTP sink with the `HTTP_SINK_TAGS`. Previously, it didn't test yet if the `EndpointBytesSent` event was sent. We're unsure if `AppsignalResponse`'s `bytes_sent` needs to be implemented or not. If it returns `None` the tests also pass, but we thought we might as well implement it properly. Part of [tracking issue vectordotdev#9261][1] [1]: vectordotdev#9261 [2]: https://github.com/vectordotdev/vector/blob/600f8191a8fe169eb38c429958dd59714349acb4/docs/tutorials/sinks/1_basic_sink.md Co-authored-by: Jeff Kreeftmeijer <jeff@kreeft.me>
Previously, the AppSignal sink was written in what was already a bit of an older style in PR vectordotdev#16650. We want to change some functionality in the future for how metrics are sent. To do this, it looks like we'll need to use the newer sink style, or at least it will be easier. With this change, the AppSignal sink's functionality has remained the same. We have updated the sink to the new StreamSink style, using a HttpBatchService wrapper to send the requests to the AppSignal public endpoint API. We followed the [sink guides][2] initially and looked at other sinks already rewritten linked in [issue vectordotdev#9261][1] to see how to implement it further. Updated the integration_tests to test if the sink is a HTTP sink with the `HTTP_SINK_TAGS`. Previously, it didn't test yet if the `EndpointBytesSent` event was sent. We're unsure if `AppsignalResponse`'s `bytes_sent` needs to be implemented or not. If it returns `None` the tests also pass, but we thought we might as well implement it properly. Part of [tracking issue vectordotdev#9261][1] [1]: vectordotdev#9261 [2]: https://github.com/vectordotdev/vector/blob/600f8191a8fe169eb38c429958dd59714349acb4/docs/tutorials/sinks/1_basic_sink.md Co-authored-by: Jeff Kreeftmeijer <jeff@kreeft.me>
Previously, the AppSignal sink was written in what was already a bit of an older style in PR vectordotdev#16650. We want to change some functionality in the future for how metrics are sent. To do this, it looks like we'll need to use the newer sink style, or at least it will be easier. With this change, the AppSignal sink's functionality has remained the same. We have updated the sink to the new StreamSink style, using a HttpBatchService wrapper to send the requests to the AppSignal public endpoint API. We followed the [sink guides][2] initially and looked at other sinks already rewritten linked in [issue vectordotdev#9261][1] to see how to implement it further. Updated the integration_tests to test if the sink is a HTTP sink with the `HTTP_SINK_TAGS`. Previously, it didn't test yet if the `EndpointBytesSent` event was sent. We're unsure if `AppsignalResponse`'s `bytes_sent` needs to be implemented or not. If it returns `None` the tests also pass, but we thought we might as well implement it properly. Part of [tracking issue vectordotdev#9261][1] [1]: vectordotdev#9261 [2]: https://github.com/vectordotdev/vector/blob/600f8191a8fe169eb38c429958dd59714349acb4/docs/tutorials/sinks/1_basic_sink.md Co-authored-by: Jeff Kreeftmeijer <jeff@kreeft.me>
* chore(appsignal sink): Refactor to use StreamSink Previously, the AppSignal sink was written in what was already a bit of an older style in PR #16650. We want to change some functionality in the future for how metrics are sent. To do this, it looks like we'll need to use the newer sink style, or at least it will be easier. With this change, the AppSignal sink's functionality has remained the same. We have updated the sink to the new StreamSink style, using a HttpBatchService wrapper to send the requests to the AppSignal public endpoint API. We followed the [sink guides][2] initially and looked at other sinks already rewritten linked in [issue #9261][1] to see how to implement it further. Updated the integration_tests to test if the sink is a HTTP sink with the `HTTP_SINK_TAGS`. Previously, it didn't test yet if the `EndpointBytesSent` event was sent. We're unsure if `AppsignalResponse`'s `bytes_sent` needs to be implemented or not. If it returns `None` the tests also pass, but we thought we might as well implement it properly. Part of [tracking issue #9261][1] [1]: #9261 [2]: https://github.com/vectordotdev/vector/blob/600f8191a8fe169eb38c429958dd59714349acb4/docs/tutorials/sinks/1_basic_sink.md Co-authored-by: Jeff Kreeftmeijer <jeff@kreeft.me> * Split AppSignal sink into separate modules As per review feedback: split the new sink style into separate module files. * Fix visibility of things in AppSignal sink It doesn't need to be visible for the entire crate, only the AppSignal sink scope. --------- Co-authored-by: Jeff Kreeftmeijer <jeff@kreeft.me>
In #8825 and #8884 we identified a "new style" for sinks, one that focuses on stream processing -- migrating away from the
Sink
variant of our sinks to theStream
variant -- and composition. To the later point our "old style" sinks rely on an inheritance pattern for their implementation, leading both to very deep type hierarchies, duplication of responsibility in the hierarchy and some brittleness. Consider that theBatcher
introduced in #8960 and is an extraction and retooling of our old-stylePartitionInnerBuffer
. This "new style" not only makes the sink code more explicable but allows better CPU saturation of vector machines, compared to the single-threaded push-style of theSink
trait.As of this writing we have three "new style" sinks. They are:
aws_s3
datadog_archive
datadog_logs
We're aware that there's duplicating in these sinks. Their IO loops, as an example, are almost identical. Our ambition is to avoid premature abstraction across the construction of the new style sinks without abandoning the process of lifting common pieces when it's clear they're common and minimal. For instance, as mentioned, each of these new-style sinks shares an IO loop and @tobz is actively working to separate that out in #9215. We expect that other common pieces will fall out of the conversion work, requiring us to revisit sinks multiple times.
At the end of this work we will have removed
VectorSink::Sink
from the codebase entirely. Much of the support infrastructure for the "old style" sinks should also be removable as well. As this work proceed we hope to expand our use of property testing in the project -- consistent with #9131 -- and add micro-benchmarks where appropriate.The sinks to be converted are (highest priority first):
datadog_metrics
sink into the new style #9440elasticsearch
sink into the new style #9439kafka
sink into the new style #9323splunk_hec
sink into the new style #9444vector
sink into the new style #9445datadog_events
sink into the new style #9446prometheus_exporter
sink into the new style #9447azure_blob
sink into the new style #9448gcp_cloud_storage
sink into the new style #9449aws_kinesis_streams
sink into the new style #9450aws_kinesis_firehose
sink into the new style #9451loki
sink into the new style #9441aws_cloudwatch_logs
sink into the new style #9452blackhole
sink into the new style #9453console
sink into the new style #9454aws_sqs
sink into the new style #10987humio_metrics
sink into the new style (included in Rewrite thesplunk_hec_logs
sink into the new style #9734)humio_logs
sink into the new style (included in Rewrite thesplunk_hec_logs
sink into the new style #9734)sematext_logs
sink into the new style (included as part of the elasticsearch work)socket
sink into the new stylestatsd
sink into the new styleclickhouse
sink into the new style #17094http
sink into the new style #18201nats
sink into the new style #18242redis
sink into the new style #18222prometheus_remote_write
sink into the new style #10991gcp_stackdriver_logs
sink into the new style #10989aws_cloudwatch_metrics
sink into the new style #10988gcp_stackdriver_metrics
sink into the new style #18748gcp_pubsub
sink into the new style #10990file
sink into the new styleazure_monitor_logs
sink into the new stylesematext_metrics
sink into the new stylehoneycomb
sink into the new style #18213papertrail
sink into the new styleinfluxdb_logs
sink into the new styleinfluxdb_metrics
sink into the new style #19102mezmo
sink into the new styleappsignal
sink into the new style #18214Approaching each sink one at a time is likely the best tactic. As of this writing @tobz and @blt understand the most of about the "new style".
The text was updated successfully, but these errors were encountered: