-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving resiliency through disk buffering capability #2285
Comments
Note that |
The processor is deprecated but is it also true for the helper? |
@pmm-sumo I would love to collaborate on this. There are some folks at Google who are interested in solving this problem as well. cc @erain @tigrannajaryan is this something that can be allowed for GA? |
The helper is here to stay. We don't plan to deprecate it. However, if implemented in the helper each exporter will have its own persistent queue, which may or may not be desirable (it will likely mean persistent data is duplicated if more than one exporter is used). |
It depends on the implementation. If it requires major changes to existing code and may destabilize the Collector right before the GA then it may not be desirable. But it should not be an issue if it is released immediately after GA, e.g. in 1.1. If the implementation is a separate component that doesn't touch existing code then I see no problem with allowing it for GA. Either way, let's see a design document first and we can discuss. |
Thank you for the clarification @tigrannajaryan I agree that we need a design doc first @dashpole and @erain, did you perhaps start working/researching this? Perhaps we could mention this during the next SIG meeting (if anyone else would like to participate too) and plan a separate meeting for discussing it? I think we might need more than a few minutes to share our ideas, discuss options and decide how to split the work (even if we would just start with creating a design doc). What do you think? |
Thus far, we have mostly been collecting use-cases, but have also explored a few different possible solutions that we'd be happy to share. I added it to the next collector sig meeting agenda. |
The recent stanza contribution may include some useful code to bootstrap this effort. I'm not deeply familiar with this part of the codebase, but it implements buffering and flushing as independent capabilities. This code is basically dead at this point, since stanza previously made use of it as part of its output operators, which have been removed due to redundancy with the collector. If this code appears to be useful at all, I believe it would be fine to pull it out of the new repository and adapt as necessary without worrying about duplication. |
Perhaps a related issue (when format used for buffering is considered): open-telemetry/opentelemetry-specification#1443 |
@pmm-sumo please review open-telemetry/opentelemetry-collector-contrib#3087 from the perspective of using in disk buffer. |
I have prepared a design doc with some ideas on how this could be achieved. cc @dashpole |
how much time the buffer data can be stored in memory? |
As mentioned in the other issue, this controlled by number of batches using queued_retry configuration. I am working on a PoC of a generic buffering solution, should have an update soon |
As discussed during the SIG, we want to move storage extension to core, starting with the interface (this PR) so persistent buffer implementation (#2285) could use it **Link to tracking Issue:** #3424 **Testing:** Just the interface, no tests **Documentation:** README.md with API cc @djaglowski @tigrannajaryan
…3274) Persistent queue implementation within queued_retry, aimed at being compatible with Jager's [BoundedQueue](https://github.com/jaegertracing/jaeger/blob/master/pkg/queue/bounded_queue.go) interface (providing a simple replacement) and backed by [file storage extension](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/storage/filestorage) for storing WAL. Currently, to run the persistent queue, OpenTelemetry Collector Contrib with `enable_unstable` build tag is required. **Link to tracking Issue:** #2285 [Design doc](https://docs.google.com/document/d/1Y4vNthCGdYI61ezeAzL5dXWgiZ73y9eSjIDitk3zXsU/edit#) **Testing:** Unit Tests and manual testing, more to come **Documentation:** README.md updated, including an example
…-telemetry#2285) * Fix IPv6 handling errors in semconv.NetAttributesFromHTTPRequest fixes open-telemetry#2283 * Enter PR number in CHANGELOG * Remove unnecessary creation and then assignment Standardize order of checks for IP, Name, Port * Assume happy path when parsing host and port i.e. assume net.SplitHostPort(input) will succeed * Get rid of uint64 for port * Fix git merge of main by adding back strings import Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com> Co-authored-by: Tyler Yahn <codingalias@gmail.com>
Closing this since the capability is now available through experimental queued_retry implementation |
Is your feature request related to a problem? Please describe.
Currently, collector buffers all data in memory, which can lead to data loss in case buffer gets overfilled (e.g. during temporary exporter connection issues) or due to a sudden crash.
Describe the solution you'd like
There should be a possibility to enable disk buffering capability. For example, exporter could buffer the incoming data on disk (up to the defined limit) and clear the buffer as data gets sent out. In case of restart, recovery process could happen, which would send-out the buffered data.
Perhaps this capability could be a good fit to
queued_retry
helper, since it already manages send queues?Additional context
Gitter discussion
The text was updated successfully, but these errors were encountered: