-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporterhelper] enable Persistent Queue feature by default #5711
[exporterhelper] enable Persistent Queue feature by default #5711
Conversation
Codecov Report
@@ Coverage Diff @@
## main #5711 +/- ##
==========================================
- Coverage 91.78% 91.39% -0.39%
==========================================
Files 192 197 +5
Lines 11394 11913 +519
==========================================
+ Hits 10458 10888 +430
- Misses 742 812 +70
- Partials 194 213 +19
Continue to review full report at Codecov.
|
7859626
to
aa29c87
Compare
I've rebased this post 0.56.0 and fixed the remaining minor issues the CI got stuck on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like to agree on the config first, since that is the most important part.
// StorageID if not empty, enables the persistent storage and uses the component specified | ||
// as a storage extension for the persistent queue | ||
StorageID *config.ComponentID `mapstructure:"storage"` | ||
// StorageEnabled describes whether persistence via a file storage extension is enabled using the single | ||
// default storage extension. | ||
StorageEnabled bool `mapstructure:"persistent_storage_enabled"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this config, is too complex for no good reason. I see no point in supporting persistent_storage_enabled
instead of simply asking for the exact name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Copying my comment from previous PR:
Is this setting necessary? If the storage
is unspecified the persistent queue is disabled. Isn't that enough for control? Or do we want to have this so that we can enable the storage without knowing the name of the extension?
Alternate suggestion: can we delete persistent_storage_enabled
setting and make *
(or some other character sequence) a special value for storage
setting which finds the only available storage extension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current state was an attempt to keep the config backwards-compatible, as this whole change was originally just about enabling the queue in the build by default. If we're going to break it, then I agree with @bogdandrutu - just make the user specify the storage explicitly and be done with it, looks like the best option in the long term.
If we do that though, I think the changelog should mention the change explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not call this a breaking change since this was an experimental feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not technically a breaking change, but it's nonetheless a feature being used in the real world, and I think it'd be polite to call it out in the changelog to make it easier for users to figure out what broke their config after an upgrade.
Either way, I'll make the change in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: Waiting for the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, sorry for the delay. I've added a changelog entry to breaking changes, even though this technically isn't one.
aa29c87
to
beacb9e
Compare
@@ -12,6 +13,7 @@ | |||
|
|||
### 💡 Enhancements 💡 | |||
|
|||
- Enable Persistent Queue by default (#5711) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit misleading. We don't enable persistent queue by default. We just make the feature available in the default build.
@@ -58,7 +58,7 @@ gomoddownload: | |||
|
|||
.PHONY: gotest | |||
gotest: | |||
@$(MAKE) for-all-target TARGET="test test-unstable" | |||
@$(MAKE) for-all-target TARGET="test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this removed? What if we have other unstable features to test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should go with feature gates for unstable features from now on, this is an old unstable feature before the gates.
requeuingEnabled bool | ||
requestUnmarshaler internal.RequestUnmarshaler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this PR should do one thing: enable the feature. Instead I see modifications of the logic. Can we split the PR into 2 parts:
- Change whatever logic needs to change and whatever refactoring is necessary. Keep the persistent queue feature experimental.
- Make persistent queue feature part of stable build.
Otherwise I have a hard time understanding what changes are needed for enabling the feature, which changes are refactorings which don't change the behavior and what changes change the behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this became too complicated along the way. I'm going to split it into three PRs:
- Make the queue logic consistent between the in-memory queue and the persistent one. This way enabling the persistent queue is just a change to the build.
- Enable persistent queue in the build by default.
- Configuration changes for setting the storage for the persistent queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first PR: #5764. Would you prefer to close this one right now or wait until all the changes are merged individually? @tigrannajaryan @bogdandrutu ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure we are not doing too many things in one PR, and keep PRs as small and focus as possible. For example we agreed on changing the config, let's start with doing that on the experimental feature.
@@ -4,6 +4,7 @@ | |||
|
|||
### 🛑 Breaking changes 🛑 | |||
|
|||
- Require storage explicitly set for persistent queue (#5711) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please comment, this happens only if you enable the experimental feature xyz...
@@ -58,7 +58,7 @@ gomoddownload: | |||
|
|||
.PHONY: gotest | |||
gotest: | |||
@$(MAKE) for-all-target TARGET="test test-unstable" | |||
@$(MAKE) for-all-target TARGET="test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should go with feature gates for unstable features from now on, this is an old unstable feature before the gates.
[file storage extension](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/storage/filestorage). | ||
If collector instance is killed while having some items in the persistent queue, on restart the items are being picked and | ||
the exporting is continued. | ||
When persistent queue is enabled, the batches are being buffered to disk using [filestorage] extension. If collector instance is killed while having some items in the persistent queue, on restart the items are being picked and the exporting is continued. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong statement, they are buffered in the storage :) the storage can be s3 or nvram (whatever the extension implements).
// PersistentQueueStartFunc defines a function that can be used to start the persistent queue | ||
type PersistentQueueStartFunc func(ctx context.Context, client storage.Client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this change necessary now and not before? if this can be "refactored" as a separate PR, please do so.
This needs some rework. |
I'll just close it and continue submitting the individual changes as separate PRs. After #5764, I'll do the storage configuration change next. |
Description:
Persistent Queue is a feature that currently requires "enable_unstable" tag provided to the the build to enable it. We have been using it in our distro for several months already and it looks stable. It seems that others are using this capability as well (open-telemetry/opentelemetry-collector-contrib#9327 (comment))
This change enables it by default, which also simplifies the build process a little bit.
Testing:
Added some tests to cover the queue creation logic and the requeue feature. They're a bit awkward, and should probably be refactored once the requeue feature is exposed in config.
Originally posted by @pmm-sumo here, see the review discussion therein for justification for some of the choices made.