Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporterhelper] enable Persistent Queue feature by default #5711

Closed

Conversation

swiatekm
Copy link
Contributor

Description:
Persistent Queue is a feature that currently requires "enable_unstable" tag provided to the the build to enable it. We have been using it in our distro for several months already and it looks stable. It seems that others are using this capability as well (open-telemetry/opentelemetry-collector-contrib#9327 (comment))

This change enables it by default, which also simplifies the build process a little bit.

Testing:
Added some tests to cover the queue creation logic and the requeue feature. They're a bit awkward, and should probably be refactored once the requeue feature is exposed in config.

Originally posted by @pmm-sumo here, see the review discussion therein for justification for some of the choices made.

@codecov
Copy link

codecov bot commented Jul 20, 2022

Codecov Report

Merging #5711 (7859626) into main (b463d2a) will decrease coverage by 0.38%.
The diff coverage is 94.39%.

❗ Current head 7859626 differs from pull request most recent head beacb9e. Consider uploading reports for the commit beacb9e to get more accurate results

@@            Coverage Diff             @@
##             main    #5711      +/-   ##
==========================================
- Coverage   91.78%   91.39%   -0.39%     
==========================================
  Files         192      197       +5     
  Lines       11394    11913     +519     
==========================================
+ Hits        10458    10888     +430     
- Misses        742      812      +70     
- Partials      194      213      +19     
Impacted Files Coverage Δ
...xporterhelper/internal/persistent_storage_batch.go 80.48% <ø> (ø)
exporter/exporterhelper/queued_retry_inmemory.go 91.66% <93.40%> (-0.23%) ⬇️
...porter/exporterhelper/internal/persistent_queue.go 100.00% <100.00%> (ø)
...rter/exporterhelper/internal/persistent_storage.go 89.58% <100.00%> (ø)
pdata/internal/metrics.go 84.09% <0.00%> (-8.41%) ⬇️
pdata/internal/common.go 92.59% <0.00%> (-2.80%) ⬇️
exporter/otlphttpexporter/otlp.go 81.73% <0.00%> (-2.11%) ⬇️
service/telemetry.go 89.20% <0.00%> (ø)
service/extensions/extensions.go
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b463d2a...beacb9e. Read the comment docs.

@swiatekm swiatekm force-pushed the enable-persistent-queue-in-build branch from 7859626 to aa29c87 Compare July 21, 2022 09:16
@swiatekm
Copy link
Contributor Author

I've rebased this post 0.56.0 and fixed the remaining minor issues the CI got stuck on.

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to agree on the config first, since that is the most important part.

Comment on lines 43 to 48
// StorageID if not empty, enables the persistent storage and uses the component specified
// as a storage extension for the persistent queue
StorageID *config.ComponentID `mapstructure:"storage"`
// StorageEnabled describes whether persistence via a file storage extension is enabled using the single
// default storage extension.
StorageEnabled bool `mapstructure:"persistent_storage_enabled"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this config, is too complex for no good reason. I see no point in supporting persistent_storage_enabled instead of simply asking for the exact name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Copying my comment from previous PR:

Is this setting necessary? If the storage is unspecified the persistent queue is disabled. Isn't that enough for control? Or do we want to have this so that we can enable the storage without knowing the name of the extension?

Alternate suggestion: can we delete persistent_storage_enabled setting and make * (or some other character sequence) a special value for storage setting which finds the only available storage extension?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current state was an attempt to keep the config backwards-compatible, as this whole change was originally just about enabling the queue in the build by default. If we're going to break it, then I agree with @bogdandrutu - just make the user specify the storage explicitly and be done with it, looks like the best option in the long term.

If we do that though, I think the changelog should mention the change explicitly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not call this a breaking change since this was an experimental feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not technically a breaking change, but it's nonetheless a feature being used in the real world, and I think it'd be polite to call it out in the changelog to make it easier for users to figure out what broke their config after an upgrade.

Either way, I'll make the change in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Waiting for the change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, sorry for the delay. I've added a changelog entry to breaking changes, even though this technically isn't one.

@swiatekm swiatekm force-pushed the enable-persistent-queue-in-build branch from aa29c87 to beacb9e Compare July 28, 2022 13:36
@@ -12,6 +13,7 @@

### 💡 Enhancements 💡

- Enable Persistent Queue by default (#5711)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit misleading. We don't enable persistent queue by default. We just make the feature available in the default build.

@@ -58,7 +58,7 @@ gomoddownload:

.PHONY: gotest
gotest:
@$(MAKE) for-all-target TARGET="test test-unstable"
@$(MAKE) for-all-target TARGET="test"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this removed? What if we have other unstable features to test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should go with feature gates for unstable features from now on, this is an old unstable feature before the gates.

Comment on lines +120 to +121
requeuingEnabled bool
requestUnmarshaler internal.RequestUnmarshaler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this PR should do one thing: enable the feature. Instead I see modifications of the logic. Can we split the PR into 2 parts:

  1. Change whatever logic needs to change and whatever refactoring is necessary. Keep the persistent queue feature experimental.
  2. Make persistent queue feature part of stable build.
    Otherwise I have a hard time understanding what changes are needed for enabling the feature, which changes are refactorings which don't change the behavior and what changes change the behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this became too complicated along the way. I'm going to split it into three PRs:

  1. Make the queue logic consistent between the in-memory queue and the persistent one. This way enabling the persistent queue is just a change to the build.
  2. Enable persistent queue in the build by default.
  3. Configuration changes for setting the storage for the persistent queue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first PR: #5764. Would you prefer to close this one right now or wait until all the changes are merged individually? @tigrannajaryan @bogdandrutu ?

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure we are not doing too many things in one PR, and keep PRs as small and focus as possible. For example we agreed on changing the config, let's start with doing that on the experimental feature.

@@ -4,6 +4,7 @@

### 🛑 Breaking changes 🛑

- Require storage explicitly set for persistent queue (#5711)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment, this happens only if you enable the experimental feature xyz...

@@ -58,7 +58,7 @@ gomoddownload:

.PHONY: gotest
gotest:
@$(MAKE) for-all-target TARGET="test test-unstable"
@$(MAKE) for-all-target TARGET="test"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should go with feature gates for unstable features from now on, this is an old unstable feature before the gates.

Comment on lines -46 to +42
[file storage extension](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/storage/filestorage).
If collector instance is killed while having some items in the persistent queue, on restart the items are being picked and
the exporting is continued.
When persistent queue is enabled, the batches are being buffered to disk using [filestorage] extension. If collector instance is killed while having some items in the persistent queue, on restart the items are being picked and the exporting is continued.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong statement, they are buffered in the storage :) the storage can be s3 or nvram (whatever the extension implements).

Comment on lines +38 to +39
// PersistentQueueStartFunc defines a function that can be used to start the persistent queue
type PersistentQueueStartFunc func(ctx context.Context, client storage.Client)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this change necessary now and not before? if this can be "refactored" as a separate PR, please do so.

@bogdandrutu
Copy link
Member

This needs some rework.

@swiatekm
Copy link
Contributor Author

swiatekm commented Aug 2, 2022

This needs some rework.

I'll just close it and continue submitting the individual changes as separate PRs. After #5764, I'll do the storage configuration change next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants