Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Receive: fix thanos_receive_write_{timeseries,samples} stats #7643

Merged

Conversation

cincinnat
Copy link
Contributor

@cincinnat cincinnat commented Aug 15, 2024

There are two path data can be written to a receiver: through the HTTP or the gRPC endpoint, and thanos_receive_write_{timeseries,samples} only count the number of timeseries/samples received through the HTTP endpoint.

So, there is no risk that a sample will be count twice, once as a remote write and once as a local write. On the other hand, we still need to account for the replication factor, and only count local writes is not enough as there might be no local writes at all (e.g. in RouterOnly mode).

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Verification

@cincinnat cincinnat force-pushed the receiver-write-timseries-metric branch from 44490ea to 0aa3bf0 Compare August 15, 2024 13:38
@cincinnat
Copy link
Contributor Author

cincinnat commented Aug 15, 2024

@MichaHoffmann I think your fix for these metrics does not work quite right, since we need to account not for local/remote writes but for the replication factor, as far as I understand it. Could you have a look at this change?

@cincinnat cincinnat force-pushed the receiver-write-timseries-metric branch 4 times, most recently from 89dcf30 to dd1a8d5 Compare August 15, 2024 14:19
@yeya24 yeya24 requested a review from MichaHoffmann August 15, 2024 16:08
MichaHoffmann
MichaHoffmann previously approved these changes Aug 24, 2024
Copy link
Contributor

@MichaHoffmann MichaHoffmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm. Can you update changelog with the fix please?

@cincinnat
Copy link
Contributor Author

cincinnat commented Aug 29, 2024

Lgtm. Can you update changelog with the fix please?

@MichaHoffmann I've updated the changelog.

Copy link
Contributor

@pedro-stanaka pedro-stanaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg

@cincinnat cincinnat force-pushed the receiver-write-timseries-metric branch from 0766b50 to be97ef7 Compare September 2, 2024 07:36
@cincinnat
Copy link
Contributor Author

Fixed a merge conflict in CHANGELOG

saswatamcode
saswatamcode previously approved these changes Sep 2, 2024
Copy link
Member

@saswatamcode saswatamcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@saswatamcode
Copy link
Member

@cincinnat this also needs a slight doc update, you can do that by running make docs

MichaHoffmann
MichaHoffmann previously approved these changes Sep 2, 2024
@cincinnat cincinnat dismissed stale reviews from MichaHoffmann and saswatamcode via 918b063 September 2, 2024 09:05
@cincinnat cincinnat force-pushed the receiver-write-timseries-metric branch 2 times, most recently from 11cfa82 to 8d4c777 Compare September 2, 2024 09:11
@saswatamcode saswatamcode enabled auto-merge (squash) September 2, 2024 09:18
@cincinnat
Copy link
Contributor Author

cincinnat commented Sep 2, 2024

The test failure seems irrelevant to the fix. It seems like a possible race condition in the test to me:
https://github.com/thanos-io/thanos/blob/main/pkg/receive/multitsdb_test.go#L468

			if test.bucket != nil {
				go func() {
					testutil.Ok(t, syncTSDBs(ctx, m, 10*time.Millisecond))
				}()
			}

There is no guarantee testutil.Ok is called before the test function exits.

This reverts commit 66841fb.

Signed-off-by: Mikhail Nozdrachev <mikhail.nozdrachev@aiven.io>
auto-merge was automatically disabled September 3, 2024 10:21

Head branch was pushed to by a user without write access

@cincinnat cincinnat force-pushed the receiver-write-timseries-metric branch from 8d4c777 to c2f47f1 Compare September 3, 2024 10:21
@cincinnat
Copy link
Contributor Author

cincinnat commented Sep 3, 2024

Just rebased on #7694 to have a green build. Previous build failed due to the flaky test fixed by that PR.

There are two path data can be written to a receiver: through the HTTP
or the gRPC endpoint, and `thanos_receive_write_{timeseries,samples}` only
count the number of timeseries/samples received through the HTTP
endpoint.

So, there is no risk that a sample will be counted twice, once as a
remote write and once as a local write. On the other hand, we still need
to account for the replication factor, and only count local writes is
not enough as there might be no local writes at all (e.g. in RouterOnly
mode).

Signed-off-by: Mikhail Nozdrachev <mikhail.nozdrachev@aiven.io>
@cincinnat cincinnat force-pushed the receiver-write-timseries-metric branch from c2f47f1 to dbc840a Compare September 3, 2024 10:27
@MichaHoffmann MichaHoffmann merged commit 7465177 into thanos-io:main Sep 3, 2024
22 checks passed
@cincinnat cincinnat deleted the receiver-write-timseries-metric branch September 3, 2024 12:23
jnyi pushed a commit to jnyi/thanos that referenced this pull request Oct 17, 2024
…s-io#7643)

* Revert "Receive: fix stats (thanos-io#7373)"

This reverts commit 66841fb.

Signed-off-by: Mikhail Nozdrachev <mikhail.nozdrachev@aiven.io>

* Receive: fix `thanos_receive_write_{timeseries,samples}` stats

There are two path data can be written to a receiver: through the HTTP
or the gRPC endpoint, and `thanos_receive_write_{timeseries,samples}` only
count the number of timeseries/samples received through the HTTP
endpoint.

So, there is no risk that a sample will be counted twice, once as a
remote write and once as a local write. On the other hand, we still need
to account for the replication factor, and only count local writes is
not enough as there might be no local writes at all (e.g. in RouterOnly
mode).

Signed-off-by: Mikhail Nozdrachev <mikhail.nozdrachev@aiven.io>

---------

Signed-off-by: Mikhail Nozdrachev <mikhail.nozdrachev@aiven.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants