Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Remove early row count and update batch export metrics #22810

Merged
merged 14 commits into from
Jun 13, 2024

Conversation

tomasfarias
Copy link
Contributor

@tomasfarias tomasfarias commented Jun 7, 2024

Problem

Counting rows is quite expensive. Let's not do it.

Changes

  • Do not count rows when starting a batch export, if a run has no rows, query will not return any records.
  • Early return moved to insert_* batch export activities: If we peek into the record generator and we don't get anything, then we return early.
    • Added tests for this.
  • App metrics no longer relies on records_total_count but now queries clickhouse to get a count of rows. This could potentially be very expensive.
    • IMO a metric based on number of events should never exist. We should instead look at count of runs only.

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Does this work well for both Cloud and self-hosted?

How did you test this code?

Updated a bunch of tests + added new tests that run workflows without events.

Comment on lines 129 to 133
count = fetch_batch_export_run_count(
team_id=run.batch_export.team_id,
data_interval_start=run.data_interval_start,
data_interval_end=run.data_interval_end,
)
Copy link
Contributor

@tiina303 tiina303 Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO a metric based on number of events should never exist. We should instead look at count of runs only.

ok, lets instead look at time ranges exported which is easy to do over runs, we can make all the graphs say "exports succeeded" / "exports failed" (because it's the latest run only not all runs) instead of "events sent" / "events failed" in the UI and we can remove this.

),
failures=Sum(
Coalesce("records_total_count", 0), filter=~Q(status=BatchExportRun.Status.COMPLETED), default=0
runs = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't this get all the runs, not just the latest for that date range? We only want the latest, if something failed, but was retried, then we'd only want to show success in the sparkgraph and metrics (because a user ultimately doesn't care if it was retried as long as it was exported same as webhooks).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if nothing failed but it was retried (manually, by the user) anyways? I think it makes sense to show duplicates if the user requests duplicates. So, we need all runs.

Copy link
Contributor Author

@tomasfarias tomasfarias Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, no new runs are automatically created in the event of failure, unless manually requested by us or users. Retries are part of a run, not a separate run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 would you mind changing the labels in the UI too with this PR 'events' -> 'runs' in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, latest commit changed this to runs. I'll make the UI change next.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in latest commit, looks like this:
image

@tomasfarias tomasfarias force-pushed the refactor/remove-count-of-rows branch from b35875d to 0c31647 Compare June 12, 2024 11:04
@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

Copy link
Contributor

github-actions bot commented Jun 12, 2024

Size Change: 0 B

Total Size: 1.06 MB

ℹ️ View Unchanged
Filename Size
frontend/dist/toolbar.js 1.06 MB

compressed-size-action

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@tomasfarias tomasfarias merged commit de17595 into master Jun 13, 2024
90 checks passed
@tomasfarias tomasfarias deleted the refactor/remove-count-of-rows branch June 13, 2024 09:14
Copy link

sentry-io bot commented Jun 13, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ ClickHouseError: Code: 202. DB::Exception: Received from ch7.posthog.net:9000. DB::Exception: Too many simultaneou... posthog.temporal.common.clickhouse in check_res... View Issue
  • ‼️ ClickHouseError: Code: 202. DB::Exception: Received from ch7.posthog.net:9000. DB::Exception: Too many simultaneou... posthog.temporal.common.clickhouse in check_res... View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants