Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: the default clickhouse installation is distributed, only turn on single node when clickhouse is disabled #624

Merged

Conversation

jandersen-plaid
Copy link
Contributor

@jandersen-plaid jandersen-plaid commented May 20, 2022

We went through an upgrade of sentry today and ran into a large bug when running snuba migrations where the migrations failed to run or recognize that any earlier migrations had been run.

After looking into it further, we found that this diff was being applied to our snuba settings:

         },
-         "single_node": False,
+         "single_node": True,
           "cluster_name": "sentry-clickhouse",
          "distributed_cluster_name": "sentry-clickhouse",
        },

This is very bad for us as we are running the default clickhouse installation with the chart which runs in distributed mode not single node mode. Swapping the value back to False in the configmap allowed the migrations to run and the upgrade to complete.

This PR honors the promise made in the values.yaml that the externalClickhouse values will only be used if clickhouse.enabled is set to false.

@jandersen-plaid jandersen-plaid changed the title The default clickhouse installation is distributed, only turn on single node when clickhouse is disabled fix: the default clickhouse installation is distributed, only turn on single node when clickhouse is disabled May 20, 2022
@Mokto Mokto merged commit d923eab into sentry-kubernetes:develop May 24, 2022
@Mokto
Copy link
Contributor

Mokto commented May 24, 2022

Thanks.

@pcornelissen
Copy link

We have been hit by that and I did not notice the single_node setting, as reaction, I just scaled clickhouse down to 1 instance.
What will happen, when we install the new update and scale up again. Will the schema and data then be synced or will we end up in a broken state again?

@jandersen-plaid
Copy link
Contributor Author

We have been hit by that and I did not notice the single_node setting, as reaction, I just scaled clickhouse down to 1 instance. What will happen, when we install the new update and scale up again. Will the schema and data then be synced or will we end up in a broken state again?

For the multi-node setting: I ended up in a broken state on upgrade even after disabling single_node. I think, fundamentally, the snuba migration system does not propagate migrations to multiple nodes properly.

To fix this, I manually went in to each node and altered the migrations table (confirming that the migration had run at each step) to assume that the migration was successful.

For a single node, I think the migrations should run properly, but I have 0 faith that that is actually the case. For those who don't need a highly available setup (and it is debatable if the distributed clickhouse used by Sentry is even highly available considering that a single clickhouse going down tends to send snuba workers into a backoff loop) I would highly recommend sticking with a single node setup.

@jandersen-plaid jandersen-plaid deleted the jha-fix-clickhouse-single-node branch July 21, 2022 16:35
This was referenced Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants