Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restore-cluster error starting cassandra #822

Closed
chrisjmiller1 opened this issue Nov 26, 2024 · 1 comment
Closed

restore-cluster error starting cassandra #822

chrisjmiller1 opened this issue Nov 26, 2024 · 1 comment

Comments

@chrisjmiller1
Copy link

chrisjmiller1 commented Nov 26, 2024

Project board link

Hi folks,

Testing restore-cluster using medusa 0.22.3.

restore is working perfectly but failing at the last step i.e. to start cassandra.

stdout has the following:
[2024-11-26 10:27:22,776] INFO: Executing "mkdir -p /tmp/medusa-job-66ff7a96-9771-41f5-972a-2cbdaed9086f; cd /tmp/medusa-job-66ff7a96-9771-41f5-972a-2cbdaed9086f && medusa-wrapper sudo medusa --fqdn=%s -vvv restore-node --in-place %s --no-verify --backup-name backup6 --temp-dir /tmp " on following nodes ['mxiad-tfdevmet01', 'mxiad-tfdevmet02', 'mxiad-tfdevmet03'] with a parallelism/pool size of 3 [2024-11-26 10:28:01,975] ERROR: Job executing "mkdir -p /tmp/medusa-job-66ff7a96-9771-41f5-972a-2cbdaed9086f; cd /tmp/medusa-job-66ff7a96-9771-41f5-972a-2cbdaed9086f && medusa-wrapper sudo medusa --fqdn=%s -vvv restore-node --in-place %s --no-verify --backup-name backup6 --temp-dir /tmp " ran and finished with errors on following nodes: ['mxiad-tfdevmet01', 'mxiad-tfdevmet02', 'mxiad-tfdevmet03'] [2024-11-26 10:28:01,976] ERROR: Some nodes failed to restore. Exiting [2024-11-26 10:28:01,976] ERROR: This error happened during the cluster restore: Some nodes failed to restore. Exiting Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/medusa/restore_cluster.py", line 72, in orchestrate restore.execute() File "/usr/local/lib/python3.9/site-packages/medusa/restore_cluster.py", line 155, in execute self._restore_data() File "/usr/local/lib/python3.9/site-packages/medusa/restore_cluster.py", line 410, in _restore_data raise RuntimeError(err_msg) RuntimeError: Some nodes failed to restore. Exiting

medusa.log has the following:
[2024-11-26 09:28:49,101] INFO: Starting Cassandra [2024-11-26 09:28:49,101] DEBUG: Starting Cassandra with ['cassandra'] [2024-11-26 09:28:49,397] DEBUG: Disconnecting from S3...

whereas stderr has the following:
[2024-11-26 09:28:59,102] INFO: Starting Cassandra [2024-11-26 09:28:59,102] DEBUG: Starting Cassandra with ['/opt/imail1/cassandra/bin/cassandra'] [2024-11-26 09:28:59,411] DEBUG: Disconnecting from S3... Traceback (most recent call last): File "/usr/local/bin/medusa", line 8, in <module> sys.exit(cli()) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 92, in new_func return ctx.invoke(f, obj, *args, **kwargs) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/medusa/medusacli.py", line 275, in restore_node medusa.restore_node.restore_node(medusaconfig, Path(temp_dir), backup_name, in_place, keep_auth, seeds, File "/usr/local/lib/python3.9/site-packages/medusa/restore_node.py", line 50, in restore_node restore_node_locally(config, temp_dir, backup_name, in_place, keep_auth, seeds, storage, File "/usr/local/lib/python3.9/site-packages/medusa/restore_node.py", line 137, in restore_node_locally cassandra.start_with_implicit_token() File "/usr/local/lib/python3.9/site-packages/medusa/cassandra_utils.py", line 650, in start_with_implicit_token subprocess.check_output(cmd) File "/usr/local/lib64/python3.9/site-packages/gevent/subprocess.py", line 418, in check_output raise CalledProcessError(retcode, process.args, output=output) subprocess.CalledProcessError: Command '['/opt/imail1/cassandra/bin/cassandra']' returned non-zero exit status 1.

I believe this is due to the fact that cassandra is being started using sudo. Is there a workaround for this?

Also is it possible to complete the restore in parallel but complete the startup in a rolling fashion?

Thanks,

Chris.

┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: MED-107

@chrisjmiller1
Copy link
Author

Used sudo -u as a workaround and it allows medusa to start cassandra successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant