restore-cluster error starting cassandra #822

chrisjmiller1 · 2024-11-26T10:36:02Z

Hi folks,

Testing restore-cluster using medusa 0.22.3.

restore is working perfectly but failing at the last step i.e. to start cassandra.

stdout has the following:
[2024-11-26 10:27:22,776] INFO: Executing "mkdir -p /tmp/medusa-job-66ff7a96-9771-41f5-972a-2cbdaed9086f; cd /tmp/medusa-job-66ff7a96-9771-41f5-972a-2cbdaed9086f && medusa-wrapper sudo medusa --fqdn=%s -vvv restore-node --in-place %s --no-verify --backup-name backup6 --temp-dir /tmp " on following nodes ['mxiad-tfdevmet01', 'mxiad-tfdevmet02', 'mxiad-tfdevmet03'] with a parallelism/pool size of 3 [2024-11-26 10:28:01,975] ERROR: Job executing "mkdir -p /tmp/medusa-job-66ff7a96-9771-41f5-972a-2cbdaed9086f; cd /tmp/medusa-job-66ff7a96-9771-41f5-972a-2cbdaed9086f && medusa-wrapper sudo medusa --fqdn=%s -vvv restore-node --in-place %s --no-verify --backup-name backup6 --temp-dir /tmp " ran and finished with errors on following nodes: ['mxiad-tfdevmet01', 'mxiad-tfdevmet02', 'mxiad-tfdevmet03'] [2024-11-26 10:28:01,976] ERROR: Some nodes failed to restore. Exiting [2024-11-26 10:28:01,976] ERROR: This error happened during the cluster restore: Some nodes failed to restore. Exiting Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/medusa/restore_cluster.py", line 72, in orchestrate restore.execute() File "/usr/local/lib/python3.9/site-packages/medusa/restore_cluster.py", line 155, in execute self._restore_data() File "/usr/local/lib/python3.9/site-packages/medusa/restore_cluster.py", line 410, in _restore_data raise RuntimeError(err_msg) RuntimeError: Some nodes failed to restore. Exiting

medusa.log has the following:
[2024-11-26 09:28:49,101] INFO: Starting Cassandra [2024-11-26 09:28:49,101] DEBUG: Starting Cassandra with ['cassandra'] [2024-11-26 09:28:49,397] DEBUG: Disconnecting from S3...

whereas stderr has the following:
[2024-11-26 09:28:59,102] INFO: Starting Cassandra [2024-11-26 09:28:59,102] DEBUG: Starting Cassandra with ['/opt/imail1/cassandra/bin/cassandra'] [2024-11-26 09:28:59,411] DEBUG: Disconnecting from S3... Traceback (most recent call last): File "/usr/local/bin/medusa", line 8, in <module> sys.exit(cli()) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 92, in new_func return ctx.invoke(f, obj, *args, **kwargs) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/medusa/medusacli.py", line 275, in restore_node medusa.restore_node.restore_node(medusaconfig, Path(temp_dir), backup_name, in_place, keep_auth, seeds, File "/usr/local/lib/python3.9/site-packages/medusa/restore_node.py", line 50, in restore_node restore_node_locally(config, temp_dir, backup_name, in_place, keep_auth, seeds, storage, File "/usr/local/lib/python3.9/site-packages/medusa/restore_node.py", line 137, in restore_node_locally cassandra.start_with_implicit_token() File "/usr/local/lib/python3.9/site-packages/medusa/cassandra_utils.py", line 650, in start_with_implicit_token subprocess.check_output(cmd) File "/usr/local/lib64/python3.9/site-packages/gevent/subprocess.py", line 418, in check_output raise CalledProcessError(retcode, process.args, output=output) subprocess.CalledProcessError: Command '['/opt/imail1/cassandra/bin/cassandra']' returned non-zero exit status 1.

I believe this is due to the fact that cassandra is being started using sudo. Is there a workaround for this?

Also is it possible to complete the restore in parallel but complete the startup in a rolling fashion?

Thanks,

Chris.

┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: MED-107

The text was updated successfully, but these errors were encountered:

chrisjmiller1 · 2024-11-29T16:12:23Z

Used sudo -u as a workaround and it allows medusa to start cassandra successfully.

adejanovski added this to K8ssandra Nov 26, 2024

chrisjmiller1 closed this as completed Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

restore-cluster error starting cassandra #822

restore-cluster error starting cassandra #822

chrisjmiller1 commented Nov 26, 2024 •

edited by sync-by-unito bot

Loading

chrisjmiller1 commented Nov 29, 2024

restore-cluster error starting cassandra #822

restore-cluster error starting cassandra #822

Comments

chrisjmiller1 commented Nov 26, 2024 • edited by sync-by-unito bot Loading

chrisjmiller1 commented Nov 29, 2024

chrisjmiller1 commented Nov 26, 2024 •

edited by sync-by-unito bot

Loading