Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble with training on google colab #220

Closed
Meldoner opened this issue Apr 3, 2023 · 22 comments · Fixed by #236
Closed

Trouble with training on google colab #220

Meldoner opened this issue Apr 3, 2023 · 22 comments · Fixed by #236
Labels
bug Something isn't working

Comments

@Meldoner
Copy link
Contributor

Meldoner commented Apr 3, 2023

Describe the bug
I have a model trained for only 5600 steps, then the colab turned me off because of the limits, when I want to continue training I get this error at the stage Copy configs file:
0% 0/15 [00:00<?, ?it/s]
0% 0/15 [00:00<?, ?it/s]
0% 0/15 [00:00<?, ?it/s]
0% 0/15 [00:00<?, ?it/s]
0% 0/15 [00:00<?, ?it/s]Traceback (most recent call last):
File "/usr/local/bin/svc", line 8, in
sys.exit(cli())
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/so_vits_svc_fork/main.py", line 590, in pre_hubert
preprocess_hubert_f0(
File "/usr/local/lib/python3.9/dist-packages/so_vits_svc_fork/preprocessing/preprocess_hubert_f0.py", line 112, in preprocess_hubert_f0
Parallel(n_jobs=n_jobs)(
File "/usr/local/lib/python3.9/dist-packages/joblib/parallel.py", line 1061, in call
self.retrieve()
File "/usr/local/lib/python3.9/dist-packages/joblib/parallel.py", line 938, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/usr/local/lib/python3.9/dist-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
return future.result(timeout=timeout)
File "/usr/lib/python3.9/concurrent/futures/_base.py", line 446, in result
return self.__get_result()
File "/usr/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.

The exit codes of the workers are {SIGKILL(-9)}
/usr/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 10 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

To Reproduce

Additional context
Is there anything you can do to help me? And how to continue training, I don't really understand

@Meldoner Meldoner added the bug Something isn't working label Apr 3, 2023
@Meldoner
Copy link
Contributor Author

Meldoner commented Apr 3, 2023

Can someone help?
It happens when I run this:
F0_METHOD = "dio" #@param ["crepe", "crepe-tiny", "parselmouth", "dio", "harvest"]
!svc pre-hubert -fm {F0_METHOD}

@34j
Copy link
Collaborator

34j commented Apr 3, 2023

please add -n 2

@Meldoner
Copy link
Contributor Author

Meldoner commented Apr 3, 2023

please add -n 2

where?

@Meldoner
Copy link
Contributor Author

Meldoner commented Apr 3, 2023

please add -n 2

And you know what I have to do to continue training model? Which code I don't need to execute

@34j
Copy link
Collaborator

34j commented Apr 3, 2023

last line
nothing special is required for resuming

@Meldoner
Copy link
Contributor Author

Meldoner commented Apr 3, 2023

nothing special is required for resuming

So I just have to execute all the code again one by one?

@Meldoner
Copy link
Contributor Author

Meldoner commented Apr 3, 2023

The previous error is gone, now there is a new one when I run this code:
%load_ext tensorboard
%tensorboard --logdir drive/MyDrive/so-vits-svc-fork/logs/44k
!svc train --model-path drive/MyDrive/so-vits-svc-fork/logs/44k

error:
RuntimeError: The size of tensor a (256) must match the size of tensor b (768) at non-singleton dimension 1

/usr/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 2 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

@Meldoner
Copy link
Contributor Author

Meldoner commented Apr 3, 2023

do I also need to add -n 2?

@34j
Copy link
Collaborator

34j commented Apr 3, 2023

This is a bug. I'm going to bed now, so please use version 2 or earlier, or add -t so-vits-svc-4.0-v1-legacy to pre-config,

@34j
Copy link
Collaborator

34j commented Apr 3, 2023

This is a bug. I'm going to bed now, so please use version 2 or earlier, or add -t so-vits-svc-4.0v1-legacy to pre-config

@Meldoner
Copy link
Contributor Author

Meldoner commented Apr 3, 2023

oh okay, thanks anyway

@Meldoner
Copy link
Contributor Author

Meldoner commented Apr 3, 2023

Can you tell me if the parameter "-t so-vits-svc-4.0-v1-legacy" makes the quality of the model worse or not?

@kin0303
Copy link

kin0303 commented Apr 4, 2023

please add -n 2

where?

I also have the same problem, -and 2 is put like below

F0_METHOD = "dio" #param ["crepe", "crepe-tiny", "parselmouth", "dio", "harvest"]
!svc pre-hubert -fm {F0_METHOD} -n 2

You can run !svc pre-hubert -h to know what parameters you can use

@Meldoner
Copy link
Contributor Author

Meldoner commented Apr 4, 2023

You can run !svc pre-hubert -h to know what parameters you can use

I get it, only I have another problem, I wrote about it above

@Meldoner
Copy link
Contributor Author

Meldoner commented Apr 4, 2023

@34j do you know how to fix the bug?

@34j
Copy link
Collaborator

34j commented Apr 4, 2023

😴

@Meldoner
Copy link
Contributor Author

Meldoner commented Apr 4, 2023

😴

please help me when you come back

@AlonDan
Copy link

AlonDan commented Apr 4, 2023

I get similar bugs in Colab, is there a new / update Colab version or a fix I should try again?
Thanks ahead 🙏

@kin0303
Copy link

kin0303 commented Apr 5, 2023

You can run !svc pre-hubert -h to know what parameters you can use

I get it, only I have another problem, I wrote about it above

Have you tried this:
!svc pre-hubert -fm {F0_METHOD} -n 2
I also had the same problem, but it worked after adding -n 2

@34j
Copy link
Collaborator

34j commented Apr 20, 2023

@allcontributors add Meldoner bug

1 similar comment
@34j
Copy link
Collaborator

34j commented Apr 20, 2023

@allcontributors add Meldoner bug

@allcontributors
Copy link
Contributor

@34j

I've put up a pull request to add @Meldoner! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants