Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Slurm scheduler sets --ncpus-per-task to a float #4530

Closed
pfebrer opened this issue Oct 29, 2020 · 1 comment · Fixed by #4555
Closed

The Slurm scheduler sets --ncpus-per-task to a float #4530

pfebrer opened this issue Oct 29, 2020 · 1 comment · Fixed by #4555

Comments

@pfebrer
Copy link

pfebrer commented Oct 29, 2020

Describe the bug

When submitting a job defining num_cores_per_machine, the scheduler is unable to submit the job because --ncpus-per-task is set to a float, which SLURM doesn't understand. The workchain gets then paused forever because it is unsuccesful at submitting the job.

Steps to reproduce

Using a slurm scheduler in a computer with mpiprocs_per_machine defined:

submit(CalculationClass, options={"resources": {"num_machines": x ,"num_cores_per_machine": y})

Your environment

  • Operating system [e.g. Linux]: Linux
  • Python version [e.g. 3.7.1]: 3.6.9
  • aiida-core version [e.g. 1.2.1]: 1.4.2

Other relevant software versions, e.g. Postres & RabbitMQ

Additional context

I think it's related to this part of the code:

# In this plugin we never used num_cores_per_machine so if it is not defined it is OK.
resources.num_cores_per_mpiproc = (resources.num_cores_per_machine / resources.num_mpiprocs_per_machine)
if isinstance(resources.num_cores_per_mpiproc, int):
raise ValueError(
'`num_cores_per_machine` must be equal to `num_cores_per_mpiproc * num_mpiprocs_per_machine` and in'
' particular it should be a multiple of `num_cores_per_mpiproc` and/or `num_mpiprocs_per_machine`'
)

As I understand, this division should return an integer, but the opposite check is performed. I don't know, maybe I'm wrong.

@chrisjsewell
Copy link
Member

thanks @pfebrer, I will close this by fixing the bug that you noted. But feel free to re-open if you still find an issue after

@sphuber sphuber added this to the v1.5.0 milestone Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants