-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA initialized before forking #115
Comments
Hey, thx for reporting this issue! Have you tried moving the import of anything |
I concur, it's likely the |
You guys are awesome, it worked :) Looking at this fbpic example, they use |
While Github issues are usually not meant for tech support, I suggest we troubleshoot this as part of this issue, because it is a problem that we need to generally solve. The issue is that each operation is executed completely independently so there is no way to tell each operation what GPU to use. One way we could mitigate that is to assign each process some kind of "task number". This task number could then be stored for instance in an environment variable, read by the operation and used to compute which GPU to run on. In your example, that would look like this:
Would any of the @glotzerlab/signac-developers want to give it a shot? This would be an alternative solution to the aggregation approach explored by @jglaser . |
@joaander Josh, this issue is similar to one that you brought up a while back. I believe that what I'm suggesting here is in line with what you proposed back then? |
@csadorf Your proposed solution would provide efficient scheduling provided that 1) The number of parallel tasks was limited to the number of GPUs in the system and 2) All tasks take exactly the same amount of time. If either of these requirements is not met, this solution will result in situations where some GPUs may go unused at times and/or some GPUs may have multiple tasks assigned at times. This may or may not be desirable. signac is not a resource manager or job scheduler and is not aware of the hardware on the system, the time it takes to run tasks, or what users are on the system. Such a system (i.e. SLURM in conjunction with the signac-flow submit functionality) would be required to obtain ideal scheduling on a multi-user system. @berceanu If you are on a single-user workstation, you could consider enabling compute exclusive mode on your GPUs so the CUDA driver can auto-assign tasks to free GPUs. You would need to limit the amount of parallelism to the number of GPUs in the system. |
@joaander I set the compute mode on all 8 GPUs to "E. Process". Now I get this error after the first operation completes on the first GPU:
|
With compute exclusive mode, attempting to acquire a CUDA context will result in an error if there are no free GPUs. Are you sure all 8 GPUs are free? Try a smaller number and see if that works. Check |
Yes, I tried, two are not free, so I reduced it to 6 but still get the same problem. |
@csadorf Does signac-flow reuse processes for multiple tasks? This would explain this behavior. Is there a way to make it launch a new process for each task? With reused processes you would need to clean up and destroy the CUDA context at the end of each task so the GPU is free for the next one. The library you are using would need to provide an API call to destroy the context. |
@joaander Whenever possible, yes. Because it is much faster for smaller operations to avoid forking, which the However, it is possible to suppress that behavior by specifying the executable manually, e.g. with |
I documented the work-around here: glotzerlab/signac-docs#27 |
I just noticed a big inconvenience in the above work-around, that is that one has to run |
Yes, this inconvenience is currently addressed as part of PR #114 . I hope that we will be able to release this soon. I propose as a work-around until then, you could define a meta-operation manually by simply calling both functions in another function which is the one you actually submit. |
@csadorf I think since we declined to merge glotzerlab/signac-docs#27 we have decided that handling GPU scheduling is out of scope for signac-flow. Are you fine with closing this issue? The |
Before we close the issue, I'd be interested to know whether it can be resolved with groups on the user site. |
That's reasonable, if there is such a solution we could at least document that. |
Solved via |
Description
I am trying to integrate
fbpic
, a well-known CUDA code (based on Python + Numba) for laser-plasma simulation withsignac
. The integration repo is signac-driven-fbpic.I managed to succesfully run on a single GPU, via
python3 src/project.py run
from inside thesignac
folder, but if I add--parallel
I getnumba.cuda.cudadrv.error.CudaDriverError: CUDA initialized before forking
The goal is to get 8 (independent) copies of
fbpic
(with different input params) running in parallel on the 8 NVIDIA P100 GPUs that are on the same machine.To reproduce
Clone the
signac-driven-fbpic
repo and follow the install instructions. Then go to thesignac
subfolder, and doError output
Relevant numba link.
System configuration
The text was updated successfully, but these errors were encountered: