-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to parallelize? #4
Comments
I have the same observation but I think the challenges would be: (2) the tissue masking - essentially vahadane performs the dictionary learning to tissue pixels of each image, and therefore the dimensionality of the actual input of dictionary varies among the input images, depending on the tissue region. I would recommend you to simply cache the stain matrices of all images that may be reused to avoid recomputation, and/or use faster approaches to obtain stain concentrations (e.g., least square solver torch.linalg.lstsq) from OD and stain matrices if you have specific needs of time efficiency. An example of using least square to solve concentration is attached here, derived from @cwlkr 's codes. |
Hello, Unfortunately, this not really feasible. At least not in a straight forward manner. The problem lies in CUDA itself as far as I understand. In CUDA, tasks are accelerated by splitting a task into smaller simple step tasks that can be run in parallel on GPU kernels. CPU parallelization however runs the same task for different inputs in parallel. Unfortunately, in how CUDA is constructed these are not easily inter-mixable. As far as I know this is more related to shared memory issues, rather than the optimization algorithms themselves. There might be a fix nowadays with torch.multiprocessing, but I lack the time at the moment to investigate this further. If its a training situations, setting your num_workers to 16 more, usually still results in a good GPU utilization, as forward pass + backprob can anyhow take longer than the (parallelized) image normalization/augmentation. For this, I have seen that in SPAMS, it is better to set numThreads=1 and have a higher num_workers, as creating new threads all the time can be slow. |
Might be able to do it in a multi-GPU scenario while each GPU utilizes their own process (e.g., dask-cuda etc.) but that's up to how user creates their own workflows rather than what a stain normalization toolkit should resolve. |
I'm sorry to bother you, but is it possible to provide a graphics card for parallel processing of multiple slices at the same time? I use the for loop and it takes too long. An unknown error occurred when I used the from multiprocessing import Pool module. So I don't have any other options. thank you!
The text was updated successfully, but these errors were encountered: