-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for ndimage.map_coordinates #235
Comments
This seems like a Maybe I'm missing something, but what makes the Distributed case harder? It seems like this should be separable by chunks of coordinates. Or am I missing something? |
Yes, except that this is another case where the output isn't going to be the same shape as the input array, which complicates things. We had planned to have a discussion about this in general - I've sent an email to sort out scheduling for next week. import numpy as np
import scipy.ndimage as ndi
import dask.array as da
image = da.from_array(np.arange(100.).reshape((10, 10)), chunks=5)
ndi.map_coordinates(np.array(image), inds)
array([ 3.75660377, 24. ]) # correct output from scipy
image.map_blocks(ndimage.map_coordinates, coordinates=inds, drop_axis=[0,1]).compute()
array([ 3.75660377, 24. ]) # looks ok, but won't be robust to edge effects between array chunks
image.map_overlap(ndimage.map_coordinates, depth=3, coordinates=inds, drop_axis=[0,1]).compute()
array([18.36856823, 1. ]) # nope! |
I don't think you're missing anything, it seems like it should be achievable. |
I think we need to add the overlap depth to the indices we're looking up, but I can't quite wrap my head around what boundary conditions we need. |
Eg: depth_value = 3
image.map_overlap(ndimage.map_coordinates, depth=depth_value, coordinates=inds+depth_value, boundary=0.0, drop_axis=[0,1]).compute()
array([ 5.09632784, 24. ]) # still not correct |
@jakirkham the issue I'm pre-empting (but which might indeed not be super serious) is the coordinates I want to extract could be in any chunk, and presented in any order. So I can't just chunk my input, I have to somehow partition it according to the chunk in the raw data that I need to access. Or I guess I could send all chunks from the coordinates to all chunks of the raw data. But that seems wasteful. |
I'd also think that "stage 2" is not easy to solve optimally. For the case of stage 1 (which should cover most use cases for now): The problem I'd see with directly applying However, applying Alternatively, for every coordinate the required input slice could be determined (see |
Yeah was thinking when I wrote this last night. This sounds kind of like a shuffle, but have been spending a fair bit of time on that use case so may just be seeing use cases that are not there. |
True, seems like a similar use case! Probably a not too bad approximation to the "stage 2" case could be to take a more or less optimized "stage 1" implementation as described above and apply a |
Thinking again probably this would not be quite as straight forward, as if the coordinates are given as a dask array, the coordinates themselves and how they map onto the input array wouldn't be known at schedule time. Maybe launching tasks from within tasks could be useful here. |
Yeah that's why I'm thinking of shuffle as it is also making decisions based on values that are not known until computed. People have been spending a lot of time optimizing that problem. So we could benefit from that very easily if we are able to articulate our problem in those terms |
@jakirkham Do you have a link to some more details about the shuffle problem and how existing solution to it can be used? In the meantime #237 suggests an implementation for use case 1. |
This could be done in two stages:
The text was updated successfully, but these errors were encountered: