Relabel multiscale connected components #31
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR updates the
multiscale_unwrap()
function to perform a post-processing step that relabels the connected components resulting from tiled unwrapping using the coarse-unwrapped connected components.When a large interferogram is unwrapped using a tiled unwrapping approach, each tile is independently assigned connected component labels. This can cause some issues for interpreting the resulting connected components:
The relabeling step attempts to address these issues by assigning each connected component a new label based on the low-resolution (i.e. coarse-unwrapped) connected component that it most overlapped with. Two or more high-res connected components that overlapped with the same low-res connected component will be assigned the same final label. High-res connected components that overlapped with different low-res connected components will be assigned distinct labels. Each high-res connected component that didn't overlap with any low-res component will be assigned a new unique label.
It's possible for the user to specify a minimum overlap fraction via the
min_conncomp_overlap
parameter. If the intersection between a high-res and low-res component (as a fraction of the area of the high-res component) is below this threshold, then the two won't be considered overlapping for purposes of relabeling.The final set of connected components are assigned sequential positive integer labels [1, 2, ..., N], where N is the total number of unique components.
Implementing the relabeling step required some refactoring of the implementation of the
multiscale_unwrap()
function. The relabeling process needs to see the full set of connected component labels from all tiles, which requires the Dask task graph to be computed for each tile prior to relabeling. Later on, when we store the final unwrapped phase and connected component labels in their respective output datasets, the task graph would need to be re-computed in order to retrieve the unwrapped phase from each tile. So each tile would get unwrapped twice(!!) -- once during relabeling and once more during the finaldask.array.store()
step.I've avoided this by writing the intermediate connected component labels arrays to temporary binary files prior to relabeling. I don't expect this to have much impact on runtime, since much of the latency of writing to disk should be hidden by parallel processing of different tiles, but it does make the code much messier. I haven't been able to think of a better approach so far, though.