Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relabel multiscale connected components #31

Merged
merged 1 commit into from
Sep 20, 2023

Conversation

gmgunter
Copy link
Member

This PR updates the multiscale_unwrap() function to perform a post-processing step that relabels the connected components resulting from tiled unwrapping using the coarse-unwrapped connected components.

When a large interferogram is unwrapped using a tiled unwrapping approach, each tile is independently assigned connected component labels. This can cause some issues for interpreting the resulting connected components:

  • Labels may not be unique across tiles. Two different components in two different tiles may be assigned the same integer label.
  • If a contiguous region of reliable unwrapped phase spans multiple tiles, it may be assigned different labels in each of the different tiles.

The relabeling step attempts to address these issues by assigning each connected component a new label based on the low-resolution (i.e. coarse-unwrapped) connected component that it most overlapped with. Two or more high-res connected components that overlapped with the same low-res connected component will be assigned the same final label. High-res connected components that overlapped with different low-res connected components will be assigned distinct labels. Each high-res connected component that didn't overlap with any low-res component will be assigned a new unique label.

It's possible for the user to specify a minimum overlap fraction via the min_conncomp_overlap parameter. If the intersection between a high-res and low-res component (as a fraction of the area of the high-res component) is below this threshold, then the two won't be considered overlapping for purposes of relabeling.

The final set of connected components are assigned sequential positive integer labels [1, 2, ..., N], where N is the total number of unique components.

Implementing the relabeling step required some refactoring of the implementation of the multiscale_unwrap() function. The relabeling process needs to see the full set of connected component labels from all tiles, which requires the Dask task graph to be computed for each tile prior to relabeling. Later on, when we store the final unwrapped phase and connected component labels in their respective output datasets, the task graph would need to be re-computed in order to retrieve the unwrapped phase from each tile. So each tile would get unwrapped twice(!!) -- once during relabeling and once more during the final dask.array.store() step.

I've avoided this by writing the intermediate connected component labels arrays to temporary binary files prior to relabeling. I don't expect this to have much impact on runtime, since much of the latency of writing to disk should be hidden by parallel processing of different tiles, but it does make the code much messier. I haven't been able to think of a better approach so far, though.

The `multiscale_unwrap()` function is updated to perform a
post-processing step that relabels the connected components resulting
from tiled unwrapping using the coarse-unwrapped connected components.

When a large interferogram is unwrapped using a tiled unwrapping
approach, each tile is independently assigned connected component
labels. This can cause some issues for interpreting the resulting
connected components:

* Labels may not be unique across tiles. Two different components in two
  different tiles may be assigned the same integer label.
* If a region of reliable unwrapped phase spans multiple tiles, it may
  be assigned different labels in each of the different tiles.

The relabeling step attempts to address these issues by assigning each
connected component a new label based on the low-resolution (i.e.
coarse-unwrapped) connected component that it most overlapped with. Two
or more high-res connected components that overlapped with the same
low-res connected component will be assigned the same final label.
High-res connected components that most overlapped with different
low-res connected components will be assigned distinct labels. Each
high-res connecteed component that didn't overlap with any low-res
component will be assigned a new unique label.

It's possible for the user to specify a minimum overlap fraction via the
`min_conncomp_overlap` parameter. If the intersection between a
high-res and low-res component (as a fraction of the area of the
high-res component) is below this threshold, then the two won't be
considered overlapping for purposes of relabeling.

The final set of connected components are assigned sequential positive
integer labels [1, 2, ..., N], where N is the total number of unique
components.

Implementing the relabeling step required some refactoring of the
implementation of the `multiscale_unwrap()` function.

The relabeling process needs to see the full set of connected component
labels from all tiles, which requires the Dask task graph to be computed
for each tile prior to relabeling. Later on, when we store the final
unwrapped phase and connected component labels in their respective
output datasets, the task graph would need to be re-computed in order to
retrieve the unwrapped phase from each tile. So each tile would get
unwrapped twice(!!) -- once during relabeling and once more during the
final `dask.array.store()` step.

I've avoided this by writing the intermediate connected component labels
arrays to temporary binary files prior to relabeling. I don't expect
this to have much impact on runtime, since much of the latency of
writing to disk should be hidden by parallel processing of different
tiles, but it does make the code much messier. I haven't been able to
think of a better approach so far, though.
@gmgunter
Copy link
Member Author

This figure shows an example of the relabeling results. The left plot shows the connected components resulting from coarse unwrapping. The middle plot shows the connected components from tiled unwrapping, using a 3x3 grid of tiles. The right plot shows the final relabeled components.

conncomp-labels-v2

@codecov
Copy link

codecov bot commented Sep 14, 2023

Codecov Report

Patch coverage: 97.97% and project coverage change: +0.30% 🎉

Comparison is base (1cb3a47) 97.55% compared to head (4fabcbf) 97.86%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #31      +/-   ##
==========================================
+ Coverage   97.55%   97.86%   +0.30%     
==========================================
  Files           8        9       +1     
  Lines         696      797     +101     
==========================================
+ Hits          679      780     +101     
  Misses         17       17              
Files Changed Coverage Δ
src/tophu/_label.py 96.72% <96.72%> (ø)
src/tophu/_multiscale.py 96.36% <96.87%> (+1.20%) ⬆️
src/tophu/__init__.py 100.00% <100.00%> (ø)
src/tophu/_util.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gmgunter gmgunter added this to the v0.1 milestone Sep 18, 2023
@gmgunter gmgunter merged commit 62ab5a2 into isce-framework:main Sep 20, 2023
6 checks passed
@gmgunter gmgunter deleted the conncomp-labels-v2 branch September 20, 2023 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant