Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Split algorithm added #136

Merged
merged 37 commits into from
Nov 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
b156b45
Merge Split algorithm added
kelcyno Jun 6, 2022
b550e3b
black formatting
Jun 16, 2022
edc5f25
black formatting
Jun 16, 2022
d1f23b4
Update merge_split.py
kelcyno Jul 3, 2022
d3aa8ea
Ran Black formatting on file.
kelcyno Jul 7, 2022
cbc80f8
Allowed optional import of additional packages: geopy, and networkx.
kelcyno Jul 7, 2022
d94e390
Merge branch 'main' into merge_split
kelcyno Aug 12, 2022
6ba5d41
Updated with suggestions from code reviewers, and tobac v1.3.
kelcyno Aug 18, 2022
1c3c04b
Merge branch 'Hotfix' into merge_split
kelcyno Sep 1, 2022
9d05857
Black formatting
kelcyno Sep 7, 2022
a25f984
Black formatting
kelcyno Sep 7, 2022
c25f588
Updates to documentation and merge_split
kelcyno Sep 18, 2022
6327c8d
Formatting
kelcyno Sep 18, 2022
99cb179
Updated utils.py with changes made for 1.4
kelcyno Sep 19, 2022
6391513
Updating Cell numbering.
kelcyno Sep 25, 2022
b9c4d26
None type error in Projection option
kelcyno Sep 25, 2022
c6c8766
Merge branch 'RC_v1.4.0' into merge_split
freemansw1 Sep 27, 2022
fb18413
Deleting the build/lib file that is not needed for the PR
kelcyno Oct 3, 2022
2a9255c
Updating documentation, and merge/split methods.
kelcyno Oct 4, 2022
17339eb
Merge branch 'merge_split' of https://github.com/kelcyno/tobac into m…
kelcyno Oct 4, 2022
53cf5b3
Black formatting
kelcyno Oct 4, 2022
f23b6b3
Testing update
kelcyno Oct 11, 2022
dd88c30
removed redundant text and fixed formatting for RTD page
Oct 25, 2022
51d87b9
solve merge conflict by pulling in latest changes from RC_v1.4.0.
Oct 25, 2022
971b0cb
Merge branch 'RC_v1.4.0' of github.com:tobac-project/tobac into merge…
freemansw1 Oct 31, 2022
9c0e33e
black formatting
freemansw1 Oct 31, 2022
7e62652
added merge_split to init; changed start track number to 0 from -1
freemansw1 Oct 31, 2022
a40b27d
added basic merge test
freemansw1 Oct 31, 2022
fa09e96
removed final pass
freemansw1 Oct 31, 2022
8412ba4
Merge pull request #1 from freemansw1/merge_split_tests
kelcyno Nov 5, 2022
5354ebe
Update dimension names, correct the merged track numbering, etc.
kelcyno Nov 5, 2022
9e722f8
Updated for formatting
kelcyno Nov 5, 2022
019d0f3
Removed unnecessary print statement
kelcyno Nov 5, 2022
ef48aa9
added merge_split to the API reference
freemansw1 Nov 7, 2022
4b11015
Updates with documentation and removing duplicated dxy lines in merge…
kelcyno Nov 7, 2022
d85e19c
Merge branch 'merge_split' of https://github.com/kelcyno/tobac into m…
kelcyno Nov 7, 2022
800fdfe
Renamed the function using the postfix MEST, and corrected a track nu…
kelcyno Nov 8, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,13 @@ The project is currently being extended by several contributors to include addit

linking
tracking_output

.. toctree::
:caption: Merge/Split
:maxdepth: 2

merge_split
merge_split_out_vars

.. toctree::
:caption: API Reference
Expand Down
43 changes: 43 additions & 0 deletions doc/merge_split.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
Merge and Split
======================

This submodule is a post processing step to address tracked cells which merge/split.
The first iteration of this module is to combine the cells which are merging but have received a new cell id (and are considered a new cell) once merged.
This module uses a minimum euclidian spanning tree to combine merging cells, thus the postfix for the function is MEST.
This submodule will label merged/split cells with a TRACK number in addition to its CELL number.

Features, cells, and tracks are combined using parent/child nomenclature.
(quick note on terms; “feature” is a detected object at a single time step (see :doc:`feature_detection_overview`). “cell” is a series of features linked together over multiple timesteps (see :doc:`linking`). "track" may be an individual cell or series of cells which have merged and/or split.)

Overview of the output dataframe from merge_split

d : `xarray.core.dataset.Dataset`

xarray dataset of tobac merge/split cells with parent and child designations.

Parent/child variables include:

* cell_parent_track_id: The associated track id for each cell. All cells that have merged or split will have the same parent track id. If a cell never merges/splits, only one cell will have a particular track id.

* feature_parent_cell_id: The associated parent cell id for each feature. All feature in a given cell will have the same cell id.

* feature_parent_track_id: The associated parent track id for each feature. This is not the same as the cell id number.

* track_child_cell_count: The total number of features belonging to all child cells of a given track id.

* cell_child_feature_count: The total number of features for each cell.


Example usage:

``d = merge_split_MEST(Track)``

merge_split outputs an `xarray` dataset with several variables. The variables, (with column names listed in the `Variable Name` column), are described below with units. Coordinates and dataset dimensions are Feature, Cell, and Track.

Variables that are common to all feature detection files:

.. csv-table:: tobac Merge_Split Track Output Variables
:file: ./merge_split_out_vars.csv
:widths: 3, 35, 3, 3
:header-rows: 1

9 changes: 9 additions & 0 deletions doc/merge_split_out_vars.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Variable Name,Description,Units,Type
feature,Unique number of the feature; starts from 1 and increments by 1 to the number of features identified in all frames,n/a,int64
cell,Tracked cell number; generally starts from 1. Untracked cell value is -1.,n/a,int64
track,Unique number of the track; starts from 0 and increments by 1 to the number of tracks identified. Untracked cells and features have a track id of -1.,n/a,int64
cell_parent_track_id,"The associated track id for each cell. All cells that have merged or split will have the same parent track id. If a cell never merges/splits, only one cell will have a particular track id.",n/a,int64
feature_parent_cell_id,The associated parent cell id for each feature. All feature in a given cell will have the same cell id.,n/a,int64
feature_parent_track_id,The associated parent track id for each feature. This is not the same as the cell id number.,n/a,int64
track_child_cell_count,The number of features belonging to all child cells of a given track id.,n/a,int64
cell_child_feature_count,The number of features for each cell.,n/a,int64
8 changes: 8 additions & 0 deletions doc/tobac.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,14 @@ tobac.feature\_detection module
:undoc-members:
:show-inheritance:

tobac.merge_split module
---------------------

.. automodule:: tobac.merge_split
:members:
:undoc-members:
:show-inheritance:

tobac.plotting module
---------------------

Expand Down
1 change: 1 addition & 0 deletions tobac/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@
from .tracking import linking_trackpy
from .wrapper import maketrack
from .wrapper import tracking_wrapper
from . import merge_split

# Set version number
__version__ = "1.4.0"
225 changes: 225 additions & 0 deletions tobac/merge_split.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
"""
Tobac merge and split
This submodule is a post processing step to address tracked cells which merge/split.
The first iteration of this module is to combine the cells which are merging but have received
a new cell id (and are considered a new cell) once merged. In general this submodule will label merged/split cells
with a TRACK number in addition to its CELL number.

"""


def merge_split_MEST(TRACK, dxy, distance=None, frame_len=5):
"""
function to postprocess tobac track data for merge/split cells using a minimum euclidian spanning tree


Parameters
----------
TRACK : pandas.core.frame.DataFrame
Pandas dataframe of tobac Track information

dxy : float, mandatory
The x/y grid spacing of the data.
Should be in meters.


distance : float, optional
kelcyno marked this conversation as resolved.
Show resolved Hide resolved
Distance threshold determining how close two features must be in order to consider merge/splitting.
kelcyno marked this conversation as resolved.
Show resolved Hide resolved
Default is 25x the x/y grid spacing of the data, given in dxy.
The distance should be in units of meters.

frame_len : float, optional
Threshold for the maximum number of frames that can separate the end of cell and the start of a related cell.
Default is five (5) frames.

Returns
-------

d : xarray.core.dataset.Dataset
xarray dataset of tobac merge/split cells with parent and child designations.

Parent/child variables include:
- cell_parent_track_id: The associated track id for each cell. All cells that have merged or split will have the same parent track id. If a cell never merges/splits, only one cell will have a particular track id.
- feature_parent_cell_id: The associated parent cell id for each feature. All features in a given cell will have the same cell id. This is the original TRACK cell_id.
- feature_parent_track_id: The associated parent track id for each feature. This is not the same as the cell id number.
- track_child_cell_count: The total number of features belonging to all child cells of a given track id.
- cell_child_feature_count: The total number of features for each cell.


Example usage:
kelcyno marked this conversation as resolved.
Show resolved Hide resolved
d = merge_split_MEST(Track)
ds = tobac.utils.standardize_track_dataset(Track, refl_mask)
both_ds = xr.merge([ds, d],compat ='override')
both_ds = tobac.utils.compress_all(both_ds)
both_ds.to_netcdf(os.path.join(savedir,'Track_features_merges.nc'))

"""
try:
import networkx as nx
except ImportError:
networkx = None
kelcyno marked this conversation as resolved.
Show resolved Hide resolved

import logging
import numpy as np
from pandas.core.common import flatten
import xarray as xr
from scipy.spatial.distance import cdist

# Immediately convert pandas dataframe of track information to xarray:
TRACK = TRACK.to_xarray()
track_groups = TRACK.groupby("cell")
first = track_groups.first()
last = track_groups.last()

if distance is None:
distance = dxy * 25.0

a_names = list()
b_names = list()
dist = list()

# write all sets of points (a and b) as Nx2 arrays
l = len(last["hdim_2"].values)
cells = first["cell"].values
a_xy = np.zeros((l, 2))
a_xy[:, 0] = last["hdim_2"].values * dxy
a_xy[:, 1] = last["hdim_1"].values * dxy
b_xy = np.zeros((l, 2))
b_xy[:, 0] = first["hdim_2"].values * dxy
b_xy[:, 1] = first["hdim_1"].values * dxy
# Use cdist to find distance matrix
out = cdist(a_xy, b_xy)
# Find all cells under the distance threshold
j = np.where(out <= distance)

# Compile cells meeting the criteria to an array of both the distance and cell ids
a_names = cells[j[0]]
b_names = cells[j[1]]
dist = out[j]

# This is inputing data to the object which will perform the spanning tree.
g = nx.Graph()
for i in np.arange(len(dist)):
g.add_edge(a_names[i], b_names[i], weight=dist[i])

tree = nx.minimum_spanning_edges(g)
tree_list = list(tree)

new_tree = []

# Pruning the tree for time limits.
for i, j in enumerate(tree_list):
kelcyno marked this conversation as resolved.
Show resolved Hide resolved
frame_a = np.nanmax(track_groups[j[0]].frame.values)
frame_b = np.nanmin(track_groups[j[1]].frame.values)
if np.abs(frame_a - frame_b) <= frame_len:
new_tree.append(tree_list[i][0:2])
new_tree_arr = np.array(new_tree)

TRACK["cell_parent_track_id"] = np.zeros(len(TRACK["cell"].values))
cell_id = np.unique(
TRACK.cell.values.astype(int)[~np.isnan(TRACK.cell.values.astype(int))]
kelcyno marked this conversation as resolved.
Show resolved Hide resolved
)
track_id = dict() # same size as number of total merged tracks

# Cleaning up tracks, combining tracks which contain the same cells.
arr = np.array([0])
kelcyno marked this conversation as resolved.
Show resolved Hide resolved
for p in cell_id:
j = np.where(arr == int(p))
if len(j[0]) > 0:
continue
else:
k = np.where(new_tree_arr == p)
if len(k[0]) == 0:
track_id[p] = [p]
arr = np.append(arr, p)
else:
temp1 = list(np.unique(new_tree_arr[k[0]]))
temp = list(np.unique(new_tree_arr[k[0]]))

for l in range(len(cell_id)):
JuliaKukulies marked this conversation as resolved.
Show resolved Hide resolved
for i in temp1:
k2 = np.where(new_tree_arr == i)
temp.append(list(np.unique(new_tree_arr[k2[0]]).squeeze()))
kelcyno marked this conversation as resolved.
Show resolved Hide resolved
temp = list(flatten(temp))
temp = list(np.unique(temp))

if len(temp1) == len(temp):
break
temp1 = np.array(temp)

for i in temp1:
k2 = np.where(new_tree_arr == i)
temp.append(list(np.unique(new_tree_arr[k2[0]]).squeeze()))

temp = list(flatten(temp))
temp = list(np.unique(temp))
arr = np.append(arr, np.unique(temp))

track_id[np.nanmax(np.unique(temp))] = list(np.unique(temp))

cell_id = list(np.unique(TRACK.cell.values.astype(int)))
logging.debug("found cell ids")

cell_parent_track_id = np.zeros(len(cell_id))
cell_parent_track_id[:] = -1

for i, id in enumerate(track_id, start=0):
for j in track_id[int(id)]:
cell_parent_track_id[cell_id.index(j)] = int(i)

logging.debug("found cell parent track ids")

track_ids = np.array(np.unique(cell_parent_track_id))
logging.debug("found track ids")

feature_parent_cell_id = list(TRACK.cell.values.astype(int))
logging.debug("found feature parent cell ids")

# # This version includes all the feature regardless of if they are used in cells or not.
feature_id = list(TRACK.feature.values.astype(int))
logging.debug("found feature ids")

feature_parent_track_id = []
feature_parent_track_id = np.zeros(len(feature_id))
for i, id in enumerate(feature_id):
cellid = feature_parent_cell_id[i]
if cellid < 0:
feature_parent_track_id[i] = -1
else:
feature_parent_track_id[i] = cell_parent_track_id[cell_id.index(cellid)]

track_child_cell_count = np.zeros(len(track_id))
for i, id in enumerate(track_id):
track_child_cell_count[i] = len(np.where(cell_parent_track_id == i)[0])
logging.debug("found track child cell count")

cell_child_feature_count = np.zeros(len(cell_id))
for i, id in enumerate(cell_id):
cell_child_feature_count[i] = len(track_groups[id].feature.values)
logging.debug("found cell child feature count")

track_dim = "track"
cell_dim = "cell"
feature_dim = "feature"

d = xr.Dataset(
{
"track": (track_dim, track_ids),
"cell": (cell_dim, cell_id),
"cell_parent_track_id": (cell_dim, cell_parent_track_id),
"feature": (feature_dim, feature_id),
"feature_parent_cell_id": (feature_dim, feature_parent_cell_id),
"feature_parent_track_id": (feature_dim, feature_parent_track_id),
"track_child_cell_count": (track_dim, track_child_cell_count),
"cell_child_feature_count": (cell_dim, cell_child_feature_count),
}
)

d = d.set_coords(["feature", "cell", "track"])

# assert len(cell_id) == len(cell_parent_track_id)
# assert len(feature_id) == len(feature_parent_cell_id)
# assert sum(track_child_cell_count) == len(cell_id)
# assert sum(cell_child_feature_count) == len(feature_id)

return d
Loading