Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explanations of parameters #45

Closed
zxdawn opened this issue Feb 5, 2021 · 13 comments
Closed

Explanations of parameters #45

zxdawn opened this issue Feb 5, 2021 · 13 comments
Labels
documentation Updates and improvements to documentation and examples
Milestone

Comments

@zxdawn
Copy link
Collaborator

zxdawn commented Feb 5, 2021

Here're some explanations of basic parameters. Most are copied from the codes.

My questions and unfamiliar parts are in bold.


Feature identification

To identify features use tobac.themes.tobac_v1.feature_detection_multithreshold method.

Tobac allows getting features by checking data above/below one threshold or multiple thresholds (recommended).

  • target

    Flag to determine if tracking is targeting minima or maxima in the data. Default is 'maximum'.

  • position_threshold

    It sets the method to determine the position from the region. Default is 'center'.

    Four options are available: center, extreme, weighted_diff or weighted_abs.

    • center

      geometrical center of identified region

    • extreme

      max/min position inside the identified region

    • weighted_diff

      center of identified region, weighted by difference from the threshold

    • weighted_abs

      center of identified region, weighted by absolute values of the field

  • sigma_threshold

    Standard deviation for intial filtering step. Default is 0.5

    I'm not quite sure of the definition above ...

    Here's my opinion: That is how many pixels are used to apply a Gaussian filter. How could 0.5 is used to smooth the data. Shouldn't it larger than 1 at least?

  • n_erosion_threshold

    Number of pixel by which to erode the identified features. Default is 0 which means keep the mask as clear as it is.

  • n_min_threshold

    Minimum number of identified features. Default is 0.

    If the number of pixel in masked region is less than n_min_threshold, the region is deleted.

  • min_distance

    Minimum distance between detected features. Default is 0.

    Remove features that are closer together than min_distance and keep larger one (higher threshold). if threshold is same, keep the larger area.

This is an example to track the minimum TBB:

parameters_features = {}
parameters_features['target'] = 'minimum'
parameters_features['position_threshold'] = 'weighted_diff'
parameters_features['sigma_threshold'] = 0.5
parameters_features['threshold'] = np.arange(280,190,-10)
parameters_features['min_distance'] = 0
parameters_features['n_erosion_threshold'] = 0

dxy = 5000 # unit: m
Features=tobac.themes.tobac_v1.feature_detection_multithreshold(TBB, dxy, **parameters_features)

Segmentation

  • threshold

    Threshold for the watershedding field to be used for the mask. Default is 3e-3.

    The algorithm fills the area (2-D) or volume (3-D) based on the input field starting from the weighted mean centers until reaching the threshold.

  • target

    Flag to determine if tracking is targeting minima or maxima in the data. Default is 'maximum'.

    This can be different from the target in parameters_features.

  • level

    Levels at which to seed the cells for the watershedding algorithm. Default is None.

    I'm not familiar with iris. So not sure the meaning of level

  • max_distance

    Maximum distance from a marker allowed to be classified as belonging to that cell. Default is None. Unit is m which is as same as dxy

Continue the example:

parameters_segmentation={}
parameters_segmentation['target']='minimum'
parameters_segmentation['method']='watershed'
parameters_segmentation['threshold'] = 280
Mask_TBB, Features_TBB = tobac.themes.tobac_v1.segmentation(Features, TBB, dxy, **parameters_segmentation)

Linking

  • dt

    Time resolution of tracked features. Unit: s

  • d_max

    Maximum search range Default is None.

  • d_min
    Variations in the shape of the regions used to determine the positions of the features can lead to quasi-instantaneous shifts of the position of the feature by one or two grid cells even for a very high temporal resolution of the input data, potentially jeopardising the tracking procedure. To prevent this, tobac uses an additional minimum radius of the search range.
    Default is None. Unit: m

  • v_max
    Speed at which features are allowed to move. Default is None. Unit: m/s

  • memory
    Number of output timesteps features allowed to vanish for to be still considered tracked. Default is 0.
    .. warning :: This parameter should be used with caution, as it can lead to erroneous trajectory linking, especially for data with low time resolution.

  • stubs
    Minimum number of timesteps of a tracked cell to be reported. Default is 1.

  • time_cell_min
    Minimum length in time of tracked cell to be reported in minutes. Default is None.

  • extrapolate
    Number or timesteps to extrapolate trajectories. Default is 0.

    This allows for the inclusion of both the initiation of the cell and the decaying later stages in the
    analysis of the cloud life cycle.

  • method_linking {'random', 'predict'}
    Flag choosing method used for trajectory linking. Default is 'random'.

    predict is useful for cloud and fluid tracking.

  • adaptive_step
    Reduce search range by multiplying it by this factor.

  • adaptive_stop
    If not None, when encountering an oversize subnet, retry by progressively reducing search_range until the subnet is solvable.

    If search_range becomes <= adaptive_stop, give up and raise a SubnetOversizeException. Default is None

  • cell_number_start
    Cell number for first tracked cell. Default is 1.

Example:

dt = 600 # unit: s
parameters_linking={}
parameters_linking['v_max'] = 20  # 20*600 = 12 km
parameters_linking['stubs'] = 3  # keeps only trajectories that last for 3 frames.
parameters_linking['order'] = 1
parameters_linking['extrapolate'] = 0
parameters_linking['memory'] = 0
parameters_linking['adaptive_stop'] = 0.2
parameters_linking['adaptive_step'] = 0.95
parameters_linking['subnetwork_size'] = 100
parameters_linking['d_min'] = 2*dxy
parameters_linking['method_linking']= 'predict'

Result

It looks good for tracking deep convections using 10 min TBB data:

case

@zxdawn zxdawn added the documentation Updates and improvements to documentation and examples label Feb 5, 2021
@zxdawn
Copy link
Collaborator Author

zxdawn commented Feb 5, 2021

I suppose we can use the diagrams in @mheikenfeld 's GMD paper to explain these parameters, like @w-k-jones has done for the Feature detection section.

@fsenf
Copy link
Member

fsenf commented Feb 5, 2021

Thanks for your comments!

Explanation of units should be part of the documentation, for sure! This needs to be integrated into the respective rst files.

Gaussian Filter: The filter width just determines how fast the weights for the convolution decay. sigma < 1 is therefore no problem.

@deeplycloudy
Copy link
Contributor

@zxdawn a +1 of thanks from me, as I needed this just now. Also, I would say the docstring for n_min_threshold is misleading, as it refers to features (tracked regions?) and not pixels, and the docstring for n_min is either absent or unclear. What is n_min counting? Is it tracked regions, or something else?

@zxdawn
Copy link
Collaborator Author

zxdawn commented Jan 5, 2022

@deeplycloudy You're welcome. Glad it's useful ;)

Which n_min do you mean? Could you copy the permalink here?

@deeplycloudy
Copy link
Contributor

Sorry, I meant min_num, such as in feature_detection_multithreshold_timestep

@zxdawn
Copy link
Collaborator Author

zxdawn commented Jan 5, 2022

Ha, it seems that's not used at all. The only appearance is here. But, that's commented. That should be the minimum number of features at one timestamp.

BTW, n_min_threshold is the num of pixels in each mask of features. See here if I understand correctly.

@deeplycloudy
Copy link
Contributor

Ah, maybe a candidate for removal in v. 2.0, then! Or at least deprecation. And thanks for the clarification on n_min_threshold - I agree with your interpretation.

@zxdawn
Copy link
Collaborator Author

zxdawn commented Jan 5, 2022

I suppose if there're n_1 features at t_1 and n_2 features at t_2, and the n_min is between n_1 and n_2, then that may cause the missing features at t_2, so the tracking won't work correctly.

@deeplycloudy
Copy link
Contributor

Looking at the default heuristic for search_range, the units for v_max are (grid spacing units)/(time step units)

@zxdawn
Copy link
Collaborator Author

zxdawn commented Jan 5, 2022

@deeplycloudy Yes, you're right. Because we usually use m for grid spacing units and s for time step units, the unit of v_max is m/s. See the get_spacings in utils. If you use other units, it should also work. Feel free to test ;)

@freemansw1 freemansw1 added this to the Version 2.0 milestone Mar 4, 2022
@freemansw1
Copy link
Member

I believe this is now being resolved in v1.x with #138 et al. I also think this has been resolved in v2.x, but correct me if I'm wrong. I'm inclined to close this when #138 is merged in, if you're happy with that @zxdawn ?

@zxdawn
Copy link
Collaborator Author

zxdawn commented Jun 29, 2022

@freemansw1 It's fine to me except for the clarification of the unit. Leave the review on that PR now ;)

@freemansw1
Copy link
Member

Now that #138 is in, I'm going to close this issue. There is still room for improvement on the docs, but I think we have addressed this specific issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Updates and improvements to documentation and examples
Projects
None yet
Development

No branches or pull requests

4 participants