Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pose tracks io #33

Merged
merged 80 commits into from
Sep 18, 2023
Merged

Pose tracks io #33

merged 80 commits into from
Sep 18, 2023

Conversation

niksirbi
Copy link
Member

@niksirbi niksirbi commented Jul 21, 2023

This PR thoroughly overhauls input/output (I/O) functionalities for pose estimations data. First I will give a summary of the added features, and then I will explain some of them a bit more in detail.

Main changes

Where to start

To get a better understanding of the new functionalities, it's best to start by reading the docs, which include:

  • home page
  • getting started guide, which explains how to load and save data and how to work with movement pose datasets
  • examples: currently contains only one example, which loads and plots the Aeon sample data with the 3 mice
  • API reference with an overview of the modules and detailed function/class signatures.

Details

Loading pose estimation data

There are two main user-facing functions for this, in the movement.io.load_poses module

  • from_sleap_file(): this currently loads SLEAP analysis.h5 files, and potentially also .slp files containing predicted instances (though this is experimental). Anyhow, the SLEAP docs themselves encourage using the .h5 files for downstream analysis.
  • from_dlc_file(): DeepLabCut stores pose estimation predictions in a pandas DataFrame, saved either as .h5 or as .csv. This function can load both. First, a pandas DataFrame is loaded from .h5 or .csv and then the from_dlc_df() function is called. The user can also directly call from_dlc_df() (for example if they have already imported the DeepLabCut DataFrame by other means).
    All loading functions return a xarray.Dataset object (see next point).

Saving pose estimation data

Currently, I have only implemented movement.io.save_poses.to_dlc_file() which saves the data to a DeepLabCut-style .h5 or .csv file. Internally it calls the movement.io.save_poses.to_dlc_df() function, which convert the data from xarray.Dataset to pandas.DataFrame before saving it.

Common representation for pose tracking data

Regardless of where data is loaded from, pose tracks are represented in movement a xarray.Dataset, containing two data variables as xarray.DataArrays: pose_tracks and confidence.

Pose tracks are essentially an array with shape (frames, individuals, keypoints, space), which is meant to capture a variety of tracking data. The dimensions are labelled with meaningful coordinates (see the relevant docs section for details).

The confidence array has shape (frames, individuals, keypoints), and holds point-wise prediction scores - i.e. point-wise confidence scores form SLEAP of the "likelihood" score from DeepLabCut. This will probably be useful later for preprocessing.

In the future, if we need to extend movement's xarray.Dataset object with specialised functionalities, the recommended way of doing that is through accessors instead of inheritance (subclassing). I've already started implementing a PosesAccessor object (in movement.io.poses_accessor.py). Currently, it only implements a validate() method, but more methods can be added as needed.

Validation

I wrote some custom validator classes with attrs mainly for validating the files from which the data is loaded or written to.
They are organised into four classes for now:

  • ValidFile: generic validator for file paths
  • ValidHDF5: checks integrity of HDF5 files
  • ValidPosesCSV: checks formatting of DeepLabCut-style .csv files
  • ValidPoseTracks: validates the pose-tracking data itself, ensuring that the shapes of the different dimensions make sense and agree with each other. It also assigns some reasonable default values for missing parameters.

See movement.io.validators.py and the API reference for more info.

TODO before merging

Currently, the documentation workflow is configured to be deployed (also) from this branch, to facilitate the review process. After the review is done, I will revert it to its original configuration - i.e. deploy when a new release is made.

@codecov-commenter
Copy link

codecov-commenter commented Jul 21, 2023

Codecov Report

Patch coverage: 98.46% and project coverage change: +1.23% 🎉

Comparison is base (e3ef907) 96.51% compared to head (58cfa33) 97.74%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #33      +/-   ##
==========================================
+ Coverage   96.51%   97.74%   +1.23%     
==========================================
  Files           5        8       +3     
  Lines          86      310     +224     
==========================================
+ Hits           83      303     +220     
- Misses          3        7       +4     
Files Changed Coverage Δ
movement/datasets.py 90.90% <66.66%> (-9.10%) ⬇️
movement/io/load_poses.py 97.64% <97.22%> (-2.36%) ⬇️
movement/io/validators.py 99.17% <99.16%> (-0.83%) ⬇️
movement/__init__.py 77.77% <100.00%> (+6.34%) ⬆️
movement/io/__init__.py 100.00% <100.00%> (ø)
movement/io/poses_accessor.py 100.00% <100.00%> (ø)
movement/io/save_poses.py 100.00% <100.00%> (ø)
movement/logging.py 96.42% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@lochhh lochhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Caught a few typos, left a few comments, questions, and suggestions here and there and some points to note and discuss (for future).

CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
movement/io/validators.py Outdated Show resolved Hide resolved
Comment on lines 306 to 322
@individual_names.validator
def _validate_individual_names(self, attribute, value):
if (value is not None) and (len(value) != self.tracks_array.shape[1]):
log_and_raise_error(
ValueError,
f"Expected {self.tracks_array.shape[1]} `{attribute}`, "
f"but got {len(value)}.",
)

@keypoint_names.validator
def _validate_keypoint_names(self, attribute, value):
if (value is not None) and (len(value) != self.tracks_array.shape[2]):
log_and_raise_error(
ValueError,
f"Expected {self.tracks_array.shape[2]} `{attribute}`, "
f"but got {len(value)}.",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are essentially doing the same things, we could consider writing this as a callable, e.g.,

def _make_list_length_validator(expected_idx: int):
    def _validate_list_length(self, attribute, value):
        """Raise ValueError if the length of the list is not as expected."""
        if (value is not None) and (
            len(value) != self.tracks_array.shape[expected_idx]
        ):
            raise log_and_raise_error(
                ValueError,
                f"Expected {self.tracks_array.shape[expected_idx]} `{attribute}`, "
                f"but got {len(value)}.",
            )

    return _validate_list_length

individual_names: Optional[List[str]] = field(
        default=None,
        converter=converters.optional(_list_of_str),
        validator=_make_list_length_validator(1),
    )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, we have some duplication there. But your proposed solution won't work I think. The _make_list_length_validator() function needs to access self (because of self.tracks_array.shape), so it has to be a class method. Class methods are not available upon attribute initialisation, so we cannot do:

individual_names: Optional[List[str]] = field(
        default=None,
        converter=converters.optional(_list_of_str),
        validator=self._make_list_length_validator(1),
    )

It seems to be an inherent limitation in how attrs internal work. I'll try to see whether I can find an alternative way of avoiding the duplication.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative approach would be to write a generic list length validator:

def _validate_list_length(
    attribute: str, value: Optional[List], expected_length: int
):
    """Raise a ValueError if the list does not have the expected length."""
    if (value is not None) and (len(value) != expected_length):
        raise log_error(
            ValueError,
            f"Expected `{attribute}` to have length {expected_length}, "
            f"but got {len(value)}.",
        )

Then the in-class validators become:

@individual_names.validator
   def _validate_individual_names(self, attribute, value):
       _validate_list_length(attribute, value, self.tracks_array.shape[1])

@keypoint_names.validator
   def _validate_keypoint_names(self, attribute, value):
       _validate_list_length(attribute, value, self.tracks_array.shape[2])

I kind of like this compromise, since it is compatible with attrs and it gives as a re-usable generic list length validator, which we will probably need in other places as well.

Copy link
Collaborator

@lochhh lochhh Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _make_list_length_validator() function does not need to be a class function nor does it need to access self as it returns the callable validator _validate_list_length in validator=_make_list_length_validator(1). This returned _validate_list_length will receive the same (instance, attribute, value) arguments as with the decorator approach. But I do agree that the alternative you suggested is easier to read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a neat trick! I will keep this in mind for future use, to get out of similarly tricky situations.
For the case here, I will leave it as is now, in the interest of readability.

movement/io/validators.py Show resolved Hide resolved
movement/io/load_poses.py Outdated Show resolved Hide resolved
tests/test_unit/test_io.py Show resolved Hide resolved
tests/test_unit/test_io.py Outdated Show resolved Hide resolved
tests/test_unit/test_io.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@lochhh lochhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go 🚀

@sonarcloud
Copy link

sonarcloud bot commented Sep 18, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@niksirbi niksirbi merged commit 609376a into main Sep 18, 2023
27 checks passed
@niksirbi niksirbi deleted the pose-tracks-io branch September 20, 2023 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants