Pose tracks io #33

niksirbi · 2023-07-21T15:37:32Z

This PR thoroughly overhauls input/output (I/O) functionalities for pose estimations data. First I will give a summary of the added features, and then I will explain some of them a bit more in detail.

Main changes

Load pose estimation data from DeepLabcut and SLEAP (closes Import data from pose estimation outputs #2)
Represent pose estimation data in a common data model - an xarray.Dataset (closes Define data model for representing points and trajectories #12)
Export data from xarray.Dataset to a DeepLabCut-style dataframes and save them as '.h5' (partially addresses Ability to export poses in various formats #13)
Validation of I/O file paths and of the data itself is now performed with attrs classes instead of Pydantic (closes Consider replacing Pydantic with attrs #24)
Added documentation website built with Sphinx (closes Add Sphinx docs #7)
Added a comprehensive contributing guide, which is also rendered in the documentation website (closes Write a comprehensice contributing guide #28)

Where to start

To get a better understanding of the new functionalities, it's best to start by reading the docs, which include:

home page
getting started guide, which explains how to load and save data and how to work with movement pose datasets
examples: currently contains only one example, which loads and plots the Aeon sample data with the 3 mice
API reference with an overview of the modules and detailed function/class signatures.

Details

Loading pose estimation data

There are two main user-facing functions for this, in the movement.io.load_poses module

from_sleap_file(): this currently loads SLEAP analysis.h5 files, and potentially also .slp files containing predicted instances (though this is experimental). Anyhow, the SLEAP docs themselves encourage using the .h5 files for downstream analysis.
from_dlc_file(): DeepLabCut stores pose estimation predictions in a pandas DataFrame, saved either as .h5 or as .csv. This function can load both. First, a pandas DataFrame is loaded from .h5 or .csv and then the from_dlc_df() function is called. The user can also directly call from_dlc_df() (for example if they have already imported the DeepLabCut DataFrame by other means).
All loading functions return a xarray.Dataset object (see next point).

Saving pose estimation data

Currently, I have only implemented movement.io.save_poses.to_dlc_file() which saves the data to a DeepLabCut-style .h5 or .csv file. Internally it calls the movement.io.save_poses.to_dlc_df() function, which convert the data from xarray.Dataset to pandas.DataFrame before saving it.

Common representation for pose tracking data

Regardless of where data is loaded from, pose tracks are represented in movement a xarray.Dataset, containing two data variables as xarray.DataArrays: pose_tracks and confidence.

Pose tracks are essentially an array with shape (frames, individuals, keypoints, space), which is meant to capture a variety of tracking data. The dimensions are labelled with meaningful coordinates (see the relevant docs section for details).

The confidence array has shape (frames, individuals, keypoints), and holds point-wise prediction scores - i.e. point-wise confidence scores form SLEAP of the "likelihood" score from DeepLabCut. This will probably be useful later for preprocessing.

In the future, if we need to extend movement's xarray.Dataset object with specialised functionalities, the recommended way of doing that is through accessors instead of inheritance (subclassing). I've already started implementing a PosesAccessor object (in movement.io.poses_accessor.py). Currently, it only implements a validate() method, but more methods can be added as needed.

Validation

I wrote some custom validator classes with attrs mainly for validating the files from which the data is loaded or written to.
They are organised into four classes for now:

ValidFile: generic validator for file paths
ValidHDF5: checks integrity of HDF5 files
ValidPosesCSV: checks formatting of DeepLabCut-style .csv files
ValidPoseTracks: validates the pose-tracking data itself, ensuring that the shapes of the different dimensions make sense and agree with each other. It also assigns some reasonable default values for missing parameters.

See movement.io.validators.py and the API reference for more info.

TODO before merging

Currently, the documentation workflow is configured to be deployed (also) from this branch, to facilitate the review process. After the review is done, I will revert it to its original configuration - i.e. deploy when a new release is made.

codecov-commenter · 2023-07-21T17:40:09Z

Codecov Report

Patch coverage: 98.46% and project coverage change: +1.23% 🎉

Comparison is base (e3ef907) 96.51% compared to head (58cfa33) 97.74%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #33      +/-   ##
==========================================
+ Coverage   96.51%   97.74%   +1.23%     
==========================================
  Files           5        8       +3     
  Lines          86      310     +224     
==========================================
+ Hits           83      303     +220     
- Misses          3        7       +4

Files Changed	Coverage Δ
movement/datasets.py	`90.90% <66.66%> (-9.10%)`	⬇️
movement/io/load_poses.py	`97.64% <97.22%> (-2.36%)`	⬇️
movement/io/validators.py	`99.17% <99.16%> (-0.83%)`	⬇️
movement/__init__.py	`77.77% <100.00%> (+6.34%)`	⬆️
movement/io/__init__.py	`100.00% <100.00%> (ø)`
movement/io/poses_accessor.py	`100.00% <100.00%> (ø)`
movement/io/save_poses.py	`100.00% <100.00%> (ø)`
movement/logging.py	`96.42% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…g with xarray.DataArray names

…ethod

…ap_file

lochhh

Great work! Caught a few typos, left a few comments, questions, and suggestions here and there and some points to note and discuss (for future).

CONTRIBUTING.md

movement/io/validators.py

lochhh · 2023-09-06T12:55:59Z

movement/io/validators.py

+    @individual_names.validator
+    def _validate_individual_names(self, attribute, value):
+        if (value is not None) and (len(value) != self.tracks_array.shape[1]):
+            log_and_raise_error(
+                ValueError,
+                f"Expected {self.tracks_array.shape[1]} `{attribute}`, "
+                f"but got {len(value)}.",
+            )
+
+    @keypoint_names.validator
+    def _validate_keypoint_names(self, attribute, value):
+        if (value is not None) and (len(value) != self.tracks_array.shape[2]):
+            log_and_raise_error(
+                ValueError,
+                f"Expected {self.tracks_array.shape[2]} `{attribute}`, "
+                f"but got {len(value)}.",
+            )


these are essentially doing the same things, we could consider writing this as a callable, e.g.,

def _make_list_length_validator(expected_idx: int): def _validate_list_length(self, attribute, value): """Raise ValueError if the length of the list is not as expected.""" if (value is not None) and ( len(value) != self.tracks_array.shape[expected_idx] ): raise log_and_raise_error( ValueError, f"Expected {self.tracks_array.shape[expected_idx]} `{attribute}`, " f"but got {len(value)}.", ) return _validate_list_length individual_names: Optional[List[str]] = field( default=None, converter=converters.optional(_list_of_str), validator=_make_list_length_validator(1), )

You are right, we have some duplication there. But your proposed solution won't work I think. The _make_list_length_validator() function needs to access self (because of self.tracks_array.shape), so it has to be a class method. Class methods are not available upon attribute initialisation, so we cannot do:

individual_names: Optional[List[str]] = field( default=None, converter=converters.optional(_list_of_str), validator=self._make_list_length_validator(1), )

It seems to be an inherent limitation in how attrs internal work. I'll try to see whether I can find an alternative way of avoiding the duplication.

An alternative approach would be to write a generic list length validator:

def _validate_list_length( attribute: str, value: Optional[List], expected_length: int ): """Raise a ValueError if the list does not have the expected length.""" if (value is not None) and (len(value) != expected_length): raise log_error( ValueError, f"Expected `{attribute}` to have length {expected_length}, " f"but got {len(value)}.", )

Then the in-class validators become:

@individual_names.validator def _validate_individual_names(self, attribute, value): _validate_list_length(attribute, value, self.tracks_array.shape[1]) @keypoint_names.validator def _validate_keypoint_names(self, attribute, value): _validate_list_length(attribute, value, self.tracks_array.shape[2])

I kind of like this compromise, since it is compatible with attrs and it gives as a re-usable generic list length validator, which we will probably need in other places as well.

The _make_list_length_validator() function does not need to be a class function nor does it need to access self as it returns the callable validator _validate_list_length in validator=_make_list_length_validator(1). This returned _validate_list_length will receive the same (instance, attribute, value) arguments as with the decorator approach. But I do agree that the alternative you suggested is easier to read.

This is a neat trick! I will keep this in mind for future use, to get out of similarly tricky situations.
For the case here, I will leave it as is now, in the interest of readability.

movement/io/validators.py

movement/io/load_poses.py

tests/test_unit/test_io.py

Co-authored-by: Chang Huan Lo <changhuan.lo@ucl.ac.uk>

lochhh

Good to go 🚀

sonarcloud · 2023-09-18T14:30:07Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
0.0% Duplication

niksirbi mentioned this pull request Jul 21, 2023

Import sleap poses #25

Closed

niksirbi force-pushed the pose-tracks-io branch from 85bb3a0 to c450005 Compare July 21, 2023 15:39

niksirbi mentioned this pull request Aug 8, 2023

Consider replacing Pydantic with attrs #24

Closed

niksirbi added 26 commits August 8, 2023 13:09

added sleap-io as dependency

855bc15

added function for converting SLEAP poses into DLC-style df

e69862b

added functions for loading SLEAP pose tracks

044cff4

renamed converters module to convert

6b8fdbb

renamed converters module to convert

e8de770

add attrs as dependency

ce28d1d

Implemented PoseTracks class with import functions from SLEAP

5cdd989

modified docstrings in PoseTracks

3e86805

refactored from_sleap() classmethod

6aa9df4

Ensure that PoseTracks class is imported with the io module

633a9eb

added method to import pose tracks from DeepLabCut

1fbc713

transferred functionality of from_dict method to __init__

e7a3ee8

shortened some docstrings

d533989

deleted superceded load_poses module

c7075f1

renamed numpy arrays for pose tracks and and scores to avoid clashiin…

846527c

…g with xarray.DataArray names

implemented existing converter function as a PoseTracks.to_dlc_df() m…

04ef737

…ethod

renamed from_dlc and from_sleap methods to from_dlc_file and from_sle…

7ddab36

…ap_file

change "frames" dim to "time"

8ea6e70

started adapting unit tests for PoseTracks object

191eed8

added tests for PoseTracks initialisation

2248a6c

removed attrs dependency for now

c8444c8

using pydantic 2.0 or greater

64fe008

make _parse_dlc_csv_to_dataframe a static method

49932db

added test for loading a variety of valid pose files

2d56a11

remove unnecessary variable assignments

d210229

use typing.List in type hints to make py3.8 happy

2d49657

niksirbi added 8 commits August 16, 2023 18:12

some fancier formatting for the contributing guide in docs

64b4c3d

add style to dropdown

f56b740

fixed docstirngs and API reference

cbade14

fixed issue with duplicate source files generated by sphinx-gallery

5c6a837

limit sphinx version to <7.2

95827b4

replaced type with isinstance

72608df

Merge branch 'main' into pose-tracks-io

1bae840

added ValidPoseTracks to API reference

647af1f

niksirbi mentioned this pull request Aug 17, 2023

Ability to save multi-animal pose tracks to single-animal files #39

Closed

niksirbi marked this pull request as ready for review August 17, 2023 16:30

niksirbi requested a review from lochhh August 17, 2023 16:30

niksirbi mentioned this pull request Aug 17, 2023

Ability to export poses in various formats #13

Closed

lochhh requested changes Sep 7, 2023

View reviewed changes

lochhh reviewed Sep 8, 2023

View reviewed changes

tests/test_unit/test_io.py Outdated Show resolved Hide resolved

tests/test_unit/test_io.py Outdated Show resolved Hide resolved

niksirbi and others added 5 commits September 11, 2023 15:37

Fix typos from code review

0eac658

Co-authored-by: Chang Huan Lo <changhuan.lo@ucl.ac.uk>

added check-manifest as dev dependency

f7c2726

removed duplicate word in manifest

90778a0

edit code examples in getting started guide to avoid key errors

79acf0d

fix fps value in example

636f98f

niksirbi mentioned this pull request Sep 14, 2023

Stop supporting python3.8 #42

Closed

niksirbi added 5 commits September 14, 2023 11:47

renamed find_pose_data to list_pose_data

49f7d33

use assert_allclose from numpy.testing

35c80ff

modify function fo logging errors

9d5af00

improved DLC pose CSV file validator

839ec0f

write resuable list length validator

3164e44

lochhh approved these changes Sep 18, 2023

View reviewed changes

reset docs deployment workflow

58cfa33

niksirbi merged commit 609376a into main Sep 18, 2023
27 checks passed

niksirbi deleted the pose-tracks-io branch September 20, 2023 16:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pose tracks io #33

Pose tracks io #33

niksirbi commented Jul 21, 2023 •

edited

Loading

codecov-commenter commented Jul 21, 2023 •

edited by codecov bot

Loading

lochhh left a comment

lochhh Sep 6, 2023

niksirbi Sep 18, 2023

niksirbi Sep 18, 2023

lochhh Sep 18, 2023 •

edited

Loading

niksirbi Sep 18, 2023

lochhh left a comment

sonarcloud bot commented Sep 18, 2023

Pose tracks io #33

Pose tracks io #33

Conversation

niksirbi commented Jul 21, 2023 • edited Loading

Main changes

Where to start

Details

Loading pose estimation data

Saving pose estimation data

Common representation for pose tracking data

Validation

TODO before merging

codecov-commenter commented Jul 21, 2023 • edited by codecov bot Loading

Codecov Report

lochhh left a comment

Choose a reason for hiding this comment

lochhh Sep 6, 2023

Choose a reason for hiding this comment

niksirbi Sep 18, 2023

Choose a reason for hiding this comment

niksirbi Sep 18, 2023

Choose a reason for hiding this comment

lochhh Sep 18, 2023 • edited Loading

Choose a reason for hiding this comment

niksirbi Sep 18, 2023

Choose a reason for hiding this comment

lochhh left a comment

Choose a reason for hiding this comment

sonarcloud bot commented Sep 18, 2023

niksirbi commented Jul 21, 2023 •

edited

Loading

codecov-commenter commented Jul 21, 2023 •

edited by codecov bot

Loading

lochhh Sep 18, 2023 •

edited

Loading