Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle skeleton encoding internally #1970

Merged
merged 40 commits into from
Sep 25, 2024

Conversation

eberrigan
Copy link
Contributor

@eberrigan eberrigan commented Sep 20, 2024

Description

Looking at our fly_skeleton_legs.json test data:
Our final json_str has keys directed, graph, links, multigraph, nodes.

  • directed is a boolean. (UNCHANGED from input graph)
  • graph is a dict with keys name and num_edges_inserted. (UNCHANGED from input graph)
  • links is a list of dicts with keys edge_insert_idx, key, source, target, type, for each edge.
  • multigraph is a boolean = True. (UNCHANGED from input graph)
  • nodes is a list of dicts with keys id for each node.
input object to JSON encoder python object location in graph dict encoded object as JSON string
Node(name='neck', weight=1.0) "links":[{... source: { Node(name='neck', weight=1.0) } {"py/object": "sleap.skeleton.Node", "py/state": {"py/tuple": ["neck", 1.0]}}
Node(name='head', weight=1.0) "links":[{... target: { Node(name='head', weight=1.0) } {"py/object": "sleap.skeleton.Node", "py/state": {"py/tuple": ["head", 1.0]}}
<EdgeType.BODY: 1> "links":[{... type: { <EdgeType.BODY: 1> } {"py/reduce": [{"py/type": "sleap.skeleton.EdgeType"}, {"py/tuple": [1]}, null, null, null]}
Node(name='head', weight=1.0) "nodes": [{'id': Node(name='head', weight=1.0)}, ..., ] {"id": {"py/id": 2}}
  • This is because Skeleton._graph is passed through json_graph.node_link_data, which returns a dictionary with node-link formatted data (see documentation here.
def node_link_data():
     multigraph = G.is_multigraph()

    # Allow 'key' to be omitted from attrs if the graph is not a multigraph.
    key = None if not multigraph else key
    if len({source, target, key}) < 3:
        raise nx.NetworkXError("Attribute names are not unique.")
    data = {
        "directed": G.is_directed(),
        "multigraph": multigraph,
        "graph": G.graph,
        "nodes": [{**G.nodes[n], name: n} for n in G],
    }
    if multigraph:
        data[link] = [
            {**d, source: u, target: v, key: k}
            for u, v, k, d in G.edges(keys=True, data=True)
        ]
    else:
        data[link] = [{**d, source: u, target: v} for u, v, d in G.edges(data=True)]
    return data
  • Note that our node-linked data has key links and not link.
    • We do not have networkx pinned. I am not sure why that hasn't been a problem.
  • Then the node-link formatted data is passed to jsonpickle.encode(data).
    • Serializes the SLEAP class Node. Each Node object, which has attributes name and weight, is encoded as a Python object sleap.skeleton.Node. The Node's internal state is represented as a Python tuple containing the name (as a string) and weight (as a numerical value): ["name", weight].
      • In the case where the node is an int, such as when the skeleton has a node_to_idx mapping (

        sleap/sleap/skeleton.py

        Lines 1002 to 1006 in 3c7f5af

        jsonpickle.set_encoder_options("simplejson", sort_keys=True, indent=4)
        if node_to_idx is not None:
        indexed_node_graph = nx.relabel_nodes(
        G=self._graph, mapping=node_to_idx
        ) # map nodes to int
        ), i.e. when the skeleton is from Labels, the node is returned as an integer without the name and weight attributes.
    • Serializes the SLEAP class EdgeType. EdgeType inherits from Python's Enum class. The possible edge types are BODY = 1 and SYMMETRY = 2. When serialized, EdgeType is represented using Python's reduce function, storing the type as sleap.skeleton.EdgeType and the value as a Python tuple containing the Enum value.
  • If the object has been "seen" before, it will not be encoded as the full JSON string but referenced by its py/id, which starts at 1 and indexes the objects in the order they are seen so that the second time the first object is used, it will be referenced as {"py/id": 1}.
    • This is tricky because looking at fly_skeleton_legs.json we can see that the first time a node is used in a source or target in the links list, it is encoded as a py/object with a py/state. Then it is referenced later on using the py/id. nodes then uses only the py/id to reference the Node.
    • However this is not what you would expect from looking at the order in of the graph passed from jsongraph.node_link_data() to jsonpickle.encode() (shown below) which has nodes first and links after.
    • The same is true for the order of source, target and type: in the graph below type comes before source and target so unless explicitly ordered after source and target the py/id will not be consistent with the legacy data.
    • For this reason, a method is added to the SkeletonEncoder called _encode_links that is used within the classmethod encode to encode the links first, and within the links list, encode the source, target and then the type.
    • If the exact order wasn't followed, and the py/ids were not the same as the legacy data, the EdgeType would be decoded as a Node and the edges would not be formed correctly in the decoded Skeleton using Skeleton.from_json.
{'directed': True, 'multigraph': True, 'graph': {'name': 'skeleton_legs.mat', 'num_edges_inserted': 23}, 'nodes': [{'id': Node(name='head', weight=1.0)}, {'id': Node(name='neck', weight=1.0)}, {'id': Node(name='thorax', weight=1.0)}, {'id': Node(name='abdomen', weight=1.0)}, {'id': Node(name='wingL', weight=1.0)}, {'id': Node(name='wingR', weight=1.0)}, {'id': Node(name='forelegL1', weight=1.0)}, {'id': Node(name='forelegL2', weight=1.0)}, {'id': Node(name='forelegL3', weight=1.0)}, {'id': Node(name='forelegR1', weight=1.0)}, {'id': Node(name='forelegR2', weight=1.0)}, {'id': Node(name='forelegR3', weight=1.0)}, {'id': Node(name='midlegL1', weight=1.0)}, {'id': Node(name='midlegL2', weight=1.0)}, {'id': Node(name='midlegL3', weight=1.0)}, {'id': Node(name='midlegR1', weight=1.0)}, {'id': Node(name='midlegR2', weight=1.0)}, {'id': Node(name='midlegR3', weight=1.0)}, {'id': Node(name='hindlegL1', weight=1.0)}, {'id': Node(name='hindlegL2', weight=1.0)}, {'id': Node(name='hindlegL3', weight=1.0)}, {'id': Node(name='hindlegR1', weight=1.0)}, {'id': Node(name='hindlegR2', weight=1.0)}, {'id': Node(name='hindlegR3', weight=1.0)}], 'links': [{'edge_insert_idx': 1, 'type': <EdgeType.BODY: 1>, 'source': Node(name='neck', weight=1.0), 'target': Node(name='head', weight=1.0), 'key': 0}, {'edge_insert_idx': 0, 'type': <EdgeType.BODY: 1>, 'source': Node(name='thorax', weight=1.0), 'target': Node(name='neck', weight=1.0), 'key': 0}, {'edge_insert_idx': 2, 'type': <EdgeType.BODY: 1>, 'source': Node(name='thorax', weight=1.0), 'target': Node(name='abdomen', weight=1.0), 'key': 0}, {'edge_insert_idx': 3, 'type': <EdgeType.BODY: 1>, 'source': Node(name='thorax', weight=1.0), 'target': Node(name='wingL', weight=1.0), 'key': 0}, {'edge_insert_idx': 4, 'type': <EdgeType.BODY: 1>, 'source': Node(name='thorax', weight=1.0), 'target': Node(name='wingR', weight=1.0), 'key': 0}, {'edge_insert_idx': 5, 'type': <EdgeType.BODY: 1>, 'source': Node(name='thorax', weight=1.0), 'target': Node(name='forelegL1', weight=1.0), 'key': 0}, {'edge_insert_idx': 8, 'type': <EdgeType.BODY: 1>, 'source': Node(name='thorax', weight=1.0), 'target': Node(name='forelegR1', weight=1.0), 'key': 0}, {'edge_insert_idx': 11, 'type': <EdgeType.BODY: 1>, 'source': Node(name='thorax', weight=1.0), 'target': Node(name='midlegL1', weight=1.0), 'key': 0}, {'edge_insert_idx': 14, 'type': <EdgeType.BODY: 1>, 'source': Node(name='thorax', weight=1.0), 'target': Node(name='midlegR1', weight=1.0), 'key': 0}, {'edge_insert_idx': 17, 'type': <EdgeType.BODY: 1>, 'source': Node(name='thorax', weight=1.0), 'target': Node(name='hindlegL1', weight=1.0), 'key': 0}, {'edge_insert_idx': 20, 'type': <EdgeType.BODY: 1>, 'source': Node(name='thorax', weight=1.0), 'target': Node(name='hindlegR1', weight=1.0), 'key': 0}, {'edge_insert_idx': 6, 'type': <EdgeType.BODY: 1>, 'source': Node(name='forelegL1', weight=1.0), 'target': Node(name='forelegL2', weight=1.0), 'key': 0}, {'edge_insert_idx': 7, 'type': <EdgeType.BODY: 1>, 'source': Node(name='forelegL2', weight=1.0), 'target': Node(name='forelegL3', weight=1.0), 'key': 0}, {'edge_insert_idx': 9, 'type': <EdgeType.BODY: 1>, 'source': Node(name='forelegR1', weight=1.0), 'target': Node(name='forelegR2', weight=1.0), 'key': 0}, {'edge_insert_idx': 10, 'type': <EdgeType.BODY: 1>, 'source': Node(name='forelegR2', weight=1.0), 'target': Node(name='forelegR3', weight=1.0), 'key': 0}, {'edge_insert_idx': 12, 'type': <EdgeType.BODY: 1>, 'source': Node(name='midlegL1', weight=1.0), 'target': Node(name='midlegL2', weight=1.0), 'key': 0}, {'edge_insert_idx': 13, 'type': <EdgeType.BODY: 1>, 'source': Node(name='midlegL2', weight=1.0), 'target': Node(name='midlegL3', weight=1.0), 'key': 0}, {'edge_insert_idx': 15, 'type': <EdgeType.BODY: 1>, 'source': Node(name='midlegR1', weight=1.0), 'target': Node(name='midlegR2', weight=1.0), 'key': 0}, {'edge_insert_idx': 16, 'type': <EdgeType.BODY: 1>, 'source': Node(name='midlegR2', weight=1.0), 'target': Node(name='midlegR3', weight=1.0), 'key': 0}, {'edge_insert_idx': 18, 'type': <EdgeType.BODY: 1>, 'source': Node(name='hindlegL1', weight=1.0), 'target': Node(name='hindlegL2', weight=1.0), 'key': 0}, {'edge_insert_idx': 19, 'type': <EdgeType.BODY: 1>, 'source': Node(name='hindlegL2', weight=1.0), 'target': Node(name='hindlegL3', weight=1.0), 'key': 0}, {'edge_insert_idx': 21, 'type': <EdgeType.BODY: 1>, 'source': Node(name='hindlegR1', weight=1.0), 'target': Node(name='hindlegR2', weight=1.0), 'key': 0}, {'edge_insert_idx': 22, 'type': <EdgeType.BODY: 1>, 'source': Node(name='hindlegR2', weight=1.0), 'target': Node(name='hindlegR3', weight=1.0), 'key': 0}]}
  • A test has been added which loads a skeleton from a JSON file, get the graph with json_graph.node_link_data from networkx, encodes the graph with the new SkeletonEncoder.encode() method, then uses Skeleton.from_json to deserialize the JSON string (with jsonpickle.decode, then json_graph.node_link_graph). The Skeleton.matches() method is used to test the equivalence of the deserialized Skeletons.
    [X] This test could be parametrized to do with various skeleton fixtures.
    [X] We should make sure to do this with a template fixture at least.
    [X] An additional test could be added to see if the serialized JSON strings are equivalent.

Types of changes

  • Bugfix
  • New feature
  • Refactor / Code style update (no logical changes)
  • Build / CI changes
  • Documentation Update
  • Other (explain)

Does this address any currently open issues?

#1470 #1918

This pull request accompanies #1961 in handling JSON encoding and decoding internally instead of relying on jsonpickle.

Outside contributors checklist

  • Review the guidelines for contributing to this repository
  • Read and sign the CLA and add yourself to the authors list
  • Make sure you are making a pull request against the develop branch (not main). Also you should start your branch off develop
  • Add tests that prove your fix is effective or that your feature works
  • Add necessary documentation (if appropriate)

Thank you for contributing to SLEAP!

❤️

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced a new SkeletonEncoder class for enhanced encoding of skeleton representations.
    • Added methods for encoding Node and EdgeType objects into JSON formats.
  • Chores

    • Updated version constraints for the attrs package in environment configuration files.
  • Tests

    • Added new tests for validating the encoding and decoding of Skeleton objects.

Copy link

coderabbitai bot commented Sep 20, 2024

Walkthrough

A new class named SkeletonEncoder has been introduced in the sleap/skeleton.py file, replacing the existing jsonpickle.encode functionality with a custom encoder for converting Python objects into JSON strings. The SkeletonEncoder includes methods for encoding Node and EdgeType objects, while the to_json method in the Skeleton class has been updated to utilize this new encoder. Additionally, tests for encoding and decoding Skeleton objects have been added, and the attrs package version constraints have been specified more clearly in the environment files.

Changes

File Change Summary
sleap/skeleton.py Added new class SkeletonEncoder for custom encoding; modified to_json method to use SkeletonEncoder.
environment.yml Updated attrs package version constraint to >=21.2.0.
environment_no_cuda.yml Updated attrs package version constraint to >=21.2.0.
tests/test_skeleton.py Added tests for encoding and decoding Skeleton objects using SkeletonEncoder.

Possibly related PRs

  • Handle skeleton decoding internally #1961: The introduction of the SkeletonDecoder class in this PR is directly related to the SkeletonEncoder class in the main PR, as both classes are involved in the (de)serialization of Node and EdgeType objects within the sleap/skeleton.py file.

Suggested reviewers

  • talmo

Poem

In the code where rabbits play,
A new encoder hops today.
Skeletons now dance with glee,
Encoding joyfully, you see!
With every node, a match so bright,
In the world of bytes, all feels right! 🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 743e406 and 83a2704.

📒 Files selected for processing (1)
  • sleap/skeleton.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • sleap/skeleton.py

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 3c7f5af and 034cffb.

Files selected for processing (1)
  • sleap/skeleton.py (1 hunks)

Comment on lines 88 to 89
class SkeletonEncoder:
"""Replace json.pickle.encode with our own encoder."""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the missing encoder implementation.

The SkeletonEncoder class is currently empty. To fulfill its purpose of replacing json.pickle.encode with a custom encoder, please add the necessary encoding logic within the class definition.

Copy link

codecov bot commented Sep 20, 2024

Codecov Report

Attention: Patch coverage is 98.41270% with 1 line in your changes missing coverage. Please review.

Project coverage is 75.50%. Comparing base (7ed1229) to head (83a2704).
Report is 52 commits behind head on develop.

Files with missing lines Patch % Lines
sleap/skeleton.py 98.41% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1970      +/-   ##
===========================================
+ Coverage    73.30%   75.50%   +2.19%     
===========================================
  Files          134      133       -1     
  Lines        24087    24625     +538     
===========================================
+ Hits         17658    18593     +935     
+ Misses        6429     6032     -397     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range and nitpick comments (3)
sleap/skeleton.py (3)

88-117: LGTM! Remember to implement the _encode method.

The SkeletonEncoder class is well-documented and provides a clear explanation of its purpose and the encoding format. The encode class method, _encode_node, and _encode_edge_type methods are implemented correctly.

However, the _encode method, which is intended for the actual encoding logic, is currently unimplemented. Please ensure that this method is implemented to complete the functionality of the SkeletonEncoder class.

Do you want me to generate the implementation for the _encode method or open a GitHub issue to track this task?


119-121: LGTM! Consider adding a comment to clarify the purpose of _encoded_strings.

The __init__ method is correctly defined and initializes the _encoded_strings instance variable. However, the purpose of this variable is not clear from the provided context.

Consider adding a comment to explain how _encoded_strings is intended to be used within the SkeletonEncoder class.


Line range hint 1090-1115: LGTM! Remember to remove or comment out debug print statements before production.

The added debug print statements in the to_json method do not alter the existing functionality and can be useful for logging and debugging purposes during development.

However, these print statements may clutter the output and should be removed or commented out before deploying the code to production to avoid unnecessary logging.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 034cffb and dd4865d.

Files selected for processing (1)
  • sleap/skeleton.py (3 hunks)
Additional context used
Ruff
sleap/skeleton.py

138-138: SyntaxError: Expected an indented block after function definition

Additional comments not posted (4)
sleap/skeleton.py (4)

122-133: LGTM! Remember to implement the _encode method.

The encode class method is correctly defined and documented. It creates an instance of the SkeletonEncoder class and calls the _encode method on the instance, which is the expected behavior.

However, since the _encode method is currently unimplemented, calling the encode method will raise an error. Please ensure that the _encode method is implemented to enable the functionality of the encode method.


135-137: Reminder: Implement the _encode method.

The _encode method is correctly defined with the appropriate parameters and return type. However, the method is currently unimplemented, as indicated by the empty body.

Please ensure that the _encode method is implemented to enable the functionality of the SkeletonEncoder class, as it is responsible for the actual encoding logic.


138-151: LGTM!

The _encode_node method is correctly defined and documented. It takes a Node object as input and returns a dictionary representation of the node, following the expected format.

The dictionary representation includes the py/object key set to sleap.Skeleton.Node and the py/state key containing a tuple of the node's name and weight, which aligns with the provided encoding format.

Tools
Ruff

138-138: SyntaxError: Expected an indented block after function definition


153-168: LGTM!

The _encode_edge_type method is correctly defined and documented. It takes an EdgeType object as input and returns a dictionary representation of the edge type, following the expected format.

The dictionary representation includes the py/reduce key, which contains a list of dictionaries representing the type and value of the edge type. This aligns with the provided encoding format for EdgeType objects.

@eberrigan eberrigan marked this pull request as draft September 21, 2024 04:39

json_str = jsonpickle.encode(data)
# json_str = jsonpickle.encode(data)
encoder = SkeletonEncoder()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SkeletonEncoder.encode(data) instead

@eberrigan eberrigan marked this pull request as ready for review September 24, 2024 23:04
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Outside diff range and nitpick comments (5)
environment_no_cuda.yml (1)

Line range hint 1-47: Consider reviewing version constraints for consistency

While the change to the attrs package version constraint is good, it might be beneficial to review the version constraints of other dependencies in this file for consistency. Some packages have specific version constraints, while others have more relaxed ones. Consider standardizing the approach to version constraints across all dependencies to ensure optimal compatibility and reproducibility.

tests/test_skeleton.py (2)

12-36: LGTM: Comprehensive test for Skeleton encoding and decoding.

This test function thoroughly checks the encoding and decoding process for a Skeleton object loaded from a JSON file. It verifies both object equality and JSON string equality, which is excellent.

A minor suggestion for improvement:

Consider adding an assertion to check that the encoded_json_str is not empty before parsing it. This could help catch potential encoding failures more explicitly:

assert encoded_json_str, "Encoded JSON string should not be empty"

This assertion could be added just after line 23.


39-67: LGTM: Well-structured parameterized test for multiple Skeleton fixtures.

This parameterized test function provides excellent coverage by testing the encoding and decoding process for multiple Skeleton fixtures. It verifies both object equality and JSON representation equality, which is thorough.

A suggestion for improvement:

Consider adding a check for the encoded_json_str to ensure it's not empty, similar to the suggestion for the previous test:

assert encoded_json_str, f"Encoded JSON string should not be empty for {skeleton_fixture_name}"

This assertion could be added just after line 54.

Additionally, it might be beneficial to add a comment explaining the purpose of using json.loads when comparing the JSON strings, as it normalizes the string representations for comparison.

sleap/skeleton.py (1)

Line range hint 1211-1237: Remove debug print statements.

The debug print statements added to the to_json method are helpful for development and debugging but should be removed or commented out in production code. Consider using a logging framework for debugging in the future, which allows for easier management of debug output.

Apply this diff to remove the debug print statements:

-            print(f"indexed_node_graph: {indexed_node_graph}")
-            print(f"indexed_node_graph: {indexed_node_graph}")
-        print(f"graph: {graph}")
-            print(f"data: {data}")
-            print(f"data: {data}")
-        print(f"json_str: {json_str}")
skeletons.ipynb (1)

44-379: Consider reducing debug output for clarity

The extensive debug output from the encoding process can clutter the notebook and make it less readable. Consider adjusting the logging level or redirecting debug information to a log file to improve clarity.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between dd4865d and 4c8bdd6.

Files selected for processing (5)
  • environment.yml (1 hunks)
  • environment_no_cuda.yml (1 hunks)
  • skeletons.ipynb (1 hunks)
  • sleap/skeleton.py (3 hunks)
  • tests/test_skeleton.py (1 hunks)
Additional comments not posted (6)
environment_no_cuda.yml (1)

14-14: Approved: Explicit version constraint for attrs package

The change from - conda-forge::attrs >=21.2.0 #,<=21.4.0 to - conda-forge::attrs >=21.2.0,<=21.4.0 is a good improvement. It explicitly sets both lower and upper bounds for the attrs package version, which helps ensure compatibility and reproducibility of the environment.

Benefits of this change:

  1. Prevents potential issues with newer, untested versions of attrs.
  2. Ensures consistency across different installations of the project.
  3. Follows best practices for specifying dependency versions in conda environment files.
environment.yml (1)

13-13: Approve version constraint update for attrs package.

The change to specify both lower and upper bounds for the attrs package version is a good practice in dependency management. It helps ensure compatibility and stability across different environments.

To ensure this change doesn't conflict with other dependencies, please run the following script:

This script will help identify any potential conflicts or usage patterns that might be affected by the version constraint update.

Verification successful

Approve version constraint update for attrs package.

The updated version constraint attrs >=21.2.0,<=21.4.0 is compatible with the existing dependencies, including cattrs ==1.1.1. This ensures stability and compatibility across the codebase without introducing conflicts.

  • No conflicts detected with other dependencies.
  • attrs is consistently used across multiple modules, and the version range is appropriate.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify compatibility of attrs version with other dependencies

# Test: Check if any other packages directly depend on attrs
rg --type yaml -i 'attrs' environment.yml

# Test: Look for any potential conflicts in other files
rg --type python -i 'import attr' -A 5

Length of output: 190


Script:

#!/bin/bash
# Description: Verify compatibility of attrs version with other dependencies

# Test: Check if any other packages directly depend on attrs
rg --type yaml -i 'attrs' environment.yml

# Test: Look for any potential conflicts in other files
rg --type py -i 'import attr' -A 5

Length of output: 19024

tests/test_skeleton.py (2)

5-5: LGTM: New imports are appropriate for the added tests.

The new imports of json and SkeletonEncoder are necessary for the newly added test functions. They are correctly placed and follow the existing import style.

Also applies to: 9-9


1-67: Overall: Excellent addition of tests for SkeletonEncoder functionality.

The new test functions test_decoded_encoded_Skeleton_from_load_json and test_decoded_encoded_Skeleton provide thorough coverage for the new SkeletonEncoder functionality. They test both single file loading and multiple fixtures, ensuring robustness of the encoding and decoding process.

The tests are well-structured, use appropriate assertions, and align well with the PR objectives. The parameterized test allows for easy expansion of test cases in the future.

Minor suggestions for improvement have been made to enhance error catching and clarity. These changes, while not critical, would further strengthen the test suite.

Great job on implementing these comprehensive tests!

sleap/skeleton.py (2)

88-289: LGTM: SkeletonEncoder implementation is complete and well-structured.

The SkeletonEncoder class has been fully implemented, addressing the previous comment about adding the missing encoder implementation. The class provides a comprehensive set of methods for encoding various data types and objects, including a reference system for previously seen objects. The implementation follows good practices for custom JSON encoding.


1235-1236: LGTM: SkeletonEncoder integration in to_json method.

The to_json method has been successfully updated to use the new SkeletonEncoder instead of jsonpickle.encode. This change aligns with the goal of replacing the existing functionality with a custom encoder.

skeletons.ipynb Outdated
Comment on lines 400 to 402
"# Save the encoded json string to a file\n",
"with open(\"encoded_json_str.json\", \"w\") as f:\n",
" f.write(encoded_json_str)"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add exception handling when writing to a file

Incorporating exception handling when writing the encoded JSON string to a file can prevent the program from crashing due to unexpected I/O errors.

Apply this diff to add exception handling:

 with open("encoded_json_str.json", "w") as f:
     f.write(encoded_json_str)
+except IOError as e:
+    print(f"An error occurred while writing to 'encoded_json_str.json': {e}")

Committable suggestion was skipped due to low confidence.

skeletons.ipynb Outdated
Comment on lines 422 to 425
"# Get the skeleton from the encoded json string\n",
"decoded_skeleton = Skeleton.from_json(encoded_json_str)\n",
"decoded_skeleton"
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add assertion to verify skeleton integrity after decoding

To ensure that the encoding and decoding processes maintain the skeleton's integrity, add an assertion to confirm that the decoded skeleton matches the original.

Apply this diff to add the assertion:

 skeleton = Skeleton.load_json(fly_skeleton_legs_json)
 decoded_skeleton = Skeleton.from_json(encoded_json_str)
+assert skeleton.matches(decoded_skeleton), "Decoded skeleton does not match the original."
 decoded_skeleton

Committable suggestion was skipped due to low confidence.

skeletons.ipynb Outdated
Comment on lines 433 to 451
"# def test_SkeletonEncoder(fly_legs_skeleton_json):\n",
"# \"\"\"\n",
"# Test SkeletonEncoder.encode method.\n",
"# \"\"\"\n",
"# # Get the skeleton from the fixture\n",
"# skeleton = Skeleton.load_json(fly_legs_skeleton_json)\n",
"# # Get the graph from the skeleton\n",
"# indexed_node_graph = skeleton._graph\n",
"# graph = json_graph.node_link_data(indexed_node_graph)\n",
"\n",
"# # Encode the graph as a json string to test .encode method\n",
"# encoder = SkeletonEncoder()\n",
"# encoded_json_str = encoder.encode(graph)\n",
"\n",
"# # Get the skeleton from the encoded json string\n",
"# decoded_skeleton = Skeleton.from_json(encoded_json_str)\n",
"\n",
"# # Check that the decoded skeleton is the same as the original skeleton\n",
"# assert skeleton.matches(decoded_skeleton)"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Uncomment and integrate the test for SkeletonEncoder

The test function for SkeletonEncoder.encode is currently commented out. Enabling this test would automate verification of the encoding and decoding processes, ensuring data integrity.

Apply this diff to uncomment and update the test function:

-# def test_SkeletonEncoder(fly_legs_skeleton_json):
-#     """
-#     Test SkeletonEncoder.encode method.
-#     """
-#     # Get the skeleton from the fixture
-#     skeleton = Skeleton.load_json(fly_legs_skeleton_json)
-#     # Get the graph from the skeleton
-#     indexed_node_graph = skeleton._graph
-#     graph = json_graph.node_link_data(indexed_node_graph)
-
-#     # Encode the graph as a json string to test .encode method
-#     encoder = SkeletonEncoder()
-#     encoded_json_str = encoder.encode(graph)
-
-#     # Get the skeleton from the encoded json string
-#     decoded_skeleton = Skeleton.from_json(encoded_json_str)
-
-#     # Check that the decoded skeleton is the same as the original skeleton
-#     assert skeleton.matches(decoded_skeleton)
+def test_SkeletonEncoder():
+    """
+    Test SkeletonEncoder.encode method.
+    """
+    # Get the skeleton from the fixture
+    fly_legs_skeleton_json = "tests/data/skeleton/fly_skeleton_legs.json"
+    skeleton = Skeleton.load_json(fly_legs_skeleton_json)
+    # Get the graph from the skeleton
+    indexed_node_graph = skeleton._graph
+    graph = json_graph.node_link_data(indexed_node_graph)
+
+    # Encode the graph as a json string to test .encode method
+    encoder = SkeletonEncoder()
+    encoded_json_str = encoder.encode(graph)
+
+    # Get the skeleton from the encoded json string
+    decoded_skeleton = Skeleton.from_json(encoded_json_str)
+
+    # Check that the decoded skeleton is the same as the original skeleton
+    assert skeleton.matches(decoded_skeleton)
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"# def test_SkeletonEncoder(fly_legs_skeleton_json):\n",
"# \"\"\"\n",
"# Test SkeletonEncoder.encode method.\n",
"# \"\"\"\n",
"# # Get the skeleton from the fixture\n",
"# skeleton = Skeleton.load_json(fly_legs_skeleton_json)\n",
"# # Get the graph from the skeleton\n",
"# indexed_node_graph = skeleton._graph\n",
"# graph = json_graph.node_link_data(indexed_node_graph)\n",
"\n",
"# # Encode the graph as a json string to test .encode method\n",
"# encoder = SkeletonEncoder()\n",
"# encoded_json_str = encoder.encode(graph)\n",
"\n",
"# # Get the skeleton from the encoded json string\n",
"# decoded_skeleton = Skeleton.from_json(encoded_json_str)\n",
"\n",
"# # Check that the decoded skeleton is the same as the original skeleton\n",
"# assert skeleton.matches(decoded_skeleton)"
def test_SkeletonEncoder():
"""
Test SkeletonEncoder.encode method.
"""
# Get the skeleton from the fixture
fly_legs_skeleton_json = "tests/data/skeleton/fly_skeleton_legs.json"
skeleton = Skeleton.load_json(fly_legs_skeleton_json)
# Get the graph from the skeleton
indexed_node_graph = skeleton._graph
graph = json_graph.node_link_data(indexed_node_graph)
# Encode the graph as a json string to test .encode method
encoder = SkeletonEncoder()
encoded_json_str = encoder.encode(graph)
# Get the skeleton from the encoded json string
decoded_skeleton = Skeleton.from_json(encoded_json_str)
# Check that the decoded skeleton is the same as the original skeleton
assert skeleton.matches(decoded_skeleton)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)
tests/test_skeleton.py (1)

1-10: Organize imports according to PEP 8 guidelines

The new imports have been added correctly, but we can improve the overall organization of the import statements to follow PEP 8 guidelines more closely. This will enhance readability and maintainability.

Consider reorganizing the imports as follows:

import copy
import json
import os

import jsonpickle
import pytest
from networkx.readwrite import json_graph

from sleap.skeleton import Skeleton, SkeletonEncoder

This organization:

  1. Groups standard library imports first.
  2. Separates third-party library imports with a blank line.
  3. Places local imports (from the sleap package) last.
  4. Combines imports from the same module on a single line.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 4c8bdd6 and 6444378.

📒 Files selected for processing (2)
  • sleap/skeleton.py (4 hunks)
  • tests/test_skeleton.py (1 hunks)
🔇 Additional comments not posted (11)
tests/test_skeleton.py (1)

1-60: Overall assessment: Well-implemented tests for SkeletonEncoder

The new test functions test_decoded_encoded_Skeleton_from_load_json and test_decoded_encoded_Skeleton are valuable additions to the test suite. They effectively verify the functionality of the new SkeletonEncoder and ensure that Skeleton objects can be correctly encoded and decoded.

Key strengths:

  1. Comprehensive testing of encoding and decoding for different Skeleton objects.
  2. Use of parameterized testing to cover multiple scenarios.
  3. Verification of both object equality and JSON representation consistency.

The suggested improvements in the previous comments will further enhance the robustness and clarity of these tests. Great job on expanding the test coverage!

sleap/skeleton.py (10)

88-121: LGTM: SkeletonEncoder class initialization and docstring.

The class docstring provides a clear explanation of the purpose and functionality of the SkeletonEncoder. The __init__ method initializes the _encoded_objects dictionary to manage object references during encoding.


123-136: LGTM: encode class method implementation.

The encode class method provides a clean interface for encoding data. It creates an instance of the encoder, calls the private _encode method, and then uses json.dumps to create the final JSON string.


138-167: LGTM: _encode method implementation.

The _encode method handles different types of objects (dict, list, EdgeType, Node) appropriately. It includes special handling for dictionaries containing 'nodes' and 'links' keys, which is likely specific to the skeleton structure.


169-206: LGTM: _encode_links method implementation.

The _encode_links method ensures that the links (edges) are encoded in a specific order (source, target, type, other attributes). This maintains consistency in the JSON output.


208-235: LGTM: _encode_node method implementation.

The _encode_node method handles both Node objects and integer indices. It checks for previous encoding to avoid redundancy and uses a specific format for encoding Node objects.


237-260: LGTM: _encode_edge_type method implementation.

The _encode_edge_type method properly encodes EdgeType objects, handling both first-time encoding and references to previously encoded objects.


262-278: LGTM: _get_or_assign_id method implementation.

This method manages the assignment of py/id for objects, ensuring unique identification and handling both new and previously encoded objects.


280-291: LGTM: _is_first_encoding method implementation.

This simple method checks if an object is being encoded for the first time, which is crucial for the reference system used in the encoder.


1146-1146: LGTM: Updated to_json method using SkeletonEncoder.

The to_json method has been updated to use the new SkeletonEncoder class. The changes include:

  1. Using SkeletonEncoder.encode(data) instead of jsonpickle.encode(data).
  2. Proper handling of node_to_idx mapping when provided.
  3. Maintaining backwards compatibility by only including description and preview_image fields for template skeletons.

These changes improve the JSON encoding process while maintaining compatibility with existing data formats.

Also applies to: 1210-1213, 1232-1232


Line range hint 88-1232: Overall implementation of SkeletonEncoder and integration with Skeleton class is well-done.

The new SkeletonEncoder class provides a custom JSON encoding solution for the Skeleton class, replacing the previous jsonpickle.encode functionality. The implementation is thorough, handling various object types and maintaining a reference system for efficient encoding. The integration with the Skeleton class, particularly in the to_json method, is done smoothly while maintaining backwards compatibility.

Key improvements:

  1. Custom encoding for Node and EdgeType objects.
  2. Efficient handling of repeated objects using a reference system.
  3. Maintaining order of attributes in encoded links.
  4. Backwards compatibility for non-template skeletons.

The changes should result in more efficient and controlled JSON encoding for Skeleton objects without breaking existing functionality.

Comment on lines +12 to +29
def test_decoded_encoded_Skeleton_from_load_json(fly_legs_skeleton_json):
"""
Test Skeleton decoded from SkeletonEncoder.encode matches the original Skeleton.
"""
# Get the skeleton from the fixture
skeleton = Skeleton.load_json(fly_legs_skeleton_json)
# Get the graph from the skeleton
indexed_node_graph = skeleton._graph
graph = json_graph.node_link_data(indexed_node_graph)

# Encode the graph as a json string to test .encode method
encoded_json_str = SkeletonEncoder.encode(graph)

# Get the skeleton from the encoded json string
decoded_skeleton = Skeleton.from_json(encoded_json_str)

# Check that the decoded skeleton is the same as the original skeleton
assert skeleton.matches(decoded_skeleton)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance test coverage with additional assertions

The test function effectively verifies that the encoded and decoded Skeleton matches the original. However, we can improve it by adding more specific assertions:

  1. Assert that the encoded JSON string is not empty.
  2. Compare the number of nodes and edges in the original and decoded skeletons.
  3. Verify that the node names and edge connections are preserved.

Consider adding these assertions to strengthen the test:

def test_decoded_encoded_Skeleton_from_load_json(fly_legs_skeleton_json):
    skeleton = Skeleton.load_json(fly_legs_skeleton_json)
    indexed_node_graph = skeleton._graph
    graph = json_graph.node_link_data(indexed_node_graph)

    encoded_json_str = SkeletonEncoder.encode(graph)
    assert encoded_json_str, "Encoded JSON string should not be empty"

    decoded_skeleton = Skeleton.from_json(encoded_json_str)

    assert skeleton.matches(decoded_skeleton)
    assert len(skeleton.nodes) == len(decoded_skeleton.nodes), "Number of nodes should match"
    assert len(skeleton.edges) == len(decoded_skeleton.edges), "Number of edges should match"
    assert set(n.name for n in skeleton.nodes) == set(n.name for n in decoded_skeleton.nodes), "Node names should match"
    assert set(skeleton.edge_names) == set(decoded_skeleton.edge_names), "Edge connections should match"

Comment on lines +32 to +60
@pytest.mark.parametrize(
"skeleton_fixture_name", ["flies13_skeleton", "skeleton", "stickman"]
)
def test_decoded_encoded_Skeleton(skeleton_fixture_name, request):
"""
Test Skeleton decoded from SkeletonEncoder.encode matches the original Skeleton.
"""
# Use request.getfixturevalue to get the actual fixture value by name
skeleton = request.getfixturevalue(skeleton_fixture_name)

# Get the graph from the skeleton
indexed_node_graph = skeleton._graph
graph = json_graph.node_link_data(indexed_node_graph)

# Encode the graph as a json string to test .encode method
encoded_json_str = SkeletonEncoder.encode(graph)

# Get the skeleton from the encoded json string
decoded_skeleton = Skeleton.from_json(encoded_json_str)

# Check that the decoded skeleton is the same as the original skeleton
assert skeleton.matches(decoded_skeleton)

# Now make everything into a JSON string
skeleton_json_str = skeleton.to_json()
decoded_skeleton_json_str = decoded_skeleton.to_json()

# Check that the JSON strings are the same
assert json.loads(skeleton_json_str) == json.loads(decoded_skeleton_json_str)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance parameterized test with additional assertions and error messages

The test function effectively verifies that the encoded and decoded Skeleton matches the original across multiple fixtures. To further improve its robustness:

  1. Add assertions for the number of nodes and edges.
  2. Verify that node names and edge connections are preserved.
  3. Include more descriptive error messages in assertions.

Consider enhancing the test function as follows:

@pytest.mark.parametrize(
    "skeleton_fixture_name", ["flies13_skeleton", "skeleton", "stickman"]
)
def test_decoded_encoded_Skeleton(skeleton_fixture_name, request):
    skeleton = request.getfixturevalue(skeleton_fixture_name)
    indexed_node_graph = skeleton._graph
    graph = json_graph.node_link_data(indexed_node_graph)

    encoded_json_str = SkeletonEncoder.encode(graph)
    assert encoded_json_str, f"Encoded JSON string for {skeleton_fixture_name} should not be empty"

    decoded_skeleton = Skeleton.from_json(encoded_json_str)

    assert skeleton.matches(decoded_skeleton), f"Decoded {skeleton_fixture_name} should match the original"
    assert len(skeleton.nodes) == len(decoded_skeleton.nodes), f"Number of nodes in {skeleton_fixture_name} should match"
    assert len(skeleton.edges) == len(decoded_skeleton.edges), f"Number of edges in {skeleton_fixture_name} should match"
    assert set(n.name for n in skeleton.nodes) == set(n.name for n in decoded_skeleton.nodes), f"Node names in {skeleton_fixture_name} should match"
    assert set(skeleton.edge_names) == set(decoded_skeleton.edge_names), f"Edge connections in {skeleton_fixture_name} should match"

    skeleton_json_str = skeleton.to_json()
    decoded_skeleton_json_str = decoded_skeleton.to_json()

    assert json.loads(skeleton_json_str) == json.loads(decoded_skeleton_json_str), f"JSON representations of {skeleton_fixture_name} should match"

These changes will provide more detailed information if a test fails, making it easier to identify and fix issues.

sleap/skeleton.py Outdated Show resolved Hide resolved
@roomrys roomrys merged commit ef803f6 into develop Sep 25, 2024
9 checks passed
@roomrys roomrys deleted the elizabeth/handle-skeleton-encoding-internally branch September 25, 2024 22:48
@roomrys
Copy link
Collaborator

roomrys commented Sep 25, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants