Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ndx-pose fails to write properly when inside a contextmanager #36

Open
pauladkisson opened this issue Nov 12, 2024 · 6 comments
Open

ndx-pose fails to write properly when inside a contextmanager #36

pauladkisson opened this issue Nov 12, 2024 · 6 comments

Comments

@pauladkisson
Copy link

pauladkisson commented Nov 12, 2024

Related to catalystneuro/neuroconv#1114

Tracked this down to a minimal example with a contextmanager like neuroconv's make_or_load_nwbfile. Basically when the io is created in the first part of a context manager (before the yield) but used to write in the second part (after the yield), it doesn't write properly. This can be solved by changing the import location (ex. adding import ndx_pose at the top of the script would do it) or changing the io instantiation location (ex. moving io = NWBHDF5IO(nwbfile_path, "w") inside the finally block would do it).

Not sure if ndx_pose needs to fix this or if it's a deeper problem, but would love a thorough explanation to improve my understanding.

Minimal Example:

from pynwb import NWBFile
from datetime import datetime
from pynwb import NWBHDF5IO
import os
from contextlib import contextmanager
from pynwb import NWBHDF5IO
import numpy as np

class myDeepLabCutInterface():
    def add_to_nwbfile(self, nwbfile):
        from ndx_pose import PoseEstimation, PoseEstimationSeries
        pose_estimation_series = PoseEstimationSeries(
            name="name",
            data=np.random.rand(10, 3),
            rate=1.0,
            confidence=np.random.rand(10),
            reference_frame="reference_frame",
        )
        pose_estimation = PoseEstimation(
            name="PoseEstimation",
            pose_estimation_series=[pose_estimation_series],
        )
        nwbfile.create_processing_module(name="behavior", description="description")
        nwbfile.processing["behavior"].add(pose_estimation)

@contextmanager
def my_make_or_load_nwbfile(nwbfile, nwbfile_path):
    io = NWBHDF5IO(nwbfile_path, "w")
    try:
        yield nwbfile
    finally:
        io.write(nwbfile)

def main():
    nwbfile_path = "test.nwb"
    if os.path.exists(nwbfile_path):
        os.remove(nwbfile_path)

    dlc_interface = myDeepLabCutInterface()

    nwbfile = NWBFile(session_description="test", identifier="test", session_start_time=datetime.now())

    with my_make_or_load_nwbfile(nwbfile, nwbfile_path) as nwbfile_out:
        dlc_interface.add_to_nwbfile(nwbfile_out)
        # In memory, the pose_estimation_series is present in the nwbfile
        assert hasattr(nwbfile_out.processing["behavior"].data_interfaces['PoseEstimation'], "pose_estimation_series") # True

    with NWBHDF5IO(nwbfile_path, "r") as io:
        nwbfile = io.read()
        # After writing to disk, the pose_estimation_series is NOT present in the nwbfile
        assert hasattr(nwbfile.processing["behavior"].data_interfaces['PoseEstimation'], "pose_estimation_series") # False

if __name__ == "__main__":
    main()
@rly
Copy link
Owner

rly commented Nov 12, 2024

Hi @pauladkisson , thanks for the reproducible example - that helped me trace what is going on.

When you call io = NWBHDF5IO(nwbfile_path, "w"), import ndx_pose has not yet been called. import ndx_pose adds the ndx-pose namespace and type specifications to the global pynwb namespace. This might be unintuitive because managing global state is not common in Python. Calling io = NWBHDF5IO(nwbfile_path, "w") creates an IO object using the current global namespace. If that happens before ndx-pose is added, then the IO object does not know about ndx-pose types. If that happens after ndx-pose is added (either because you imported ndx-pose at the file level outside of a method, or because you call it in the finally block, then the IO object will know about ndx-pose types and write the data correctly.

I would recommend either:

  1. Create the IO object after all your extensions have been imported and data is ready to be written.
  2. Detect that the user will add pose data and call import ndx_pose before creating the IO object.

There are other ways to manually load the namespace from the YAML file in the installed ndx_pose package on the user's computer, or to update the IO object with the ndx-pose namespace that was added to the global namespace, but those that feel quite hacky. 1 or 2 above seem better, but I do not understand your full use case.

@pauladkisson
Copy link
Author

Thank you for explaining what's going on here. These details are very helpful.

I would love to do (1), but make_or_load_nwbfile sometimes need to append to existing nwbfiles (https://github.com/catalystneuro/neuroconv/blob/main/src/neuroconv/tools/nwb_helpers/_metadata_and_file_helpers.py#L164), which, as far as I know, requires reusing the io used for read (which would happen before ndx_pose is imported).

@rly
Copy link
Owner

rly commented Nov 13, 2024

We could add a function in ndx-pose and all new extensions called add_namespace_to_io that updates an IO object (or type map) with the ndx-pose namespace? It's not ideal that a user would need to know that they need to do this. Also, you would need to pass the IO object into add_to_nwbfile.

We could add a function in pynwb that updates an IO object (or type map) with the latest global namespace. You could call this before write. Users would still need to know that they need to do this.

What do you think (out of these or other ideas) would work best for your use case?

@pauladkisson
Copy link
Author

We could add a function in pynwb that updates an IO object (or type map) with the latest global namespace. You could call this before write. Users would still need to know that they need to do this.

I think this would be the best option.

@h-mayorquin
Copy link
Contributor

h-mayorquin commented Nov 13, 2024

@pauladkisson @rly thanks for figuring this out and the detailed explanation.

@rly shouldn't there be a way to throw an error when io is writing and encounters an unknown data type? I think the most concerning part of this is the silent error. Throwing an informative error should be a priority here I think. Is that possible or any technical blockers on this?

@rly
Copy link
Owner

rly commented Nov 14, 2024

@h-mayorquin I agree - a silent error is bad. I'll create an issue ticket on HDMF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants