Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define a data logging/ingestion format and spec #41

Open
bruno-f-cruz opened this issue Mar 19, 2024 · 3 comments
Open

Define a data logging/ingestion format and spec #41

bruno-f-cruz opened this issue Mar 19, 2024 · 3 comments
Labels
proposal Request for a new feature

Comments

@bruno-f-cruz
Copy link
Member

bruno-f-cruz commented Mar 19, 2024

Summary

One of the goals of the harp-ecossytem is to define data format and specifications to allow users to log their data in a stable and shareable format.

Current Implementations

At the Allen

The current implementation at the Allen follows the following pattern: https://allenneuraldynamics.github.io/Bonsai.AllenNeuralDynamics/articles/core-logging.html#harp-data

Essentially, all messages from a single device and GroupedBy Register and save in their respective binary file. The name of the binary file current follows the convention <DeviceName__RegisterName.bin>.
e.g.:

├───Behavior.harpRegister__AnalogData.binRegister__AssemblyVersion.binRegister__Camera0Frame.binRegister__Camera0Frequency.binRegister__Camera1Frame.binRegister__Camera1Frequency.binRegister__ClockConfiguration.binRegister__CoreVersionHigh.binRegister__CoreVersionLow.binRegister__DeviceName.bin
.....
├───ClockGenerator.harpRegister__AssemblyVersion.binRegister__Battery.binRegister__BatteryCalibration0.binRegister__BatteryCalibration1.binRegister__BatteryRate.binRegister__BatteryThresholdHigh.binRegister__BatteryThresholdLow.binRegister__ClockConfiguration.binRegister__Config.binRegister__CoreVersionHigh.binRegister__CoreVersionLow.bin
....

This has a few problems:

  1. it does not split by event/read/write. Which might be a problem given the last discussions about Require timestamp sequence to be monotonic when device is synchronized #37
  2. It does not work with the current spec of the harp-python package
  3. It does not include the yml metadata file making it difficult to recover the metadata associated with the device offline

Possible solutions

  • Add a way to add the package metadata to the logging folder
  • Decide on the data logging spec format
  • Should we have a <FolderName.harp> as the root and split all files inside by <MessageType.Register>?
  • Should we label registers as numbers or names?
  • Adopt the following folder structure:
    - <UserGivenName>.harp / <DeviceName>_<RegisterNumber>.bin
@bruno-f-cruz bruno-f-cruz added the proposal Request for a new feature label Mar 19, 2024
@bruno-f-cruz
Copy link
Member Author

One thing that came to mind is why use the <DeviceName> to <UserGivenName>.harp / <DeviceName>_<RegisterNumber>.bin at all. It seems that it just introduces an extra dependency that is not necessary. Maybe a more general name, like Register is better? @glopesdev

@glopesdev
Copy link
Collaborator

@bruno-f-cruz This makes it easier when searching for chunks of the same device across epoch folders, as what happens in the Aeon data formats. I want to keep pushing for this, as I think it is an important use case to keep compatibility for, even though it may not be used in 90% of cases.

@bruno-f-cruz
Copy link
Member Author

bruno-f-cruz commented Mar 24, 2024

I guess my question is whether it should be part of the spec or not. From the Python interface point of view it doesn't appear to add much. I wonder if we can find a way that the interface works as long as the pattern is '*_' or if there is an advantage of introducing this dependency and locking the spec to it. To be clear: I am not against folding it in, just wonder if we really need to add it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal Request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants