Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give UUID to artifact interval #993

Merged
merged 12 commits into from
Jun 18, 2024
Merged

Conversation

khl02007
Copy link
Collaborator

Description

Currently the results of artifact detection in LFP V1 pipeline is saved in both spyglass.lfp.v1.LFPArtifactDetection table and spyglass.common.IntervalList table. The interval_list_name field for the entry in the latter is obtained by concatenating nwb_file_name, interval_list_name of the original interval, the string LFP, and artifact_params_name. But this is not unique; e.g. if artifact detection is applied to different LFP electrode groups from the same NWB file and over the same interval, they would all receive the same interval_list_name. This PR proposes replacing it with just a UUID.

In addition, the insertion of the rows into LFPArtifactRemovedIntervalList and IntervalList is done with the flag replace=True. This is unnecessary and potentially dangerous as it requires deleting and repopulating. I changed this to skip_duplicates=True.

  • Fixes #000: How this PR fixes the issue
    • path/file.py: Description of the change
    • path/file.py: Description of the change
  • Fixes #000: How this PR fixes the issue
    • path/file.py: Description of the change
    • path/file.py: Description of the change

Checklist:

  • This PR should be accompanied by a release: (yes/no/unsure)
  • If release, I have updated the CITATION.cff
  • This PR makes edits to table definitions: (yes/no)
  • If table edits, I have included an alter snippet for release notes.
  • If this PR makes changes to positon, I ran the relevant tests locally.
  • I have updated the CHANGELOG.md with PR number and description.
  • I have added/edited docs/notebooks to reflect the changes

@CBroz1
Copy link
Member

CBroz1 commented Jun 3, 2024

I agree that the use of replace can be dangerous, and discuss here and here. In this case, I think skip_dupe could be equally dangerous without the uuid. With uuid, I don't think it's needed: uuid will prevent collision and the need for skip_dupe. If someone were to follow this model and not use a uuid, we could get the following...

  • I run an analysis A and insert into IntervalList with primary key A1, and valid_times A2
  • I run another analysis B, and get A1 as primary key, with valid_times B2
  • My B analysis wrongfully refers to times A2

So, I think it's worth removing skip_dupe here.

If we just need to fix this case, I would add LFP electrode group to the concatenated string. If we can solve the general case mentioned in the issue, I'd like to apply that solve to all cases. I'm happy with using uuids as interval list name, but I recall @edeno advocating for a human-readable name

@edeno
Copy link
Collaborator

edeno commented Jun 3, 2024

I'm okay with non-human readable as long as it is clear what a particular interval is attached to. We also need to be consistent with these things.

)

* add option to set spike smoothing sigma

* update changelog
Copy link
Member

@CBroz1 CBroz1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eric and I discussed this our meeting - we'd ask that this case be resolved by adding to the interval_list_name concatenation. We can discuss solving the general case during my next site visit

@khl02007
Copy link
Collaborator Author

khl02007 commented Jun 4, 2024

@CBroz1 what do you mean by 'adding to the interval_list_name concatenation'? what do we add?

@CBroz1
Copy link
Member

CBroz1 commented Jun 4, 2024

You mentioned that LFP electrode group was not included in the name, despite being a differentiating factor in the analysis. That's the additional component I had in mind

@khl02007
Copy link
Collaborator Author

khl02007 commented Jun 4, 2024

I see. But that is actually not enough to uniquely define a row (I was just using LFPElectrodeGroup as an example of a place where ambiguity may arise). The chain is LFPArtifactDetection -> LFPV1 and LFPArtifactDetectionParameters -> LFPElectrodeGroup, IntervalList, FirFilterParameters (no UUID, all composite keys). @edeno do you think we should concatenate the primary keys of all of these tables?

samuelbray32 and others added 5 commits June 4, 2024 15:06
…ab#996)

* Add docstrings

* update changelog

* fix spelling

---------

Co-authored-by: Samuel Bray <samuelbray@som-dfvnn9m-lt.ucsfmedicalcenter.org>
* Add Common Errors

* Update changelog
* documented some of mua notebook

* mua notebook documented

* documented some of mua notebook

* synced py script
* compile exported files, download dandiset, and organize

* add function to translate files into dandi-compatible names

* add table to store dandi name translation and steps to populate

* add dandiset validation

* add function to fetch nwb from dandi

* add function to change obj_id of nwb_file

* add dandi upload call and fix circular import

* debug dandi file streaming

* fix circular import

* resolve dandi-streamed files with fetch_nwb

* implement review comments

* add admin tools to fix common dandi discrepencies

* implement tool to cleanup common dandi errors

* add dandi export to tutorial

* fix linting

* update changelog

* fix spelling

* style changes from review

* reorganize function locations

* fix circular import

* make dandi dependency optional in imports

* store dandi instance of data in DandiPath

* resolve case of pre-existing dandi entries for export

* cleanup bugs from refactor

* update notebook

* Apply suggestions from code review

Co-authored-by: Chris Broz <Chris.Broz@ucsf.edu>

* add requested changes from review

* make method check_admin_privilege in LabMember

---------

Co-authored-by: Chris Broz <Chris.Broz@ucsf.edu>
* give analysis nwb new uuid when created

* fix function argument

* update changelog
@edeno
Copy link
Collaborator

edeno commented Jun 6, 2024

If it is within the limit of the number of characters that mysql 8 allows, then yes, that would be consistent with what we have been doing before.

I do think a uuid would work here, but I feel like we should as a group discuss doing this as a general pattern. I am mostly concerned about our ability to trace where it came from.

@khl02007
Copy link
Collaborator Author

khl02007 commented Jun 6, 2024

@edeno I don't know if we will be able to constrain the total number of characters to the mysql limit because the string is made of many parts, each of which can be long. So I think that in this particular case UUIDs can work. I agree with you that it would be good to standardize this. To make it easy to look up where this came from, what if I add lfp.v1.LFPArtifactDetection to the source column of IntervalList? Then one could do LFPArtifactDetection & {'artifact_removed_interval_list_name': UUID} and find info about what parameters were used etc.

Whatever we decide on I'd prefer if this were fixed sooner rather than later because it is holding up some analysis.

@edeno
Copy link
Collaborator

edeno commented Jun 6, 2024

That seems okay to me. Thoughts on this @CBroz1 ?

@CBroz1
Copy link
Member

CBroz1 commented Jun 7, 2024

what if I add lfp.v1.LFPArtifactDetection to the source column of IntervalList?

Do you mean the pipeline column? It would be further migration of use pattern to specify table there rather than subpackage or schema, but sure, no objection. For clarification, the pattern would be: Tables with whose specifications overload the character limit of interval_list_name will use a UUID in this field and specify X as pipeline, where X is ...

  • full_table_name? The rare table exceeds the 64 char limit, but the camel case name has the possibility to be ambiguous across pipeline versions. This makes it easier to call a FreeTable later to run a join for automated features
  • relative import path? So f"{Table.__module__}.{Table.__qualname__}" with some substringing in there to remove spyglass.. This is more human readable, but would rely on inspect or manual intervention to work with later

I'm in favor of the former

samuelbray32 and others added 2 commits June 10, 2024 14:57
* fix bug in change in analysis_file_object_id

* update changelog
@khl02007
Copy link
Collaborator Author

@CBroz1 Sorry I'm just returning to this PR. So should I use UUID and populate the pipeline field with the subpackage or schema? I'm happy to do whatever you think is best.

@CBroz1
Copy link
Member

CBroz1 commented Jun 13, 2024

@CBroz1 Sorry I'm just returning to this PR. So should I use UUID and populate the pipeline field with the subpackage or schema? I'm happy to do whatever you think is best.

Please use full_table_name

samuelbray32 and others added 3 commits June 14, 2024 14:06
* initial non daemon parallel commit

* resolve namespace and pickling errors

* fix linting

* update changelog

* implement review comments

* add parallel_make flag to spikesorting recording tables

* fix multiprocessing spawn error on mac

* move propert

---------

Co-authored-by: Samuel Bray <samuelbray@som-dfvnn9m-lt.ucsf.edu>
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@khl02007
Copy link
Collaborator Author

@CBroz1 I have changed the value of the pipeline column of IntevalList to self.full_table_name and removed skip_duplicates=True since the uniqueness is enforced by the UUID.

@edeno edeno merged commit 97933e7 into LorenFrankLab:master Jun 18, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants