Give UUID to artifact interval #993

khl02007 · 2024-05-31T21:26:24Z

Description

Currently the results of artifact detection in LFP V1 pipeline is saved in both spyglass.lfp.v1.LFPArtifactDetection table and spyglass.common.IntervalList table. The interval_list_name field for the entry in the latter is obtained by concatenating nwb_file_name, interval_list_name of the original interval, the string LFP, and artifact_params_name. But this is not unique; e.g. if artifact detection is applied to different LFP electrode groups from the same NWB file and over the same interval, they would all receive the same interval_list_name. This PR proposes replacing it with just a UUID.

In addition, the insertion of the rows into LFPArtifactRemovedIntervalList and IntervalList is done with the flag replace=True. This is unnecessary and potentially dangerous as it requires deleting and repopulating. I changed this to skip_duplicates=True.

Fixes #000: How this PR fixes the issue
- path/file.py: Description of the change
- path/file.py: Description of the change
Fixes #000: How this PR fixes the issue
- path/file.py: Description of the change
- path/file.py: Description of the change

Checklist:

This PR should be accompanied by a release: (yes/no/unsure)
If release, I have updated the CITATION.cff
This PR makes edits to table definitions: (yes/no)
If table edits, I have included an alter snippet for release notes.
If this PR makes changes to positon, I ran the relevant tests locally.
I have updated the CHANGELOG.md with PR number and description.
I have added/edited docs/notebooks to reflect the changes

CBroz1 · 2024-06-03T14:24:07Z

I agree that the use of replace can be dangerous, and discuss here and here. In this case, I think skip_dupe could be equally dangerous without the uuid. With uuid, I don't think it's needed: uuid will prevent collision and the need for skip_dupe. If someone were to follow this model and not use a uuid, we could get the following...

I run an analysis A and insert into IntervalList with primary key A1, and valid_times A2
I run another analysis B, and get A1 as primary key, with valid_times B2
My B analysis wrongfully refers to times A2

So, I think it's worth removing skip_dupe here.

If we just need to fix this case, I would add LFP electrode group to the concatenated string. If we can solve the general case mentioned in the issue, I'd like to apply that solve to all cases. I'm happy with using uuids as interval list name, but I recall @edeno advocating for a human-readable name

edeno · 2024-06-03T15:32:31Z

I'm okay with non-human readable as long as it is clear what a particular interval is attached to. We also need to be consistent with these things.

) * add option to set spike smoothing sigma * update changelog

CBroz1

Eric and I discussed this our meeting - we'd ask that this case be resolved by adding to the interval_list_name concatenation. We can discuss solving the general case during my next site visit

khl02007 · 2024-06-04T18:52:01Z

@CBroz1 what do you mean by 'adding to the interval_list_name concatenation'? what do we add?

CBroz1 · 2024-06-04T18:57:24Z

You mentioned that LFP electrode group was not included in the name, despite being a differentiating factor in the analysis. That's the additional component I had in mind

khl02007 · 2024-06-04T19:05:19Z

I see. But that is actually not enough to uniquely define a row (I was just using LFPElectrodeGroup as an example of a place where ambiguity may arise). The chain is LFPArtifactDetection -> LFPV1 and LFPArtifactDetectionParameters -> LFPElectrodeGroup, IntervalList, FirFilterParameters (no UUID, all composite keys). @edeno do you think we should concatenate the primary keys of all of these tables?

…ab#996) * Add docstrings * update changelog * fix spelling --------- Co-authored-by: Samuel Bray <samuelbray@som-dfvnn9m-lt.ucsfmedicalcenter.org>

* Add Common Errors * Update changelog

* documented some of mua notebook * mua notebook documented * documented some of mua notebook * synced py script

* compile exported files, download dandiset, and organize * add function to translate files into dandi-compatible names * add table to store dandi name translation and steps to populate * add dandiset validation * add function to fetch nwb from dandi * add function to change obj_id of nwb_file * add dandi upload call and fix circular import * debug dandi file streaming * fix circular import * resolve dandi-streamed files with fetch_nwb * implement review comments * add admin tools to fix common dandi discrepencies * implement tool to cleanup common dandi errors * add dandi export to tutorial * fix linting * update changelog * fix spelling * style changes from review * reorganize function locations * fix circular import * make dandi dependency optional in imports * store dandi instance of data in DandiPath * resolve case of pre-existing dandi entries for export * cleanup bugs from refactor * update notebook * Apply suggestions from code review Co-authored-by: Chris Broz <Chris.Broz@ucsf.edu> * add requested changes from review * make method check_admin_privilege in LabMember --------- Co-authored-by: Chris Broz <Chris.Broz@ucsf.edu>

* give analysis nwb new uuid when created * fix function argument * update changelog

edeno · 2024-06-06T20:09:34Z

If it is within the limit of the number of characters that mysql 8 allows, then yes, that would be consistent with what we have been doing before.

I do think a uuid would work here, but I feel like we should as a group discuss doing this as a general pattern. I am mostly concerned about our ability to trace where it came from.

khl02007 · 2024-06-06T20:35:10Z

@edeno I don't know if we will be able to constrain the total number of characters to the mysql limit because the string is made of many parts, each of which can be long. So I think that in this particular case UUIDs can work. I agree with you that it would be good to standardize this. To make it easy to look up where this came from, what if I add lfp.v1.LFPArtifactDetection to the source column of IntervalList? Then one could do LFPArtifactDetection & {'artifact_removed_interval_list_name': UUID} and find info about what parameters were used etc.

Whatever we decide on I'd prefer if this were fixed sooner rather than later because it is holding up some analysis.

edeno · 2024-06-06T20:41:41Z

That seems okay to me. Thoughts on this @CBroz1 ?

CBroz1 · 2024-06-07T14:27:25Z

what if I add lfp.v1.LFPArtifactDetection to the source column of IntervalList?

Do you mean the pipeline column? It would be further migration of use pattern to specify table there rather than subpackage or schema, but sure, no objection. For clarification, the pattern would be: Tables with whose specifications overload the character limit of interval_list_name will use a UUID in this field and specify X as pipeline, where X is ...

full_table_name? The rare table exceeds the 64 char limit, but the camel case name has the possibility to be ambiguous across pipeline versions. This makes it easier to call a FreeTable later to run a join for automated features
relative import path? So f"{Table.__module__}.{Table.__qualname__}" with some substringing in there to remove spyglass.. This is more human readable, but would rely on inspect or manual intervention to work with later

I'm in favor of the former

* fix bug in change in analysis_file_object_id * update changelog

* LorenFrankLab#976 * Remove notebook reference

khl02007 · 2024-06-12T20:32:13Z

@CBroz1 Sorry I'm just returning to this PR. So should I use UUID and populate the pipeline field with the subpackage or schema? I'm happy to do whatever you think is best.

CBroz1 · 2024-06-13T13:33:18Z

@CBroz1 Sorry I'm just returning to this PR. So should I use UUID and populate the pipeline field with the subpackage or schema? I'm happy to do whatever you think is best.

Please use full_table_name

* initial non daemon parallel commit * resolve namespace and pickling errors * fix linting * update changelog * implement review comments * add parallel_make flag to spikesorting recording tables * fix multiprocessing spawn error on mac * move propert --------- Co-authored-by: Samuel Bray <samuelbray@som-dfvnn9m-lt.ucsf.edu>

… lfp_artifact

review-notebook-app · 2024-06-17T20:28:55Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

khl02007 · 2024-06-17T20:29:46Z

@CBroz1 I have changed the value of the pipeline column of IntevalList to self.full_table_name and removed skip_duplicates=True since the uniqueness is enforced by the UUID.

Give UUID to artifact interval

db5a881

khl02007 requested review from CBroz1, edeno and samuelbray32 May 31, 2024 21:26

Add ability to set smoothing sigma in get_firing_rate (LorenFrankLab#994

04ec37a

) * add option to set spike smoothing sigma * update changelog

CBroz1 requested changes Jun 4, 2024

View reviewed changes

samuelbray32 and others added 5 commits June 4, 2024 15:06

Add docstrings to SortedSpikesGroup and Decoding methods (LorenFrankL…

6b49c2d

…ab#996) * Add docstrings * update changelog * fix spelling --------- Co-authored-by: Samuel Bray <samuelbray@som-dfvnn9m-lt.ucsfmedicalcenter.org>

Add Common Errors doc (LorenFrankLab#997)

7edff6a

* Add Common Errors * Update changelog

Mua notebook (LorenFrankLab#998)

d8e5196

* documented some of mua notebook * mua notebook documented * documented some of mua notebook * synced py script

Minor fixes (LorenFrankLab#999)

2de1d2b

* give analysis nwb new uuid when created * fix function argument * update changelog

samuelbray32 and others added 2 commits June 10, 2024 14:57

Fix bug in change in analysis_file object_id (LorenFrankLab#1004)

4a1b40e

* fix bug in change in analysis_file_object_id * update changelog

Remove classes for usused tables (LorenFrankLab#1003)

5d957f1

* LorenFrankLab#976 * Remove notebook reference

samuelbray32 and others added 3 commits June 14, 2024 14:06

Merge branch 'lfp_artifact' of github.com:khl02007/nwb_datajoint into…

cb0d1ed

… lfp_artifact

Update pipeline column for IntervalList

979a5d2

CBroz1 approved these changes Jun 18, 2024

View reviewed changes

edeno merged commit 97933e7 into LorenFrankLab:master Jun 18, 2024
7 checks passed

CBroz1 mentioned this pull request Jul 25, 2024

LFPBandV1 fetches interval list of inverted shape #1045

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Give UUID to artifact interval #993

Give UUID to artifact interval #993

khl02007 commented May 31, 2024

CBroz1 commented Jun 3, 2024

edeno commented Jun 3, 2024 •

edited

Loading

CBroz1 left a comment

khl02007 commented Jun 4, 2024

CBroz1 commented Jun 4, 2024

khl02007 commented Jun 4, 2024

edeno commented Jun 6, 2024

khl02007 commented Jun 6, 2024

edeno commented Jun 6, 2024

CBroz1 commented Jun 7, 2024

khl02007 commented Jun 12, 2024

CBroz1 commented Jun 13, 2024

review-notebook-app bot commented Jun 17, 2024

khl02007 commented Jun 17, 2024

Give UUID to artifact interval #993

Give UUID to artifact interval #993

Conversation

khl02007 commented May 31, 2024

Description

Checklist:

CBroz1 commented Jun 3, 2024

edeno commented Jun 3, 2024 • edited Loading

CBroz1 left a comment

Choose a reason for hiding this comment

khl02007 commented Jun 4, 2024

CBroz1 commented Jun 4, 2024

khl02007 commented Jun 4, 2024

edeno commented Jun 6, 2024

khl02007 commented Jun 6, 2024

edeno commented Jun 6, 2024

CBroz1 commented Jun 7, 2024

khl02007 commented Jun 12, 2024

CBroz1 commented Jun 13, 2024

review-notebook-app bot commented Jun 17, 2024

khl02007 commented Jun 17, 2024

edeno commented Jun 3, 2024 •

edited

Loading