Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update som classifcation #1300

Merged
merged 34 commits into from
Dec 21, 2023
Merged

Update som classifcation #1300

merged 34 commits into from
Dec 21, 2023

Conversation

WenzDaniel
Copy link
Collaborator

This PR will change the lineage for > peaklets!

What does the code in this PR do / what does it improve?

In this PR we change the SOM plugin such that it only returns a single data-type. It is based on the changes made in #1299. Further, we propagate the SOM and straxen classification, including the SOM subclasses, up to event basics for a better analysis.

I test processed a short run 025423 with the normal offline and SOM context.

straxen.PeakBasicsSOM,
straxen.EventBasicsSOM,
)
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for clarification, the PeaksSOM, PeakBasicsSOM, and EventBasicsSOM is so that we do not interfere with the lineage of the non-SOM plugins?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a few more thoughts about it during the weekend. This is the idea yes. However, this is only true for not using SOM at all. If we want to use SOM as a cut we also have to use the SOM plugins to do it properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lineages of the two will always be separate since the peaklet clasification will have a different liniage as that propagates all the way down the processing pipeline, I thought the main Idea was just to propagate the SOM clusters down the chain no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well yes and no, when I made this PR I made different assumptions. Right now one cannot separate the two.

self.peak_properties = tuple(self.peak_properties)

def compute(self, events, peaks):
result = super().compute(events, peaks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume that the super().compute() will copy the peak info into EventBasics. But just for clarification, the reason we don't need to use depends_on = (..., peaks_som) or something like that is because peaks itself is being provided by the PeaksSOM plugin which is registered in the context?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the reason is PeakBasicsSOM provides a different version of peak_basics which EventBasics(SOM) depends on.

@@ -16,7 +16,7 @@ class MergedS2s(strax.OverlapWindowPlugin):
"""Merge together peaklets if peak finding favours that they would form a single peak
instead."""

__version__ = "1.0.2"
__version__ = "1.1.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably changes in this file are all from PR #1299?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are. Sorry that it is a bit messy, but I thought you guys would look first at #1299

@@ -12,7 +12,7 @@
class PeakletClassification(strax.Plugin):
"""Classify peaklets as unknown, S1, or S2."""

__version__ = "3.0.3"
__version__ = "3.0.4"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no problems with the change here, but out of curiosity, why did we return peaklet_classification as a dict before? Also, @FaroutYLq , will this be an issue for reprocessing if the version changes but the output is functionally the same?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the headup @JYangQi00 , very important catch! I would try to avoid touching the standard peaklet_classification in this SOM PR. It doesn't look to me that the performance of peaklet_classification is meaningfully changed, and if it is just about style we should also have a different PR for this. I would try to avoid bumping version which changes lineage so we have to reprocess again the standard pipeline.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is coming from #1299 and indeed the version bump is purely optional. Since I do not modify the outcome of the plugin we can also merge without version bump if you guys prefer. If you agree I change #1299.

@@ -15,40 +15,10 @@ class PeakBasics(strax.Plugin):

"""

__version__ = "0.1.4"
__version__ = "0.1.5"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this is also just from PR #1299 ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, same as above purely optional.

Copy link
Contributor

@FaroutYLq FaroutYLq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with concern by @WenzDaniel and suggest we don't touch the regular peaklet_classification in this PR. The rest look good to me, but I have to admit I didn't run tests locally.
Another general comments: I think for this level of complexity and significance in update we should add some tests. I would let strax* maintainers to comment more here.

@@ -12,7 +12,7 @@
class PeakletClassification(strax.Plugin):
"""Classify peaklets as unknown, S1, or S2."""

__version__ = "3.0.3"
__version__ = "3.0.4"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the headup @JYangQi00 , very important catch! I would try to avoid touching the standard peaklet_classification in this SOM PR. It doesn't look to me that the performance of peaklet_classification is meaningfully changed, and if it is just about style we should also have a different PR for this. I would try to avoid bumping version which changes lineage so we have to reprocess again the standard pipeline.

def compute(self, peaklets, merged_s2s):
result = super().compute(peaklets, merged_s2s)

# For merged S2s SOM and straxen type are undefined:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this is a really basic thing that I'm missing, but why are types undefined for merged S2s?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because SOM classifies only based on peaklets. Merged S2s is only called on S2s (type==2) and we assume that these merged peaks stay S2s (type==2). However, for each S2 there are multiple SOM sub-types and it is currently not defined which sub-type should be inherited to the merged S2s after merging. We could do something like the sub-type of the peak with the largest area, but not sure if this make sense.
If one uses straxen types as classification for peaks then of course type == straxen_type and the straxen_type of MergedS2s would be defined again, however in that case the information is trivial to retrieve. So I preferred to not add an extra if case and make the code more complicated.

Copy link
Contributor

@JYangQi00 JYangQi00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Daniel, thanks for the improvements! Everything looks fine to me I am just confused about a few things that I left in the comments. If you think they are fine then I will approve.

@WenzDaniel
Copy link
Collaborator Author

Hmm @FaroutYLq which type of test do you have in mind? I can try to add some in #1299 because there we make actually all technical changes.

Copy link
Collaborator Author

@WenzDaniel WenzDaniel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey guys I think I made all needed modifications. I also added some comments to the code for guidance. I conducted the following test to see if everything works as intended:

  1. Process run 025423 up to event basics using the current master using the default offline context. Switch to this branch and see if I can still load peaklet_classifcation, merged_s2s, peak_basics and event_basics with the default nT offline context.
  2. Processed for the same run the three new SOM data-types and checked if I can load them with meaningful values too.
  3. I repeated the test of 1. with run 053494 using the XENONnT online context.

If you guys do not have any objections I think we can merge.

@@ -19,12 +19,15 @@
straxen.PulseProcessing,
straxen.Peaklets,
straxen.PeakletClassification,
straxen.PeakletSOMClass,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three plugins registered to the default nT context are additional plugins. They will allow us to keep the already processed data as is and provide the additional SOM fields at peak and event level as additional data-types. Thus, after processing on midway we have to use OSG to process peak_som_classification and event_som_classification.

Comment on lines +132 to +135
straxen.PeakletClassificationSOM,
straxen.PeaksSOM,
straxen.PeakBasicsSOM,
straxen.EventBasicsSOM,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are new plugins which extend the already existing classical versions. With this context we will have a new lineage for everything above peaklets. The same data-type names as for the classical case are used (peaklet_classification, peaks, peak_basics, event_basics), such that we can use the standard processing procedure.

  • PeakletClassificationSOM computes first the classical classification before adding the SOM information.
  • PeaksSOM propagates the information to Peaks such that we can propagate it to peak basics in PeakBasicsSOM.
  • EventBasicsSOM is the same as the classical EventBasics, but also extended by the additional SOM fields.



@export
class EventBasicsSOM(EventBasics):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plugin is inherited from the classic event-basics and only adds in addition the new SOM fields. It will also provide event_basics and is used in the SOM context.



@export
class EventSOMClassification(strax.Plugin):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plugin is the extra plugin for the default context case in which we provide in addition event_som_clasification. For peaks which do not provide any SOM information like S2s -1 is set as default value.

channel=-1,
length=peaklets["length"],
)
peaklets_classification = np.zeros(len(peaklets), dtype=self.dtype)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I change the plugin return from a dict into an array. There is no impact whatsoever for our processing. Only more convenient to work with when inhering.



@export
class PeakletSOMClass(PeakletClassificationSOM):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plugin is used in the default straxen context as the provided data-type needs to be different from peaklet_classification. The rest is inherited from above.

@@ -66,6 +36,45 @@ class PeakBasics(strax.Plugin):
),
)

def infer_dtype(self):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only moving the dtype in here so it can be better inherited.



@export
class PeakBasicsSOM(PeakBasics):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plugin is used in the SOM context and replaces peak_basics with a new version including the SOM parameters.



@export
class PeaksSOM(Peaks):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plugin is used in the SOM context to propagate the SOM parameters from peaklet_classification to peaks.



@export
class PeaksSOMClassification(strax.Plugin):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is again a stand alone plugin to propagate the SOM settings to peaks for the default nT context using the default classification.

@coveralls
Copy link

coveralls commented Dec 20, 2023

Coverage Status

coverage: 92.417% (+0.8%) from 91.571%
when pulling 6e5d7b5 on update_som_classifcation
into c7916fc on master.

@JYangQi00
Copy link
Contributor

I checked that the That peaklet_classification (SOM context) has the same straxen_type as the type of peaklet_classification (nT online context), and the same som fields as peaklet_classification_som (also in nT context). I did similar things as well for the peak and event basics as well, but only looking at the S1 SOM information (as intended by this PR). Everything looks good on my end and I think we are good to merge!

@dachengx dachengx merged commit f3e7726 into master Dec 21, 2023
9 checks passed
@dachengx dachengx deleted the update_som_classifcation branch December 21, 2023 11:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants