-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use signal_fraction for training particle classifier #2465
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #2465 +/- ##
=======================================
Coverage 92.53% 92.53%
=======================================
Files 235 235
Lines 20024 20062 +38
=======================================
+ Hits 18529 18565 +36
- Misses 1495 1497 +2 ☔ View full report in Codecov by Sentry. |
16dff41
to
c04be4d
Compare
c04be4d
to
1ad8b25
Compare
I'm not sure why this provenance test is failing right now. It works on my machine... Edit: Fixed by #2469 |
1ad8b25
to
b91dff2
Compare
b91dff2
to
bec01fc
Compare
# - [type, "LST*", 1000] | ||
# - [type, "MST*", 1000] | ||
# - [type, "MST*", 1000] # If not specified, as many events as possible are used. | ||
signal_fraction: 0.5 # signal_fraction = n_signal / n_events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would find it more intuitive I think to define the ratio?
I.e. 1.0 means use as much signal as background?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think signal fraction being how much signal is in the total is pretty intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are merits to both options:
- signal fraction in combination with
n_events
makes it a little bit more intuitive how many signal and how many background events are used, but we write out this information in the logs anyway. - signal ratio is maybe a bit closer to actual use cases, e.g. "Let's use twice as many signal events than background events".
I don't have a personal preference.
bec01fc
to
5ada7df
Compare
5ada7df
to
3e2e736
Compare
This removes the
$$\texttt{signal\_fraction}= \frac{n_s}{n_s + n_b}.$$
n_signal
andn_background
options ofctapipe-train-particle-classifier
. Instead the total number of training eventsn_events
and thesignal_fraction
can be chosen, whereIf
n_events
is not specified, as many events as possible will be used considering the givensignal_fraction
.