-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "noise removal" feature selection #153
Conversation
This adds the noise removal feature selection. Prior to using this step, the data must be normalized first. The operation works by repeating the following calculation on each feature:
|
Example of this in action: df = pd.DataFrame( Suppose the first 3 are replicates of perturbation a and the last 3 are of perturbation b Run feature selection Note that features can be inferred, but you must always provide the perturbation list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ruifanp I am looking forward to using this approach for the JUMP datasets. I have made a few suggestions. I hope they make sense and are useful to you.
I am also tagging @gwaygenomics to take a look at this PR. His reviews are legendary 😄 and one can learn a lot from his suggestions.
@ruifanp - amazing to have this contribution! I will provide my review once you address @niranjchandrasekaran 's comments. BTW, in the future, I don't think that both of us will need to review. Certainly supplemental comments can be helpful, but our CONTRIBUTING document only specifies one required maintainer acceptance:
|
I'll run the tests now, to see if anything obvious needs fixing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for this contribution @ruifanp - it's especially awesome considering JUMP is likely to benefit from it immediately.
My two main comments are:
- An enhancement to allow a user to also specify a metadata column
- A clarification on what you're testing for with
data_unique_test_df
- is this different than what you're testing for earlier in that function? If it is different, then I think you need to at the very least add comments on what each line is doing - very hard to read!
Co-authored-by: Gregory Way <gregory.way@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One required simplification, one question, one typo fix.
Very close!
Co-authored-by: Gregory Way <gregory.way@gmail.com>
Codecov Report
@@ Coverage Diff @@
## master #153 +/- ##
==========================================
- Coverage 98.09% 98.05% -0.04%
==========================================
Files 49 50 +1
Lines 2253 2313 +60
==========================================
+ Hits 2210 2268 +58
- Misses 43 45 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @ruifanp!
with one maintainer approval, it looks like this can be merged! Niranj, are you all set to merge Ruifan's contribution? (sometimes it makes sense to hold off a couple days, but I am not sure that is the case here) |
Yes, will merge now. |
modified from EmbeddedArtistry
Description
Added noise removal feature selection. Demonstrated to make features more informative in multiple sets.
What is the nature of your change?
Checklist