Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tentative implementation for adjusting of confounding factors using edgeR #2539

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

robinpaul85
Copy link
Contributor

@robinpaul85 robinpaul85 commented Dec 20, 2024

Draft PR: This is a work in progress. Please DO NOT merge without review from all reviewers listed

Note: Please add features.run_parametricDE in serverconfig.json for testing this feature.

Tentative implementation for adjusting of confounding factors using edgeR

Caveats:

  1. Currently only support for adjustment of 1 confounding factors. In the future, will allow for adjustment of multiple confounding factors.

  2. Implementation of adjustment of confounding factors is even slower than normal edgeR (which is already significantly slower than the wilcoxon method). For e.g . group 1 sample size = 259, group2 sample size = 539 it takes about ~3 mins (without adjustment of confounding factors) and ~15mins WITH adjustment of confounding factors.

  3. Current implementation seem to only work for simple confounding factors for now. Will also work on this.

Checklist

Check each task that has been performed or verified to be not applicable.

  • Tests: added and/or passed unit and integration tests, or N/A
  • Todos: commented or documented, or N/A
  • Notable Changes: updated release.txt, prefixed a commit message with "fix:" or "feat:", added to an internal tracking document, or N/A

@robinpaul85 robinpaul85 marked this pull request as draft December 20, 2024 19:50
@karishma-gangwani
Copy link
Contributor

Hi @robinpaul85 thanks for this pr.

  1. when running covariate adjustment we should only run it on edgeR and not wilcoxon (non-parametric method). Can you make this change on the UI so we don't end up seeing type I errors? For wilcoxon just normalizing for batch effects should be enough.
  2. on the UI why is the default fc now 2 again? I believe we had made that 0 in the previous PR. Can you check this and fix?
  3. When switching fc to 0 and then method to edgeR and selecting 'Molecular subtypes' as the covariate doesn't load anything for me. I don't see any client or server-side errors, using the same example of 259 vs 539 samples for sensitive vs resitant.

I will continue to test.

@robinpaul85
Copy link
Contributor Author

robinpaul85 commented Dec 31, 2024 via email

@karishma-gangwani
Copy link
Contributor

Can you test the branch without using a sessions file? I believe then default fc will be 0. Dec 30, 2024 5:48:22 PM karishma gangwani @.***>:

Hi @robinpaul85[https://github.com/robinpaul85] thanks for this pr. 1. when running covariate adjustment we should only run it on edgeR and not wilcoxon (non-parametric method). Can you make this change on the UI so we don't end up seeing type I errors? For wilcoxon just normalizing for batch effects should be enough. 2. on the UI why is the default fc now 2 again? I believe we had made that 0 in the previous PR. Can you check this and fix? 3. When switching fc to 0 and then method to edgeR and selecting 'Molecular subtypes' as the covariate doesn't load anything for me. I don't see any client or server-side errors, using the same example of 259 vs 539 samples for sensitive vs resitant. I will continue to test. — Reply to this email directly, view it on GitHub[#2539 (comment)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AKCZFMQ2CNPBBO5PUR5FYXD2IHLULAVCNFSM6AAAAABT7XV5LKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRWGAYTGOJUGA]. You are receiving this because you were mentioned. [Tracking image][https://github.com/notifications/beacon/AKCZFMREQOWB6H5PFO4LU2T2IHLULA5CNFSM6AAAAABT7XV5LKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUY6JB7I.gif]

okay, that seems to work. so 2. and 3. are fine. can you look into 1?
also, another thing we are not handling currently is when groups are created with overlapping samples. We don't have any way to run the DE analysis after excluding out the overlapping samples from the two groups. We should think of a way to do that and enable the analysis for the non-overlapping. I can make a note of it for another pr. let me know. because for those overlapping samples we might have to enable some other kinds of analyses.

@robinpaul85
Copy link
Contributor Author

robinpaul85 commented Jan 1, 2025 via email

@karishma-gangwani
Copy link
Contributor

I think that creation of DE groups has nothing to do with the DE app itself. The DE app only requires two groups. If you want to remove overlap it would have to be handled by some upstream code. Dec 31, 2024 12:54:33 PM karishma gangwani @.>:

Can you test the branch without using a sessions file? I believe then default fc will be 0. Dec 30, 2024 5:48:22 PM karishma gangwani /
@
/.
**>: …[#] Hi @robinpaul85[https://github.com/robinpaul85][https://github.com/robinpaul85] thanks for this pr. 1. when running covariate adjustment we should only run it on edgeR and not wilcoxon (non-parametric method). Can you make this change on the UI so we don't end up seeing type I errors? For wilcoxon just normalizing for batch effects should be enough. 2. on the UI why is the default fc now 2 again? I believe we had made that 0 in the previous PR. Can you check this and fix? 3. When switching fc to 0 and then method to edgeR and selecting 'Molecular subtypes' as the covariate doesn't load anything for me. I don't see any client or server-side errors, using the same example of 259 vs 539 samples for sensitive vs resitant. I will continue to test. — Reply to this email directly, view it on GitHub[#2539 (comment)[#2539 (comment)]], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AKCZFMQ2CNPBBO5PUR5FYXD2IHLULAVCNFSM6AAAAABT7XV5LKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRWGAYTGOJUGA]. You are receiving this because you were mentioned. [Tracking image][https://github.com/notifications/beacon/AKCZFMREQOWB6H5PFO4LU2T2IHLULA5CNFSM6AAAAABT7XV5LKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUY6JB7I.gif] okay, that seems to work. so 2. and 3. are fine. can you look into 1? also, another thing we are not handling currently is when groups are created with overlapping samples. We don't have any way to run the DE analysis after excluding out the overlapping samples from the two groups. We should think of a way to do that and enable the analysis for the non-overlapping. I can make a note of it for another pr. let me know. because for those overlapping samples we might have to enable some other kinds of analyses. — Reply to this email directly, view it on GitHub[#2539 (comment)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AKCZFMR4CQVUVEO6ZZTYXYT2ILR6PAVCNFSM6AAAAABT7XV5LKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRWGY2TSMZXHE]. You are receiving this because you were mentioned. [Tracking image][https://github.com/notifications/beacon/AKCZFMUUQ7C2DNRDAEASOGL2ILR6PA5CNFSM6AAAAABT7XV5LKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUY7QOTG.gif]

yes, that is fine. this is not important for now. We can discuss the overlapping samples later. but you should make the fix for running the adjustment for edgeR when a covariate is selected wilcoxon button should hide (for now) since this adjustment is only applicable to edgeR. until we have enabled some batch correction or other adjustment models suitable for wilcoxon.

@karishma-gangwani
Copy link
Contributor

also, it takes a significant amount of time to run the DE analysis after adding the variable. Is there a way to speed up the process? It shows 'Loading...' but looks like it is frozen.

@robinpaul85
Copy link
Contributor Author

also, it takes a significant amount of time to run the DE analysis after adding the variable. Is there a way to speed up the process? It shows 'Loading...' but looks like it is frozen.

Not that I can think of at the moment. That is why I had suggested precomputing earlier. I can try benchmarking with DESeq2 to see how fast it takes.

@karishma-gangwani
Copy link
Contributor

also, it takes a significant amount of time to run the DE analysis after adding the variable. Is there a way to speed up the process? It shows 'Loading...' but looks like it is frozen.

Not that I can think of at the moment. That is why I had suggested precomputing earlier. I can try benchmarking with DESeq2 to see how fast it takes.

You could run DESeq2 and see (better in a separate branch). And Yes, maybe precomputing will be a better idea if it's going to take this long to run it on-the-fly. We can discuss with Xin once. Can you look into the batch correction for Wilcoxon? We can test that out as well in this pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants