Skip to content
This repository has been archived by the owner on Jun 16, 2023. It is now read-only.

Updated analysis: optimize run times of the cnv-frequencies module #120

Closed
ewafula opened this issue Jul 20, 2021 · 1 comment
Closed

Updated analysis: optimize run times of the cnv-frequencies module #120

ewafula opened this issue Jul 20, 2021 · 1 comment

Comments

@ewafula
Copy link
Contributor

ewafula commented Jul 20, 2021

What analysis module should be updated and why?

Optimize the cnv-frequencies module to improve run times. Currently takes 2 hrs to compute and annotate cancer_group and cancer_group_cohort gene-level frequencies variant frequencies for the autosomes CNV consensus results.

What changes need to be made? Please provide enough detail for another participant to make the update.

Reimplement and test runs times of mainly the compute_variant_frequencies function, the slowest portion of code to be more efficient.

What input data should be used? Which data were used in the version being updated?

  • autosomes/allosomes CNV consensus files
  • histology file
  • primary/relapse independents sample files
  • gene and disease annotations files

When do you expect the revised analysis will be completed?

1-2 days

Who will complete the updated analysis?

@ewafula

cc @logstar @jharenza

@logstar
Copy link

logstar commented Jul 21, 2021

Run time optimized. Down from 2.38333 hrs (143 min) to 0.65278 hrs (39.167 minutes) using named tuples. Reference issue ticket: PediatricOpenTargets/ticket-tracker#120. Is this acceptable?

I Will start working on: PediatricOpenTargets/ticket-tracker#124

cc @logstar, @jharenza

Source: d3b-center/OpenPedCan-analysis#52 (comment)

@logstar, the annotation function is going to be replaced anyway. So there no need for optimize it further. All the changes were in the function.

On Wed, Jul 21, 2021 at 4:12 PM Yuanchao Zhang @.***> wrote: Thank you for fixing the errors @ewafula https://github.com/ewafula ! The results are identical to the previously uploaded ones now. However, the run time now is about 108 minutes now. $ time bash analyses/cnv-frequencies/run-cnv-frequencies-analysis.sh real 108m58.665s user 108m55.404s sys 0m4.508s @jharenza https://github.com/jharenza I wonder if the frequencies and other parts of the results look good. I think this PR is ready for merging. Regarding the run time, we could discuss further at PediatricOpenTargets/ticket-tracker#120 <PediatricOpenTargets/ticket-tracker#120>. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#52 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZN26DTNQPEYBE7CP4ZU33TY4SZBANCNFSM5AK7DEKQ .

I agree. The annotation function needs no further optimization, as it will be replaced by the upcoming annotation module. All other parts should run within 40 minutes, so they are also good. I will close the optimization ticket PediatricOpenTargets/ticket-tracker#120.

Source: d3b-center/OpenPedCan-analysis#52 (comment)

@logstar logstar closed this as completed Jul 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants