Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gCNV WGS cohort germline WDL run outputs vcf files with names that do not correspond to the actual sample inside the file. #5217

Closed
vruano opened this issue Sep 25, 2018 · 10 comments

Comments

@vruano
Copy link
Contributor

vruano commented Sep 25, 2018

I used the very last version of the WDL using the last master commit for the docker and the name of the output files seem to be a random permutation of the actual sample names. The name inside the file (i.e. the one listed in the #CHROM line) seems to be correct.

A previous run using a earlier version of the WDL (probably the one just before the last update) and the latest official gatk docker didn't have this issue.
.

@samuelklee
Copy link
Contributor

Thanks for catching this, @vruano. @asmirnov239 and I will take a look.

@samuelklee
Copy link
Contributor

@vruano could you point us to the run results, either here or on Slack?

@samuelklee
Copy link
Contributor

@asmirnov239 I just noticed you changed the call output in the GermlineCNVCaller tasks from a single tar containing all samples to an array of tars (one for each sample): Array[File] gcnv_call_tars = glob("*-gcnv-calls.tar.gz"). Could this be the cause? Also, wouldn't this be an issue in case mode, as well?

@vruano
Copy link
Contributor Author

vruano commented Sep 25, 2018

@vruano
Copy link
Contributor Author

vruano commented Sep 25, 2018

The folder containing all the outputs (scripts and staged inputs) in the cloud is in https://console.cloud.google.com/storage/browser/fc-6da0ee01-0ec0-4606-b918-216fe3f7f098/ecdb3b72-7b4b-4612-9c87-1c0124f62708?pli=1

You should be able to browse down to each tasks sub-folder.

@samuelklee
Copy link
Contributor

Thanks, @vruano, but I think you'll have to give me access to the FC workspace and bucket.

I've opened a branch sl_revert_glob that I think will fix the issue. I'll test it out on FC and let you know how it goes.

@asmirnov239
Copy link
Collaborator

asmirnov239 commented Sep 25, 2018

@vruano Thanks for catching this! I can take a look at it for you. I don't have the permission to the bucket you mentioned though. It says: You need the storage.objects.list permission to list objects in this bucket. Ask a project or bucket owner to give you this permission and try again

@slee I think you're right, it's probably because of the order glob captures the files, which I believe is lexicographical.

@vruano
Copy link
Contributor Author

vruano commented Sep 25, 2018

I see what I can do about the perm issue, but is probably not up to me.

@vruano
Copy link
Contributor Author

vruano commented Sep 25, 2018

@cwhelan could you give andrei and sam access to the FC dsde-methods-sv-dev workspace can investigate the bug. I and Steve had tried without success.

@samuelklee
Copy link
Contributor

samuelklee commented Sep 25, 2018

Thanks, we have access now. I'm pretty sure that sl_revert_glob will fix the error. I've rebased my dev branch sl_filter (which includes the filtering steps Jack mentioned in the BSV meeting today) onto sl_revert and am testing cohort mode on FC now. I'll try to test scattered-case mode as well later today if that succeeds.

As @asmirnov239 pointed out to me, this revert leaves #4397 unresolved, so we should go back and clean up at some point. However, our priority now is to get a stable v1 of the SFARI evaluation on FC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants