-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deep sequencing #62
Comments
Hi KBT59, |
Hi again, Thanks! Let us know how it goes. |
Thanks, I’m giving that a try today. vsc_min_count_snps, vsc_min_count_indels are already small numbers (2), so I changed only the fraction flags from their defaults, 0.12, to 0.01 which the fraction I want. Is that reasonable?
Brad Thomas
From: Pi-Chuan Chang [mailto:notifications@github.com]
Sent: Thursday, April 5, 2018 6:56 PM
To: google/deepvariant <deepvariant@noreply.github.com>
Cc: Brad Thomas <brad.thomas@neogenomics.com>; Author <author@noreply.github.com>
Subject: [EXTERNAL]Re: [google/deepvariant] Deep sequencing (#62)
CAUTION: This email originated from outside the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe.
Hi again,
I didn't read carefully so I missed that you said you want to train a model.
If you want to get make_examples to create more candidates, the other flags you need to consider are: vsc_min_count_snps, vsc_min_count_indels, vsc_min_fraction_snps, vsc_min_fraction_indels. With the default values of these flags for VSC (Very Sensitive Caller), you simply won't be able to even get candidates generated for low allele fraction variants. So I would suggest playing around with those flags and see if more candidates come out.
Thanks! Let us know how it goes.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#62 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AifcqU5J11c7Zr-VYS_8CjFPh-UF6VIYks5tlq76gaJpZM4TIm9R>.
This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. NeoGenomics Laboratories, Suite 5, 12701 Commonwealth Dr, Fort Myers, FL 33913, http://www.neogenomics.com (2017)
|
Lowering the fractions makes sense. Since you're doing something very experimental, you'll need to look into your own metrics to see what threshold makes sense. I think you'll want to confirm that your new setting does give you enough sensitivity. Because if something is not picked up by the Very Sensitive Caller, it won't be called later on. |
Hello,
With make_examples I believe I have made examples I can use in model_train. The 64 files are named like this: 5PRR-RD_S86.examples.tfrecord-00000-of-00064
My protobuffer file contains this:
name: "my-training-dataset"
tfrecord_path: "/home2/myModelAttempt/output/5PRR-RD_S86.examples.tfrecord"
num_examples: 64
When I run model_train I see this error:
ValueError: Cannot find matching files with the pattern "/home2/myModelAttempt/output/5PRR-RD_S86.examples.tfrecord"
How should I specify the tfrecord_path to get model_train to use the files?
Brad Thomas
From: Pi-Chuan Chang [mailto:notifications@github.com]
Sent: Thursday, April 5, 2018 6:56 PM
To: google/deepvariant <deepvariant@noreply.github.com>
Cc: Brad Thomas <brad.thomas@neogenomics.com>; Author <author@noreply.github.com>
Subject: [EXTERNAL]Re: [google/deepvariant] Deep sequencing (#62)
CAUTION: This email originated from outside the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe.
Hi again,
I didn't read carefully so I missed that you said you want to train a model.
If you want to get make_examples to create more candidates, the other flags you need to consider are: vsc_min_count_snps, vsc_min_count_indels, vsc_min_fraction_snps, vsc_min_fraction_indels. With the default values of these flags for VSC (Very Sensitive Caller), you simply won't be able to even get candidates generated for low allele fraction variants. So I would suggest playing around with those flags and see if more candidates come out.
Thanks! Let us know how it goes.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#62 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AifcqU5J11c7Zr-VYS_8CjFPh-UF6VIYks5tlq76gaJpZM4TIm9R>.
This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. NeoGenomics Laboratories, Suite 5, 12701 Commonwealth Dr, Fort Myers, FL 33913, http://www.neogenomics.com (2017)
|
I think you'll want: |
OK – that proceeded further, I think. Now the error is
ValueError: Can not squeeze dim[1], expected a dimension of 1, got 27 for 'InceptionV3/Logits/SpatialSqueeze' (op: 'Squeeze') with input shapes: [64,27,1,3]
I hate to keep bothering people about this. Is there documentation on all of this that I can refer to?
Thanks,
Brad Thomas
From: Pi-Chuan Chang [mailto:notifications@github.com]
Sent: Tuesday, April 10, 2018 1:04 PM
To: google/deepvariant <deepvariant@noreply.github.com>
Cc: Brad Thomas <brad.thomas@neogenomics.com>; Author <author@noreply.github.com>
Subject: [EXTERNAL]Re: [google/deepvariant] Deep sequencing (#62)
CAUTION: This email originated from outside the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe.
I think you'll want:
tfrecord_path: "/home2/myModelAttempt/output/5PRR-RD_S86.examples.tfrecord-?????-of-00064"
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#62 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AifcqWYh56G1_S7aFjDrcbIt_6qII5goks5tnPP8gaJpZM4TIm9R>.
This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. NeoGenomics Laboratories, Suite 5, 12701 Commonwealth Dr, Fort Myers, FL 33913, http://www.neogenomics.com (2017)
|
Hi,
From a quick look of your error, it doesn't look like anything I've ever
encountered before. If you could potentially set up a reproducible setting
that I can very quickly run, I can see if I can try it out and tell you
what might could have gone wrong. We don't currently have a tutorial for
training, unfortunately. And to be honest, even if we do, it probably
wouldn't specifically cover this error case.
(from my phone)
…On Tue, Apr 10, 2018, 11:16 AM KBT59 ***@***.***> wrote:
OK – that proceeded further, I think. Now the error is
ValueError: Can not squeeze dim[1], expected a dimension of 1, got 27 for
'InceptionV3/Logits/SpatialSqueeze' (op: 'Squeeze') with input shapes:
[64,27,1,3]
I hate to keep bothering people about this. Is there documentation on all
of this that I can refer to?
Thanks,
Brad Thomas
From: Pi-Chuan Chang ***@***.***
Sent: Tuesday, April 10, 2018 1:04 PM
To: google/deepvariant ***@***.***>
Cc: Brad Thomas ***@***.***>; Author <
***@***.***>
Subject: [EXTERNAL]Re: [google/deepvariant] Deep sequencing (#62)
CAUTION: This email originated from outside the organization. DO NOT click
links or open attachments unless you recognize the sender and know the
content is safe.
I think you'll want:
tfrecord_path:
"/home2/myModelAttempt/output/5PRR-RD_S86.examples.tfrecord-?????-of-00064"
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<
#62 (comment)>,
or mute the thread<
https://github.com/notifications/unsubscribe-auth/AifcqWYh56G1_S7aFjDrcbIt_6qII5goks5tnPP8gaJpZM4TIm9R>.
This message contains confidential information and is intended only for
the individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. E-mail transmission cannot be
guaranteed to be secured or error-free as information could be intercepted,
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
The sender therefore does not accept liability for any errors or omissions
in the contents of this message, which arise as a result of e-mail
transmission. If verification is required please request a hard-copy
version. NeoGenomics Laboratories, Suite 5, 12701 Commonwealth Dr, Fort
Myers, FL 33913, http://www.neogenomics.com (2017)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#62 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAczBalfAA5qelNx5damo_mTuPg7r4UJks5tnPbigaJpZM4TIm9R>
.
|
Hello,
Unfortunately the data I’m using are restricted by Federal regulations and also are proprietary. Apart from sharing data, what can I provide that might help figure this out?
Brad Thomas
From: Pi-Chuan Chang [mailto:notifications@github.com]
Sent: Thursday, April 12, 2018 3:34 PM
To: google/deepvariant <deepvariant@noreply.github.com>
Cc: Brad Thomas <brad.thomas@neogenomics.com>; Author <author@noreply.github.com>
Subject: [EXTERNAL]Re: [google/deepvariant] Deep sequencing (#62)
CAUTION: This email originated from outside the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe.
Hi,
From a quick look of your error, it doesn't look like anything I've ever
encountered before. If you could potentially set up a reproducible setting
that I can very quickly run, I can see if I can try it out and tell you
what might could have gone wrong. We don't currently have a tutorial for
training, unfortunately. And to be honest, even if we do, it probably
wouldn't specifically cover this error case.
(from my phone)
On Tue, Apr 10, 2018, 11:16 AM KBT59 ***@***.******@***.***>> wrote:
OK – that proceeded further, I think. Now the error is
ValueError: Can not squeeze dim[1], expected a dimension of 1, got 27 for
'InceptionV3/Logits/SpatialSqueeze' (op: 'Squeeze') with input shapes:
[64,27,1,3]
I hate to keep bothering people about this. Is there documentation on all
of this that I can refer to?
Thanks,
Brad Thomas
From: Pi-Chuan Chang ***@***.***
Sent: Tuesday, April 10, 2018 1:04 PM
To: google/deepvariant ***@***.******@***.***>>
Cc: Brad Thomas ***@***.******@***.***>>; Author <
***@***.******@***.***>>
Subject: [EXTERNAL]Re: [google/deepvariant] Deep sequencing (#62)
CAUTION: This email originated from outside the organization. DO NOT click
links or open attachments unless you recognize the sender and know the
content is safe.
I think you'll want:
tfrecord_path:
"/home2/myModelAttempt/output/5PRR-RD_S86.examples.tfrecord-?????-of-00064"
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<
#62 (comment)>,
or mute the thread<
https://github.com/notifications/unsubscribe-auth/AifcqWYh56G1_S7aFjDrcbIt_6qII5goks5tnPP8gaJpZM4TIm9R>.
This message contains confidential information and is intended only for
the individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. E-mail transmission cannot be
guaranteed to be secured or error-free as information could be intercepted,
corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
The sender therefore does not accept liability for any errors or omissions
in the contents of this message, which arise as a result of e-mail
transmission. If verification is required please request a hard-copy
version. NeoGenomics Laboratories, Suite 5, 12701 Commonwealth Dr, Fort
Myers, FL 33913, http://www.neogenomics.com (2017)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#62 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAczBalfAA5qelNx5damo_mTuPg7r4UJks5tnPbigaJpZM4TIm9R>
.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#62 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AifcqWTCcHHVi1NrDCRWylTEadlDsGGAks5tn7pQgaJpZM4TIm9R>.
This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. NeoGenomics Laboratories, Suite 5, 12701 Commonwealth Dr, Fort Myers, FL 33913, http://www.neogenomics.com (2017)
|
Hi, From earlier discussions, it sounds like the main thing you're changing about the data representation is the pileup_image_height. You can actually do the same thing on the QuickStart or CaseStudy data too. It will just look like a taller image with the bottom being mostly empty. And then, I suspect there's a high probability that you can get the same error on the CaseStudy data if you follow the same steps. Once you're able to do that, post every steps (similar to QuickStart and CaseStudy) here. And note the place where you're having an error. |
I’m generating a set from the GIAB exome data as you described. I’ll see what happens with it when I try to train with it.
Brad Thomas
From: Pi-Chuan Chang [mailto:notifications@github.com]
Sent: Friday, April 13, 2018 10:14 AM
To: google/deepvariant <deepvariant@noreply.github.com>
Cc: Brad Thomas <brad.thomas@neogenomics.com>; Author <author@noreply.github.com>
Subject: [EXTERNAL]Re: [google/deepvariant] Deep sequencing (#62)
CAUTION: This email originated from outside the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe.
Hi,
originally I was thinking a small/synthetic dataset could subsampled from your data. I actually don't want the full data anyway (that wouldn't really be a small thing I can try). I understand if you can't even subsample from that.
How about at least posting the commands you used?
From earlier discussions, it sounds like the main thing you're changing about the data representation is the pileup_image_height. You can actually do the same thing on the QuickStart or CaseStudy data too. It will just look like a taller image with the bottom being mostly empty.
(You can use logic like this https://github.com/google/deepvariant/blob/r0.6/docs/visualizing_examples.ipynb to visualize them)
And then, I suspect there's a high probability that you can get the same error on the CaseStudy data if you follow the same steps.
Once you're able to do that, post every steps (similar to QuickStart and CaseStudy) here. And note the place where you're having an error.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#62 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AifcqSFWVfTRnfVvHyl6ecMC-XCahjurks5toMDJgaJpZM4TIm9R>.
This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. NeoGenomics Laboratories, Suite 5, 12701 Commonwealth Dr, Fort Myers, FL 33913, http://www.neogenomics.com (2017)
|
Hello,
I did make 64 examples from the GIAB exome data mentioned on the github site. I encountered the same problem I mentioned. I’ve attached an archive, bundle.zip that has important files. The file nohup.out shows what was returned when I ran model_train from the command line. Examples were made using the shell script in the bundle: testModeExamples.sh. I’ve included the two python scripts I’ve altered for my deep sequencing project.
I appreciate your help. Let me know if there is more I should provide.
Thank you,
Brad Thomas
From: Pi-Chuan Chang [mailto:notifications@github.com]
Sent: Friday, April 13, 2018 10:14 AM
To: google/deepvariant <deepvariant@noreply.github.com>
Cc: Brad Thomas <brad.thomas@neogenomics.com>; Author <author@noreply.github.com>
Subject: [EXTERNAL]Re: [google/deepvariant] Deep sequencing (#62)
CAUTION: This email originated from outside the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe.
Hi,
originally I was thinking a small/synthetic dataset could subsampled from your data. I actually don't want the full data anyway (that wouldn't really be a small thing I can try). I understand if you can't even subsample from that.
How about at least posting the commands you used?
From earlier discussions, it sounds like the main thing you're changing about the data representation is the pileup_image_height. You can actually do the same thing on the QuickStart or CaseStudy data too. It will just look like a taller image with the bottom being mostly empty.
(You can use logic like this https://github.com/google/deepvariant/blob/r0.6/docs/visualizing_examples.ipynb to visualize them)
And then, I suspect there's a high probability that you can get the same error on the CaseStudy data if you follow the same steps.
Once you're able to do that, post every steps (similar to QuickStart and CaseStudy) here. And note the place where you're having an error.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#62 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AifcqSFWVfTRnfVvHyl6ecMC-XCahjurks5toMDJgaJpZM4TIm9R>.
This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. NeoGenomics Laboratories, Suite 5, 12701 Commonwealth Dr, Fort Myers, FL 33913, http://www.neogenomics.com (2017)
|
Hi, |
Here is the zip file.
Brad Thomas
From: Pi-Chuan Chang [mailto:notifications@github.com]
Sent: Saturday, April 14, 2018 12:09 AM
To: google/deepvariant <deepvariant@noreply.github.com>
Cc: Brad Thomas <brad.thomas@neogenomics.com>; Author <author@noreply.github.com>
Subject: [EXTERNAL]Re: [google/deepvariant] Deep sequencing (#62)
CAUTION: This email originated from outside the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe.
Hi,
I'm not seeing the zip file.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#62 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AifcqT_omHfFPRVmhBNx0mJ-jQQyMRMXks5toYR4gaJpZM4TIm9R>.
This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. NeoGenomics Laboratories, Suite 5, 12701 Commonwealth Dr, Fort Myers, FL 33913, http://www.neogenomics.com (2017)
|
Hello,
Did you receive the attachment I resent on 4/16? Also, any thoughts on the error I was seeing?
Thank you and best regards,
Brad Thomas
From: Pi-Chuan Chang [mailto:notifications@github.com]
Sent: Saturday, April 14, 2018 12:09 AM
To: google/deepvariant <deepvariant@noreply.github.com>
Cc: Brad Thomas <brad.thomas@neogenomics.com>; Author <author@noreply.github.com>
Subject: [EXTERNAL]Re: [google/deepvariant] Deep sequencing (#62)
CAUTION: This email originated from outside the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe.
Hi,
I'm not seeing the zip file.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#62 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AifcqT_omHfFPRVmhBNx0mJ-jQQyMRMXks5toYR4gaJpZM4TIm9R>.
This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. NeoGenomics Laboratories, Suite 5, 12701 Commonwealth Dr, Fort Myers, FL 33913, http://www.neogenomics.com (2017)
|
Hi Brad, Sometimes smtp (email) servers block zip files. Just put it on Google Drive or DropBox and share the link to it. ~p |
Frustrating. We are blocked from using Google Drive or DropBox. I will send the file from home.
Thanks,
Brad Thomas
From: Paul Grosu [mailto:notifications@github.com]
Sent: Tuesday, May 1, 2018 10:50 AM
To: google/deepvariant <deepvariant@noreply.github.com>
Cc: Brad Thomas <brad.thomas@neogenomics.com>; Author <author@noreply.github.com>
Subject: [EXTERNAL]Re: [google/deepvariant] Deep sequencing (#62)
CAUTION: This email originated from outside the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe.
Hi Brad,
Sometimes smtp (email) servers block zip files. Just put it on Google Drive or DropBox and share the link to it.
~p
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#62 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AifcqWj7l8lWX5469lRFaED45lcY1l0Kks5tuIQogaJpZM4TIm9R>.
This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. NeoGenomics Laboratories, Suite 5, 12701 Commonwealth Dr, Fort Myers, FL 33913, http://www.neogenomics.com (2017)
|
If they are small you might attach them individually directly in Github as shown here: https://blog.github.com/2015-09-25-attach-files-to-comments/ |
These are the files I mentioned above. |
Hi, |
Update: |
I've figured out what's going on here and have some good news and bad news. First, the bad news is that setting the height to 2000 isn't going to work in the short run. This is a limitation coming from inception_v3 itself. At such large image sizes, we would have to run with spatial_squeeze=False to avoid this exception. By doing so we'd essentially end up with a "tile" of deepvariant predictions every 64 rows in the image, and then have to pool them together somehow, which makes sense in the general object detection case but not for us in DeepVariant. The good news is that the maximum supported depth is 362. So you can get a lot more information into your images than the default 100 value. Give 362 a try and let us know if that works. I should point out that we use a reservoir sampler to create these images. So a height of 362 means you'll get a random sampling of 362 - 5 [for the reference] reads from your very deep sequencing. It's not ideal if you want to detect things occurring in only 1 or 2 reads, but you get a reasonable number of reads if you are looking for things >1% or so frequency in the reads. Hope that helps! Mark |
I am interested in training DeepVariant for deep sequencing on a capture panel. We are interested in lower frequency variants - say 1%. Depth of coverage is on the order of 1000 to 1700 for the data I am using. I have set the default height of the pileup tensors to 2000 via
https://github.com/google/deepvariant/blob/r0.5/deepvariant/make_examples.py#L177
In a set with 497 confirmed 'true' variants I'm getting a much smaller number of variants out of make_examples:
I0404 17:02:18.420840 140137671104256 make_examples.py:1032] Found 487 candidate variants
I0404 17:02:18.421224 140137671104256 make_examples.py:620] ----- VariantCounts -----
I0404 17:02:18.421346 140137671104256 make_examples.py:624] All: 29/29 (100.00%)
I0404 17:02:18.421475 140137671104256 make_examples.py:624] SNPs: 27/29 (93.10%)
I0404 17:02:18.421593 140137671104256 make_examples.py:624] Indels: 2/29 (6.90%)
I0404 17:02:18.421717 140137671104256 make_examples.py:624] BiAllelic: 29/29 (100.00%)
I0404 17:02:18.421834 140137671104256 make_examples.py:624] MultiAllelic: 0/29 (0.00%)
I0404 17:02:18.421953 140137671104256 make_examples.py:624] HomRef: 28/29 (96.55%)
I0404 17:02:18.422069 140137671104256 make_examples.py:624] Het: 1/29 (3.45%)
What, besides setting the pileup height to match my data, should I be looking at?
The text was updated successfully, but these errors were encountered: