Deep sequencing #62

KBT59 · 2018-04-05T14:57:35Z

I am interested in training DeepVariant for deep sequencing on a capture panel. We are interested in lower frequency variants - say 1%. Depth of coverage is on the order of 1000 to 1700 for the data I am using. I have set the default height of the pileup tensors to 2000 via
https://github.com/google/deepvariant/blob/r0.5/deepvariant/make_examples.py#L177

In a set with 497 confirmed 'true' variants I'm getting a much smaller number of variants out of make_examples:

I0404 17:02:18.420840 140137671104256 make_examples.py:1032] Found 487 candidate variants
I0404 17:02:18.421224 140137671104256 make_examples.py:620] ----- VariantCounts -----
I0404 17:02:18.421346 140137671104256 make_examples.py:624] All: 29/29 (100.00%)
I0404 17:02:18.421475 140137671104256 make_examples.py:624] SNPs: 27/29 (93.10%)
I0404 17:02:18.421593 140137671104256 make_examples.py:624] Indels: 2/29 (6.90%)
I0404 17:02:18.421717 140137671104256 make_examples.py:624] BiAllelic: 29/29 (100.00%)
I0404 17:02:18.421834 140137671104256 make_examples.py:624] MultiAllelic: 0/29 (0.00%)
I0404 17:02:18.421953 140137671104256 make_examples.py:624] HomRef: 28/29 (96.55%)
I0404 17:02:18.422069 140137671104256 make_examples.py:624] Het: 1/29 (3.45%)

What, besides setting the pileup height to match my data, should I be looking at?

pichuan · 2018-04-05T23:35:10Z

Hi KBT59,
because the released models are trained with the default height of the pileup images, by just changing --pileup_image_height at inference time won't really give you better results. Currently DeepVariant is a germline variant caller, so it's not designed to call variants with 1% frequency.

pichuan · 2018-04-05T23:55:35Z

Hi again,
I didn't read carefully so I missed that you said you want to train a model.
If you want to get make_examples to create more candidates, the other flags you need to consider are: vsc_min_count_snps, vsc_min_count_indels, vsc_min_fraction_snps, vsc_min_fraction_indels. With the default values of these flags for VSC (Very Sensitive Caller), you simply won't be able to even get candidates generated for low allele fraction variants. So I would suggest playing around with those flags and see if more candidates come out.

Thanks! Let us know how it goes.

KBT59 · 2018-04-09T18:57:19Z

Thanks, I’m giving that a try today. vsc_min_count_snps, vsc_min_count_indels are already small numbers (2), so I changed only the fraction flags from their defaults, 0.12, to 0.01 which the fraction I want. Is that reasonable? Brad Thomas From: Pi-Chuan Chang [mailto:notifications@github.com] Sent: Thursday, April 5, 2018 6:56 PM To: google/deepvariant <deepvariant@noreply.github.com> Cc: Brad Thomas <brad.thomas@neogenomics.com>; Author <author@noreply.github.com> Subject: [EXTERNAL]Re: [google/deepvariant] Deep sequencing (#62) CAUTION: This email originated from outside the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe. Hi again, I didn't read carefully so I missed that you said you want to train a model. If you want to get make_examples to create more candidates, the other flags you need to consider are: vsc_min_count_snps, vsc_min_count_indels, vsc_min_fraction_snps, vsc_min_fraction_indels. With the default values of these flags for VSC (Very Sensitive Caller), you simply won't be able to even get candidates generated for low allele fraction variants. So I would suggest playing around with those flags and see if more candidates come out. Thanks! Let us know how it goes. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#62 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AifcqU5J11c7Zr-VYS_8CjFPh-UF6VIYks5tlq76gaJpZM4TIm9R>. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. NeoGenomics Laboratories, Suite 5, 12701 Commonwealth Dr, Fort Myers, FL 33913, http://www.neogenomics.com (2017)

pichuan · 2018-04-10T04:15:59Z

Lowering the fractions makes sense. Since you're doing something very experimental, you'll need to look into your own metrics to see what threshold makes sense. I think you'll want to confirm that your new setting does give you enough sensitivity. Because if something is not picked up by the Very Sensitive Caller, it won't be called later on.
There's a chance that the current model won't work well on your use case at all (and you might need to use a different kind of model), but it's worth a try.

KBT59 · 2018-04-10T16:10:26Z

Hello, With make_examples I believe I have made examples I can use in model_train. The 64 files are named like this: 5PRR-RD_S86.examples.tfrecord-00000-of-00064 My protobuffer file contains this: name: "my-training-dataset" tfrecord_path: "/home2/myModelAttempt/output/5PRR-RD_S86.examples.tfrecord" num_examples: 64 When I run model_train I see this error: ValueError: Cannot find matching files with the pattern "/home2/myModelAttempt/output/5PRR-RD_S86.examples.tfrecord" How should I specify the tfrecord_path to get model_train to use the files? Brad Thomas From: Pi-Chuan Chang [mailto:notifications@github.com] Sent: Thursday, April 5, 2018 6:56 PM To: google/deepvariant <deepvariant@noreply.github.com> Cc: Brad Thomas <brad.thomas@neogenomics.com>; Author <author@noreply.github.com> Subject: [EXTERNAL]Re: [google/deepvariant] Deep sequencing (#62) CAUTION: This email originated from outside the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe. Hi again, I didn't read carefully so I missed that you said you want to train a model. If you want to get make_examples to create more candidates, the other flags you need to consider are: vsc_min_count_snps, vsc_min_count_indels, vsc_min_fraction_snps, vsc_min_fraction_indels. With the default values of these flags for VSC (Very Sensitive Caller), you simply won't be able to even get candidates generated for low allele fraction variants. So I would suggest playing around with those flags and see if more candidates come out. Thanks! Let us know how it goes. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#62 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AifcqU5J11c7Zr-VYS_8CjFPh-UF6VIYks5tlq76gaJpZM4TIm9R>. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. NeoGenomics Laboratories, Suite 5, 12701 Commonwealth Dr, Fort Myers, FL 33913, http://www.neogenomics.com (2017)

pichuan · 2018-04-10T18:03:36Z

I think you'll want:
tfrecord_path: "/home2/myModelAttempt/output/5PRR-RD_S86.examples.tfrecord-?????-of-00064"

KBT59 · 2018-04-10T18:15:56Z

OK – that proceeded further, I think. Now the error is ValueError: Can not squeeze dim[1], expected a dimension of 1, got 27 for 'InceptionV3/Logits/SpatialSqueeze' (op: 'Squeeze') with input shapes: [64,27,1,3] I hate to keep bothering people about this. Is there documentation on all of this that I can refer to? Thanks, Brad Thomas From: Pi-Chuan Chang [mailto:notifications@github.com] Sent: Tuesday, April 10, 2018 1:04 PM To: google/deepvariant <deepvariant@noreply.github.com> Cc: Brad Thomas <brad.thomas@neogenomics.com>; Author <author@noreply.github.com> Subject: [EXTERNAL]Re: [google/deepvariant] Deep sequencing (#62) CAUTION: This email originated from outside the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe. I think you'll want: tfrecord_path: "/home2/myModelAttempt/output/5PRR-RD_S86.examples.tfrecord-?????-of-00064" — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#62 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AifcqWYh56G1_S7aFjDrcbIt_6qII5goks5tnPP8gaJpZM4TIm9R>. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. NeoGenomics Laboratories, Suite 5, 12701 Commonwealth Dr, Fort Myers, FL 33913, http://www.neogenomics.com (2017)

pichuan · 2018-04-12T20:34:15Z

Hi, From a quick look of your error, it doesn't look like anything I've ever encountered before. If you could potentially set up a reproducible setting that I can very quickly run, I can see if I can try it out and tell you what might could have gone wrong. We don't currently have a tutorial for training, unfortunately. And to be honest, even if we do, it probably wouldn't specifically cover this error case. (from my phone)

…

On Tue, Apr 10, 2018, 11:16 AM KBT59 ***@***.***> wrote: OK – that proceeded further, I think. Now the error is ValueError: Can not squeeze dim[1], expected a dimension of 1, got 27 for 'InceptionV3/Logits/SpatialSqueeze' (op: 'Squeeze') with input shapes: [64,27,1,3] I hate to keep bothering people about this. Is there documentation on all of this that I can refer to? Thanks, Brad Thomas From: Pi-Chuan Chang ***@***.*** Sent: Tuesday, April 10, 2018 1:04 PM To: google/deepvariant ***@***.***> Cc: Brad Thomas ***@***.***>; Author < ***@***.***> Subject: [EXTERNAL]Re: [google/deepvariant] Deep sequencing (#62) CAUTION: This email originated from outside the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe. I think you'll want: tfrecord_path: "/home2/myModelAttempt/output/5PRR-RD_S86.examples.tfrecord-?????-of-00064" — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< #62 (comment)>, or mute the thread< https://github.com/notifications/unsubscribe-auth/AifcqWYh56G1_S7aFjDrcbIt_6qII5goks5tnPP8gaJpZM4TIm9R>. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. NeoGenomics Laboratories, Suite 5, 12701 Commonwealth Dr, Fort Myers, FL 33913, http://www.neogenomics.com (2017) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#62 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAczBalfAA5qelNx5damo_mTuPg7r4UJks5tnPbigaJpZM4TIm9R> .

KBT59 · 2018-04-13T15:05:19Z

Hello, Unfortunately the data I’m using are restricted by Federal regulations and also are proprietary. Apart from sharing data, what can I provide that might help figure this out? Brad Thomas From: Pi-Chuan Chang [mailto:notifications@github.com] Sent: Thursday, April 12, 2018 3:34 PM To: google/deepvariant <deepvariant@noreply.github.com> Cc: Brad Thomas <brad.thomas@neogenomics.com>; Author <author@noreply.github.com> Subject: [EXTERNAL]Re: [google/deepvariant] Deep sequencing (#62) CAUTION: This email originated from outside the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe. Hi, From a quick look of your error, it doesn't look like anything I've ever encountered before. If you could potentially set up a reproducible setting that I can very quickly run, I can see if I can try it out and tell you what might could have gone wrong. We don't currently have a tutorial for training, unfortunately. And to be honest, even if we do, it probably wouldn't specifically cover this error case. (from my phone)

On Tue, Apr 10, 2018, 11:16 AM KBT59 ***@***.******@***.***>> wrote: OK – that proceeded further, I think. Now the error is ValueError: Can not squeeze dim[1], expected a dimension of 1, got 27 for 'InceptionV3/Logits/SpatialSqueeze' (op: 'Squeeze') with input shapes: [64,27,1,3] I hate to keep bothering people about this. Is there documentation on all of this that I can refer to? Thanks, Brad Thomas From: Pi-Chuan Chang ***@***.*** Sent: Tuesday, April 10, 2018 1:04 PM To: google/deepvariant ***@***.******@***.***>> Cc: Brad Thomas ***@***.******@***.***>>; Author < ***@***.******@***.***>> Subject: [EXTERNAL]Re: [google/deepvariant] Deep sequencing (#62) CAUTION: This email originated from outside the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe. I think you'll want: tfrecord_path: "/home2/myModelAttempt/output/5PRR-RD_S86.examples.tfrecord-?????-of-00064" — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< #62 (comment)>, or mute the thread< https://github.com/notifications/unsubscribe-auth/AifcqWYh56G1_S7aFjDrcbIt_6qII5goks5tnPP8gaJpZM4TIm9R>. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. NeoGenomics Laboratories, Suite 5, 12701 Commonwealth Dr, Fort Myers, FL 33913, http://www.neogenomics.com (2017) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#62 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAczBalfAA5qelNx5damo_mTuPg7r4UJks5tnPbigaJpZM4TIm9R> .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#62 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AifcqWTCcHHVi1NrDCRWylTEadlDsGGAks5tn7pQgaJpZM4TIm9R>. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. NeoGenomics Laboratories, Suite 5, 12701 Commonwealth Dr, Fort Myers, FL 33913, http://www.neogenomics.com (2017)

pichuan · 2018-04-13T15:14:06Z

Hi,
originally I was thinking a small/synthetic dataset could subsampled from your data. I actually don't want the full data anyway (that wouldn't really be a small thing I can try). But I understand if you can't even subsample from your real data.
How about at least posting the commands you used?

From earlier discussions, it sounds like the main thing you're changing about the data representation is the pileup_image_height. You can actually do the same thing on the QuickStart or CaseStudy data too. It will just look like a taller image with the bottom being mostly empty.
(You can use logic like this https://github.com/google/deepvariant/blob/r0.6/docs/visualizing_examples.ipynb to visualize them)

And then, I suspect there's a high probability that you can get the same error on the CaseStudy data if you follow the same steps.

Once you're able to do that, post every steps (similar to QuickStart and CaseStudy) here. And note the place where you're having an error.

KBT59 · 2018-04-13T17:38:16Z

I’m generating a set from the GIAB exome data as you described. I’ll see what happens with it when I try to train with it. Brad Thomas From: Pi-Chuan Chang [mailto:notifications@github.com] Sent: Friday, April 13, 2018 10:14 AM To: google/deepvariant <deepvariant@noreply.github.com> Cc: Brad Thomas <brad.thomas@neogenomics.com>; Author <author@noreply.github.com> Subject: [EXTERNAL]Re: [google/deepvariant] Deep sequencing (#62) CAUTION: This email originated from outside the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe. Hi, originally I was thinking a small/synthetic dataset could subsampled from your data. I actually don't want the full data anyway (that wouldn't really be a small thing I can try). I understand if you can't even subsample from that. How about at least posting the commands you used? From earlier discussions, it sounds like the main thing you're changing about the data representation is the pileup_image_height. You can actually do the same thing on the QuickStart or CaseStudy data too. It will just look like a taller image with the bottom being mostly empty. (You can use logic like this https://github.com/google/deepvariant/blob/r0.6/docs/visualizing_examples.ipynb to visualize them) And then, I suspect there's a high probability that you can get the same error on the CaseStudy data if you follow the same steps. Once you're able to do that, post every steps (similar to QuickStart and CaseStudy) here. And note the place where you're having an error. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#62 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AifcqSFWVfTRnfVvHyl6ecMC-XCahjurks5toMDJgaJpZM4TIm9R>. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. NeoGenomics Laboratories, Suite 5, 12701 Commonwealth Dr, Fort Myers, FL 33913, http://www.neogenomics.com (2017)

KBT59 · 2018-04-13T20:08:11Z

Hello, I did make 64 examples from the GIAB exome data mentioned on the github site. I encountered the same problem I mentioned. I’ve attached an archive, bundle.zip that has important files. The file nohup.out shows what was returned when I ran model_train from the command line. Examples were made using the shell script in the bundle: testModeExamples.sh. I’ve included the two python scripts I’ve altered for my deep sequencing project. I appreciate your help. Let me know if there is more I should provide. Thank you, Brad Thomas From: Pi-Chuan Chang [mailto:notifications@github.com] Sent: Friday, April 13, 2018 10:14 AM To: google/deepvariant <deepvariant@noreply.github.com> Cc: Brad Thomas <brad.thomas@neogenomics.com>; Author <author@noreply.github.com> Subject: [EXTERNAL]Re: [google/deepvariant] Deep sequencing (#62) CAUTION: This email originated from outside the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe. Hi, originally I was thinking a small/synthetic dataset could subsampled from your data. I actually don't want the full data anyway (that wouldn't really be a small thing I can try). I understand if you can't even subsample from that. How about at least posting the commands you used? From earlier discussions, it sounds like the main thing you're changing about the data representation is the pileup_image_height. You can actually do the same thing on the QuickStart or CaseStudy data too. It will just look like a taller image with the bottom being mostly empty. (You can use logic like this https://github.com/google/deepvariant/blob/r0.6/docs/visualizing_examples.ipynb to visualize them) And then, I suspect there's a high probability that you can get the same error on the CaseStudy data if you follow the same steps. Once you're able to do that, post every steps (similar to QuickStart and CaseStudy) here. And note the place where you're having an error. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#62 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AifcqSFWVfTRnfVvHyl6ecMC-XCahjurks5toMDJgaJpZM4TIm9R>. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. NeoGenomics Laboratories, Suite 5, 12701 Commonwealth Dr, Fort Myers, FL 33913, http://www.neogenomics.com (2017)

pichuan · 2018-04-14T05:09:06Z

Hi,
I'm not seeing the zip file.

KBT59 · 2018-04-16T14:37:40Z

Here is the zip file. Brad Thomas From: Pi-Chuan Chang [mailto:notifications@github.com] Sent: Saturday, April 14, 2018 12:09 AM To: google/deepvariant <deepvariant@noreply.github.com> Cc: Brad Thomas <brad.thomas@neogenomics.com>; Author <author@noreply.github.com> Subject: [EXTERNAL]Re: [google/deepvariant] Deep sequencing (#62) CAUTION: This email originated from outside the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe. Hi, I'm not seeing the zip file. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#62 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AifcqT_omHfFPRVmhBNx0mJ-jQQyMRMXks5toYR4gaJpZM4TIm9R>. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. NeoGenomics Laboratories, Suite 5, 12701 Commonwealth Dr, Fort Myers, FL 33913, http://www.neogenomics.com (2017)

KBT59 · 2018-05-01T15:33:39Z

Hello, Did you receive the attachment I resent on 4/16? Also, any thoughts on the error I was seeing? Thank you and best regards, Brad Thomas From: Pi-Chuan Chang [mailto:notifications@github.com] Sent: Saturday, April 14, 2018 12:09 AM To: google/deepvariant <deepvariant@noreply.github.com> Cc: Brad Thomas <brad.thomas@neogenomics.com>; Author <author@noreply.github.com> Subject: [EXTERNAL]Re: [google/deepvariant] Deep sequencing (#62) CAUTION: This email originated from outside the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe. Hi, I'm not seeing the zip file. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#62 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AifcqT_omHfFPRVmhBNx0mJ-jQQyMRMXks5toYR4gaJpZM4TIm9R>. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. NeoGenomics Laboratories, Suite 5, 12701 Commonwealth Dr, Fort Myers, FL 33913, http://www.neogenomics.com (2017)

pgrosu · 2018-05-01T15:49:57Z

Hi Brad,

Sometimes smtp (email) servers block zip files. Just put it on Google Drive or DropBox and share the link to it.

~p

KBT59 · 2018-05-01T15:52:56Z

Frustrating. We are blocked from using Google Drive or DropBox. I will send the file from home. Thanks, Brad Thomas From: Paul Grosu [mailto:notifications@github.com] Sent: Tuesday, May 1, 2018 10:50 AM To: google/deepvariant <deepvariant@noreply.github.com> Cc: Brad Thomas <brad.thomas@neogenomics.com>; Author <author@noreply.github.com> Subject: [EXTERNAL]Re: [google/deepvariant] Deep sequencing (#62) CAUTION: This email originated from outside the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe. Hi Brad, Sometimes smtp (email) servers block zip files. Just put it on Google Drive or DropBox and share the link to it. ~p — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#62 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AifcqWj7l8lWX5469lRFaED45lcY1l0Kks5tuIQogaJpZM4TIm9R>. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secured or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. NeoGenomics Laboratories, Suite 5, 12701 Commonwealth Dr, Fort Myers, FL 33913, http://www.neogenomics.com (2017)

pgrosu · 2018-05-01T16:01:55Z

If they are small you might attach them individually directly in Github as shown here:

https://blog.github.com/2015-09-25-attach-files-to-comments/

KBT59 · 2018-05-01T16:27:29Z

These are the files I mentioned above.

bundle.zip

pichuan · 2018-05-02T15:51:37Z

Hi,
I'll take a look. Give me a few days. Please feel free to ping back if you don't hear from me by end of this week.

pichuan · 2018-05-11T18:40:06Z

Update:
I can confirm that I'm able to reproduce your error. We're working on a fix. Stay tuned!

depristo · 2018-05-16T15:46:06Z

I've figured out what's going on here and have some good news and bad news.

First, the bad news is that setting the height to 2000 isn't going to work in the short run. This is a limitation coming from inception_v3 itself. At such large image sizes, we would have to run with spatial_squeeze=False to avoid this exception. By doing so we'd essentially end up with a "tile" of deepvariant predictions every 64 rows in the image, and then have to pool them together somehow, which makes sense in the general object detection case but not for us in DeepVariant.

The good news is that the maximum supported depth is 362. So you can get a lot more information into your images than the default 100 value. Give 362 a try and let us know if that works.

I should point out that we use a reservoir sampler to create these images. So a height of 362 means you'll get a random sampling of 362 - 5 [for the reference] reads from your very deep sequencing. It's not ideal if you want to detect things occurring in only 1 or 2 reads, but you get a reasonable number of reads if you are looking for things >1% or so frequency in the reads.

Hope that helps!

Mark

depristo closed this as completed Jun 28, 2018

pichuan mentioned this issue Apr 19, 2019

Is there a way to disable downsampling? #176

Closed

WeiweiBian mentioned this issue Apr 25, 2020

call mutation from multiple regions and multiple chromosomes #305

Closed

WeiweiBian mentioned this issue May 13, 2020

train a new model for deep sequencing with the Linux system #308

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deep sequencing #62

Deep sequencing #62

KBT59 commented Apr 5, 2018

pichuan commented Apr 5, 2018

pichuan commented Apr 5, 2018

KBT59 commented Apr 9, 2018 via email

pichuan commented Apr 10, 2018

KBT59 commented Apr 10, 2018 via email

pichuan commented Apr 10, 2018

KBT59 commented Apr 10, 2018 via email

pichuan commented Apr 12, 2018 via email

KBT59 commented Apr 13, 2018 via email

pichuan commented Apr 13, 2018 •

edited

Loading

KBT59 commented Apr 13, 2018 via email

KBT59 commented Apr 13, 2018 via email

pichuan commented Apr 14, 2018

KBT59 commented Apr 16, 2018 via email

KBT59 commented May 1, 2018 via email

pgrosu commented May 1, 2018

KBT59 commented May 1, 2018 via email

pgrosu commented May 1, 2018

KBT59 commented May 1, 2018

pichuan commented May 2, 2018

pichuan commented May 11, 2018

depristo commented May 16, 2018

Deep sequencing #62

Deep sequencing #62

Comments

KBT59 commented Apr 5, 2018

pichuan commented Apr 5, 2018

pichuan commented Apr 5, 2018

KBT59 commented Apr 9, 2018 via email

pichuan commented Apr 10, 2018

KBT59 commented Apr 10, 2018 via email

pichuan commented Apr 10, 2018

KBT59 commented Apr 10, 2018 via email

pichuan commented Apr 12, 2018 via email

KBT59 commented Apr 13, 2018 via email

pichuan commented Apr 13, 2018 • edited Loading

KBT59 commented Apr 13, 2018 via email

KBT59 commented Apr 13, 2018 via email

pichuan commented Apr 14, 2018

KBT59 commented Apr 16, 2018 via email

KBT59 commented May 1, 2018 via email

pgrosu commented May 1, 2018

KBT59 commented May 1, 2018 via email

pgrosu commented May 1, 2018

KBT59 commented May 1, 2018

pichuan commented May 2, 2018

pichuan commented May 11, 2018

depristo commented May 16, 2018

pichuan commented Apr 13, 2018 •

edited

Loading