Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenVINO integration #363

Closed
wants to merge 9 commits into from
Closed

OpenVINO integration #363

wants to merge 9 commits into from

Conversation

dkurt
Copy link

@dkurt dkurt commented Oct 15, 2020

  • Build Docker image with OpenVINO support
docker build -t deepvariant . --build-arg DV_OPENVINO_BUILD=1
  • Run
export INPUT_DIR="${PWD}/quickstart-testdata"
export OUTPUT_DIR="${PWD}/quickstart-output"

docker run \
  -v "${INPUT_DIR}":"/input" \
  -v "${OUTPUT_DIR}:/output" \
  deepvariant \
  /opt/deepvariant/bin/run_deepvariant \
  --model_type=WGS \
  --ref=/input/ucsc.hg19.chr20.unittest.fasta \
  --reads=/input/NA12878_S1.chr20.10_10p1mb.bam \
  --regions "chr20:10,000,000-10,010,000" \
  --output_vcf=/output/output.vcf.gz \
  --output_gvcf=/output/output.g.vcf.gz \
  --call_variants_extra_args="use_openvino=True" \
  --num_shards=1

(added extra flag --call_variants_extra_args="use_openvino=True" comparing to original Getting Started)

@dkurt
Copy link
Author

dkurt commented Oct 15, 2020

Hi! We would like to propose optional OpenVINO backend for call_variants step. Do you accept external PRs?

Also, this PR has Actions pipeline. Feel free to enable by https://github.com/google/deepvariant/actions to run it

/cc @pichuan

@AndrewCarroll
Copy link
Collaborator

Hi @dkurt

First, thank you for your interest in DeepVariant, and for the substantial work that you have put into these modifications. I have some questions for you, and I suspect @pichuan may add some questions and comments as well.

  1. We do not directly accept external PRs, but this is not because we do not accept community additions. The "source of truth" for the DeepVariant GitHub repo resides in internal systems, which are then copied onto GitHub. As a result, to incorporate your changes, we will need to change the internal code and copy that back out. We have done this in the past with community additions, and have attributed the authors for contributions in release notes and other forums.

  2. Can you help me understand the expected benefit of the changes? It looks like this should improve runtime for call_variants. Do you have any high-level information from benchmark runs for us to understand the percentage improvement to expect.

  3. If we are to incorporate these changes, we would want to make sure this would perform well across various hardware. For example, we would want our Docker images to gracefully fallback to working code if it is on a machine with incompatible hardware. Do you expect OpenVINO to have this property (for example, if someone is running on an AMD machine).

  4. We will also have to think about how these changes interact with any updates we would make to our use of TensorFlow. We're not directly planning anything in the near future, but it's good for us to consider.

I suspect that we will try to run with these changes and see how the performance changes. If you are able to answer some of these questions, it could be helpful for us to understand how to prioritize their assessment.

Thank you again for the work you have put into this. It's quite impressive, and we appreciate your effort.

Andrew

@dkurt
Copy link
Author

dkurt commented Oct 16, 2020

@AndrewCarroll, many thanks for such quick response!

We do not directly accept external PRs, but this is not because we do not accept community additions. The "source of truth" for the DeepVariant GitHub repo resides in internal systems, which are then copied onto GitHub. As a result, to incorporate your changes, we will need to change the internal code and copy that back out. We have done this in the past with community additions, and have attributed the authors for contributions in release notes and other forums.

I don't see any issues with that. The only thing which is important is to add the changes as a single commit separately from other changes so it can be properly tracked in git history.

Can you help me understand the expected benefit of the changes? It looks like this should improve runtime for call_variants. Do you have any high-level information from benchmark runs for us to understand the percentage improvement to expect.

You're right - this PR about efficiency of deep learning part only (call_variants). We need some time to collect some benchmark numbers and check what's is OK to share publicly. Probably, extra optimizations could be applied.

If we are to incorporate these changes, we would want to make sure this would perform well across various hardware. For example, we would want our Docker images to gracefully fallback to working code if it is on a machine with incompatible hardware. Do you expect OpenVINO to have this property (for example, if someone is running on an AMD machine).

Proposed changes are very smooth - by default, OpenVINO is not used. We can test other Intel hardware such iGPU and HDDL-R. For non-Intel HW users can always use default implementation.

We will also have to think about how these changes interact with any updates we would make to our use of TensorFlow. We're not directly planning anything in the near future, but it's good for us to consider.

I think it won't be an issue. It make sense to isolate some OpenVINO related logic to separate Python script to reduce conflicts during development.

Additionally, I just wanted to ask if you're interested in using GitHub Actions so you can perform initial tests for pull requests. In example, https://github.com/dkurt/deepvariant/blob/master_openvino/.github/workflows/main.yml does Docker build and then runs WGS on getting-started data with TensorFlow and OpenVINO and compares the outputs (logs).

@pichuan
Copy link
Collaborator

pichuan commented Oct 17, 2020

Hi @dkurt ,
thank you for sending this PR.

From the discussion between you and Andrew above, here is my current summary:

  1. You are planning to do more benchmarking on this change, and will let us know when you have some numbers on runtime improvement.
  2. You want to know whether we're interested in enabling GitHub Actions.

For 2., I am not familiar with GitHub Actions, but it seems interesting! I'll file an internal issue to look into this. This will likely fall under a lower priority, but I want to let you know that we'll track it and give you update if any.

If there are more details that you wish to contact directly, please feel free to email me at pichuan@google.com. We can also continue to communicate here to follow up.

@pichuan
Copy link
Collaborator

pichuan commented Oct 17, 2020

And one more follow up on my comment above for @dkurt ,
When we consider adding code, it's important for me to trade off between how much more complexity it adds (for any future developer on the team) and how much benefit it brings (runtime or accuracy improvement).
So, if you have more context on what's the impact of this PR, I'll also take into account and adjust my priority as needed.
Thanks again for sending this PR. Looking through it, I'm impressed with the amount of thoughtfulness and understanding of our codebase!

scripts/freeze_graph.py Outdated Show resolved Hide resolved
@pichuan
Copy link
Collaborator

pichuan commented Oct 19, 2020

Hi @dkurt , an update and a question for you:

I was curious about the runtime myself, so over the weekend, I tried building and running the version with OpenVINO to observe the behavior of call_variants.

Specifically, I did something similar to https://github.com/google/deepvariant/blob/r1.0/scripts/run_wgs_runtime_test_docker.sh - but I incorporate your changes, and build the docker with OpenVINO, and made sure I ran call_variants with OpenVINO.

Here is a strange thing I found in my log:

...
W1019 04:32:52.604453 140673783289600 deprecation.py:323] From /tmp/Bazel.runfiles_mdh0lz62/runfiles/com_google_deepvariant/deepvariant/data_providers.py:375: parallel_interleave >
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Opti>
2020-10-19 04:32:52.727391: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_thread>
W1019 04:32:52.748747 140673783289600 deprecation.py:323] From /tmp/Bazel.runfiles_mdh0lz62/runfiles/com_google_deepvariant/deepvariant/data_providers.py:381: map_and_batch (from >
Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the f>
I1019 09:14:02.799481 140673783289600 call_variants.py:520] Writing calls to /tmp/tmpll7hkhwu/call_variants_output.tfrecord.gz
I1019 09:14:02.804919 140673783289600 call_variants.py:538] Processed 1 examples in 1 batches [0.284 sec per 100]
I1019 09:14:04.482554 140673783289600 call_variants.py:538] Processed 15001 examples in 30 batches [0.011 sec per 100]
I1019 09:14:06.172387 140673783289600 call_variants.py:538] Processed 30001 examples in 59 batches [0.011 sec per 100]
I1019 09:14:07.867975 140673783289600 call_variants.py:538] Processed 45001 examples in 88 batches [0.011 sec per 100]
I1019 09:14:09.554191 140673783289600 call_variants.py:538] Processed 60001 examples in 118 batches [0.011 sec per 100]
I1019 09:14:11.247823 140673783289600 call_variants.py:538] Processed 75001 examples in 147 batches [0.011 sec per 100]
I1019 09:14:12.950735 140673783289600 call_variants.py:538] Processed 90001 examples in 176 batches [0.011 sec per 100]
...

The strange thing is:
There seems to be a long lag from the timestamp 04:32 to 09:14, it blocked for almost 5 hours?
But after that , the "sec per 100" log seems MUCH faster than a regular run without OpenVINO.

However, because of that strange long lag, the over time runtime seems worse with OpenVINO.

Any idea on what's happening here?

@dkurt
Copy link
Author

dkurt commented Oct 20, 2020

@pichuan, It might be an effect of current implementation - all the processing is done at iterator initialization and then __getitem__ returns predicted results without delay.

We will take a look at overall efficiency and check what can be improved here, thanks!

@dkurt
Copy link
Author

dkurt commented Nov 6, 2020

@pichuan, I'm very sorry for long delay! I tried to build DeepVariant so it can be portable to benchmark on remote target machine.

These are initial numbers for Intel DevCloud machines and quickstart-testdata:

Intel® Xeon® Gold 5120 make_examples call_variants postprocess_variants
TensorFlow MKL-DNN real 0m13.111s
user 0m8.496s
sys 0m4.869s
real 0m19.154s
user 0m23.705s
sys 0m8.424s
real 0m6.662s
user 0m7.946s
sys 0m4.841s
OpenVINO real 0m13.083s
user 0m8.216s
sys 0m4.510s
real 0m9.687s (x1.97)
user 0m18.741s (x1.26)
sys 0m6.289s (x1.33)
real 0m6.709s
user 0m8.165s
sys 0m4.676s

So, probably, my main question is how to interpret real, user and sys time? Maybe it will help us to understand how to improve the pipeline.


Here are my steps:

  1. Build locally (see deepvariant-build-test.md). After build,
./build_release_binaries.sh
tar -cvzf bazel-deepvariant.tar.gz bazel-deepvariant/*
tar -cvzf bazel-genfiles.tar.gz bazel-genfiles/*
  1. Go to another machine (i.e. Intel DevCloud) and clone repository. Unpack the binaries
git clone -b master_openvino https://github.com/dkurt/deepvariant --depth 1
cd deepvariant
tar -xf bazel-deepvariant.tar.gz
tar -xf bazel-genfiles.tar.gz
  1. Apply some patches to resolve local paths:
sed -i -E 's|/opt/deepvariant/bin|./bazel-genfiles/deepvariant|' scripts/run_deepvariant.py
sed -i -E 's|/opt/models/wgs/model.ckpt|model.ckpt|' scripts/run_deepvariant.py
ln -s -f $HOME/deepvariant/scripts/ bazel-deepvariant/scripts
  1. Download GNU parallel (if you have no root permissions)
wget http://launchpadlibrarian.net/300780258/parallel_20161222-1_all.deb 
dpkg -x parallel_20161222-1_all.deb parallel
export PATH=$HOME/parallel/usr/bin:$PATH
  1. Install TensorFlow MKL-DNN
WHEEL_NAME=tensorflow-2.0.0-cp36-cp36m-linux_x86_64.whl
wget "https://storage.googleapis.com/penporn-kokoro/tf-mkl-2.0-py36/${WHEEL_NAME}" -O "/tmp/${WHEEL_NAME}"
pip3 install --upgrade "/tmp/${WHEEL_NAME}"
  1. Run
export INPUT_DIR="${PWD}/quickstart-testdata"
export OUTPUT_DIR="${PWD}/quickstart-output"
mkdir -p $OUTPUT_DIR

export PYTHONPATH=./bazel-genfiles:$PYTHONPATH
python3 ./bazel-deepvariant/scripts/run_deepvariant.py \
  --model_type=WGS \
  --ref=${INPUT_DIR}/ucsc.hg19.chr20.unittest.fasta \
  --reads=${INPUT_DIR}/NA12878_S1.chr20.10_10p1mb.bam \
  --regions "chr20:10,000,000-10,010,000" \
  --output_vcf=${OUTPUT_DIR}/output.vcf.gz \
  --output_gvcf=${OUTPUT_DIR}/output.g.vcf.gz \
  --call_variants_extra_args="use_openvino=True" \
  --num_shards=1

@pichuan
Copy link
Collaborator

pichuan commented Nov 8, 2020

Hi,
I think Quickstart is way too short to capture enough information. Can you run this on the whole genome, or at least chr1 of the whole genome?
Thank you!

Optimize preprocessing
@dkurt
Copy link
Author

dkurt commented Nov 9, 2020

Hi!

Made some optimizations and tested on chr20 from https://github.com/google/deepvariant/blob/r1.0/docs/deepvariant-case-study.md. Can you please tell me if it's a representative launch?

Intel® Xeon® Gold 6258R make_examples call_variants postprocess_variants
TensorFlow MKL-DNN 3m7.950s 5m59.221s 1m5.724s
OpenVINO 3m6.239s 3m46.756s (x1.58) 1m7.640s

("real" times are in the table)

python3 ./bazel-deepvariant/scripts/run_deepvariant.py \
  --model_type=WGS \
  --ref=./reference/GRCh38_no_alt_analysis_set.fasta \
  --reads=./input/HG002.novaseq.pcr-free.35x.dedup.grch38_no_alt.chr20.bam \
  --output_vcf=${OUTPUT_DIR}/HG002.output.vcf.gz \
  --output_gvcf=${OUTPUT_DIR}/HG002.output.g.vcf.gz \
  --num_shards=16 \
  --regions "chr20" \
  --call_variants_extra_args="use_openvino=True"

@pichuan
Copy link
Collaborator

pichuan commented Nov 10, 2020

Hi @dkurt , chr20 is better than the Quick Start!
In v1.0, we consolidate all our runtime comparison into this doc: https://github.com/google/deepvariant/blob/r1.0/docs/metrics.md which shows the runtime on all chromosomes. I think the difference will be more noticeable if you compare to one of the bigger ones, for example WGS:

# WGS (should take about 7 hours)
curl -O https://raw.githubusercontent.com/google/deepvariant/r1.0/scripts/run_wgs_case_study_docker.sh
bash run_wgs_case_study_docker.sh

I will certainly plan to compare on that so I know how this works in my setting.

@dkurt I have a question for you: If I get a machine like this: https://github.com/google/deepvariant/blob/r1.0/docs/deepvariant-details.md#command-for-a-cpu-only-machine-on-google-cloud-platform , do you think I will be able to see the same improvement that you're seeing?

Test WES model
@dkurt
Copy link
Author

dkurt commented Nov 12, 2020

Unfortunately, I have no access to similar 64 cores configuration but I tried once again Xeon 6258R which has 28 cores on 8 chromosomes:

Intel® Xeon® Gold 6258R make_examples call_variants postprocess_variants
TensorFlow MKL-DNN 58m54.584s 103m44.907s 19m27.091s
OpenVINO 59m2.299s 68m25.176s (x1.51) 19m36.495s

I think more number of cores will show more speedup.

python3 ./bazel-deepvariant/scripts/run_deepvariant.py \
  --model_type=WGS \
  --ref=./input/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz \
  --reads=./input/HG002.novaseq.pcr-free.35x.dedup.grch38_no_alt.bam \
  --output_vcf=${OUTPUT_DIR}/HG002.output.vcf.gz \
  --output_gvcf=${OUTPUT_DIR}/HG002.output.g.vcf.gz \
  --num_shards=16 \
  --regions "chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8" \
  --call_variants_extra_args="use_openvino=True"

@pichuan, GCP team denied an access to 64 cores machine, unfortunately.

@dkurt
Copy link
Author

dkurt commented Nov 26, 2020

I have published an image for the latest state of PR at https://hub.docker.com/r/dkurtaev/deepvariant. May I ask to validate it?

@pichuan
Copy link
Collaborator

pichuan commented Nov 27, 2020

@dkurt Happy to try it out. Do I just pull it run with a regular command like:
https://github.com/google/deepvariant/blob/r1.0/scripts/run_wgs_case_study_docker.sh
?

Any more flags I need to add?

It seems like I'll probably need to add --call_variants_extra_args="use_openvino=True". Anything else?

@dkurt
Copy link
Author

dkurt commented Nov 27, 2020

@pichuan, you're right, just --call_variants_extra_args="use_openvino=True".

May I ask to try to run on just a single chromosome first? To check if OpenVINO works faster than TensorFlow. It not, is that possible to share lscpu and cat /proc/cpuinfo information?

@pichuan
Copy link
Collaborator

pichuan commented Nov 28, 2020

I tested with chr1 of the WGS script. See below for details.
With use_openvino=true, call_variants runs for ~15m on chr1. Without, it takes about ~21m.

See commands and machine details below:

Commands

Tested on the same machine:

All below were done with command like:

scripts/run_wgs_case_study_docker.sh 2>&1 | tee /tmp/openvino.log

with some code diffs below:

  1. Use your Docker image, use_openvino=true
    The code diff:
$ git diff
diff --git a/scripts/run_wgs_case_study_docker.sh b/scripts/run_wgs_case_study_docker.sh
index 3dc9712..78712d8 100755
--- a/scripts/run_wgs_case_study_docker.sh
+++ b/scripts/run_wgs_case_study_docker.sh
@@ -65,14 +65,14 @@ aria2c -c -x10 -s10 "http://storage.googleapis.com/deepvariant/case-study-testda
 aria2c -c -x10 -s10 "http://storage.googleapis.com/deepvariant/case-study-testdata/${REF}.fai" -d "${INPUT_DIR}"
 
 ## Pull the docker image.
-sudo docker pull google/deepvariant:"${BIN_VERSION}"
+sudo docker pull dkurtaev/deepvariant:latest
 
 echo "Run DeepVariant..."
 sudo docker run \
   -v "${INPUT_DIR}":"/input" \
   -v "${OUTPUT_DIR}:/output" \
-  google/deepvariant:"${BIN_VERSION}" \
-  /opt/deepvariant/bin/run_deepvariant \
+  dkurtaev/deepvariant:latest \
+  /opt/deepvariant/bin/run_deepvariant --call_variants_extra_args=use_openvino=true --make_examples_extra_args=regions=chr1 \
   --model_type=WGS \
   --ref="/input/${REF}.gz" \
   --reads="/input/${BAM}" \
@@ -100,6 +100,6 @@ pkrusche/hap.py /opt/hap.py/bin/hap.py \
   -f "${INPUT_DIR}/${TRUTH_BED}" \
   -r "${UNCOMPRESSED_REF}" \
   -o "${OUTPUT_DIR}/happy.output" \
-  --engine=vcfeval
+  --engine=vcfeval -l chr1
 ) 2>&1 | tee "${LOG_DIR}/happy.log"
 echo "Done."

Runtime:

$ grep '^real' /tmp/open
real    7m38.326s
real    15m12.564s
real    7m15.173s
  1. Use your Docker image, use_openvino=false
    The code diff:
$ git diff
diff --git a/scripts/run_wgs_case_study_docker.sh b/scripts/run_wgs_case_study_docker.sh
index 3dc9712..78712d8 100755
--- a/scripts/run_wgs_case_study_docker.sh
+++ b/scripts/run_wgs_case_study_docker.sh
@@ -65,14 +65,14 @@ aria2c -c -x10 -s10 "http://storage.googleapis.com/deepvariant/case-study-testda
 aria2c -c -x10 -s10 "http://storage.googleapis.com/deepvariant/case-study-testdata/${REF}.fai" -d "${INPUT_DIR}"
 
 ## Pull the docker image.
-sudo docker pull google/deepvariant:"${BIN_VERSION}"
+sudo docker pull dkurtaev/deepvariant:latest
 
 echo "Run DeepVariant..."
 sudo docker run \
   -v "${INPUT_DIR}":"/input" \
   -v "${OUTPUT_DIR}:/output" \
-  google/deepvariant:"${BIN_VERSION}" \
-  /opt/deepvariant/bin/run_deepvariant \
+  dkurtaev/deepvariant:latest \
+  /opt/deepvariant/bin/run_deepvariant --call_variants_extra_args=use_openvino=false --make_examples_extra_args=regions=chr1 \
   --model_type=WGS \
   --ref="/input/${REF}.gz" \
   --reads="/input/${BAM}" \
@@ -100,6 +100,6 @@ pkrusche/hap.py /opt/hap.py/bin/hap.py \
   -f "${INPUT_DIR}/${TRUTH_BED}" \
   -r "${UNCOMPRESSED_REF}" \
   -o "${OUTPUT_DIR}/happy.output" \
-  --engine=vcfeval
+  --engine=vcfeval -l chr1
 ) 2>&1 | tee "${LOG_DIR}/happy.log"
 echo "Done."

Runtime:

$ grep '^real' /tmp/open
real    7m20.986s
real    21m24.429s
real    6m32.705s
  1. Use v1.0.0 image.
    The code diff:
$ git diff
diff --git a/scripts/run_wgs_case_study_docker.sh b/scripts/run_wgs_case_study_docker.sh
index 3dc9712..88fb0c1 100755
--- a/scripts/run_wgs_case_study_docker.sh
+++ b/scripts/run_wgs_case_study_docker.sh
@@ -72,7 +72,7 @@ sudo docker run \
   -v "${INPUT_DIR}":"/input" \
   -v "${OUTPUT_DIR}:/output" \
   google/deepvariant:"${BIN_VERSION}" \
-  /opt/deepvariant/bin/run_deepvariant \
+  /opt/deepvariant/bin/run_deepvariant --make_examples_extra_args=regions=chr1 \
   --model_type=WGS \
   --ref="/input/${REF}.gz" \
   --reads="/input/${BAM}" \
@@ -100,6 +100,6 @@ pkrusche/hap.py /opt/hap.py/bin/hap.py \
   -f "${INPUT_DIR}/${TRUTH_BED}" \
   -r "${UNCOMPRESSED_REF}" \
   -o "${OUTPUT_DIR}/happy.output" \
-  --engine=vcfeval
+  --engine=vcfeval -l chr1
 ) 2>&1 | tee "${LOG_DIR}/happy.log"
 echo "Done."

Runtime

$ grep '^real' /tmp/openvino.log
real    7m26.887s
real    20m40.889s
real    6m25.257s

Machine details

I got the machine with this command:

gcloud compute instances create "${USER}-openvino-expt" \
  --scopes "compute-rw,storage-full,cloud-platform" \
  --image-family "ubuntu-1604-lts" \
  --image-project "ubuntu-os-cloud" \
  --machine-type "custom-64-131072" \
  --boot-disk-size "300" \
  --zone "us-west1-b" \
  --min-cpu-platform "Intel Skylake"

lscpu

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                64
On-line CPU(s) list:   0-63
Thread(s) per core:    2
Core(s) per socket:    32
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) CPU @ 2.00GHz
Stepping:              3
CPU MHz:               2000.178
BogoMIPS:              4000.35
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              39424K
NUMA node0 CPU(s):     0-63
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat md_clear arch_capabilities

/proc/cpuinfo has info for 64 of them. I'll just list the first one

$ cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Xeon(R) CPU @ 2.00GHz
stepping        : 3
microcode       : 0x1
cpu MHz         : 2000.178
cache size      : 39424 KB
physical id     : 0
siblings        : 64
core id         : 0
cpu cores       : 32
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat md_clear arch_capabilities
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa
bogomips        : 4000.35
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

@pichuan
Copy link
Collaborator

pichuan commented Nov 28, 2020

Following up on my previous comment,
I confirmed that hap.py results in happy.output.summary.csv are the same.

@dkurt One thing similar to what I observed before: call_variants with openvino seems to block at the beginning for quite a bit before it starts printing each of the logs:


***** Running the command:*****
time /opt/deepvariant/bin/call_variants --outfile "/tmp/tmp0gfwv278/call_variants_output.tfrecord.gz" --examples "/tmp/tmp0gfwv278/make_examples.tfrecord@64.gz" --checkpoint "/opt/models/wgs/model.ckpt" --use_openvino

I1128 03:33:00.717135 139674856871680 call_variants.py:338] Shape of input examples: [100, 221, 6]
2020-11-28 03:33:00.726560: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  AVX2 AVX512F FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-28 03:33:00.742278: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2000175000 Hz
2020-11-28 03:33:00.748254: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x419c1f0 executing computations on platform Host. Devices:
2020-11-28 03:33:00.748310: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
2020-11-28 03:33:00.752221: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
W1128 03:33:02.766410 139674856871680 deprecation.py:323] From /tmp/Bazel.runfiles_rud4ovxa/runfiles/com_google_deepvariant/deepvariant/data_providers.py:375: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
2020-11-28 03:33:02.961537: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
W1128 03:33:02.980482 139674856871680 deprecation.py:323] From /tmp/Bazel.runfiles_rud4ovxa/runfiles/com_google_deepvariant/deepvariant/data_providers.py:381: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.
I1128 03:46:54.531774 139674856871680 call_variants.py:434] Writing calls to /tmp/tmp0gfwv278/call_variants_output.tfrecord.gz
I1128 03:46:54.533843 139674856871680 call_variants.py:452] Processed 1 examples in 1 batches [0.123 sec per 100]
I1128 03:46:56.165715 139674856871680 call_variants.py:452] Processed 15001 examples in 30 batches [0.011 sec per 100]
I1128 03:46:57.810235 139674856871680 call_variants.py:452] Processed 30001 examples in 59 batches [0.011 sec per 100]
I1128 03:46:59.458546 139674856871680 call_variants.py:452] Processed 45001 examples in 88 batches [0.011 sec per 100]
I1128 03:47:01.112938 139674856871680 call_variants.py:452] Processed 60001 examples in 118 batches [0.011 sec per 100]
I1128 03:47:02.774386 139674856871680 call_variants.py:452] Processed 75001 examples in 147 batches [0.011 sec per 100]
I1128 03:47:04.426402 139674856871680 call_variants.py:452] Processed 90001 examples in 176 batches [0.011 sec per 100]
I1128 03:47:06.086514 139674856871680 call_variants.py:452] Processed 105001 examples in 206 batches [0.011 sec per 100]
I1128 03:47:07.738636 139674856871680 call_variants.py:452] Processed 120001 examples in 235 batches [0.011 sec per 100]
I1128 03:47:09.394680 139674856871680 call_variants.py:452] Processed 135001 examples in 264 batches [0.011 sec per 100]
I1128 03:47:11.054500 139674856871680 call_variants.py:452] Processed 150001 examples in 293 batches [0.011 sec per 100]
I1128 03:47:12.715886 139674856871680 call_variants.py:452] Processed 165001 examples in 323 batches [0.011 sec per 100]
I1128 03:47:14.370283 139674856871680 call_variants.py:452] Processed 180001 examples in 352 batches [0.011 sec per 100]
I1128 03:47:16.030028 139674856871680 call_variants.py:452] Processed 195001 examples in 381 batches [0.011 sec per 100]
I1128 03:47:17.690865 139674856871680 call_variants.py:452] Processed 210001 examples in 411 batches [0.011 sec per 100]
I1128 03:47:19.339626 139674856871680 call_variants.py:452] Processed 225001 examples in 440 batches [0.011 sec per 100]
I1128 03:47:20.994633 139674856871680 call_variants.py:452] Processed 240001 examples in 469 batches [0.011 sec per 100]
I1128 03:47:22.652035 139674856871680 call_variants.py:452] Processed 255001 examples in 499 batches [0.011 sec per 100]
I1128 03:47:24.307619 139674856871680 call_variants.py:452] Processed 270001 examples in 528 batches [0.011 sec per 100]
I1128 03:47:25.964561 139674856871680 call_variants.py:452] Processed 285001 examples in 557 batches [0.011 sec per 100]
I1128 03:47:27.619805 139674856871680 call_variants.py:452] Processed 300001 examples in 586 batches [0.011 sec per 100]
I1128 03:47:29.276331 139674856871680 call_variants.py:452] Processed 315001 examples in 616 batches [0.011 sec per 100]
I1128 03:47:30.922740 139674856871680 call_variants.py:452] Processed 330001 examples in 645 batches [0.011 sec per 100]
I1128 03:47:32.582401 139674856871680 call_variants.py:452] Processed 345001 examples in 674 batches [0.011 sec per 100]
I1128 03:47:34.248395 139674856871680 call_variants.py:452] Processed 360001 examples in 704 batches [0.011 sec per 100]
I1128 03:47:35.905924 139674856871680 call_variants.py:452] Processed 375001 examples in 733 batches [0.011 sec per 100]
I1128 03:47:37.563962 139674856871680 call_variants.py:452] Processed 390001 examples in 762 batches [0.011 sec per 100]
I1128 03:47:39.216807 139674856871680 call_variants.py:452] Processed 405001 examples in 792 batches [0.011 sec per 100]
I1128 03:47:40.874265 139674856871680 call_variants.py:452] Processed 420001 examples in 821 batches [0.011 sec per 100]
I1128 03:47:42.549129 139674856871680 call_variants.py:452] Processed 435001 examples in 850 batches [0.011 sec per 100]
I1128 03:47:44.205866 139674856871680 call_variants.py:452] Processed 450001 examples in 879 batches [0.011 sec per 100]
I1128 03:47:45.870136 139674856871680 call_variants.py:452] Processed 465001 examples in 909 batches [0.011 sec per 100]
I1128 03:47:47.526660 139674856871680 call_variants.py:452] Processed 480001 examples in 938 batches [0.011 sec per 100]
I1128 03:47:49.185387 139674856871680 call_variants.py:452] Processed 495001 examples in 967 batches [0.011 sec per 100]
I1128 03:47:50.852418 139674856871680 call_variants.py:452] Processed 510001 examples in 997 batches [0.011 sec per 100]
I1128 03:47:52.511156 139674856871680 call_variants.py:452] Processed 525001 examples in 1026 batches [0.011 sec per 100]
I1128 03:47:54.166230 139674856871680 call_variants.py:452] Processed 540001 examples in 1055 batches [0.011 sec per 100]
I1128 03:47:55.828215 139674856871680 call_variants.py:452] Processed 555001 examples in 1084 batches [0.011 sec per 100]
I1128 03:47:57.485885 139674856871680 call_variants.py:452] Processed 570001 examples in 1114 batches [0.011 sec per 100]
I1128 03:47:59.149574 139674856871680 call_variants.py:452] Processed 585001 examples in 1143 batches [0.011 sec per 100]
I1128 03:48:00.813269 139674856871680 call_variants.py:452] Processed 600001 examples in 1172 batches [0.011 sec per 100]
I1128 03:48:02.468808 139674856871680 call_variants.py:452] Processed 615001 examples in 1202 batches [0.011 sec per 100]
I1128 03:48:04.122274 139674856871680 call_variants.py:452] Processed 630001 examples in 1231 batches [0.011 sec per 100]
I1128 03:48:05.762554 139674856871680 call_variants.py:452] Processed 645001 examples in 1260 batches [0.011 sec per 100]
I1128 03:48:07.409487 139674856871680 call_variants.py:452] Processed 660001 examples in 1290 batches [0.011 sec per 100]
I1128 03:48:08.445094 139674856871680 call_variants.py:455] Processed 669335 examples in 1308 batches [0.011 sec per 100]
I1128 03:48:08.445318 139674856871680 call_variants.py:458] Done calling variants from a total of 669335 examples.

real    15m12.564s
user    763m44.970s
sys     58m35.140s

You can see these lines:

W1128 03:33:02.980482 139674856871680 deprecation.py:323] From /tmp/Bazel.runfiles_rud4ovxa/runfiles/com_google_deepvariant/deepvariant/data_providers.py:381: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map(map_func, num_parallel_calls)` followed by `tf.data.Dataset.batch(batch_size, drop_remainder)`. Static tf.data optimizations will take care of using the fused implementation.
I1128 03:46:54.531774 139674856871680 call_variants.py:434] Writing calls to /tmp/tmp0gfwv278/call_variants_output.tfrecord.gz

Before the 03:46:54.531774 timestamp, the last timestamp was 03:33:02.980482. I don't know if this expected or not.

I'm curious to run this on the whole genome and see whether the speedup will be more noticeable.

@dkurt
Copy link
Author

dkurt commented Nov 29, 2020

@pichuan, thank you for very detailed experiment! Looking forward to see whole genome results.

@dkurt One thing similar to what I observed before: call_variants with openvino seems to block at the beginning for quite a bit before it starts printing each of the logs:

Yes, this is expected due all the processing done at first iteration. It's not critical for performance but I can change it so it won't confuse users.

@dkurt
Copy link
Author

dkurt commented Nov 29, 2020

@dkurt One thing similar to what I observed before: call_variants with openvino seems to block at the beginning for quite a bit before it starts printing each of the logs:

Yes, this is expected due all the processing done at first iteration. It's not critical for performance but I can change it so it won't confuse users.

Added a commit which let's to track call_variants progress with OpenVINO backend. Updated docker image correspondingly.

@pichuan
Copy link
Collaborator

pichuan commented Nov 30, 2020

@dkurt With all chromosomes of WGS, the call_variants runtime change is 266m46.183s --> 198m46.734s.
So the runtime reduction is about 25% as well.

Thanks for the latest change for tracking progress. I'll try it out and let you know if there's any issues.

In terms of getting the code in, I'll see if I can get the code through internal review before the next release (r1.1). If not, it'll be in the the one after. If this gets in in time the next release (r1.1), I still don't plan to build our release Docker image with this on by default yet, because I'm not exactly sure what's the effect on all use cases.

@dkurt For future releases, do you think it's safe to turn on OpenVINO by default? What do you expect to happen on non-Indel machines?
Thanks!!

@dkurt
Copy link
Author

dkurt commented Nov 30, 2020

@pichuan, thank you! It should be safe to build Docker image with OpenVINO backend and just keep it disabled by default, so users can turn on it only manually by --call_variants_extra_args="use_openvino=True".

OpenVINO import is surrounded by try-catch and I guess that it won't crash on non-Intel CPU:

try:
  from openvino.inference_engine import IECore, StatusCode
except:
  pass

Anyway, I'll try to run on some public CI to confirm.

@pichuan
Copy link
Collaborator

pichuan commented Nov 30, 2020

@dkurt Thanks! One small downside is that the Docker image will have to include the old ckpt format as well as the new format. One question for you - do you know if the converted model format can be read and used with regular Estimator as well?

@dkurt
Copy link
Author

dkurt commented Nov 30, 2020

@pichuan, I'll take a look. Can you please refer what is a new checkpoint format? I've tried only checkpoints that were since r0.9.

@pichuan
Copy link
Collaborator

pichuan commented Nov 30, 2020

@dkurt Sorry for the confusion. I meant:

$ sudo docker run deepvariant:latest ls -lh /opt/models/wgs/
total 449M
-rw-r--r-- 1 root root  84M Nov 30 03:49 model.bin
-rw-r--r-- 1 root root 333M Nov 10 17:09 model.ckpt.data-00000-of-00001
-rw-r--r-- 1 root root  19K Nov 10 17:09 model.ckpt.index
-rw-r--r-- 1 root root  33M Nov 10 17:09 model.ckpt.meta
-rw-r--r-- 1 root root  94K Nov 30 03:49 model.mapping
-rw-r--r-- 1 root root 276K Nov 30 03:49 model.xml

These are the extra files after enabling OpenVINO:

-rw-r--r-- 1 root root  84M Nov 30 03:49 model.bin
-rw-r--r-- 1 root root  94K Nov 30 03:49 model.mapping
-rw-r--r-- 1 root root 276K Nov 30 03:49 model.xml

Our regular Estimator code paths currently uses these files (these are what I meant by "old ckpt format"):

-rw-r--r-- 1 root root 333M Nov 10 17:09 model.ckpt.data-00000-of-00001
-rw-r--r-- 1 root root  19K Nov 10 17:09 model.ckpt.index
-rw-r--r-- 1 root root  33M Nov 10 17:09 model.ckpt.meta

If both code paths can use the new (and smaller!) files, that will be very nice.

I also noticed there is an intermediate *.pb format. If it possible to load that instead, that might be nice too. (assuming it's smaller too. I actually haven't checked.) I looked up yesterday but haven't found out how yet. If you have a pointer, please let me know. Thank you for all the work!
(And even if not, the new files are not too big. I'll experiment with building with openvino on.)

@dkurt
Copy link
Author

dkurt commented Nov 30, 2020

@pichuan, I got it, good point! .pb file is intermediate and is removed after OpenVINO conversion:

rm model.pb;

However there is a way to generate .xml + .bin in runtime but not to keep in in the image. Also I can reduce a size of OpenVINO installation removing some components.

@pichuan
Copy link
Collaborator

pichuan commented Nov 30, 2020

@dkurt Keeping them in the image is fine! I'm actually more curious about whether I can get rid of that big model.ckpt.data-00000-of-00001 file. :)

@dkurt One more question for you -- do you see any downside of enabling --use_openvino as default in our CPU run? Once this is built into our CPU docker image, it'll be nice to have it as default. I want to know if it might crash on non-Intel hardware or not. (I can also test it myself, but haven't got around to do that yet).

@dkurt
Copy link
Author

dkurt commented Nov 30, 2020

@dkurt Keeping them in the image is fine! I'm actually more curious about whether I can get rid of that big model.ckpt.data-00000-of-00001 file. :)

@pichuan, that's good question. Checkpoint is used to restore model training and that's why it takes a lot of size. Probably, internally it contains not just weights but also gradients and intermediate outputs for layer. .pb model can be used for inference but using TensorFlow 1.x API, not sure about Estimator, unfortunately.

I moved OpenVINO conversion into runtime anyway - that seems now simpler and doesn't oversize an image.

@dkurt One more question for you -- do you see any downside of enabling --use_openvino as default in our CPU run? Once this is built into our CPU docker image, it'll be nice to have it as default. I want to know if it might crash on non-Intel hardware or not. (I can also test it myself, but haven't got around to do that yet).

Just tried the image on n2d-standard-8 from GCP and it works fine through OpenVINO backend (AMD EPYC 7B12). So seems like we can freely turn OpenVINO by default for CPU only environment. Shall I do it in this PR or you can switch it separately?

@pichuan
Copy link
Collaborator

pichuan commented Nov 30, 2020

Just tried the image on n2d-standard-8 from GCP and it works fine through OpenVINO backend (AMD EPYC 7B12). So seems like we can freely turn OpenVINO by default for CPU only environment. Shall I do it in this PR or you can switch it separately?

Thanks for testing! What do you think is the best way to change the default for GPU? I was thinking bout this, but not sure:
For building, we'll want to keep DV_OPENVINO_BUILD=0 in Dockerfile, right? Because for building GPU, we don't want DV_OPENVINO_BUILD to be on by default. This one is easy to change - I can just change our release process for CPU image building to always add --build-arg DV_OPENVINO_BUILD=1. So we don't need to change the default in Dockerfile.

I wonder what's a good way to change the default of the use_openvino flag, though.
Because of GPU use case, we don't really want to switch use_openvino to True in call_variants.py either.
I was thinking about optionally add --use_openvino flag in Dockerfile if building for GPU, but haven't tried whether that'll work or not. (Ideally I want users to still be able to pass in --use_openvino=false if they want to turn it off.)

If you have a proposed change that works well for CPU as a default, but doesn't hurt the GPU use case, feel free to propose a commit here. Internally I'm about to get some of these code through for review first, and I can add on any incremental changes for review internally later.

Thanks!

@pichuan
Copy link
Collaborator

pichuan commented Nov 30, 2020

Quick update and FYI for you @dkurt
I ran with the internal latest code (which we switched all metrics to be on HG003 BAMs). Here are the improvements with --use_openvino=true.

  • wgs: 233m14.191s --> 204m35.065s
  • wes: 1m41.381s --> 1m31.513s
  • pacbio: 193m20.407s --> 169m45.878s
  • hybrid_pacbio_illumina: 241m7.426s --> 189m40.148s

This was after your "Process OpenVINO in thread (#8)" change yesterday.

@dkurt
Copy link
Author

dkurt commented Nov 30, 2020

@pichuan, I'll think and propose a flexible solution for OpenVINO acceleration.

Can you please add some details about the latest numbers? Is that regression or improvement? Because last time we saw 266m46.183s --> 198m46.734s for WGS (call_variants) now it's 233m14.191s --> 204m35.065s.

@pichuan
Copy link
Collaborator

pichuan commented Nov 30, 2020

@dkurt It does seem like the % of runtime reduction on WGS has been worse. 3 things have changed:

  1. Previous number was evaluated on HG002; this time on HG003 (but the BAM is similar setting).
  2. The second thing that has changed is your change to improve the logging.
  3. Third thing is - last time my two numbers were on the same GCE instance. This time, the baseline and the experimental numbers were from two different GCE instances (even though I did use the same command to get the same type). Empirically, on different GCE instances, even with the same code, I've been observing sometimes up ~10% runtime difference for call_variants.

I suspect 3 is the main reason here. This can be verified if I can run the same thing with and without the use_openvino flag on the exactly same machine (sequentially). But I likely don't have time to do that again now...

@dkurt
Copy link
Author

dkurt commented Nov 30, 2020

@pichuan, I got it, thanks! Indeed the experiments are different. I also benchmarked changes without and with logging improvements so can confirm that there were no efficiency difference so we don't need additional experiments. Thanks for your time and warm welcome!

I agree with you that Dockerfile now is in right configuration - build only which is manually enabled. Regarding default value of use_openvino I propose a condition openvino_available and not cuda_available. Just pushed corresponding commit.

# Unit tests use this branch.
predict_hooks = []
if FLAGS.use_openvino:
ie_estimator = OpenVINOEstimator(checkpoint_path, num_channels_in_checkpoint_model,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually doesn't seem to make sense to me. Where is num_channels_in_checkpoint_model from?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is defined above, initialized from checkpoint by a common part.

num_channels_in_checkpoint_model = shape_map_for_layers[first_layer][2]

@pichuan
Copy link
Collaborator

pichuan commented Dec 1, 2020

Hi @dkurt , to give you an update on our discussion in the team, here is my current decision:

  1. I'm getting your first 5 commits (up to 3cfa6c5 ) reviewed internally. We'll plan to get those 5 commits into our codebase for the upcoming release.
  2. @gunjanbaid has a question about EMA. She'll follow up in this discussion.
  3. Currently, even though the hap.py results are the same, we did notice the VCFs are not exactly the same. I understand that this is likely expected, but to be extra careful, I'm going to still keep use_openvino by default as False.
  4. We'll plan to build our Docker images with --build-arg DV_OPENVINO_BUILD=1 on, and we will plan to add to our documentation so users will be aware that they can try out adding --call_variants_extra_args="use_openvino=true" to speed up their CPU runs.

I really want to get this to be the default :) But release is happening soon and I don't want to break the default behavior, so I'm being extra careful here. Let me know if you have more thoughts about the decisions above.

(Also adding @akolesnikov @AndrewCarroll @gunjanbaid @danielecook FYI about the current status above)

@gunjanbaid
Copy link
Contributor

@dkurt we noticed some slight differences in the output VCF with and without OpenVINO. The quality scores are different in the example below (46 vs. 46.1) for an internal dataset. These quality scores are derived from the output probabilities. Are slight differences in output probabilities expected with and without OpenVINO? In the past, I've noticed such slight differences for the same hardware when EMA is not loaded in correctly at inference time. I wanted to bring this to your attention in case EMA is the reason for these differences.

-chr1   16895912        .       G       A       46      PASS    .       GT:GQ:DP:AD:VAF:PL      1/1:24:61:10,51:0.836066:46,23,0
+chr1   16895912        .       G       A       46.1    PASS    .       GT:GQ:DP:AD:VAF:PL      1/1:24:61:10,51:0.836066:46,23,0

@dkurt
Copy link
Author

dkurt commented Dec 1, 2020

I'm getting your first 5 commits (up to 3cfa6c5 ) reviewed internally. We'll plan to get those 5 commits into our codebase for the upcoming release.

@pichuan, May I ask to additionally take a look at Dockerfile. There is the following line I feel unsure:

sed -i -E 's/from deepvariant import tf_utils//' /opt/deepvariant/deepvariant/modeling.py; \

It would be safer to replace with something like this:

diff --git a/Dockerfile b/Dockerfile
index 0432fd8..a57364d 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -67,7 +67,7 @@ RUN chmod +r /opt/models/hybrid_pacbio_illumina/model.ckpt*
 # Convert model to OpenVINO format
 RUN if [ "${DV_OPENVINO_BUILD}" = "1" ]; then \
       python3 -m pip install networkx defusedxml test-generator==0.1.1; \
-      sed -i -E 's/from deepvariant import tf_utils//' /opt/deepvariant/deepvariant/modeling.py; \
+      sed -i -E 's/from deepvariant import tf_utils/#from deepvariant import tf_utils/' /opt/deepvariant/deepvariant/modeling.py; \
       export PYTHONPATH=/opt/deepvariant:${PYTHONPATH}; \
       for model in wgs wes pacbio hybrid_pacbio_illumina; do \
         cd /opt/models/${model}; \
@@ -79,6 +79,7 @@ RUN if [ "${DV_OPENVINO_BUILD}" = "1" ]; then \
             --scale 128; \
         rm model.pb; \
       done \
+      sed -i -E 's/#from deepvariant import tf_utils/from deepvariant import tf_utils/' /opt/deepvariant/deepvariant/modeling.py; \
     fi

@dkurt
Copy link
Author

dkurt commented Dec 1, 2020

@gunjanbaid, I'll take a look closer. In our experiments maximal absolute difference between TensorFlow and OpenVINO about 10e-6. Maybe this is some corner case where such difference may lead to misclassification.

@pichuan
Copy link
Collaborator

pichuan commented Dec 1, 2020

@dkurt Thanks. Another update for you - I am now trying to incorporate your on-the-fly conversion code:
f0ed018

I think it'll be cleaner, and also removes the need for #363 (comment). I do have a question for the code. I'll comment inline.

@dkurt
Copy link
Author

dkurt commented Dec 1, 2020

I think it'll be cleaner, and also removes the need for #363 (comment). I do have a question for the code. I'll comment inline.

Yeah, I also think that it makes the procedure more transparent.

Would the result be wrong for those models, if the placeholder here has a different height (100) than the actual image?

Good point! Just added a commit which uses input_fn to get input dimensions and number of channels. So OpenVINO model will work on dataset's resolution.

@@ -49,10 +49,11 @@ def is_available():
return openvino_available


def __init__(self, checkpoint_path, num_channels, *, input_fn, model):
freeze_graph(model, checkpoint_path, num_channels)
def __init__(self, checkpoint_path, *, input_fn, model):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @dkurt . FYI this commit has been reviewed internally and added now as well. It'll show up in our next release.

One question for you about this line: what is the "*" arg passed in here? I removed it and it worked fine. But wanted to check unless there's a reason it has to be there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pichuan, thank you so much!

This is an arguments delimiter which makes input_fn and model are required to be named parameters. So

OpenVINOEstimator(checkpoint_path, input_fn=input_fn, model=model)

@pichuan
Copy link
Collaborator

pichuan commented Dec 3, 2020

@dkurt A quick update:

I just noticed that the outputs of the multiple runs with OpenVINO are not deterministic. (I confirmed by running the same command 10 times on a WES BAM file with use_openvino on)
I actually wonder if there's something weird with the threading code that you added to make the logging more smooth.

(I have confirmed that without OpenVINO, the results are deterministic. I ran another 10 to make sure all VCFs are exactly the same - which is what I expected).

I will go ahead and see if I can make OpenVINO runs deterministic by removing the threading code. If you have some ideas why (or why I shouldn't expect it to be deterministic), please let me know.

@pichuan
Copy link
Collaborator

pichuan commented Dec 3, 2020

@dkurt A quick update:

I just noticed that the outputs of the multiple runs with OpenVINO are not deterministic. (I confirmed by running the same command 10 times on a WES BAM file with use_openvino on)
I actually wonder if there's something weird with the threading code that you added to make the logging more smooth.

(I have confirmed that without OpenVINO, the results are deterministic. I ran another 10 to make sure all VCFs are exactly the same - which is what I expected).

I will go ahead and see if I can make OpenVINO runs deterministic by removing the threading code. If you have some ideas why (or why I shouldn't expect it to be deterministic), please let me know.

I confirmed that by reverting the changes in 3cfa6c5 , my new 10 runs with OpenVINO are now producing the exactly same VCFs! 🎉
(Still different from without openvino, but that is expected.)

@dkurt For this upcoming release, I will just print out a message to warn the users that all the logging information will come out towards the end. We can look into improving the logging in the future.

@dkurt
Copy link
Author

dkurt commented Dec 3, 2020

@pichuan, good catch, thanks a lot! Sorry for this bug. Yes, let's work on it separately.

@dkurt
Copy link
Author

dkurt commented Dec 4, 2020

Fixed non-deterministic behavior by last patch.

akolesnikov pushed a commit that referenced this pull request Dec 7, 2020
@pichuan
Copy link
Collaborator

pichuan commented Dec 7, 2020

@dkurt FYI , the OpenVINO changes are in https://github.com/google/deepvariant/releases/tag/v1.1.0
Thank you!

@dkurt
Copy link
Author

dkurt commented Dec 8, 2020

@pichuan, @AndrewCarroll, @gunjanbaid, thank you so much for very productive collaboration!

@pichuan, I've moved the changes related to threading into #393.

@dkurt dkurt closed this Dec 8, 2020
@dkurt dkurt deleted the master_openvino branch December 14, 2020 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants