-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenVINO integration #363
OpenVINO integration #363
Conversation
Hi! We would like to propose optional OpenVINO backend for Also, this PR has Actions pipeline. Feel free to enable by https://github.com/google/deepvariant/actions to run it /cc @pichuan |
Hi @dkurt First, thank you for your interest in DeepVariant, and for the substantial work that you have put into these modifications. I have some questions for you, and I suspect @pichuan may add some questions and comments as well.
I suspect that we will try to run with these changes and see how the performance changes. If you are able to answer some of these questions, it could be helpful for us to understand how to prioritize their assessment. Thank you again for the work you have put into this. It's quite impressive, and we appreciate your effort. Andrew |
@AndrewCarroll, many thanks for such quick response!
I don't see any issues with that. The only thing which is important is to add the changes as a single commit separately from other changes so it can be properly tracked in git history.
You're right - this PR about efficiency of deep learning part only (call_variants). We need some time to collect some benchmark numbers and check what's is OK to share publicly. Probably, extra optimizations could be applied.
Proposed changes are very smooth - by default, OpenVINO is not used. We can test other Intel hardware such iGPU and HDDL-R. For non-Intel HW users can always use default implementation.
I think it won't be an issue. It make sense to isolate some OpenVINO related logic to separate Python script to reduce conflicts during development. Additionally, I just wanted to ask if you're interested in using GitHub Actions so you can perform initial tests for pull requests. In example, https://github.com/dkurt/deepvariant/blob/master_openvino/.github/workflows/main.yml does Docker build and then runs WGS on getting-started data with TensorFlow and OpenVINO and compares the outputs (logs). |
Hi @dkurt , From the discussion between you and Andrew above, here is my current summary:
For 2., I am not familiar with GitHub Actions, but it seems interesting! I'll file an internal issue to look into this. This will likely fall under a lower priority, but I want to let you know that we'll track it and give you update if any. If there are more details that you wish to contact directly, please feel free to email me at pichuan@google.com. We can also continue to communicate here to follow up. |
And one more follow up on my comment above for @dkurt , |
Hi @dkurt , an update and a question for you: I was curious about the runtime myself, so over the weekend, I tried building and running the version with OpenVINO to observe the behavior of call_variants. Specifically, I did something similar to https://github.com/google/deepvariant/blob/r1.0/scripts/run_wgs_runtime_test_docker.sh - but I incorporate your changes, and build the docker with OpenVINO, and made sure I ran call_variants with OpenVINO. Here is a strange thing I found in my log:
The strange thing is: However, because of that strange long lag, the over time runtime seems worse with OpenVINO. Any idea on what's happening here? |
@pichuan, It might be an effect of current implementation - all the processing is done at iterator initialization and then We will take a look at overall efficiency and check what can be improved here, thanks! |
@pichuan, I'm very sorry for long delay! I tried to build DeepVariant so it can be portable to benchmark on remote target machine. These are initial numbers for Intel DevCloud machines and quickstart-testdata:
So, probably, my main question is how to interpret real, user and sys time? Maybe it will help us to understand how to improve the pipeline. Here are my steps:
./build_release_binaries.sh
tar -cvzf bazel-deepvariant.tar.gz bazel-deepvariant/*
tar -cvzf bazel-genfiles.tar.gz bazel-genfiles/*
git clone -b master_openvino https://github.com/dkurt/deepvariant --depth 1
cd deepvariant
tar -xf bazel-deepvariant.tar.gz
tar -xf bazel-genfiles.tar.gz
sed -i -E 's|/opt/deepvariant/bin|./bazel-genfiles/deepvariant|' scripts/run_deepvariant.py
sed -i -E 's|/opt/models/wgs/model.ckpt|model.ckpt|' scripts/run_deepvariant.py
ln -s -f $HOME/deepvariant/scripts/ bazel-deepvariant/scripts
wget http://launchpadlibrarian.net/300780258/parallel_20161222-1_all.deb
dpkg -x parallel_20161222-1_all.deb parallel
export PATH=$HOME/parallel/usr/bin:$PATH
WHEEL_NAME=tensorflow-2.0.0-cp36-cp36m-linux_x86_64.whl
wget "https://storage.googleapis.com/penporn-kokoro/tf-mkl-2.0-py36/${WHEEL_NAME}" -O "/tmp/${WHEEL_NAME}"
pip3 install --upgrade "/tmp/${WHEEL_NAME}"
export INPUT_DIR="${PWD}/quickstart-testdata"
export OUTPUT_DIR="${PWD}/quickstart-output"
mkdir -p $OUTPUT_DIR
export PYTHONPATH=./bazel-genfiles:$PYTHONPATH
python3 ./bazel-deepvariant/scripts/run_deepvariant.py \
--model_type=WGS \
--ref=${INPUT_DIR}/ucsc.hg19.chr20.unittest.fasta \
--reads=${INPUT_DIR}/NA12878_S1.chr20.10_10p1mb.bam \
--regions "chr20:10,000,000-10,010,000" \
--output_vcf=${OUTPUT_DIR}/output.vcf.gz \
--output_gvcf=${OUTPUT_DIR}/output.g.vcf.gz \
--call_variants_extra_args="use_openvino=True" \
--num_shards=1 |
Hi, |
Optimize preprocessing
Hi! Made some optimizations and tested on chr20 from https://github.com/google/deepvariant/blob/r1.0/docs/deepvariant-case-study.md. Can you please tell me if it's a representative launch?
("real" times are in the table) python3 ./bazel-deepvariant/scripts/run_deepvariant.py \
--model_type=WGS \
--ref=./reference/GRCh38_no_alt_analysis_set.fasta \
--reads=./input/HG002.novaseq.pcr-free.35x.dedup.grch38_no_alt.chr20.bam \
--output_vcf=${OUTPUT_DIR}/HG002.output.vcf.gz \
--output_gvcf=${OUTPUT_DIR}/HG002.output.g.vcf.gz \
--num_shards=16 \
--regions "chr20" \
--call_variants_extra_args="use_openvino=True" |
Hi @dkurt , chr20 is better than the Quick Start!
I will certainly plan to compare on that so I know how this works in my setting. @dkurt I have a question for you: If I get a machine like this: https://github.com/google/deepvariant/blob/r1.0/docs/deepvariant-details.md#command-for-a-cpu-only-machine-on-google-cloud-platform , do you think I will be able to see the same improvement that you're seeing? |
Test WES model
Unfortunately, I have no access to similar 64 cores configuration but I tried once again Xeon 6258R which has 28 cores on 8 chromosomes:
I think more number of cores will show more speedup. python3 ./bazel-deepvariant/scripts/run_deepvariant.py \
--model_type=WGS \
--ref=./input/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz \
--reads=./input/HG002.novaseq.pcr-free.35x.dedup.grch38_no_alt.bam \
--output_vcf=${OUTPUT_DIR}/HG002.output.vcf.gz \
--output_gvcf=${OUTPUT_DIR}/HG002.output.g.vcf.gz \
--num_shards=16 \
--regions "chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8" \
--call_variants_extra_args="use_openvino=True" @pichuan, GCP team denied an access to 64 cores machine, unfortunately. |
I have published an image for the latest state of PR at https://hub.docker.com/r/dkurtaev/deepvariant. May I ask to validate it? |
@dkurt Happy to try it out. Do I just pull it run with a regular command like: Any more flags I need to add? It seems like I'll probably need to add |
@pichuan, you're right, just May I ask to try to run on just a single chromosome first? To check if OpenVINO works faster than TensorFlow. It not, is that possible to share |
I tested with chr1 of the WGS script. See below for details. See commands and machine details below:CommandsTested on the same machine: All below were done with command like:
with some code diffs below:
Runtime:
Runtime:
Runtime
Machine detailsI got the machine with this command:
|
Following up on my previous comment, @dkurt One thing similar to what I observed before: call_variants with openvino seems to block at the beginning for quite a bit before it starts printing each of the logs:
You can see these lines:
Before the I'm curious to run this on the whole genome and see whether the speedup will be more noticeable. |
@pichuan, thank you for very detailed experiment! Looking forward to see whole genome results.
Yes, this is expected due all the processing done at first iteration. It's not critical for performance but I can change it so it won't confuse users. |
Added a commit which let's to track |
@dkurt With all chromosomes of WGS, the call_variants runtime change is 266m46.183s --> 198m46.734s. Thanks for the latest change for tracking progress. I'll try it out and let you know if there's any issues. In terms of getting the code in, I'll see if I can get the code through internal review before the next release (r1.1). If not, it'll be in the the one after. If this gets in in time the next release (r1.1), I still don't plan to build our release Docker image with this on by default yet, because I'm not exactly sure what's the effect on all use cases. @dkurt For future releases, do you think it's safe to turn on OpenVINO by default? What do you expect to happen on non-Indel machines? |
@pichuan, thank you! It should be safe to build Docker image with OpenVINO backend and just keep it disabled by default, so users can turn on it only manually by OpenVINO import is surrounded by try-catch and I guess that it won't crash on non-Intel CPU: try:
from openvino.inference_engine import IECore, StatusCode
except:
pass Anyway, I'll try to run on some public CI to confirm. |
@dkurt Thanks! One small downside is that the Docker image will have to include the old ckpt format as well as the new format. One question for you - do you know if the converted model format can be read and used with regular Estimator as well? |
@pichuan, I'll take a look. Can you please refer what is a new checkpoint format? I've tried only checkpoints that were since r0.9. |
@dkurt Sorry for the confusion. I meant:
These are the extra files after enabling OpenVINO:
Our regular Estimator code paths currently uses these files (these are what I meant by "old ckpt format"):
If both code paths can use the new (and smaller!) files, that will be very nice. I also noticed there is an intermediate |
@pichuan, I got it, good point!
However there is a way to generate |
@dkurt Keeping them in the image is fine! I'm actually more curious about whether I can get rid of that big model.ckpt.data-00000-of-00001 file. :) @dkurt One more question for you -- do you see any downside of enabling --use_openvino as default in our CPU run? Once this is built into our CPU docker image, it'll be nice to have it as default. I want to know if it might crash on non-Intel hardware or not. (I can also test it myself, but haven't got around to do that yet). |
@pichuan, that's good question. Checkpoint is used to restore model training and that's why it takes a lot of size. Probably, internally it contains not just weights but also gradients and intermediate outputs for layer. I moved OpenVINO conversion into runtime anyway - that seems now simpler and doesn't oversize an image.
Just tried the image on n2d-standard-8 from GCP and it works fine through OpenVINO backend (AMD EPYC 7B12). So seems like we can freely turn OpenVINO by default for CPU only environment. Shall I do it in this PR or you can switch it separately? |
Thanks for testing! What do you think is the best way to change the default for GPU? I was thinking bout this, but not sure: I wonder what's a good way to change the default of the use_openvino flag, though. If you have a proposed change that works well for CPU as a default, but doesn't hurt the GPU use case, feel free to propose a commit here. Internally I'm about to get some of these code through for review first, and I can add on any incremental changes for review internally later. Thanks! |
Quick update and FYI for you @dkurt
This was after your "Process OpenVINO in thread (#8)" change yesterday. |
@pichuan, I'll think and propose a flexible solution for OpenVINO acceleration. Can you please add some details about the latest numbers? Is that regression or improvement? Because last time we saw 266m46.183s --> 198m46.734s for WGS (call_variants) now it's 233m14.191s --> 204m35.065s. |
@dkurt It does seem like the % of runtime reduction on WGS has been worse. 3 things have changed:
I suspect 3 is the main reason here. This can be verified if I can run the same thing with and without the use_openvino flag on the exactly same machine (sequentially). But I likely don't have time to do that again now... |
@pichuan, I got it, thanks! Indeed the experiments are different. I also benchmarked changes without and with logging improvements so can confirm that there were no efficiency difference so we don't need additional experiments. Thanks for your time and warm welcome! I agree with you that Dockerfile now is in right configuration - build only which is manually enabled. Regarding default value of |
deepvariant/call_variants.py
Outdated
# Unit tests use this branch. | ||
predict_hooks = [] | ||
if FLAGS.use_openvino: | ||
ie_estimator = OpenVINOEstimator(checkpoint_path, num_channels_in_checkpoint_model, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually doesn't seem to make sense to me. Where is num_channels_in_checkpoint_model
from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable is defined above, initialized from checkpoint by a common part.
num_channels_in_checkpoint_model = shape_map_for_layers[first_layer][2]
Hi @dkurt , to give you an update on our discussion in the team, here is my current decision:
I really want to get this to be the default :) But release is happening soon and I don't want to break the default behavior, so I'm being extra careful here. Let me know if you have more thoughts about the decisions above. (Also adding @akolesnikov @AndrewCarroll @gunjanbaid @danielecook FYI about the current status above) |
@dkurt we noticed some slight differences in the output VCF with and without OpenVINO. The quality scores are different in the example below (46 vs. 46.1) for an internal dataset. These quality scores are derived from the output probabilities. Are slight differences in output probabilities expected with and without OpenVINO? In the past, I've noticed such slight differences for the same hardware when EMA is not loaded in correctly at inference time. I wanted to bring this to your attention in case EMA is the reason for these differences.
|
@pichuan, May I ask to additionally take a look at Dockerfile. There is the following line I feel unsure:
It would be safer to replace with something like this: diff --git a/Dockerfile b/Dockerfile
index 0432fd8..a57364d 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -67,7 +67,7 @@ RUN chmod +r /opt/models/hybrid_pacbio_illumina/model.ckpt*
# Convert model to OpenVINO format
RUN if [ "${DV_OPENVINO_BUILD}" = "1" ]; then \
python3 -m pip install networkx defusedxml test-generator==0.1.1; \
- sed -i -E 's/from deepvariant import tf_utils//' /opt/deepvariant/deepvariant/modeling.py; \
+ sed -i -E 's/from deepvariant import tf_utils/#from deepvariant import tf_utils/' /opt/deepvariant/deepvariant/modeling.py; \
export PYTHONPATH=/opt/deepvariant:${PYTHONPATH}; \
for model in wgs wes pacbio hybrid_pacbio_illumina; do \
cd /opt/models/${model}; \
@@ -79,6 +79,7 @@ RUN if [ "${DV_OPENVINO_BUILD}" = "1" ]; then \
--scale 128; \
rm model.pb; \
done \
+ sed -i -E 's/#from deepvariant import tf_utils/from deepvariant import tf_utils/' /opt/deepvariant/deepvariant/modeling.py; \
fi |
@gunjanbaid, I'll take a look closer. In our experiments maximal absolute difference between TensorFlow and OpenVINO about 10e-6. Maybe this is some corner case where such difference may lead to misclassification. |
@dkurt Thanks. Another update for you - I am now trying to incorporate your on-the-fly conversion code: I think it'll be cleaner, and also removes the need for #363 (comment). I do have a question for the code. I'll comment inline. |
Yeah, I also think that it makes the procedure more transparent.
Good point! Just added a commit which uses |
@@ -49,10 +49,11 @@ def is_available(): | |||
return openvino_available | |||
|
|||
|
|||
def __init__(self, checkpoint_path, num_channels, *, input_fn, model): | |||
freeze_graph(model, checkpoint_path, num_channels) | |||
def __init__(self, checkpoint_path, *, input_fn, model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @dkurt . FYI this commit has been reviewed internally and added now as well. It'll show up in our next release.
One question for you about this line: what is the "*" arg passed in here? I removed it and it worked fine. But wanted to check unless there's a reason it has to be there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pichuan, thank you so much!
This is an arguments delimiter which makes input_fn
and model
are required to be named parameters. So
OpenVINOEstimator(checkpoint_path, input_fn=input_fn, model=model)
@dkurt A quick update: I just noticed that the outputs of the multiple runs with OpenVINO are not deterministic. (I confirmed by running the same command 10 times on a WES BAM file with use_openvino on) (I have confirmed that without OpenVINO, the results are deterministic. I ran another 10 to make sure all VCFs are exactly the same - which is what I expected). I will go ahead and see if I can make OpenVINO runs deterministic by removing the threading code. If you have some ideas why (or why I shouldn't expect it to be deterministic), please let me know. |
I confirmed that by reverting the changes in 3cfa6c5 , my new 10 runs with OpenVINO are now producing the exactly same VCFs! 🎉 @dkurt For this upcoming release, I will just print out a message to warn the users that all the logging information will come out towards the end. We can look into improving the logging in the future. |
@pichuan, good catch, thanks a lot! Sorry for this bug. Yes, let's work on it separately. |
Fixed non-deterministic behavior by last patch. |
…ttps://github.com/dkurt. PiperOrigin-RevId: 344919234
@dkurt FYI , the OpenVINO changes are in https://github.com/google/deepvariant/releases/tag/v1.1.0 |
@pichuan, @AndrewCarroll, @gunjanbaid, thank you so much for very productive collaboration! @pichuan, I've moved the changes related to threading into #393. |
(added extra flag
--call_variants_extra_args="use_openvino=True"
comparing to original Getting Started)