Skip to content

Commit

Permalink
Merge branch 'openvinotoolkit:master' into flip
Browse files Browse the repository at this point in the history
  • Loading branch information
Asthestarsfalll authored Feb 21, 2023
2 parents 0d5e369 + 7f3f576 commit ecc0c9d
Show file tree
Hide file tree
Showing 52 changed files with 1,594 additions and 763 deletions.
2 changes: 1 addition & 1 deletion .ci/azure/linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ jobs:
maxParallel: '2'

# About 150% of total time
timeoutInMinutes: '120'
timeoutInMinutes: '180'

pool:
name: LIN_VMSS_VENV_F16S_U20_WU2
Expand Down
6 changes: 3 additions & 3 deletions docs/IE_PLUGIN_DG/AsyncInferRequest.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ OpenVINO Runtime Plugin API provides the base InferenceEngine::AsyncInferRequest

OpenVINO Runtime Plugin API provides the base InferenceEngine::AsyncInferRequestThreadSafeDefault class for a custom asynchronous inference request implementation:

@snippet src/template_async_infer_request.hpp async_infer_request:header
@snippet src/async_infer_request.hpp async_infer_request:header

#### Class Fields

Expand All @@ -30,7 +30,7 @@ The main goal of the `AsyncInferRequest` constructor is to define a device pipel
- `waitPipeline` is a CPU non-compute task that waits for a response from a remote device.
- `inferPostprocess` is a CPU compute task.

@snippet src/template_async_infer_request.cpp async_infer_request:ctor
@snippet src/async_infer_request.cpp async_infer_request:ctor

The stages are distributed among two task executors in the following way:

Expand All @@ -46,4 +46,4 @@ Inference request stages are also profiled using IE_PROFILING_AUTO_SCOPE, which

In the asynchronous request destructor, it is necessary to wait for a pipeline to finish. It can be done using the InferenceEngine::AsyncInferRequestThreadSafeDefault::StopAndWait method of the base class.

@snippet src/template_async_infer_request.cpp async_infer_request:dtor
@snippet src/async_infer_request.cpp async_infer_request:dtor
16 changes: 8 additions & 8 deletions docs/IE_PLUGIN_DG/InferRequest.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Inference Engine Plugin API provides the helper InferenceEngine::IInferRequestIn
to use as a base class for a synchronous inference request implementation. Based of that, a declaration
of a synchronous request class can look as follows:

@snippet src/template_infer_request.hpp infer_request:header
@snippet src/infer_request.hpp infer_request:header

#### Class Fields

Expand All @@ -34,29 +34,29 @@ The example class has several fields:

The constructor initializes helper fields and calls methods which allocate blobs:

@snippet src/template_infer_request.cpp infer_request:ctor
@snippet src/infer_request.cpp infer_request:ctor

> **NOTE**: Call InferenceEngine::CNNNetwork::getInputsInfo and InferenceEngine::CNNNetwork::getOutputsInfo to specify both layout and precision of blobs, which you can set with InferenceEngine::InferRequest::SetBlob and get with InferenceEngine::InferRequest::GetBlob. A plugin uses these hints to determine its internal layouts and precisions for input and output blobs if needed.
### `~InferRequest` Destructor

Decrements a number of created inference requests:

@snippet src/template_infer_request.cpp infer_request:dtor
@snippet src/infer_request.cpp infer_request:dtor

### `InferImpl()`

**Implementation details:** Base IInferRequestInternal class implements the public InferenceEngine::IInferRequestInternal::Infer method as following:
- Checks blobs set by users
- Calls the `InferImpl` method defined in a derived class to call actual pipeline stages synchronously

@snippet src/template_infer_request.cpp infer_request:infer_impl
@snippet src/infer_request.cpp infer_request:infer_impl

#### 1. `inferPreprocess`

Below is the code of the `inferPreprocess` method to demonstrate Inference Engine common preprocessing step handling:

@snippet src/template_infer_request.cpp infer_request:infer_preprocess
@snippet src/infer_request.cpp infer_request:infer_preprocess

**Details:**
* `InferImpl` must call the InferenceEngine::IInferRequestInternal::execDataPreprocessing function, which executes common Inference Engine preprocessing step (for example, applies resize or color conversion operations) if it is set by the user. The output dimensions, layout and precision matches the input information set via InferenceEngine::CNNNetwork::getInputsInfo.
Expand All @@ -66,18 +66,18 @@ Below is the code of the `inferPreprocess` method to demonstrate Inference Engin

Executes a pipeline synchronously using `_executable` object:

@snippet src/template_infer_request.cpp infer_request:start_pipeline
@snippet src/infer_request.cpp infer_request:start_pipeline

#### 3. `inferPostprocess`

Converts output blobs if precisions of backend output blobs and blobs passed by user are different:

@snippet src/template_infer_request.cpp infer_request:infer_postprocess
@snippet src/infer_request.cpp infer_request:infer_postprocess

### `GetPerformanceCounts()`

The method sets performance counters which were measured during pipeline stages execution:

@snippet src/template_infer_request.cpp infer_request:get_performance_counts
@snippet src/infer_request.cpp infer_request:get_performance_counts

The next step in the plugin library implementation is the [Asynchronous Inference Request](@ref openvino_docs_ie_plugin_dg_async_infer_request) class.
121 changes: 77 additions & 44 deletions docs/OV_Runtime_UG/auto_device_selection.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,12 @@ The logic behind the choice is as follows:
@endsphinxdirective

To put it simply, when loading the model to the first device on the list fails, AUTO will try to load it to the next device in line, until one of them succeeds.
What is important, **AUTO always starts inference with the CPU of the system**, as it provides very low latency and can start inference with no additional delays.
What is important, **AUTO starts inference with the CPU of the system by default**, as it provides very low latency and can start inference with no additional delays.
While the CPU is performing inference, AUTO continues to load the model to the device best suited for the purpose and transfers the task to it when ready.
This way, the devices which are much slower in compiling models, GPU being the best example, do not impede inference at its initial stages.
For example, if you use a CPU and a GPU, the first-inference latency of AUTO will be better than that of using GPU alone.

Note that if you choose to exclude CPU from the priority list, it will be unable to support the initial model compilation stage.
Note that if you choose to exclude CPU from the priority list or disable the initial CPU acceleration feature via `ov::intel_auto::enable_startup_fallback`, it will be unable to support the initial model compilation stage.

![](../img/autoplugin_accelerate.svg)

Expand All @@ -76,41 +76,56 @@ Following the OpenVINO™ naming convention, the Automatic Device Selection mode

@sphinxdirective

+--------------------------------+----------------------------------------------------------------------+
| | Property | | Values and Description |
+================================+======================================================================+
| | <device candidate list> | | **Values**: |
| | | | empty |
| | | | `AUTO` |
| | | | `AUTO: <device names>` (comma-separated, no spaces) |
| | | | |
| | | | Lists the devices available for selection. |
| | | | The device sequence will be taken as priority from high to low. |
| | | | If not specified, `AUTO` will be used as default, |
| | | | and all devices will be "viewed" as candidates. |
+--------------------------------+----------------------------------------------------------------------+
| | `ov::device:priorities` | | **Values**: |
| | | | `<device names>` (comma-separated, no spaces) |
| | | | |
| | | | Specifies the devices for AUTO to select. |
| | | | The device sequence will be taken as priority from high to low. |
| | | | This configuration is optional. |
+--------------------------------+----------------------------------------------------------------------+
| | `ov::hint::performance_mode` | | **Values**: |
| | | | `ov::hint::PerformanceMode::LATENCY` |
| | | | `ov::hint::PerformanceMode::THROUGHPUT` |
| | | | `ov::hint::PerformanceMode::CUMULATIVE_THROUGHPUT` |
| | | | |
| | | | Specifies the performance option preferred by the application. |
+--------------------------------+----------------------------------------------------------------------+
| | `ov::hint::model_priority` | | **Values**: |
| | | | `ov::hint::Priority::HIGH` |
| | | | `ov::hint::Priority::MEDIUM` |
| | | | `ov::hint::Priority::LOW` |
| | | | |
| | | | Indicates the priority for a model. |
| | | | IMPORTANT: This property is not fully supported yet. |
+--------------------------------+----------------------------------------------------------------------+
+---------------------------------------------+----------------------------------------------------------------------+
| | Property | | Values and Description |
+=============================================+======================================================================+
| | <device candidate list> | | **Values**: |
| | | | empty |
| | | | `AUTO` |
| | | | `AUTO: <device names>` (comma-separated, no spaces) |
| | | | |
| | | | Lists the devices available for selection. |
| | | | The device sequence will be taken as priority from high to low. |
| | | | If not specified, `AUTO` will be used as default, |
| | | | and all devices will be "viewed" as candidates. |
+---------------------------------------------+----------------------------------------------------------------------+
| | `ov::device::priorities` | | **Values**: |
| | | | `<device names>` (comma-separated, no spaces) |
| | | | |
| | | | Specifies the devices for AUTO to select. |
| | | | The device sequence will be taken as priority from high to low. |
| | | | This configuration is optional. |
+---------------------------------------------+----------------------------------------------------------------------+
| | `ov::hint::performance_mode` | | **Values**: |
| | | | `ov::hint::PerformanceMode::LATENCY` |
| | | | `ov::hint::PerformanceMode::THROUGHPUT` |
| | | | `ov::hint::PerformanceMode::CUMULATIVE_THROUGHPUT` |
| | | | |
| | | | Specifies the performance option preferred by the application. |
+---------------------------------------------+----------------------------------------------------------------------+
| | `ov::hint::model_priority` | | **Values**: |
| | | | `ov::hint::Priority::HIGH` |
| | | | `ov::hint::Priority::MEDIUM` |
| | | | `ov::hint::Priority::LOW` |
| | | | |
| | | | Indicates the priority for a model. |
| | | | IMPORTANT: This property is not fully supported yet. |
+---------------------------------------------+----------------------------------------------------------------------+
| | `ov::execution_devices` | | Lists the runtime target devices on which the inferences are being |
| | | | executed. |
| | | | Examples of returning results could be `(CPU)`(`(CPU)` is a |
| | | | temporary device, indicating that CPU is used for acceleration at |
| | | | the model compilation stage), `CPU`, `GPU`, `CPU GPU`, `GPU.0`, |
| | | | etc. |
+---------------------------------------------+----------------------------------------------------------------------+
| | `ov::intel_auto::enable_startup_fallback` | | **Values**: |
| | | | `true` |
| | | | `false` |
| | | | |
| | | | Enables/disables CPU as acceleration (or the helper device) in the |
| | | | beginning. The default value is `true`, indicating that CPU is used|
| | | | as acceleration by default. |
+---------------------------------------------+----------------------------------------------------------------------+

@endsphinxdirective

Expand All @@ -122,7 +137,7 @@ The device candidate list enables you to customize the priority and limit the ch
- If <device candidate list> is not specified, AUTO assumes all the devices present in the system can be used.
- If `AUTO` without any device names is specified, AUTO assumes all the devices present in the system can be used, and will load the network to all devices and run inference based on their default priorities, from high to low.

To specify the priority of devices, enter the device names in the priority order (from high to low) in `AUTO: <device names>`, or use the `ov::device:priorities` property.
To specify the priority of devices, enter the device names in the priority order (from high to low) in `AUTO: <device names>`, or use the `ov::device::priorities` property.

See the following code for using AUTO and specifying devices:

Expand Down Expand Up @@ -192,25 +207,43 @@ AUTO will then query all available devices and remove CPU from the candidate lis

Note that if you choose to exclude CPU from device candidate list, CPU will not be able to support the initial model compilation stage. See more information in [How AUTO Works](#how-auto-works).

### Performance Hints for AUTO
The `ov::hint::performance_mode` property enables you to specify a performance option for AUTO to be more efficient for particular use cases.
### Checking Target Runtime Devices

> **NOTE**: Currently, the `ov::hint` property is supported by CPU and GPU devices only.
To query the runtime target devices on which the inferences are being executed using AUTO, you can use the `ov::execution_devices` property. It must be used with `get_property`, for example:

#### THROUGHPUT
This option prioritizes high throughput, balancing between latency and power. It is best suited for tasks involving multiple jobs, such as inference of video feeds or large numbers of images.
@sphinxdirective

.. tab:: C++

.. doxygensnippet:: docs/snippets/AUTO7.cpp
:language: cpp
:fragment: [part7]

.. tab:: Python

> **NOTE**: If no performance hint is set explicitly, AUTO will set THROUGHPUT for devices that have not set `ov::device::properties`. For example, if you have both a CPU and a GPU in the system, this command `core.compile_model("AUTO", ov::device::properties("CPU", ov::enable_profiling(true)))` will set THROUGHPUT for the GPU only. No hint will be set for the CPU although it's the selected device.
.. doxygensnippet:: docs/snippets/ov_auto.py
:language: python
:fragment: [part7]

@endsphinxdirective

### Performance Hints for AUTO
The `ov::hint::performance_mode` property enables you to specify a performance option for AUTO to be more efficient for particular use cases. The default hint for AUTO is `LATENCY`.

#### LATENCY
This option prioritizes low latency, providing short response time for each inference job. It performs best for tasks where inference is required for a single input image, e.g. a medical analysis of an ultrasound scan image. It also fits the tasks of real-time or nearly real-time applications, such as an industrial robot's response to actions in its environment or obstacle avoidance for autonomous vehicles.

> **NOTE**: If no performance hint is set explicitly, AUTO will set LATENCY for devices that have not set `ov::device::properties`, for example, `ov::device::properties(<DEVICE_NAME>, ov::hint::performance_mode(ov::hint::LATENCY))`.
@sphinxdirective

.. _cumulative throughput:

@endsphinxdirective

#### THROUGHPUT
This option prioritizes high throughput, balancing between latency and power. It is best suited for tasks involving multiple jobs, such as inference of video feeds or large numbers of images.

#### CUMULATIVE_THROUGHPUT
While `LATENCY` and `THROUGHPUT` can select one target device with your preferred performance option, the `CUMULATIVE_THROUGHPUT` option enables running inference on multiple devices for higher throughput. With `CUMULATIVE_THROUGHPUT`, AUTO loads the network model to all available devices in the candidate list, and then runs inference on them based on the default or specified priority.

Expand Down
Loading

0 comments on commit ecc0c9d

Please sign in to comment.