Merge branch 'releases/2024/3' into updating-interactive-tutorials-07…

…0824-for-24.3
openvinotoolkit · Aug 12, 2024 · bd20779 · bd20779
2 parents eca0c99 + 99450a7
commit bd20779
Show file tree

Hide file tree

Showing 6 changed files with 59 additions and 14 deletions.
diff --git a/README.md b/README.md
@@ -115,6 +115,7 @@ You can ask questions and get support on:
 * OpenVINO channels on the [Intel DevHub Discord server](https://discord.gg/7pVRxUwdWG).
 * The [`openvino`](https://stackoverflow.com/questions/tagged/openvino) tag on Stack Overflow\*.
 
+
 ## Additional Resources
 
 * [Product Page](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html)
@@ -123,6 +124,18 @@ You can ask questions and get support on:
 * [OpenVINO™ toolkit on Medium](https://medium.com/@openvino)
 
 
+## Telemetry
+
+OpenVINO™ collects software performance and usage data for the purpose of improving OpenVINO™ tools.
+This data is collected directly by OpenVINO™ or through the use of Google Analytics 4.
+You can opt-out at any time by running the command:
+
+``` bash
+opt_in_out --opt_out
+```
+
+More Information is available at [OpenVINO™ Telemetry](https://docs.openvino.ai/2024/about-openvino/additional-resources/telemetry.html).
+
 ## License
 
 OpenVINO™ Toolkit is licensed under [Apache License Version 2.0](LICENSE).

diff --git a/docs/articles_en/about-openvino/release-notes-openvino/release-policy.rst b/docs/articles_en/about-openvino/release-notes-openvino/release-policy.rst
@@ -1,13 +1,13 @@
 Release Policy
 =============================================================================
 
-OpenVINO offers releases of three different types, each targeting a different use case:
+OpenVINO™ offers releases of four different types, each targeting a different use case:
 
 * `Regular releases <#regular-releases>`__
 * `Long-Term Support <#long-term-support-releases>`__
+* `Pre-release releases <#pre-release-releases>`__
 * `Nightly <#nightly-releases>`__
 
-
 Regular releases
 ####################
 
@@ -60,6 +60,16 @@ The following elements are not guaranteed to receive updates:
 * OpenVINO tools, such as NNCF and OVMS.
 * Code samples used in component testing.
 
+Pre-release releases
+######################
+
+OpenVINO pre-release is an early version of regular releases that has not undergone full release validation
+or qualification. Pre-release versions are more stable than nightly releases. No support is offered on pre-release software. The scope, functionality,
+and APIs/behavior are subject to change in the future. It **should NOT** be incorporated into
+any production software/solution, instead it should be used only for:
+
+* Performing early testing and integration.
+* Getting early feedback from the community.
 
 Nightly releases
 ###########################

diff --git a/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide.rst b/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide.rst
@@ -184,7 +184,7 @@ mark a conversation session, as you can see in these simple examples:
          import openvino_genai as ov_genai
          pipe = ov_genai.LLMPipeline(model_path)
 
-         pipe.set_generation_cofnig({'max_new_tokens': 100)
+         pipe.set_generation_config({'max_new_tokens': 100)
 
          pipe.start_chat()
          while True:
@@ -209,7 +209,7 @@ mark a conversation session, as you can see in these simple examples:
 
             ov::genai::GenerationConfig config = pipe.get_generation_config();
             config.max_new_tokens = 100;
-            pipe.set_generation_cofnig(config)
+            pipe.set_generation_config(config)
 
             pipe.start_chat();
             for (size_t i = 0; i < questions.size(); i++) {

diff --git a/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst b/docs/articles_en/learn-openvino/llm_inference_guide/llm-inference-hf.rst
@@ -223,6 +223,8 @@ as in OpenVINO native API:
 
    model.to("GPU")
 
+.. _enabling-runtime-optimizations:
+
 Enabling OpenVINO Runtime Optimizations
 ############################################################
 
@@ -237,17 +239,28 @@ includes **Dynamic quantization** of activations of 4/8-bit quantized MatMuls an
   insignificant deviation in generation accuracy.  Quantization is performed in a group-wise
   manner, with configurable group size. It means that values in a group share quantization
   parameters. Larger group sizes lead to faster inference but lower accuracy. Recommended
-  group size values are ``32``, ``64``, or ``128``. To enable Dynamic quantization, use
-  the corresponding inference property as follows:
+  group size values are ``0``, ``32``, ``64``, or ``128``. Dynamic quantization is enabled **by
+  default** on the CPU device. To disable dynamic quantization you can either:
+
+  * **(Primary Option)** Set ``DYNAMIC_QUANTIZATION_GROUP_SIZE`` to the ``0`` value.
+  * Switch execution mode from the ``PERFORMANCE mode`` to the ``ACCURACY mode``. However, this
+    option affects inference precision as well. You can learn more about both: ``PERFORMANCE``
+    and ``ACCURACY`` modes by following the :ref:`Precision Control Guide <execution-mode>`.
 
+  To change a group size value (e.g. to ``64``), you need to execute the following code:
 
   .. code-block:: python
 
      model = OVModelForCausalLM.from_pretrained(
          model_path,
-         ov_config={"DYNAMIC_QUANTIZATION_GROUP_SIZE": "32", "PERFORMANCE_HINT": "LATENCY"}
+         ov_config={"DYNAMIC_QUANTIZATION_GROUP_SIZE": "64"}
      )
 
+  .. note::
+
+     As of release 2024.3, dynamic quantization is not enabled for BF16 inference.
+
+
 * **KV-cache quantization** allows lowering the precision of Key and Value cache in LLMs.
   This helps reduce memory consumption during inference, improving latency and throughput.
   KV-cache can be quantized into the following precisions: ``u8``, ``bf16``, ``f16``.

diff --git a/...en/openvino-workflow/running-inference/optimize-inference/precision-control.rst b/...en/openvino-workflow/running-inference/optimize-inference/precision-control.rst
@@ -1,5 +1,3 @@
-.. {#openvino_docs_OV_UG_Precision_Control}
-
 Precision Control
 =================
 
@@ -26,6 +24,7 @@ Advanced Matrix Extensions (AMX) on CPU do not support ``f32``). Also, I/O opera
 requires less memory due to the smaller tensor byte size. This guide will focus on how
 to control inference precision.
 
+.. _execution-mode:
 
 Execution Mode
 ##############
@@ -36,8 +35,9 @@ may lower the accuracy for performance reasons (**PERFORMANCE mode**)
 
 * In **ACCURACY mode**, the device cannot convert floating point tensors to a smaller
   floating point type, so devices try to keep the accuracy metrics as close as possible to
-  the original values obtained after training relative to the device's real capabilities.
+  the original values obtained after training relative to the device's real capabilities.
   This means that most devices will infer with ``f32`` precision if your device supports it.
+  In this mode, the :ref:`Dynamic Quantization <enabling-runtime-optimizations>` is disabled.
 * In **PERFORMANCE mode**, the device can convert to smaller data types and apply other
   optimizations that may have some impact on accuracy rates, although we still try to
   minimize accuracy loss and may use mixed precision execution in some cases.
@@ -80,14 +80,23 @@ to specify the exact precision the user wants, but is less portable. For example
 supports ``f32`` inference precision and ``bf16`` on some platforms, GPU supports ``f32``
 and ``f16``, so if a user wants to an application that uses multiple devices, they have
 to handle all these combinations manually or let OV do it automatically by using higher
-level ``execution_mode`` property. Another thing is that ``inference_precision`` is also
-a hint, so the value provided is not guaranteed to be used by Runtime (mainly in cases
-where the current device does not have the required hardware capabilities).
+level ``execution_mode`` property.
+
+.. note::
+
+   When using ``execution_mode``, you need to be aware that using **ACCURACY mode**
+   will result in enabling ``f32`` inference precision, but it will also disable
+   :ref:`dynamic quantization <enabling-runtime-optimizations>`. This may highly affect
+   inference performance (esp. on the Intel® Xeon® platforms and Intel® GPU devices)
+
+Another thing is that ``inference_precision`` is also a hint, so the value provided is not guaranteed
+to be used by Runtime (mainly in cases where the current device does not have the required hardware
+capabilities).
 
 .. note::
 
    All devices only support floating-point data types (``f32``, ``f16``, ``bf16``) as a value
-   for ``inference_precision`` attribute, because quantization cannot be done in Runtime.
+   for ``inference_precision`` attribute.
 
 
 .. _limited_inference_precision:

diff --git a/docs/sphinx_setup/_static/download/OpenVINO_Quick_Start_Guide.pdf b/docs/sphinx_setup/_static/download/OpenVINO_Quick_Start_Guide.pdf