Fix memory access violations in the CPU float16 min and max operators #22135

adamreeve · 2024-09-18T22:29:53Z

Description

Fixes the logic for getting the number of elements for the input and output spans in the MinMaxMLFloat16 method. This was incorrectly using the full number of elements in the output rather than the number of elements in the current span, which worked fine with 1D inputs but breaks with 2D inputs.

This meant that as the BroadcastLooper iterated over spans, MinMaxMLFloat16 would start at a position further forward in the input and output and read and write further beyond the end of the input and output respectively, causing the asan error in #21558 and sometimes segfaults in larger examples.

Motivation and Context

Fixes #21558.

From further testing, this issue didn't only cause asan errors in tests but causes segfaults with larger sized inputs.

snnn · 2024-09-18T23:01:29Z

/azp run Big Models, Linux Android Emulator QNN CI Pipeline, Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline

snnn · 2024-09-18T23:01:39Z

/azp run Linux OpenVINO CI Pipeline, Linux QNN CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows ARM64 QNN CI Pipeline

snnn · 2024-09-18T23:01:46Z

/azp run Windows CPU CI Pipeline, Windows GPU CUDA CI Pipeline, Windows GPU DML CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows x64 QNN CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline

azure-pipelines · 2024-09-18T23:01:54Z

Azure Pipelines successfully started running 6 pipeline(s).

azure-pipelines · 2024-09-18T23:02:01Z

Azure Pipelines successfully started running 5 pipeline(s).

azure-pipelines · 2024-09-18T23:02:15Z

Azure Pipelines successfully started running 9 pipeline(s).

onnxruntime/test/providers/cpu/math/element_wise_ops_test.cc

snnn · 2024-09-19T14:43:08Z

/azp run Big Models, Linux Android Emulator QNN CI Pipeline, Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline

snnn · 2024-09-19T14:43:15Z

/azp run Linux OpenVINO CI Pipeline, Linux QNN CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows ARM64 QNN CI Pipeline

snnn · 2024-09-19T14:43:37Z

/azp run Windows CPU CI Pipeline, Windows GPU CUDA CI Pipeline, Windows GPU DML CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows x64 QNN CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline

azure-pipelines · 2024-09-19T14:43:38Z

Azure Pipelines successfully started running 6 pipeline(s).

azure-pipelines · 2024-09-19T14:43:40Z

Azure Pipelines successfully started running 5 pipeline(s).

azure-pipelines · 2024-09-19T14:44:10Z

Azure Pipelines successfully started running 9 pipeline(s).

snnn

Thanks.

This makes min and max with NaN for either operand always return NaN for float16 data, matching the behaviour of float and double. The behaviour for floats and doubles was previously fixed for the CPU provider in #21492 and the CUDA provider in #19984, but these PRs didn't fix the behaviour for float16 due to tests causing asan errors. The memory access violations with float16 data have now been fixed in #22135, so this PR is a follow up to make float16 min and max behave the same as float and double for both the CPU and CUDA providers now that we can add tests for this. ### Motivation and Context Relevant previous issues (not float16 specific): * #21455 * onnx/onnx#6003

Fix incorrect output size in float16 min/max operations

025e422

tianleiwu reviewed Sep 19, 2024

View reviewed changes

onnxruntime/test/providers/cpu/math/element_wise_ops_test.cc Outdated Show resolved Hide resolved

Make some tests have a different number of rows to columns

8c00cd9

tianleiwu approved these changes Sep 19, 2024

View reviewed changes

snnn approved these changes Sep 19, 2024

View reviewed changes

tianleiwu merged commit f3cbe76 into microsoft:main Sep 20, 2024
71 checks passed

adamreeve deleted the float16_minmax branch September 20, 2024 01:40

adamreeve mentioned this pull request Sep 20, 2024

Fix NaN propagation for float16 min and max operators #22161

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix memory access violations in the CPU float16 min and max operators #22135

Fix memory access violations in the CPU float16 min and max operators #22135

adamreeve commented Sep 18, 2024

snnn commented Sep 18, 2024

snnn commented Sep 18, 2024

snnn commented Sep 18, 2024

azure-pipelines bot commented Sep 18, 2024

azure-pipelines bot commented Sep 18, 2024

azure-pipelines bot commented Sep 18, 2024

snnn commented Sep 19, 2024

snnn commented Sep 19, 2024

snnn commented Sep 19, 2024

azure-pipelines bot commented Sep 19, 2024

azure-pipelines bot commented Sep 19, 2024

azure-pipelines bot commented Sep 19, 2024

snnn left a comment

Fix memory access violations in the CPU float16 min and max operators #22135

Fix memory access violations in the CPU float16 min and max operators #22135

Conversation

adamreeve commented Sep 18, 2024

Description

Motivation and Context

snnn commented Sep 18, 2024

snnn commented Sep 18, 2024

snnn commented Sep 18, 2024

azure-pipelines bot commented Sep 18, 2024

azure-pipelines bot commented Sep 18, 2024

azure-pipelines bot commented Sep 18, 2024

snnn commented Sep 19, 2024

snnn commented Sep 19, 2024

snnn commented Sep 19, 2024

azure-pipelines bot commented Sep 19, 2024

azure-pipelines bot commented Sep 19, 2024

azure-pipelines bot commented Sep 19, 2024

snnn left a comment

Choose a reason for hiding this comment