Incorrect subtraction #16134

RomanSteinberg · 2019-09-10T12:11:24Z

Description

The result of subtraction is influenced by third party code.

Environment info (Required)

----------Python Info----------
Version      : 3.6.8
Compiler     : GCC 8.0.1 20180414 (experimental) [trunk revision 259383
Build        : ('default', 'Jan 14 2019 11:02:34')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
Version      : 9.0.1
Directory    : /home/deploy/insightface/venv/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.5.0
Directory    : /home/deploy/insightface/venv/lib/python3.6/site-packages/mxnet
Commit Hash   : 75a9e187d00a8b7ebc71412a02ed0e3ae489d91f
Library      : ['/home/deploy/insightface/venv/lib/python3.6/site-packages/mxnet/libmxnet.so']
Build features:
✔ CUDA
✔ CUDNN
✔ NCCL
✔ CUDA_RTC
✖ TENSORRT
✔ CPU_SSE
✔ CPU_SSE2
✔ CPU_SSE3
✔ CPU_SSE4_1
✔ CPU_SSE4_2
✖ CPU_SSE4A
✔ CPU_AVX
✖ CPU_AVX2
✖ OPENMP
✖ SSE
✔ F16C
✖ JEMALLOC
✖ BLAS_OPEN
✖ BLAS_ATLAS
✖ BLAS_MKL
✖ BLAS_APPLE
✔ LAPACK
✖ MKLDNN
✔ OPENCV
✖ CAFFE
✖ PROFILER
✔ DIST_KVSTORE
✖ CXX14
✖ INT64_TENSOR_SIZE
✔ SIGNAL_HANDLER
✖ DEBUG
----------System Info----------
Platform     : Linux-4.18.0-15-generic-x86_64-with-Ubuntu-18.04-bionic
system       : Linux
node         : finapolis
release      : 4.18.0-15-generic
version      : #16~18.04.1-Ubuntu SMP Thu Feb 7 14:06:04 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               58
Model name:          Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
Stepping:            9
CPU MHz:             2970.775
CPU max MHz:         3400,0000
CPU min MHz:         1600,0000
BogoMIPS:            6784.38
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            8192K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm arat pln pts
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0032 sec, LOAD: 0.8174 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.2182 sec, LOAD: 1.2723 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.2506 sec, LOAD: 1.0109 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.1522 sec, LOAD: 0.5201 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.2439 sec, LOAD: 1.0428 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.2124 sec, LOAD: 0.4703 sec.
----------Environment----------
KMP_DUPLICATE_LIB_OK="True"

Package used (Python/R/Scala/Julia):
I'm using Python

Build info (Required if built from source)

None

Error Message:

None

Minimum reproducible example

import mxnet as mx
import numpy as np
import sys

from PyQt5.QtWidgets import QApplication


QApplication(sys.argv)
x = mx.symbol.Variable('x', shape=(10,))
x = x - 127.5
x_val = mx.nd.array(np.array([0., 0., 1., 1., 0., 0., 0., 1., 1., 0., 0.], dtype=np.float32))
mod = x.bind(ctx=mx.cpu(), args={'x': x_val})
mod.forward()
print(mod.outputs[0].asnumpy())

Steps to reproduce

source ~/insightface/venv/bin/activate
python test_mxnet.py

The observed result is [-127. -127. -126. -126. -127. -127. -127. -126. -126. -127. -127.], which is incorrect. Expected [-127.5 -127.5 -126.5 -126.5 -127.5 -127.5 -127.5 -126.5 -126.5 -127.5 -127.5].

What have you tried to solve it?

If you comment QApplication(sys.argv), the code produces correct results.
If you move QApplication(sys.argv) into the end of script, the code produces correct results.
I tried to run this code on 4 computers. Only 3 of them reproduce this error.
I tried to switch between gpu and cpu. The observations are the same.

Note: I understand that you are not responsible for PyQt5 codebase, but I think I found a vulnerability in mxnet which is not good.

The text was updated successfully, but these errors were encountered:

mxnet-label-bot · 2019-09-10T12:11:28Z

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended label(s): Example

leezu · 2019-09-10T12:35:08Z

I can't reproduce this on Ubuntu 18.04 with neither MXNet 1.5 from pypi nor a local build from latest master.

Could you please share more information about the version of PyQt5 as well as the systems that you experience the issue? How does the 4th computer on which you cant repro differ from the other 3 on which you experience the issue?

ChaiBapchya · 2019-09-10T20:53:22Z

@mxnet-label-bot add [Bug, Question]

RomanSteinberg · 2019-09-11T08:00:28Z

There are 2 computers with GPU and 2 without. All have linux installed and python works in virtual environments. For simplification, I will consider only two computers which have GPU on-board. Computer deploy has a problem and I provided info about it below. My computer (roman) environment is as follows:

----------Python Info----------
Version : 3.6.8
Compiler : GCC 8.3.0
Build : ('default', 'Aug 20 2019 17:12:48')
Arch : ('64bit', 'ELF')
------------Pip Info-----------
Version : 9.0.1
Directory : /home/roman/dev/venv_fin/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version : 1.5.0
Directory : /home/roman/dev/venv_fin/lib/python3.6/site-packages/mxnet
Commit Hash : 75a9e18
Library : ['/home/roman/dev/venv_fin/lib/python3.6/site-packages/mxnet/libmxnet.so']
Build features:
✔ CUDA
✔ CUDNN
✔ NCCL
✔ CUDA_RTC
✖ TENSORRT
✔ CPU_SSE
✔ CPU_SSE2
✔ CPU_SSE3
✔ CPU_SSE4_1
✔ CPU_SSE4_2
✖ CPU_SSE4A
✔ CPU_AVX
✖ CPU_AVX2
✖ OPENMP
✖ SSE
✔ F16C
✖ JEMALLOC
✖ BLAS_OPEN
✖ BLAS_ATLAS
✖ BLAS_MKL
✖ BLAS_APPLE
✔ LAPACK
✖ MKLDNN
✔ OPENCV
✖ CAFFE
✖ PROFILER
✔ DIST_KVSTORE
✖ CXX14
✖ INT64_TENSOR_SIZE
✔ SIGNAL_HANDLER
✖ DEBUG
----------System Info----------
Platform : Linux-5.0.0-27-generic-x86_64-with-Ubuntu-18.04-bionic
system : Linux
node : vs-roman
release : 5.0.0-27-generic
version : #28~18.04.1-Ubuntu SMP Thu Aug 22 03:00:32 UTC 2019
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 158
Model name: Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz
Stepping: 9
CPU MHz: 3465.639
CPU max MHz: 3500.0000
CPU min MHz: 800.0000
BogoMIPS: 6000.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 6144K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0007 sec, LOAD: 0.7133 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0011 sec, LOAD: 1.0205 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0009 sec, LOAD: 0.9150 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0007 sec, LOAD: 0.9638 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0008 sec, LOAD: 0.8264 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0009 sec, LOAD: 0.2840 sec.
----------Environment----------
KMP_DUPLICATE_LIB_OK="True"

PyQt5 (pip show pyqt5)

Name: PyQt5
Version: 5.12.1
Summary: Python bindings for the Qt cross platform UI and application toolkit
Home-page: https://www.riverbankcomputing.com/software/pyqt/
Author: Riverbank Computing Limited
Author-email: info@riverbankcomputing.com
License: GPL v3
Location: /home/roman/dev/venv/lib/python3.6/site-packages
Requires: PyQt5-sip
Required-by: PyQtWebEngine, PyQt5-stubs

PyQt5 is the same on both computers.

How does the 4th computer on which you cant repro differ from the other 3 on which you experience the issue?

I ask myself this question for several days. And I didn't figured it out yet.

UPD: Looking into results of your diagnostic script, the main difference is CPU and Ubuntu version. Can you see something more important?

leezu · 2019-09-11T10:30:25Z

Can you try upgrading to PyQt5 5.13.0?

RomanSteinberg · 2019-09-12T07:37:47Z

@leezu the results are the same. And once again I want to point out that I can find workaround, but it doesn't remove vulnerability from mxnet codebase. If any software/library/framework can influence calculations then it is possible that there are some other programs which can exploit this vulnerability.

PS: there’s still a chance thatI'm doing something wrong :)

RomanSteinberg · 2019-09-12T07:53:41Z

By the way, has anyone managed to reproduce the error? If not I'll try to find possibility to "copy" the system on "deploy" computer.

You can use reactions thumbs up and down to vote.

marcoabreu · 2019-09-12T18:37:38Z

You could maybe provide a Dockerfile that recreates the environment? That would make it easier to replicate and debug the error.

RomanSteinberg · 2019-09-13T07:08:42Z

@marcoabreu yes I know and I tried to do it, but it doesn't reproduce the problem. I'm looking for a way to 'copy' the OS and all the environment which reproduces the problem.

ei-grad · 2019-09-23T20:02:58Z

Looks like a bug with locale settings somewhere in mxnet.

reproduce.py:

import mxnet as mx
import numpy as np

import locale
locale.setlocale(locale.LC_ALL, '')
print(locale.getlocale())

x = mx.symbol.Variable('x', shape=(10,))
x = x - 127.5
x_val = mx.nd.array(np.array([0, .5], dtype=np.float32))
mod = x.bind(ctx=mx.cpu(), args={'x': x_val})
mod.forward()
print(mod.debug_str())
print(mod.outputs[0].asnumpy())

With C locale:

$ LANG=C python reproduce.py          
(None, None)
Symbol Outputs:
	output[0]=_minusscalar0(0)
Variable:x
--------------------
Op:_minus_scalar, Name=_minusscalar0
Inputs:
	arg[0]=x(0) version=0
Attrs:
	scalar=127.5
Total 0 MB allocated
Total 11 TempSpace resource requested

[-127.5 -127. ]

With russian locale:

$ LANG=ru_RU.UTF-8 python reproduce.py
('ru_RU', 'UTF-8')
Symbol Outputs:
	output[0]=_minusscalar0(0)
Variable:x
--------------------
Op:_minus_scalar, Name=_minusscalar0
Inputs:
	arg[0]=x(0) version=0
Attrs:
	scalar=127.5
Total 0 MB allocated
Total 11 TempSpace resource requested

[-127.  -126.5]

RomanSteinberg · 2019-09-24T11:12:10Z

Thank you, @ei-grad but I have another behavior. When I run your script on on deploy computer it reproduces the bug for both locales.

ei-grad · 2019-09-24T11:19:44Z

Oops, probably you have the LC_CTYPE env variable also set to the russian locale. This should be more reliable (but still needs the ru_RU.UTF-8 locale generated):

import mxnet as mx
import numpy as np
import locale

locale.setlocale(locale.LC_ALL, 'C')

x = mx.symbol.Variable('x', shape=(2,))
x = x - 127.5
x_val = mx.nd.array(np.array([0, .5], dtype=np.float32))
mod = x.bind(ctx=mx.cpu(), args={'x': x_val})
mod.forward()
print('C locale:', mod.outputs[0].asnumpy())

locale.setlocale(locale.LC_ALL, 'ru_RU.UTF-8')

x = mx.symbol.Variable('x', shape=(2,))
x = x - 127.5
x_val = mx.nd.array(np.array([0, .5], dtype=np.float32))
mod = x.bind(ctx=mx.cpu(), args={'x': x_val})
mod.forward()
print('ru_RU.UTF-8 locale:', mod.outputs[0].asnumpy())

RomanSteinberg · 2019-09-24T11:28:23Z

Exactly!

$ python reproduce.py 
C locale: [-127.5 -127. ]
ru_RU.UTF-8 locale: [-127.  -126.5]

So, @marcoabreu it looks like Qt influences locale and locale influences mxnet calculations.

marcoabreu · 2019-09-24T13:07:00Z

Interesting, thanks for providing the steps to reproduce. The locale certainly shouldn't influence numerical results.

@szha any idea?

marcoabreu · 2019-09-24T13:07:46Z

Also @pengzhao-intel any idea?

leezu · 2020-05-10T01:54:27Z

Thanks to @nickguletskii CentOS CI now verifies MXNet with a different locale to prevent such issues in the future.

All the tests marked with xfail_when_nonstandard_decimal_separator in
eab068b are still subject to the problem and the associated operators need to be exposed via the new FFI to fix the issue.

leezu · 2020-05-12T01:28:21Z

@nickguletskii @yzhliu @szha I suspectthe issue will still persist on 2.x with the new FFI when serializing to / loading from json symbol

marcoabreu added Bug Question labels Sep 10, 2019

RomanSteinberg mentioned this issue Sep 11, 2019

Gender-Age random predictions deepinsight/insightface#899

Closed

zachgk assigned szha Sep 23, 2019

marcoabreu added Operator and removed Question labels Sep 24, 2019

This was referenced Dec 23, 2019

mx.nd._internal._mul_scalar and other scalar operators give incorrect results when PySide2 is used #17140

Closed

[1.x] Fix incorrect calculation results when the C locale is set to a locale that uses commas as the decimal separator #17177

Merged

This was referenced Apr 18, 2020

Change LC_NUMERIC for CentOS CI jobs to verify locale invariance #18097

Merged

[v1.7.x] Backport #17177 to 1.7.x (Fix incorrect calculation results when the C locale is set to a locale that uses commas as the decimal separator) #18147

Merged

leezu added the v2.0 label May 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect subtraction #16134

Incorrect subtraction #16134

RomanSteinberg commented Sep 10, 2019

mxnet-label-bot commented Sep 10, 2019

leezu commented Sep 10, 2019

ChaiBapchya commented Sep 10, 2019

RomanSteinberg commented Sep 11, 2019 •

edited

Loading

leezu commented Sep 11, 2019

RomanSteinberg commented Sep 12, 2019 •

edited

Loading

RomanSteinberg commented Sep 12, 2019

marcoabreu commented Sep 12, 2019

RomanSteinberg commented Sep 13, 2019

ei-grad commented Sep 23, 2019 •

edited

Loading

RomanSteinberg commented Sep 24, 2019

ei-grad commented Sep 24, 2019

RomanSteinberg commented Sep 24, 2019 •

edited

Loading

marcoabreu commented Sep 24, 2019

marcoabreu commented Sep 24, 2019

leezu commented May 10, 2020

leezu commented May 12, 2020

Incorrect subtraction #16134

Incorrect subtraction #16134

Comments

RomanSteinberg commented Sep 10, 2019

Description

Environment info (Required)

Build info (Required if built from source)

Error Message:

Minimum reproducible example

Steps to reproduce

What have you tried to solve it?

mxnet-label-bot commented Sep 10, 2019

leezu commented Sep 10, 2019

ChaiBapchya commented Sep 10, 2019

RomanSteinberg commented Sep 11, 2019 • edited Loading

PyQt5 (pip show pyqt5)

leezu commented Sep 11, 2019

RomanSteinberg commented Sep 12, 2019 • edited Loading

RomanSteinberg commented Sep 12, 2019

marcoabreu commented Sep 12, 2019

RomanSteinberg commented Sep 13, 2019

ei-grad commented Sep 23, 2019 • edited Loading

RomanSteinberg commented Sep 24, 2019

ei-grad commented Sep 24, 2019

RomanSteinberg commented Sep 24, 2019 • edited Loading

marcoabreu commented Sep 24, 2019

marcoabreu commented Sep 24, 2019

leezu commented May 10, 2020

leezu commented May 12, 2020

RomanSteinberg commented Sep 11, 2019 •

edited

Loading

RomanSteinberg commented Sep 12, 2019 •

edited

Loading

ei-grad commented Sep 23, 2019 •

edited

Loading

RomanSteinberg commented Sep 24, 2019 •

edited

Loading