Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Change LC_NUMERIC for CentOS CI jobs to verify locale invariance #18097

Merged
merged 11 commits into from
May 10, 2020

Conversation

nickguletskii
Copy link
Contributor

@nickguletskii nickguletskii commented Apr 18, 2020

Description

This pull request changes all tests to load the user's locale and changes the CentOS CI jobs to run with LC_NUMERIC set to en_DK.UTF8. en_DK.UTF8 is a locale that uses , as the decimal separator. This should weed out any bugs similar to #16134, #17140 and #18079. A partial fix for v1.x is available in #17177. Migrating operators and other API calls to the new FFI in MXNet 2.0 should resolve most locale-related issues related to API calls.

This pull request makes many tests fail because they only pass on systems with LC_NUMERIC set to a locale that uses . as the decimal separator.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Tests updated to call locale.setlocale on initialization.
  • CentOS jobs updated to run with LC_NUMERIC set to en_DK.UTF8.
  • xfail/cancel tests which fail due to the incorrect handling of decimal separators.

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@mxnet-bot
Copy link

Hey @nickguletskii , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [unix-gpu, centos-gpu, clang, unix-cpu, sanity, windows-gpu, miscellaneous, centos-cpu, website, edge, windows-cpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@nickguletskii
Copy link
Contributor Author

@mxnet-bot run ci [centos-cpu, centos-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [centos-cpu, centos-gpu]

@marcoabreu
Copy link
Contributor

Is there some other method than inserting it at the top of every test file? That's a lot of copy&paste.

@leezu
Copy link
Contributor

leezu commented Apr 20, 2020

@nickguletskii it should be easier to address the code duplication problem once #18025 is merged. I suggest you wait until it's merged.

@leezu
Copy link
Contributor

leezu commented Apr 24, 2020

@nickguletskii the pytest PR is merged now. I also consolidated the CentOS dockerfiles into ci/docker/Dockerfile.build.centos7

@nickguletskii nickguletskii force-pushed the feature/ci-centos-locale-change branch from c2c8e9d to 2bbbc66 Compare April 24, 2020 23:55
@nickguletskii
Copy link
Contributor Author

@leezu Thank you, I've rewritten this PR to utilize pytest's conftests.py and updated the CentOS dockerfile accordingly.

@leezu
Copy link
Contributor

leezu commented Apr 25, 2020

42 failed tests on centos-cpu. Seems there are still quite a few locale-unsafe parts

@leezu
Copy link
Contributor

leezu commented Apr 28, 2020

@nickguletskii do you know how to reproduce the CI runs locally? For the one you're interested in, just run

git clean -ffxd; python ci/build.py --platform centos7_cpu /work/runtime_functions.sh build_centos7_cpu
python ci/build.py --platform centos7_cpu /work/runtime_functions.sh unittest_centos7_cpu

@nickguletskii
Copy link
Contributor Author

@leezu It's actually even easier than that. It's sufficient to set the LC_NUMERIC environment variable before running pytest.

However, I don't think that fixing these issues is in scope of this PR. As I've already mentioned in the PR that introduces a hack to fix at least some of these issues (#17177), a permanent solution is to port all old operators to the new FFI.

@leezu
Copy link
Contributor

leezu commented May 2, 2020

@nickguletskii yes, the scope of this PR doesn't need to be "fix all the issues" ;) But for the PR to be merged, you need to find a way for it to pass the CI. One idea is to use the pytest xfail features, marking the tests invoking broken behavior. You'd want to set the condition on when the test is xfail based on the operating system (centos) or the environment variable in question. Reference https://docs.pytest.org/en/latest/reference.html#pytest-mark-xfail-ref

Then the PR will pass the CI and can be merged. At the same time @yzhliu and other contributors will have more data on prioritizing which operators to port to the new FFI.

@nickguletskii nickguletskii force-pushed the feature/ci-centos-locale-change branch from 2bbbc66 to fba512c Compare May 5, 2020 18:08
@nickguletskii
Copy link
Contributor Author

Hmm, it seems that something has changed in how the CentOS image is built since the last CI run. The required locale doesn't seem to be available anymore. I will try experimenting with the Docker image locally once I find some free time.

@nickguletskii nickguletskii force-pushed the feature/ci-centos-locale-change branch from 1e93c88 to 22255cb Compare May 8, 2020 18:43
@nickguletskii
Copy link
Contributor Author

@mxnet-bot run ci [windows-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [windows-gpu]

@nickguletskii
Copy link
Contributor Author

@leezu Should be ready for review now. Sorry for the delay and thanks for the advice!

Copy link
Contributor

@leezu leezu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @nickguletskii!

@leezu leezu merged commit eab068b into apache:master May 10, 2020
AntiZpvoh pushed a commit to AntiZpvoh/incubator-mxnet that referenced this pull request Jul 6, 2020
…che#18097)

* Load the user's locale before performing tests

* Change the locale for the CentOS CI jobs to weed out locale-related bugs

* Mark tests that fail due to the decimal point with xfail

* Run localedef when generating the CentOS CI image

* Cancel some Scala tests when C locale uses a non-standard decimal sep.

* Rename xfail helper to xfail_when_nonstandard_decimal_separator

* Fix scalastyle errors

* Disable more Python tests that fail due to locale-related issues

* Move assumeStandardDecimalSeparator into separate object to fix scaladoc

* Disable the "symbol pow" test when running with non-standard decimal sep

* Disable new tests that fail due to locale-related issues
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants