Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add patch to fix regression in GCC 12.x on AVX512 systems #19180

Merged

Conversation

Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Nov 9, 2023

(created using eb --new-pr)
Fix a typo causing inverted semantics when (even AVX2) code is compiled for AVX512 systems, e.g. via -march=native
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112443 where the changeset is from

It manifests in e.g. failing PyTorch tests where sgn/sign tests involving uint8_t are failing such as

  • test_reference_numerics_normal_sgn_cpu_uint8
  • test_reference_numerics_normal_sign_cpu_uint8
  • test_reference_numerics_small_sgn_cpu_uint8
  • test_reference_numerics_small_sign_cpu_uint8

This is summarized in the log as

== 2023-10-26 07:54:15,898 pytorch.py:303 WARNING Found 20 individual tests with failed assertions: test_consistency_SparseBSC_sgn_cpu_uint8, test_consistency_SparseBSC_sign_cpu_uint8, test_consistency_SparseBSR_sgn_cpu_uint8, test_consistency_SparseBSR_sign_cpu_uint8, test_consistency_SparseCSC_sgn_cpu_uint8, test_consistency_SparseCSC_sign_cpu_uint8, test_consistency_SparseCSR_sgn_cpu_uint8, test_consistency_SparseCSR_sign_cpu_uint8, test_contig_vs_every_other_sgn_cpu_uint8, test_contig_vs_every_other_sign_cpu_uint8, test_non_contig_sgn_cpu_uint8, test_non_contig_sign_cpu_uint8, test_qthreshold, test_reference_numerics_normal_sgn_cpu_uint8, test_reference_numerics_normal_sign_cpu_uint8, test_reference_numerics_small_sgn_cpu_uint8, test_reference_numerics_small_sign_cpu_uint8, test_sigmoid_non_observed, test_sparse_consistency_sgn_cpu_uint8, test_sparse_consistency_sign_cpu_uint8
== 2023-10-26 07:54:18,542 pytorch.py:445 WARNING 20 test failures, 0 test errors (out of 129477):
test_quantization (997 total tests, failures=2, skipped=74)
test_sparse (2618 total tests, failures=2, skipped=329)
test_unary_ufuncs (12664 total tests, failures=8, skipped=644, expected failures=14)
test_sparse_csr (4357 total tests, failures=8, skipped=716)

@branfosj branfosj added this to the next release (4.9.0?) milestone Nov 9, 2023
@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 5 out of 5 (5 easyconfigs in total)
n1519 - Linux RHEL 8.7 (Ootpa), x86_64, Intel(R) Xeon(R) Platinum 8470 (icelake), Python 3.8.13
See https://gist.github.com/Flamefire/9ab2cb0ff01ade178935360187634b5e for a full test report.

@branfosj
Copy link
Member

branfosj commented Nov 9, 2023

Test report by @branfosj
SUCCESS
Build succeeded for 5 out of 5 (5 easyconfigs in total)
bear-pg0105u03b - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz (icelake), Python 3.6.8
See https://gist.github.com/branfosj/43eaec9c218951bc501e78fcd6586102 for a full test report.

@branfosj
Copy link
Member

branfosj commented Nov 9, 2023

@boegelbot please test @ jsc-zen2
CORE_CNT=16

@boegelbot
Copy link
Collaborator

@branfosj: Request for testing this PR well received on jsczen2l1.int.jsc-zen2.easybuild-test.cluster

PR test command 'EB_PR=19180 EB_ARGS= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --mem-per-cpu=4000M --job-name test_PR_19180 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen2.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 3698

Test results coming soon (I hope)...

- notification for comment with ID 1803979487 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@branfosj
Copy link
Member

branfosj commented Nov 9, 2023

@boegelbot please test @ generoso
CORE_CNT=16

@boegelbot
Copy link
Collaborator

@branfosj: Request for testing this PR well received on login1

PR test command 'EB_PR=19180 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_19180 --ntasks="16" ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12126

Test results coming soon (I hope)...

- notification for comment with ID 1803986090 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegel
Copy link
Member

boegel commented Nov 9, 2023

@Flamefire For good measure, can you add some more info to the description of this PR in what context that this fixes a problem (I think it's mostly about failing PyTorch tests)? If possible, please add the relevant error messages too, so people searching for answers for the problem they're seeing will easily hit this PR...

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 5 out of 5 (5 easyconfigs in total)
jsczen2c1.int.jsc-zen2.easybuild-test.cluster - Linux Rocky Linux 8.5, x86_64, AMD EPYC 7742 64-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/boegelbot/7b9bf0bfe32d7cd3e38c949782e03fea for a full test report.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 5 out of 5 (5 easyconfigs in total)
cnx1 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/732584ee213266f5080b0df2ec66ae1b for a full test report.

@boegel
Copy link
Member

boegel commented Nov 10, 2023

Test report by @boegel
SUCCESS
Build succeeded for 5 out of 5 (5 easyconfigs in total)
node3120.skitty.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (skylake_avx512), Python 3.6.8
See https://gist.github.com/boegel/b3fc9a0c82b1dba27a13f9f0fa463e62 for a full test report.

@branfosj
Copy link
Member

Going in, thanks @Flamefire!

@branfosj branfosj merged commit d3c06ee into easybuilders:develop Nov 10, 2023
9 checks passed
@Flamefire Flamefire deleted the 20231109105147_new_pr_GCCcore1210 branch November 10, 2023 11:45
@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 5 out of 5 (5 easyconfigs in total)
taurusi8014 - Linux CentOS Linux 7.9.2009, x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 470.57.02, Python 2.7.5
See https://gist.github.com/Flamefire/5883925afbe8b896aa31eaf880c00204 for a full test report.

schiotz added a commit to schiotz/easybuild-easyconfigs that referenced this pull request Nov 11, 2023
@boegel boegel changed the title fix regression in GCC 12+ on AVX512 systems add patch to fix regression in GCC 12.x on AVX512 systems Nov 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants