Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround: Fix library zlib test failures on Ubuntu jammy s390x. #1084

Closed
wants to merge 2 commits into from

Conversation

junaruga
Copy link
Member

This PR fixes #1083, and ruby/ruby's tests on Ubuntu jammy s390x server on RubyCI. The log is here.

The ruby/spec fails on the s390x jammy s390x where the zlib deb package is configured by ./configure --dfltcc.

It produces a different (but still valid) compressed byte stream, and causes the test failures in ruby/zlib. As a workaround, we need to set the environment variable DFLTCC=0 disabling the implementation in zlib on s390x to the failing tests.

Note we need to test in a child Ruby process with ruby_exe to test on the DFLTCC=0 set by the parent Ruby process.

See ruby/zlib@9f3b9c470c for details.

This PR has 2 commits. The 1st commit is to add Travis CI s390x. As far as I know, Travis CI is only the choice to test on naive s390x on pull-request.

Travis CI log before the commit.
https://app.travis-ci.com/github/junaruga/ruby-spec/jobs/610640347

3715 files, 32206 examples, 181145 expectations, 12 failures, 0 errors, 0 tagged

Travis CI log after the commit.
https://app.travis-ci.com/github/junaruga/ruby-spec/builds/266237275

3715 files, 32206 examples, 158515 expectations, 0 failures, 0 errors, 0 tagged

It seems that right now Travis CI is enabled for only ruby/ruby. I am asking @hsbt to enable Travis CI for ruby/*, all the repositories under Ruby projects.

The only choice to test s390x on pull-request is currently Travis CI as far as
I know.
@eregon
Copy link
Member

eregon commented Sep 28, 2023

Thank you for the great issue and this PR.
Does this issue only happen on s390x? It seems so.

I however have multiple concerns about this PR:

  • I do not want to run TravisCI, it's too slow, I want a single CI system and IMO TravisCI betrayed the OSS community
  • When googling s390x it seems discontinued
  • This change means it's a lot slower to run those specs and it's harder to debug them with the extra subprocesses (+ not syntax highlight of the code), even more so on alternative Ruby implementations (where startup tends to be slower).

So I think a practical solution here is wrapping those specs in platform_is_not :s390x or so (that matches by substring of RUBY_PLATFORM IIRC), or documenting that on s390x ruby/spec must be run with DFLTCC=0 like DFLTCC=0 mspec .... That seems a much less invasive approach with no downside for "fully-compatible standard zlib" implementations.

@junaruga
Copy link
Member Author

junaruga commented Sep 28, 2023

Thank you for the great issue and this PR. Does this issue only happen on s390x? It seems so.

Thank you for raising the concerns. I can understand it.

So I think a practical solution here is wrapping those specs in platform_is_not :s390x or so (that matches by substring of RUBY_PLATFORM IIRC), or documenting that on s390x ruby/spec must be run with DFLTCC=0 like DFLTCC=0 mspec .... That seems a much less invasive approach with no downside for "fully-compatible standard zlib" implementations.

I will change the PR removing the .travis.yml, following on your way. I can understand that we want to make maintaining the repository simple.

I however have multiple concerns about this PR:

* I do not want to run TravisCI, it's too slow, I want a single CI system and IMO TravisCI betrayed the OSS community

In my testing, Travis CI with one build was faster than GitHub Actions with 9 builds. And when comparing the one build between Travis and GitHub Actions, Travis CI's job is faster.

Travis: 3 minutes 24 seconds.
https://app.travis-ci.com/github/junaruga/ruby-spec/builds/266237275

GitHub Actions: macos 3.0.6: 4 minutes 41 seconds.
https://github.com/ruby/spec/actions/runs/6330557388/job/17193217499?pr=1084

Could you tell me more about "TravisCI betrayed the OSS community" in your mind?

* When googling s390x it seems [discontinued](https://en.wikipedia.org/wiki/IBM_System/390)

Seeing the IBM_System/390 Wikipedia page, I don't think it's the case of the s390x. I can see the "Discontinued | December 31, 2004[1]" in the page, citing this reference. I think the CPU architecture s390 and s390x are different. In my understand the s390x CPU is currently used in called IBM Z system and Linux One.

For example, if you see the ruby deb package on Ubuntu latest version mantic, you see the s390x is one of their 7 supported CPU architectures. Ubuntu also has the s390x page.

In Fedora project, the s390x is one of the supported CPU architectures. I need to care about the s390x when building Ruby RPM package on Fedora. Here is the latest Ruby RPM build on Fedora latest version. You see the s390x is one of the 5 CPU architectures.

* This change means it's a lot slower to run those specs and it's harder to debug them with the extra subprocesses (+ not syntax highlight of the code), even more so on alternative Ruby implementations (where startup tends to be slower).

I definitely agree with this. This PR makes us much harder to debug, and to maintain.

@junaruga
Copy link
Member Author

So I think a practical solution here is wrapping those specs in platform_is_not :s390x or so (that matches by substring of RUBY_PLATFORM IIRC), or documenting that on s390x ruby/spec must be run with DFLTCC=0 like DFLTCC=0 mspec .... That seems a much less invasive approach with no downside for "fully-compatible standard zlib" implementations.

I will change the PR removing the .travis.yml, following on your way. I can understand that we want to make maintaining the repository simple.

The failures didn't happen in Ubuntu focal (20.04), and also it seems that these didn't happen in Fedora rawhide (Fedora 40) even when the upstream zlib patch implementing this feature is applied and configured with --dfltcc. So, I think that the printing warning in the tests when the DFLTCC=0 is another possible fix.

https://src.fedoraproject.org/rpms/zlib/blob/rawhide/f/zlib.spec#_24

# IBM Z hardware-accelerated deflate
# ref: https://github.com/madler/zlib/pull/410
Patch19: zlib-1.2.13-IBM-Z-hw-accelerated-deflate.patch

@junaruga
Copy link
Member Author

I would close this PR, as I opened another PR #1088.

@eregon
Copy link
Member

eregon commented Sep 29, 2023

In my testing, Travis CI with one build was faster than GitHub Actions with 9 builds. And when comparing the one build between Travis and GitHub Actions, Travis CI's job is faster.

Travis: 3 minutes 24 seconds. https://app.travis-ci.com/github/junaruga/ruby-spec/builds/266237275

GitHub Actions: macos 3.0.6: 4 minutes 41 seconds. https://github.com/ruby/spec/actions/runs/6330557388/job/17193217499?pr=1084

That's not comparing the same, GitHub Actions also does Run C-API specs as C++.
If we compare times for the step running the specs it's 2min (GHA) vs 3min (TravisCI).

Could you tell me more about "TravisCI betrayed the OSS community" in your mind?

Well, they were acquired by a company which doesn't care, they stopped providing CI for free for the most common architectures for OSS, TravisCI Windows support is crap, etc, etc.

Seeing the IBM_System/390 Wikipedia page, I don't think it's the case of the s390x. I can see the "Discontinued | December 31, 2004[1]" in the page, citing this reference. I think the CPU architecture s390 and s390x are different. In my understand the s390x CPU is currently used in called IBM Z system and Linux One.

Right, that's quite confusing.

@junaruga
Copy link
Member Author

In my testing, Travis CI with one build was faster than GitHub Actions with 9 builds. And when comparing the one build between Travis and GitHub Actions, Travis CI's job is faster.
Travis: 3 minutes 24 seconds. https://app.travis-ci.com/github/junaruga/ruby-spec/builds/266237275
GitHub Actions: macos 3.0.6: 4 minutes 41 seconds. https://github.com/ruby/spec/actions/runs/6330557388/job/17193217499?pr=1084

That's not comparing the same, GitHub Actions also does Run C-API specs as C++. If we compare times for the step running the specs it's 2min (GHA) vs 3min (TravisCI).

Ah, right.
How about comparing with GitHub Actions, windows cases?
https://github.com/ruby/spec/actions/runs/6330557388/job/17193218336?pr=1084
Run Specs (Windows) is 6:49 min vs 3 min (Travis CI).

Could you tell me more about "TravisCI betrayed the OSS community" in your mind?

Well, they were acquired by a company which doesn't care, they stopped providing CI for free for the most common architectures for OSS, TravisCI Windows support is crap, etc, etc.

I can understand it. Travis's x86_64 (amd64) CPU, the common architecture's pipeline is not free any more (reference, while other small sized CI services such as Circle CI, Drone CI are still free for the CPU architecture.

Oh, I just found a GitHub Actions helper to run non-x86_64 architectures on QEMU on GitHub here. It doesn't mean I want to apply it to this repository. I am just sharing it.

@eregon eregon mentioned this pull request Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Ubuntu jammy s390x: Test failures
2 participants