Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fortnightly build for wheels failed / PyBaMM import kills process on macOS M-series #4091

Closed
github-actions bot opened this issue May 15, 2024 · 13 comments · Fixed by #4092 or #4128
Closed

Fortnightly build for wheels failed / PyBaMM import kills process on macOS M-series #4091

github-actions bot opened this issue May 15, 2024 · 13 comments · Fixed by #4092 or #4128
Assignees
Labels
bug Something isn't working difficulty: medium Will take a few days priority: high To be resolved as soon as possible release blocker Issues that need to be addressed before the creation of a release

Comments

@github-actions
Copy link
Contributor

github-actions bot commented May 15, 2024

The build is failing with the following logs - https://github.com/pybamm-team/PyBaMM/actions/runs/9093914939

Note: after #4092, this is failing for the self-hosted runners as well after the wheels were fixed: https://github.com/pybamm-team/PyBaMM/actions/runs/9184757819/job/25257629262 most likely due to a SIGKILL or some sort of memory leak.

@github-actions github-actions bot added bug Something isn't working priority: high To be resolved as soon as possible labels May 15, 2024
@agriyakhetarpal
Copy link
Member

Something is up either on the cibuildwheel side or when we build SuiteSparse and SUNDIALS. Looks like MACOSX_DEPLOYMENT_TARGET wasn't set for the latter. I shall take a look at this.

@agriyakhetarpal agriyakhetarpal added the release blocker Issues that need to be addressed before the creation of a release label May 15, 2024
@agriyakhetarpal agriyakhetarpal self-assigned this May 15, 2024
@agriyakhetarpal agriyakhetarpal added the difficulty: medium Will take a few days label May 15, 2024
@agriyakhetarpal
Copy link
Member

Reopening this from #3959 (comment) because I seem to be hitting a memory leak of some kind with OpenMP 16.0.4 on my macOS machine (M-series) and my terminal (zsh) keeps getting killed with a -9 exit code. I don't know for now why this is the case, because we had confirmed that import pybamm and the IDAKLU solver was working before merging the PR for both macOS amd64 and macOS arm64 (it's even tested with cibuildwheel in the wheel builder job). I am not entirely sure, but I will keep looking.

Could someone else with a macOS device test this out (by running the unit tests with nox -s unit) so that we can identify whether this is specific to my machine or happens everywhere? For additional context, I updated my macOS version to Sonoma 14.5 a few hours ago, so maybe that was what broke something on my end, but even after re-installing Python and deleting older files, I still seem to hit this case.

@agriyakhetarpal agriyakhetarpal changed the title Fortnightly build for wheels failed Fortnightly build for wheels failed / PyBaMM import kills process on macOS M-series May 20, 2024
@agriyakhetarpal
Copy link
Member

It is to be noted that I was able to fix the original error in #4092, but that is just for the wheels because that is where the error was relevant – I did not make this change in the PR tests because we do not distribute wheels from there.

@agriyakhetarpal
Copy link
Member

The custom libgfortran.dylib that we are bundling also has libomp.dylib and omp.h present for further use. Maybe we can use that to fix everything? I am not sure if we can avoid the trouble of not doing it for the regular builds and I feel that this makes the installation a bit difficult as well, so it is a step backward from the previous changes that we've been planning through #3564.

@agriyakhetarpal
Copy link
Member

The above comment is assuming that the build is broken for everyone and not just me, that is...

@kratman, I am tagging you because you have an M-series machine as well, could you please try building the recent tip of PyBaMM (i.e., anything after #4092) as well and check if you are able to run the unit tests with and/or import PyBaMM with IDAKLU enabled?

@kratman
Copy link
Contributor

kratman commented May 20, 2024

@agriyakhetarpal I should have time in a day or two to take a look at this. If I forget then just ping me again

@agriyakhetarpal
Copy link
Member

Note to self: try using scikit-image's LLVM-OpenMP solution here

scikit-image/scikit-image@765e568/.github/workflows/wheels_recipe.yml#L134

@agriyakhetarpal
Copy link
Member

agriyakhetarpal commented May 22, 2024

The above comment is assuming that the build is broken for everyone and not just me, that is...

I think it's not just me, I checked the logs from today and the self-hosted runner is also failing on all Python versions: https://github.com/pybamm-team/PyBaMM/actions/runs/9184757819/job/25257629262.

@BradyPlanden, could you please tell me the Mac OS X version that's installed in the runner, and whether it was upgraded recently in any case? The issue is that the macOS 14 GitHub runners are not reporting the failure and it works there, but Mac OS X had a recent problem with Java-based apps with 14.4 (which was fixed in 14.5). I am trying to investigate if the macos-14 label is running an older OS version and then why libomp.dylib breaks on the newer version.

P.S. It would be great if we could use cibuildwheel with the self-hosted runner as an additional testing resource for PyBaMM's wheels. It won't be easy to integrate, though, because cibuildwheel installs and uninstalls its own Python distributions from the official installer every time...

@BradyPlanden
Copy link
Member

The self-hosted runner is on 14.3.1, and has not been updated recently. I have a M-series laptop running 14.4.1 that I can test on if you are looking for another datapoint.

P.S. It would be great if we could use cibuildwheel with the self-hosted runner as an additional testing resource for PyBaMM's wheels. It won't be easy to integrate, though, because cibuildwheel installs and uninstalls its own Python distributions from the official installer every time...

Interesting, any thoughts on if we would need to sandbox this so it doesn't affect the PyBOP scheduled tests? At the moment, I'm using pyenv for python installations for both PyBaMM and PyBOP tests.

@agriyakhetarpal
Copy link
Member

The self-hosted runner is on 14.3.1, and has not been updated recently. I have a M-series laptop running 14.4.1 that I can test on if you are looking for another datapoint.

Thanks, Brady! Yes, if you could try to run the unit tests for PyBaMM with the latest changes from develop with the IDAKLU solver set up on your laptop, that would be helpful – it should be just nox -s pybamm-requires and nox -s unit – if the error code received is -9, that should confirm it, and I can then say that I broke everything (😅). 14.3.1 is stable enough, but it would be good to confirm on 14.4.1 as well.

Interesting, any thoughts on if we would need to sandbox this so it doesn't affect the PyBOP scheduled tests? At the moment, I'm using pyenv for python installations for both PyBaMM and PyBOP tests.

I imagine it should be more doable after pypa/cibuildwheel#1718 – we would be able to use any distribution of Python (or we can patch cibuildwheel on our own to do this – we have to remove the restrictions they have put up). For example, Homebrew-Python is blocked from using cibuildwheel (https://cibuildwheel.pypa.io/en/stable/setup/#macos-windows-builds). https://github.com/indygreg/python-build-standalone is another option we can set up using rye and uv. However, if pyenv is installing the official CPython distribution with its environments, we should be good to try it (it would resolve the sandboxing part).

@BradyPlanden
Copy link
Member

Hmm, nox -s unit runs as expected on my laptop with the idaklu installed (nox -s pybamm-requires, and pybamm.have_idaklu()), on the current state of develop. This was tested within a new pyenv virtualenv as well to double-check.

@agriyakhetarpal
Copy link
Member

@kratman, a gentle reminder after a couple of days like you had requested – I hope this is a good time to ping you again about this

@agriyakhetarpal
Copy link
Member

Let me try this again this week – I think we can use the same solution as the one implemented in #4092 and then reverted in #4109 – but this time, we can use a different libomp.dylib, and use it only when we are building the wheels (and not for local development). I'm assigned to this issue as a reminder, I'll take note of this before we close #4112.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working difficulty: medium Will take a few days priority: high To be resolved as soon as possible release blocker Issues that need to be addressed before the creation of a release
Projects
None yet
3 participants