Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update wcEcoli from Python 3.8 to 3.11.3 and up-to-date pip libraries #1367

Merged
merged 22 commits into from
May 18, 2023

Conversation

1fish2
Copy link
Contributor

@1fish2 1fish2 commented Apr 17, 2023

Fixes JuliaLang/julia#1311
Fixes JuliaLang/julia#1365

  • Python improvements include faster startup, faster runtime, better error messages, Structural Pattern Matching, dict union operators, and type hints like list[str] and str | int. (This PR doesn't use the new features.)
    Python 3.11 is between 10-60% faster than Python 3.10. On average, we measured a 1.25x speedup on the standard benchmark suite. -- That's on top of the 3.9 and 3.10 speedups.
    BUT a 1-CPU Sherlock node ran Parca + 2 sim gens only 4% faster. Maybe most of the model's computation is already within numpy, scipy, numba, cython, and aesara.
  • The updated libraries fix the security vulnerabilities noted by GitHub's Dependabot.

SHERLOCK ENVIRONMENT MODULES:

  • I added openssl/3.0.7 to the module /home/groups/mcovert/modules/wcEcoli/python3 (for CPython 3.10+) and commented out glpk/4.55-sherlock2 and openblas/0.2.19, leftover from Python 2.
    • NOTE: Anyone still running Python 2 will need to load those 2 modules.
  • There's a case for updating to newer native libs: libpng, freetype, sqlite, libuuid, ncurses, gcc.

TODO

  • Update to numba==0.57.0 + llvmlite==0.40.0 when they're released, and re-test. That team has release candidates and they're fixing bugs, esp. on Windows.
  • Test on Sherlock.
  • Test on Intel Mac.
  • Test on M1 Mac.
  • Test on M2 Mac.
  • Test the macOS 13.3+ version of Apple's Accelerate library vs. OpenBLAS. Supposedly it's now usable and many times faster, at least on M1 and M2.
  • Test on a Linux desktop.
  • Ideally, test on Windows.
  • Ideally, test on Google Cloud Compute Engine.
  • Remove PYTHONWARNINGS=default from ecoli-pull-request.sh and run-fireworks.sh if it's too spammy.
  • Rebuild the wcEcoli pyenv before merging into master. (This PR uses wcEcoli-staging.)

1fish2 and others added 13 commits April 9, 2023 21:42
* Use Python 3.11.3 and the latest pip libraries.
* For now, this includes numba rc1 and llvmlite rc1, as needed for Python 3.11 compatibility. llvmlite doesn't yet load on Windows.
* `np.bool`, `np.int`, and `np.float` are gone. Use `bool` et al.
* `open()` mode `"U"` is gone. It was for Python 2.
* `bokeh.models.Panel` is now `bokeh.models.TabPanel`.
* For PR builds, use pyenv `wcEcoli3-staging`.
* Remove the old numpy stubs and install type stubs via `mypy --install-types`.
* mypy no longer allows implicit `Optional[]`. That change caught some bugs.
* Don't shadow built-in name `id`.
* Fix mixed indentation in `rRNA_operon_expression.py`. (The project style uses tabs but this file uses spaces except one line, so change that line.)
* Plotly `append_trace()` is deprecated in favor of `add_trace()`, which is the same except the (unused) return value.

**TODO:** "... Bio.Alphabet was
therefore removed from Biopython in release 1.78. Instead, the molecule type is
included as an annotation on SeqRecords where appropriate." See https://biopython.org/wiki/Alphabet for more information.
  For now, revert to `biopython==1.77`. "This release of Biopython supports Python 3.6, 3.7 and 3.8" ... so it might not work on 3.11.
Running all `ACTIVE` analysis classes except the `analysisComparison` category mostly produced plausible plots.

* `Ignoring an unknown analysis class: No module named 'models.ecoli.analysis.multigen.new_gene_counts.pypolycistronic_transcription'; 'models.ecoli.analysis.multigen.new_gene_counts' is not a package` -- For want of a `,`.
  * (Searching the other `__init__.py` files didn't find another instance of this goof but it's good practice to have a `,` after the last entry in one-entry/line lists so adding another entry makes fewer merge conflicts.)
* `KeyError: 'operons'` in `start_codon_distribution.py`, `metadata['operons']` -- Does a fireworks run set metadata's 'operons' key? `runscripts/manual/runParca.py` does not. Anyway, `sim_data.operons_on` is simple and reliable.
* `UserWarning: color is redundantly defined by the 'color' keyword argument and the fmt string "ob" (-> color='b'). The keyword argument will take precedence.` in `kineticsFluxComparison`.
* `UserWarning: linestyle is redundantly defined by the 'linestyle' keyword argument and the fmt string " " (-> linestyle=' '). The keyword argument will take precedence.` in `growth_condition_comparison_validation.py`.
* Remove `print()` debugging code to remove output clutter. (I recommend the PyCharm debugger. FYI, format strings support a handy debug feature, e.g. `f"Analysis with {exclude_timeout_cells=}, {exclude_early_gens=}"`.)

There are some more warnings like `RuntimeWarning: invalid value encountered in divide`, `RuntimeWarning: divide by zero encountered in divide`, and `No artists with labels found to put in legend.`
Setting `PYTHONWARNINGS=default` should "Show all warnings (even those ignored by default)", in particular more deprecation warnings, although maybe just the first occurrence for each source location. [What's the actual default?]

Let's see how useful vs. spammy this is.

https://docs.python.org/3/library/warnings.html#the-warnings-filter
* Add pymongo's ocsp dependents to fix Sherlock's access to MongoDB Atlas servers.
  (A better error message or a doc could've saved a lot of debugging time.)
* Matplotlib's seaborn- styles are deprecated since 3.6 but live on as `seaborn-v0_8-` styles.
* Speed up `make clean` via `find ... -exec ... {} +`, which is like xargs.
Fireworks hasn't provided release notes in a long time, and it turns out they made at least one incompatible API change: The `LaunchPad()` constructor no longer has `ssl*` arguments. All the SSL and (later) TLS args now go into a `mongoclient_kwargs` arg to pass through to the `pymongo` class `MongoClient`.

After digging through their repo to find this change and the Issues on it, it was a reaction to the `pymongo` 4.0 API removing the SSL args in favor of TLS args.

Fix here: Call `LaunchPad.from_file()` or `LaunchPad.from_dict()` which extract only the relevant keys from the YAML-provided config.

**Recommendation:** Use `uri_mode=True` in your `my_launchpad.yaml` files (and any other way to specify the MongoDB connection). Otherwise, put TLS args into a `mongoclient_kwargs` nested dict.

See https://materialsproject.github.io/fireworks/security_tutorial.html#add-tls-ssl-configuration-to-your-launchpad-file
* In a couple cases like `or ''` fixing the type errors fixed runtime problems, but they're probably in unexercised cases.
* Fixing `units` required using PEP 647 `TypeGuard`, which is new in Python 3.10.
* mypy warnings `By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]` required adding type signatures to some functions that already contained a little typing for PyCharm's type inspector. Turning on `--check-untyped-defs` would be a larger project.
@1fish2
Copy link
Contributor Author

1fish2 commented Apr 24, 2023

To @rjuenemann and anyone else interested, while waiting for the Numba release it'd be great to test its release candidate (https://pypi.org/project/numba/#history) and the rest of this Python 3.11 PR on Apple Silicon and other platforms.

Furthermore, in macOS 13.3, Apple's Accelerate library can now substitute for OpenBLAS and should be considerably faster than OpenBLAS on Apple Silicon. On my Intel Mac, using Accelerate ran Parca + 2 sim gens 6% faster, which is probably just measurement variability.

To test with Accelerate: Build one pyenv per requirements.txt in this branch (python-3.11.3) and a second pyenv with the following install-Numpy command to compile Numpy to use Accelerate:

  1. Save this text in ~/.numpy-site.cfg (commenting out any existing settings):
    [accelerate]
    libraries = Accelerate, vecLib
    
  2. pip install --no-cache-dir numpy==1.24.2 --no-binary numpy

Optional steps to examine/test a pyenv

python runscripts/debug/summarize_environment.py  # did numpy link to Accelerate?
runscripts/debug/time_libraries.sh
wholecell/tests/utils/test_openblas_threads.py
pytest

Compare their Parca + sim elapsed CPU run times

pyenv local wcEcoli-3.11  # or whatever you named it
make clean compile  # clear caches and .pyc files
python runscripts/manual/runParca.py -c6 python3.11  # or pick a different number of CPU cores than -c6
python runscripts/manual/runSim.py -g2 python3.11

pyenv local wcEcoli-3.11-accelerate  # or whatever you named it
make clean compile
python runscripts/manual/runParca.py -c6 python3.11-accel
python runscripts/manual/runSim.py -g2 python3.11-accel

Optionally compare outputs

  1. Parca output
runscripts/debug/comparePickles.py out/python3.11/kb out/python3.11-accel/kb
  1. Sim output, but I don't expect them to match.
runscripts/debug/diff_simouts.py out/python3.11-2/wildtype_000000/000000/generation_000001/000000/simOut/ out/python3.11-accel/wildtype_000000/000000/generation_000001/000000/simOut/
  1. Run some analysis plots and compare them visually.

Thanks, and go ahead and log results in this PR.

@rjuenemann
Copy link
Contributor

Thank you for the update @1fish2 ! I'll test the 3.11 PR and Accelerate on my M2 and report back.

@rjuenemann
Copy link
Contributor

rjuenemann commented Apr 28, 2023

Hi @1fish2 -

I tested this out on my M2 last night. The setup went smoothly and I was able to install the requirements at their specified versions. I did encounter some warnings in the pytest:

wcEcoli3.11

=== warnings summary ===
wholecell/io/tablereader.py:2
  /Users/rjuene/dev/wcEcoli/wholecell/io/tablereader.py:2: DeprecationWarning: 'chunk' is deprecated and slated for removal in Python 3.13
    from chunk import Chunk

wholecell/tests/utils/test_openblas_threads.py::Test_openblas_threads::test_openblas
  /Users/rjuene/dev/wcEcoli/wholecell/tests/utils/test_openblas_threads.py:66: UserWarning: Didn't reproduce the expected dot product using 1 OpenBLAS thread, 0.01668380558411259 != 0.016683805584112754, so simulation results aren't portable.
    warnings.warn(f"Didn't reproduce the expected dot product using 1"

wcEcoli3.11-accel

=== warnings summary===
wholecell/io/tablereader.py:2
  /Users/rjuene/dev/wcEcoli/wholecell/io/tablereader.py:2: DeprecationWarning: 'chunk' is deprecated and slated for removal in Python 3.13
    from chunk import Chunk

wholecell/tests/utils/test_openblas_threads.py::Test_openblas_threads::test_openblas
  /Users/rjuene/dev/wcEcoli/wholecell/tests/utils/test_openblas_threads.py:66: UserWarning: Didn't reproduce the expected dot product using 1 OpenBLAS thread, 0.0166838055841127 != 0.016683805584112754, so simulation results aren't portable.
    warnings.warn(f"Didn't reproduce the expected dot product using 1"

Using Accelerate unfortunately did not lead to speed up for me. It does look like numpy linked to accelerate.

python runscripts/manual/runParca.py -c6 python3.11 -  Elapsed time 265.60 sec (0:04:25.595855); CPU 129.43 sec
python runscripts/manual/runSim.py -g2 python3.11 - Elapsed time 1283.02 sec (0:21:23.024364); CPU 1282.77 sec

python runscripts/manual/runParca.py -c6 python3.11-accel - Elapsed time 261.72 sec (0:0
[parcaCompare.txt](https://github.com/CovertLab/wcEcoli/files/11356001/parcaCompare.txt)
4:21.720864); CPU 124.16 sec
python runscripts/manual/runSim.py -g2 python3.11-accel Elapsed time 1250.76 sec (0:20:50.755607); CPU 1251.34 sec

The Parca outputs seemed to have quite a few differences (7230 lines). I’ve attached the comparison. parcaCompare.txt

For runtime comparison on my laptop with the master branch and older Python version:

python runscripts/manual/runParca.py -c6 python3old - Elapsed time 278.50 sec (0:04:38.497969); CPU 138.29 sec
python runscripts/manual/runSim.py -g2 python3old - Elapsed time 1224.85 sec (0:20:24.846500); CPU 1224.67 sec

Let me know what you think. I can test on my Windows machine next.

@1fish2
Copy link
Contributor Author

1fish2 commented Apr 29, 2023

@rjuenemann That's very helpful!

  • Go ahead and try Windows. See this issue if it fails to load llvmlite.dll.
  • I'll copy the chunk library to handle the (sad but true) deprecation.
  • ... and add a timer to test_openblas_threads.py for a quick measure of Numpy + BLAS speed without running the WCM.
  • Maybe we need to install numpy with --no-use-pep517 as described in these emails to get Accelerate speedup, but other discussions said --no-use-pep517 won't work, and it's not clear what that switch does.
  • Floating point repeatability might be intractable. Any change to evaluation order can change the rounding of results. The test_openblas_threads.py just computes one dot product, asking OpenBLAS to use different numbers of threads, and that suffices to get different results. I wonder if we can control the number of Accelerate threads (BNNSFilterParameters?) from Numpy.

@rjuenemann
Copy link
Contributor

Hi @1fish2,

I’m happy to try to install numpy with --no-use-pep517 and accelerate if we want another data point. Adding the timer to test_openblas_threads.py sounds useful!

I’ve now tried out the Python 3.11 version on my Windows machine.

I didn’t run into an issue with llvmlite.dll. I did encounter an issue installing the new python version with pyenv - No module named _lizma
Following some stack overflow pages, I sudo apt installed liblzma_dev, then uninstalled and reinstalled 3.11.3 with pyenv and it seemed to fix the issue.

Summarize_environment and time_libraries looked okay. I passed all pytests except for failing test_open_blas_threads. Test output attached.
windowsOpenBlasTest.txt

Again, I'm not seeing a noticeable difference in runtimes:

python runscripts/manual/runParca.py -c6 python3.11 - Elapsed time 477.51 sec (0:07:57.508392); CPU 210.47 sec
python runscripts/manual/runSim.py -g2 python3.11 - Elapsed time 1982.94 sec (0:33:02.936445); CPU 1982.12 sec

Compared to old runtimes on this machine:

python runscripts/manual/runParca.py -c6 python3old -  Elapsed time 437.31 sec (0:07:17.307120); CPU 200.79 sec
python runscripts/manual/runSim.py -g2 python3old - Elapsed time 2036.66 sec (0:33:56.664326); CPU 2034.45 sec

Thanks!

This will give a quick reading on BLAS performance, in particular whether we've succeeded in getting Apple's Accelerate library to outperform OpenBLAS. Much quicker than running WCM.
@1fish2
Copy link
Contributor Author

1fish2 commented May 3, 2023

numba 0.57.0 and llvmlite 0.40.0 were finally released. Now:

  • Test with numba 0.57.0, llvmlite 0.40.0, also numpy 1.24.3 and other recent library updates:
  • Update installation docs.
  • Fork the chunk library.

Installation notes:

  • @rjuenemann was that sudo apt install liblzma_dev in WSL on Windows?
  • Experimenting on Intel Mac, pip install --no-binary numpy numpy==1.24.2 builds a Numpy that links to macOS Accelerate even without a .numpy-site.cfg file. So Accelerate must now be the default.
  • So to make Numpy use OpenBLAS requires (1) pip install numpy==1.24.3 (precompiled wheels) or (2) pip install --no-binary numpy numpy==1.24.3 with a .numpy-site.cfg file configured for OpenBLAS. That setup took twice as long to compute dot products in test_openblas_threads.py.
  • The --no-use-pep517 argument didn't have any obvious impact.

Note: Starting in numpy 1.24.3, integer conversions check for overflow, e.g. np.array([3000], dtype=np.int8) will give a DeprecationWarning. This could make for a very noisy PR build.

Update to the newly released versions of ntumba & llvmlite, the first versions compatible with Python 3.11.

Get other recent python library updates including numpy 1.24.3.
@rjuenemann
Copy link
Contributor

@1fish2 - yes that was sudo apt install liblzma_dev on WSL (Ubuntu distribution) on Windows.

Thanks for the update - I will retest with the new library updates.

@rjuenemann
Copy link
Contributor

Hi @1fish2

Retested with the new library updates. I didn’t encounter anything unexpected. Still having warnings with OpenBLAS thread test on M2 and failing on Windows.

M2 installing numpy with Accelerate

=== warnings summary ===
wholecell/io/tablereader.py:2
/Users/rjuene/dev/wcEcoli/wholecell/io/tablereader.py:2: DeprecationWarning: 'chunk' is deprecated and slated for removal in Python 3.13
from chunk import Chunk

wholecell/tests/utils/test_openblas_threads.py::Test_openblas_threads::test_openblas
/Users/rjuene/dev/wcEcoli/wholecell/tests/utils/test_openblas_threads.py:87: UserWarning: Didn't reproduce the expected dot product using 1 OpenBLAS thread, 0.0166838055841127 != 0.016683805584112754, so simulation results aren't portable.
warnings.warn(f"Didn't reproduce the expected dot product using 1"

ParCa: Elapsed time 263.65 sec (0:04:23.653752); CPU 125.52 sec
2-gen simulation: Elapsed time 1282.43 sec (0:21:22.429618); CPU 1282.44 sec

M2 installing numpy with built-in OpenBLAS

=== warnings summary ===
wholecell/io/tablereader.py:2
/Users/rjuene/dev/wcEcoli/wholecell/io/tablereader.py:2: DeprecationWarning: 'chunk' is deprecated and slated for removal in Python 3.13
from chunk import Chunk

wholecell/tests/utils/test_openblas_threads.py::Test_openblas_threads::test_openblas
/Users/rjuene/dev/wcEcoli/wholecell/tests/utils/test_openblas_threads.py:87: UserWarning: Didn't reproduce the expected dot product using 1 OpenBLAS thread, 0.01668380558411259 != 0.016683805584112754, so simulation results aren't portable.
warnings.warn(f"Didn't reproduce the expected dot product using 1"

ParCa: Elapsed time 290.86 sec (0:04:50.857963); CPU 130.23 sec

2-gen simulation: Elapsed time 1291.40 sec (0:21:31.399535); CPU 1291.17 sec

Windows installing numpy with built-in OpenBlas

OpenBLAS thread test still failing testOpenBlasUpdate.txt

ParCa: Elapsed time 447.08 sec (0:07:27.084221); CPU 206.07 sec

2-gen simulation: Elapsed time 2049.56 sec (0:34:09.559818); CPU 2048.94 sec

@1fish2
Copy link
Contributor Author

1fish2 commented May 9, 2023

Thanks, @rjuenemann!

I posted to the numpy discussion group asking how to make Accelerate really accelerate. I forked the Chunk library, fixed up the docstrings, and will add it to the repo after writing a unit test; maybe later move it to a PyPI library.

Not reproducing the expected dot product means we won't get portable results and thus results in Jenkins builds could differ from those on our development computers. Bummer for debugging.

@1fish2
Copy link
Contributor Author

1fish2 commented May 9, 2023

I'm wondering whether it's worth renaming test_openblas_threads.py since it actually tests whatever BLAS library Numpy is linked to...

@rjuenemann
Copy link
Contributor

I'm wondering whether it's worth renaming test_openblas_threads.py since it actually tests whatever BLAS library Numpy is linked to...

Might not be a bad idea if it's not too big of a hassle

* Fork the deprecated `chunk` library to fix the deprecation warning.
* Add a unit test.
* Fold in the .pyi type decls and type Chunk's `file` parameter.
* Make `seek()` return the new position, per the `IO` protocol.
* Fix relative `seek()` for the case when it's called after reading the pad byte.
* Clarify and correct docstrings and comments to match the code and the file format standards.
* Switch `tablereader` to use it.

Also: Add a tip from @rjuenemann about installing `liblzma_dev` to install in WSL on Windows.

TODO: Make `chunk.py` into a PyPI library. Meanwhile, it still uses spaces for indentation since that's the standard in the Python Standard Library and independent libraries.
@1fish2
Copy link
Contributor Author

1fish2 commented May 12, 2023

It turns out to use the new BLAS/LAPACK interfaces via macOS Accelerate, numpy has to define the C macro ACCELERATE_NEW_LAPACK before including the Accelerate and vecLib headers. Without that and maybe other numpy changes, I think it'll hit "bugs that cause wrong output under easily reproducible conditions" and miss out on 64-bit APIs.

The numpy folks expect a numpy PR to enable that "soon."

We could clone the numpy repo and tinker in the meantime or just wait for the PR.

@rjuenemann
Copy link
Contributor

It turns out to use the new BLAS/LAPACK interfaces via macOS Accelerate, numpy has to define the C macro ACCELERATE_NEW_LAPACK before including the Accelerate and vecLib headers. Without that and maybe other numpy changes, I think it'll hit "bugs that cause wrong output under easily reproducible conditions" and miss out on 64-bit APIs.

The numpy folks expect a numpy PR to enable that "soon."

We could clone the numpy repo and tinker in the meantime or just wait for the PR.

Interesting. So this could explain why we weren't seeing a substantial speedup using Accelerate?

@1fish2
Copy link
Contributor Author

1fish2 commented May 12, 2023

It seems that compiling with ACCELERATE_NEW_LAPACK (+ ACCELERATE_LAPACK_ILP64?) is needed to access the new 64-bit libraries which use M1 hardware features like matrix hardware.

There are lots of variations in installing numpy (e.g. via conda), build details, math operations (should we benchmark matrix multiply?), array size, floating point precision, hardware (M1/M2 -/Pro/Max), and bugs, so I'm unsure about it.

1fish2 added 4 commits May 14, 2023 21:52
* Update pips, but stick with `urllib3<2` since 2.0 has an incompatible API which `docker` 6.1.2 doesn't say it's ready for. `requests` 2.30.0 is ready for it, so delay that update.
* `sympy==1.12` has [a lot of changes](https://github.com/sympy/sympy/wiki/release-notes-for-1.12). Only a few of them are documented as BREAKING.
* Rename `test_openblas_threads.py` to `test_blas.py`. The test calls numpy, and whether numpy in turn calls OpenBLAS depends on how it was installed.
* Don't encourage people to use Apple's Accelerate library yet. Numpy isn't quite ready.
@1fish2 1fish2 changed the title Update wcEcoli from Python 3.8 to 3.11.3 and up-to-date pip libraries [DRAFT] Update wcEcoli from Python 3.8 to 3.11.3 and up-to-date pip libraries May 15, 2023
@1fish2 1fish2 requested review from rjuenemann and ggsun May 15, 2023 20:47
@1fish2
Copy link
Contributor Author

1fish2 commented May 15, 2023

@rjuenemann and @ggsun if you escaped the room then please do look at this PR now that it's ready.

Is there a more or less convenient time to merge it in? At that time, I'll rebuild the wcEcoli3 pyenv on Sherlock, and everyone will need to update their local pyenvs when they pull from GitHub.

@rjuenemann
Copy link
Contributor

Thanks @1fish2! I don't have a strong opinion about a convenient time to merge it in - anytime should be fine on my end.

Comment on lines 242 to 246
**Recommendation:** On macOS 13.3+, use
`pip install numpy==1.24.3 --no-binary numpy`
to link numpy to the macOS Accelerate library. Otherwise use
`pip install numpy==1.24.3`
to use numpy's built-in copy of OpenBLAS.
Copy link
Contributor

@rjuenemann rjuenemann May 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused on the wording in this section vs line 192 in this file - are we recommending Mac users to use Accelerate? Or still use OpenBLAS for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I'll remove this until the Numpy team makes it work properly on Accelerate.

1fish2 added 2 commits May 17, 2023 17:55
Now that the `wcEcoli3` pyenv matches this PR's.
@1fish2 1fish2 merged commit e03c717 into master May 18, 2023
@1fish2 1fish2 deleted the python-3.11.3 branch May 18, 2023 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants