-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{lib}[foss/2023a] TensorFlow v2.13.0 w/ CUDA 12.1.1 #19182
Closed
VRehnberg
wants to merge
1
commit into
easybuilders:develop
from
VRehnberg:20231109160257_new_pr_TensorFlow2130
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
250 changes: 250 additions & 0 deletions
250
easybuild/easyconfigs/t/TensorFlow/TensorFlow-2.13.0-foss-2023a-CUDA-12.1.1.eb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,250 @@ | ||
easyblock = 'PythonBundle' | ||
|
||
name = 'TensorFlow' | ||
version = '2.13.0' | ||
versionsuffix = '-CUDA-%(cudaver)s' | ||
|
||
homepage = 'https://www.tensorflow.org/' | ||
description = "An open-source software library for Machine Intelligence" | ||
|
||
toolchain = {'name': 'foss', 'version': '2023a'} | ||
toolchainopts = {'pic': True} | ||
|
||
builddependencies = [ | ||
('Bazel', '6.3.1'), | ||
# git 2.x required, see also https://github.com/tensorflow/tensorflow/issues/29053 | ||
('git', '2.41.0', '-nodocs'), | ||
('pybind11', '2.11.1'), | ||
('UnZip', '6.0'), | ||
# Required to build some of the extensions | ||
('poetry', '1.5.1'), | ||
# System protobuf doesn't seem to work: https://github.com/tensorflow/tensorflow/issues/61593 | ||
# So don't add it here | ||
] | ||
dependencies = [ | ||
('CUDA', '12.1.1', '', SYSTEM), | ||
('cuDNN', '8.9.2.26', versionsuffix, SYSTEM), | ||
('NCCL', '2.18.3', versionsuffix), | ||
('Python', '3.11.3'), | ||
('h5py', '3.9.0'), | ||
('cURL', '8.0.1'), | ||
('dill', '0.3.7'), | ||
('double-conversion', '3.3.0'), | ||
('flatbuffers', '23.5.26'), | ||
('flatbuffers-python', '23.5.26'), | ||
('giflib', '5.2.1'), | ||
('hwloc', '2.9.1'), | ||
('ICU', '73.2'), | ||
('JsonCpp', '1.9.5'), | ||
('libjpeg-turbo', '2.1.5.1'), | ||
('NASM', '2.16.01'), | ||
('nsync', '1.26.0'), | ||
('SQLite', '3.42.0'), | ||
('patchelf', '0.18.0'), | ||
('protobuf-python', '4.24.0'), | ||
('libpng', '1.6.39'), | ||
('snappy', '1.1.10'), | ||
('zlib', '1.2.13'), | ||
# Dependencies of grpcio | ||
('OpenSSL', '1.1', '', SYSTEM), | ||
('RE2', '2023-08-01'), | ||
] | ||
|
||
use_pip = True | ||
sanity_pip_check = True | ||
|
||
# Dependencies created and updated using findPythonDeps.sh: | ||
# https://gist.github.com/Flamefire/49426e502cd8983757bd01a08a10ae0d | ||
exts_list = [ | ||
('wrapt', '1.15.0', { | ||
'checksums': ['d06730c6aed78cee4126234cf2d071e01b44b915e725a6cb439a879ec9754a3a'], | ||
}), | ||
('termcolor', '2.3.0', { | ||
'source_tmpl': SOURCE_PY3_WHL, | ||
'checksums': ['3afb05607b89aed0ffe25202399ee0867ad4d3cb4180d98aaf8eefa6a5f7d475'], | ||
}), | ||
('tensorflow-estimator', version, { | ||
'source_tmpl': 'tensorflow_estimator-%(version)s-py2.py3-none-any.whl', | ||
'checksums': ['6f868284eaa654ae3aa7cacdbef2175d0909df9fcf11374f5166f8bf475952aa'], | ||
}), | ||
('Werkzeug', '2.3.7', { | ||
'source_tmpl': SOURCELOWER_TAR_GZ, | ||
'checksums': ['2b8c0e447b4b9dbcc85dd97b6eeb4dcbaf6c8b6c3be0bd654e25553e0a2157d8'], | ||
}), | ||
('tensorboard-plugin-wit', '1.8.1', { | ||
'source_tmpl': 'tensorboard_plugin_wit-%(version)s-py3-none-any.whl', | ||
'checksums': ['ff26bdd583d155aa951ee3b152b3d0cffae8005dc697f72b44a8e8c2a77a8cbe'], | ||
}), | ||
('tensorboard-data-server', '0.7.1', { | ||
'source_tmpl': 'tensorboard_data_server-%(version)s-py3-none-any.whl', | ||
'checksums': ['9938bd39f5041797b33921066fba0eab03a0dd10d1887a05e62ae58841ad4c3f'], | ||
}), | ||
('Markdown', '3.4.4', { | ||
'checksums': ['225c6123522495d4119a90b3a3ba31a1e87a70369e03f14799ea9c0d7183a3d6'], | ||
}), | ||
('grpcio', '1.57.0', { | ||
'modulename': 'grpc', | ||
'preinstallopts': "GRPC_PYTHON_BUILD_EXT_COMPILER_JOBS=%(parallel)s " + | ||
# Required to avoid building with non-default C++ standard but keep other flags, | ||
# see https://github.com/grpc/grpc/issues/34256 | ||
"GRPC_PYTHON_CFLAGS='-fvisibility=hidden -fno-wrapv -fno-exceptions' " + | ||
" ".join(["GRPC_PYTHON_BUILD_SYSTEM_%s=True" % i for i in | ||
( | ||
'OPENSSL', | ||
'ZLIB', | ||
'RE2', | ||
# 'ABSL', | ||
)]), | ||
'checksums': ['4b089f7ad1eb00a104078bab8015b0ed0ebcb3b589e527ab009c53893fd4e613'], | ||
}), | ||
('oauthlib', '3.2.2', { | ||
'checksums': ['9859c40929662bec5d64f34d01c99e093149682a3f38915dc0655d5a633dd918'], | ||
}), | ||
('requests-oauthlib', '1.3.1', { | ||
'checksums': ['75beac4a47881eeb94d5ea5d6ad31ef88856affe2332b9aafb52c6452ccf0d7a'], | ||
}), | ||
('rsa', '4.9', { | ||
'checksums': ['e38464a49c6c85d7f1351b0126661487a7e0a14a50f1675ec50eb34d4f20ef21'], | ||
}), | ||
('pyasn1-modules', '0.3.0', { | ||
'source_tmpl': 'pyasn1_modules-%(version)s.tar.gz', | ||
'checksums': ['5bd01446b736eb9d31512a30d46c1ac3395d676c6f3cafa4c03eb54b9925631c'], | ||
}), | ||
('cachetools', '5.3.1', { | ||
'checksums': ['dce83f2d9b4e1f732a8cd44af8e8fab2dbe46201467fc98b3ef8f269092bf62b'], | ||
}), | ||
('google-auth', '2.22.0', { | ||
'modulename': 'google.auth', | ||
'checksums': ['164cba9af4e6e4e40c3a4f90a1a6c12ee56f14c0b4868d1ca91b32826ab334ce'], | ||
}), | ||
('google-auth-oauthlib', '1.0.0', { | ||
'checksums': ['e375064964820b47221a7e1b7ee1fd77051b6323c3f9e3e19785f78ab67ecfc5'], | ||
}), | ||
('absl-py', '1.4.0', { | ||
'modulename': 'absl', | ||
'checksums': ['d2c244d01048ba476e7c080bd2c6df5e141d211de80223460d5b3b8a2a58433d'], | ||
}), | ||
('tensorboard', version, { | ||
'source_tmpl': SOURCE_PY3_WHL, | ||
'checksums': ['ab69961ebddbddc83f5fa2ff9233572bdad5b883778c35e4fe94bf1798bd8481'], | ||
}), | ||
('opt-einsum', '3.3.0', { | ||
'source_tmpl': 'opt_einsum-%(version)s.tar.gz', | ||
'checksums': ['59f6475f77bbc37dcf7cd748519c0ec60722e91e63ca114e68821c0c54a46549'], | ||
}), | ||
('keras', '2.13.1', { | ||
'source_tmpl': SOURCE_PY3_WHL, | ||
'checksums': ['5ce5f706f779fa7330e63632f327b75ce38144a120376b2ae1917c00fa6136af'], | ||
}), | ||
('google-pasta', '0.2.0', { | ||
'modulename': 'pasta', | ||
'checksums': ['c9f2c8dfc8f96d0d5808299920721be30c9eec37f2389f28904f454565c8a16e'], | ||
}), | ||
('astunparse', '1.6.3', { | ||
'checksums': ['5ad93a8456f0d084c3456d059fd9a92cce667963232cbf763eac3bc5b7940872'], | ||
}), | ||
# Required by tests | ||
('portpicker', '1.5.2', { | ||
'checksums': ['c55683ad725f5c00a41bc7db0225223e8be024b1fa564d039ed3390e4fd48fb3'], | ||
}), | ||
# System dependencies | ||
('tblib', '2.0.0', { | ||
'checksums': ['a6df30f272c08bf8be66e0775fad862005d950a6b8449b94f7c788731d70ecd7'], | ||
}), | ||
('astor', '0.8.1', { | ||
'checksums': ['6a6effda93f4e1ce9f618779b2dd1d9d84f1e32812c23a29b3fff6fd7f63fa5e'], | ||
}), | ||
# Optional profile plugin + dependency | ||
('gviz-api', '1.10.0', { | ||
'source_tmpl': 'gviz_api-%(version)s.tar.gz', | ||
'checksums': ['846692dd8cc73224fc31b18e41589bd934e1cc05090c6576af4b4b26c2e71b90'], | ||
}), | ||
('tensorboard-plugin-profile', '2.13.1', { | ||
'source_tmpl': 'tensorboard_plugin_profile-%(version)s.tar.gz', | ||
'checksums': ['472d1cb85d7087c5294131eb640bd771f5515ecc4867030c7904718be7fc19c1'], | ||
}), | ||
(name, version, { | ||
'source_tmpl': 'v%(version)s.tar.gz', | ||
'source_urls': ['https://github.com/tensorflow/tensorflow/archive/'], | ||
'patches': [ | ||
'TensorFlow-2.1.0_fix-cuda-build.patch', | ||
'TensorFlow-2.4.0_dont-use-var-lock.patch', | ||
'TensorFlow-2.9.1_remove-duplicate-gpu-tests.patch', | ||
'TensorFlow-2.11.0_disable-avx512-extensions.patch', | ||
'TensorFlow-2.13.0_add-default-shell-env.patch', | ||
'TensorFlow-2.13.0_add-missing-system-absl-py-target.patch', | ||
'TensorFlow-2.13.0_add-missing-system-protobuf-targets.patch', | ||
'TensorFlow-2.13.0_exclude-xnnpack-on-ppc.patch', | ||
'TensorFlow-2.13.0_fix-protobuf-compatibility.patch', | ||
'TensorFlow-2.13.0_remove-io-gcs-filesystem-dep.patch', | ||
'TensorFlow-2.13.0_remove-libclang-dep.patch', | ||
'TensorFlow-2.13.0_fix-numpy-2.15.compat.patch', | ||
'TensorFlow-2.13.0_remove-typing_extensions-upper-bound.patch', | ||
'TensorFlow-2.13.0_revert-to-flatbuffers-2.0.6.patch', | ||
'TensorFlow-2.13.0_unpin-gast-version.patch', | ||
], | ||
'checksums': [ | ||
{'v2.13.0.tar.gz': 'e58c939079588623e6fa1d054aec2f90f95018266e0a970fd353a5244f5173dc'}, | ||
{'TensorFlow-2.1.0_fix-cuda-build.patch': | ||
'78c20aeaa7784b8ceb46238a81e8c2461137d28e0b576deeba8357d23fbe1f5a'}, | ||
{'TensorFlow-2.4.0_dont-use-var-lock.patch': | ||
'b14f2493fd2edf79abd1c4f2dde6c98a3e7d5cb9c25ab9386df874d5f072d6b5'}, | ||
{'TensorFlow-2.9.1_remove-duplicate-gpu-tests.patch': | ||
'6fe50faab28387c622c68dc3fc0cbfb2a51000cd750c1a82f8420b54fcd2509f'}, | ||
{'TensorFlow-2.11.0_disable-avx512-extensions.patch': | ||
'fb8e7694b5d2377cc44e6674ff85a7c50dc725f2f507cbcfda65f129f534b1cc'}, | ||
{'TensorFlow-2.13.0_add-default-shell-env.patch': | ||
'a94b2e007bff5a08ec4e6ec3043985907a69e9eeaea69dc4fe2aa15d15b75aef'}, | ||
{'TensorFlow-2.13.0_add-missing-system-absl-py-target.patch': | ||
'94bc3b155840af942437d06c43830dabf41d94391daf61e1d0add0a7bf20a538'}, | ||
{'TensorFlow-2.13.0_add-missing-system-protobuf-targets.patch': | ||
'77d8c8a5627493fc7c38b4de79d49e60ff6628b05ff969f4cd3ff9857176c459'}, | ||
{'TensorFlow-2.13.0_exclude-xnnpack-on-ppc.patch': | ||
'd0818206846911d946666ded7d3216c0546e37cee1890a2f48dc1a9d71047cad'}, | ||
{'TensorFlow-2.13.0_fix-protobuf-compatibility.patch': | ||
'a9658c035b663da1b7d1983a8e37883cc40c1c0cfa22132bb7fe19c4cbc9712a'}, | ||
{'TensorFlow-2.13.0_remove-io-gcs-filesystem-dep.patch': | ||
'39f1cbecad4b3723481b30f18f16363ab1837c8749ee197ec88b92b493e9df67'}, | ||
{'TensorFlow-2.13.0_remove-libclang-dep.patch': | ||
'f0d067d129e817b0d371c4e48a4a1ac08f80a2c137d52b05a3c7c4370dcbd1e5'}, | ||
{'TensorFlow-2.13.0_fix-numpy-2.15.compat.patch': | ||
'4023be57bc8e33ae55ccac54b51d6532fea7ac4a32cb1125e3e42da0dec1669a'}, | ||
{'TensorFlow-2.13.0_remove-typing_extensions-upper-bound.patch': | ||
'ed48464ed6f4cdbd0dde93ffc413c394d363278039502d77540ff7206c2048ae'}, | ||
{'TensorFlow-2.13.0_revert-to-flatbuffers-2.0.6.patch': | ||
'f22757250181b6165e4b2ef1e199bd4cb344a9429be5a1086638f25bcbf650fc'}, | ||
{'TensorFlow-2.13.0_unpin-gast-version.patch': | ||
'61e0c9b67aa6c48176fcbb429bf6aa36c4fdde604c82c02f58a043412fecf285'}, | ||
], | ||
'test_script': 'TensorFlow-2.x_mnist-test.py', | ||
'test_tag_filters_cpu': '-gpu,-tpu,-no_cuda_on_cpu_tap,' | ||
'-no_pip,-no_oss,-oss_serial,-benchmark-test,-v1only', | ||
'test_tag_filters_gpu': 'gpu,-no_gpu,-nogpu,-gpu_cupti,-no_cuda11,' | ||
'-no_pip,-no_oss,-oss_serial,-benchmark-test,-v1only', | ||
'test_targets': [ | ||
'//tensorflow/core/...', | ||
'-//tensorflow/core:example_java_proto', | ||
'-//tensorflow/core/example:example_protos_closure', | ||
'//tensorflow/cc/...', | ||
'//tensorflow/c/...', | ||
'//tensorflow/python/...', | ||
'-//tensorflow/c/eager:c_api_test_gpu', | ||
'-//tensorflow/c/eager:c_api_distributed_test', | ||
'-//tensorflow/c/eager:c_api_distributed_test_gpu', | ||
'-//tensorflow/c/eager:c_api_cluster_test_gpu', | ||
'-//tensorflow/c/eager:c_api_remote_function_test_gpu', | ||
'-//tensorflow/c/eager:c_api_remote_test_gpu', | ||
'-//tensorflow/core/common_runtime:collective_param_resolver_local_test', | ||
'-//tensorflow/core/kernels/mkl:mkl_fused_ops_test', | ||
'-//tensorflow/core/kernels/mkl:mkl_fused_batch_norm_op_test', | ||
'-//tensorflow/core/ir/importexport/tests/roundtrip/...', | ||
], | ||
# Need to have $HOME set for tests on PPC: https://github.com/tensorflow/tensorflow/issues/61814 | ||
'testopts': "--test_env=HOME=/tmp --test_timeout=3600 --test_size_filters=small", | ||
'testopts_gpu': "--test_env=HOME=/tmp --test_timeout=3600 --test_size_filters=small " | ||
"--run_under=//tensorflow/tools/ci_build/gpu_build:parallel_gpu_execute", | ||
'with_xla': True, | ||
}), | ||
] | ||
|
||
moduleclass = 'lib' |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VRehnberg It seems like this
cuDNN
version is causing trouble, I'm getting:see also tensorflow/tensorflow#60832, where they suggest to downgrade to an older
cuDNN
(ugh...)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no easyconfigs yet that using a
2023a
toolchain and have acuDNN
dependency, so we still have the freedom to stick tocuDNN
8.6.* here...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to stick to CUDA 11.8 though, since cuDNN 8.6 is only paired with CUDA 10.3 and 11.8 it seems, see https://developer.download.nvidia.com/compute/redist/cudnn/v8.6.0/local_installers/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And CUDA 11.8 is a problem with GCC 12.x, hitting this when installing NCCL on top of CUDA 11.8.0 with
GCCcore/12.3.0
:So that tells me we're doomed to stick to
foss/2022a
for TensorFlow 2.13.0?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meh, I'll close this one then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, or go with another CUDA I suppose. That's what the CUDA version suffix is for I guess. For CUDA 12.3 I can't find anything about compatible GCC, but extrapolating what I could find it will probably work for CUDA 12.3 which isn't listed for CuDNN 8.9.6, but could possibly work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can workaround this by forcing NVCC to accept the "incompatible" compiler: https://github.com/easybuilders/easybuild-easyconfigs/pull/18853/files#diff-c0833191974a98d7eddf20cecac9d27ec670e369f43f75f3a4bafb2261b1135fR27
Of course there is a risk that the compiler really is incompatible...