Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.10.0.rc1 #260

Closed
wants to merge 19 commits into from
Closed

2.10.0.rc1 #260

wants to merge 19 commits into from

Conversation

ngam
Copy link
Contributor

@ngam ngam commented Aug 5, 2022

notable changes:


MAJOR CHANGES:

: RIP cuda102 (I am not going to bother looking into this)

Error in fail: The following libraries cannot be linked either statically or dynamically:
@cub_archive//:cub
To ignore which libraries get linked statically for now, add the following to 'static_deps':
        "@cub_archive//:__subpackages__",


Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe:

  • It looks like the 'tensorflow-base' output doesn't have any tests.

@ngam
Copy link
Contributor Author

ngam commented Aug 5, 2022

@conda-forge-admin, please rerender

@ngam
Copy link
Contributor Author

ngam commented Aug 5, 2022

@hmaarrfk @h-vetinari I lost track of what is ready for c++17 and what is not. The cuda11x builds are failing with c++17 errors. As you see below the compiler call includes c++17, but the error is obviously c++17 specific. It could be something else bringing c++14 (or something else) with it?

Any advice? How to make sure cuda11x builds actually work/compile with c++17?

2022-08-05T04:57:58.4401174Z ERROR: /home/conda/feedstock_root/build_artifacts/tensorflow-split_1659668861211/work/tensorflow/core/kernels/image/BUILD:202:18: Compiling tensorflow/core/kernels/image/crop_and_resize_op_gpu.cu.cc failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command 
2022-08-05T04:57:58.4403587Z   (cd /home/conda/feedstock_root/build_artifacts/tensorflow-split_1659668861211/_build_env/share/bazel/0f5c1a31abfcd9c29af0c744fb45460a/execroot/org_tensorflow && \
2022-08-05T04:57:58.4405110Z   exec env - \
2022-08-05T04:57:58.4405970Z     CUDA_TOOLKIT_PATH=/usr/local/cuda-11.2 \
2022-08-05T04:57:58.4407036Z     GCC_HOST_COMPILER_PATH=/home/conda/feedstock_root/build_artifacts/tensorflow-split_1659668861211/_build_env/bin/x86_64-conda-linux-gnu-gcc \
2022-08-05T04:57:58.4412337Z     PATH=/home/conda/feedstock_root/build_artifacts/tensorflow-split_1659668861211/work:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1659668861211/_build_env/bin:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1659668861211/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/bin:/opt/conda/condabin:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1659668861211/_build_env:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1659668861211/_build_env/bin:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1659668861211/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac:/home/conda/feedstock_root/build_artifacts/tensorflow-split_1659668861211/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/bin:/opt/conda/bin:/opt/conda/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/conda/bin:/usr/local/cuda/bin \
2022-08-05T04:57:58.4417198Z     PWD=/proc/self/cwd \
2022-08-05T04:57:58.4418830Z     PYTHON_BIN_PATH=/home/conda/feedstock_root/build_artifacts/tensorflow-split_1659668861211/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/bin/python \
2022-08-05T04:57:58.4420886Z     PYTHON_LIB_PATH=/home/conda/feedstock_root/build_artifacts/tensorflow-split_1659668861211/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/lib/python3.10/site-packages \
2022-08-05T04:57:58.4421860Z     TF2_BEHAVIOR=1 \
2022-08-05T04:57:58.4422564Z     TF_CUDA_COMPUTE_CAPABILITIES=sm_35,sm_50,sm_60,sm_62,sm_70,sm_72,sm_75,sm_80,sm_86,compute_86 \
2022-08-05T04:57:58.4423946Z     TF_CUDA_PATHS=/home/conda/feedstock_root/build_artifacts/tensorflow-split_1659668861211/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac,/usr/local/cuda \
2022-08-05T04:57:58.4424845Z     TF_CUDA_VERSION=11.2 \
2022-08-05T04:57:58.4425260Z     TF_CUDNN_VERSION=8 \
2022-08-05T04:57:58.4425630Z     TF_NCCL_VERSION=2.13 \
2022-08-05T04:57:58.4426484Z     TF_SYSTEM_LIBS=absl_py,astor_archive,astunparse_archive,boringssl,com_github_googlecloudplatform_google_cloud_cpp,com_github_grpc_grpc,com_google_protobuf,curl,cython,dill_archive,flatbuffers,gast_archive,gif,icu,libjpeg_turbo,org_sqlite,png,pybind11,snappy,zlib \
2022-08-05T04:57:58.4449976Z   custom_toolchain/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/k8-opt/bin/tensorflow/core/kernels/image/_objs/crop_and_resize_op_gpu/crop_and_resize_op_gpu.cu.pic.d '-frandom-seed=bazel-out/k8-opt/bin/tensorflow/core/kernels/image/_objs/crop_and_resize_op_gpu/crop_and_resize_op_gpu.cu.pic.o' -fPIC -DEIGEN_MPL2_ONLY '-DEIGEN_MAX_ALIGN_BYTES=64' -DTF_USE_SNAPPY -DTENSORFLOW_USE_CUSTOM_CONTRACTION_KERNEL -DTENSORFLOW_USE_MKLDNN_CONTRACTION_KERNEL -DGEMM_KERNEL_H '-DEIGEN_ALTIVEC_USE_CUSTOM_PACK=0' -iquote . -iquote bazel-out/k8-opt/bin -iquote external/eigen_archive -iquote bazel-out/k8-opt/bin/external/eigen_archive -iquote external/com_google_absl -iquote bazel-out/k8-opt/bin/external/com_google_absl -iquote external/nsync -iquote bazel-out/k8-opt/bin/external/nsync -iquote external/gif -iquote bazel-out/k8-opt/bin/external/gif -iquote external/libjpeg_turbo -iquote bazel-out/k8-opt/bin/external/libjpeg_turbo -iquote external/com_google_protobuf -iquote bazel-out/k8-opt/bin/external/com_google_protobuf -iquote external/com_googlesource_code_re2 -iquote bazel-out/k8-opt/bin/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/k8-opt/bin/external/farmhash_archive -iquote external/fft2d -iquote bazel-out/k8-opt/bin/external/fft2d -iquote external/highwayhash -iquote bazel-out/k8-opt/bin/external/highwayhash -iquote external/zlib -iquote bazel-out/k8-opt/bin/external/zlib -iquote external/local_config_cuda -iquote bazel-out/k8-opt/bin/external/local_config_cuda -iquote external/snappy -iquote bazel-out/k8-opt/bin/external/snappy -iquote external/double_conversion -iquote bazel-out/k8-opt/bin/external/double_conversion -iquote external/local_config_rocm -iquote bazel-out/k8-opt/bin/external/local_config_rocm -iquote external/local_config_tensorrt -iquote bazel-out/k8-opt/bin/external/local_config_tensorrt -iquote external/png -iquote bazel-out/k8-opt/bin/external/png -iquote external/mkl_dnn_v1 -iquote bazel-out/k8-opt/bin/external/mkl_dnn_v1 -Ibazel-out/k8-opt/bin/external/local_config_cuda/cuda/_virtual_includes/cuda_headers_virtual -Ibazel-out/k8-opt/bin/external/local_config_tensorrt/_virtual_includes/tensorrt_headers -isystem third_party/eigen3/mkl_include -isystem bazel-out/k8-opt/bin/third_party/eigen3/mkl_include -isystem external/eigen_archive -isystem bazel-out/k8-opt/bin/external/eigen_archive -isystem external/nsync/public -isystem bazel-out/k8-opt/bin/external/nsync/public -isystem external/farmhash_archive/src -isystem bazel-out/k8-opt/bin/external/farmhash_archive/src -isystem external/local_config_cuda/cuda -isystem bazel-out/k8-opt/bin/external/local_config_cuda/cuda -isystem external/local_config_cuda/cuda/cuda/include -isystem bazel-out/k8-opt/bin/external/local_config_cuda/cuda/cuda/include -isystem external/local_config_rocm/rocm -isystem bazel-out/k8-opt/bin/external/local_config_rocm/rocm -isystem external/local_config_rocm/rocm/rocm/include -isystem bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include -isystem external/local_config_rocm/rocm/rocm/include/rocrand -isystem bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include/rocrand -isystem external/local_config_rocm/rocm/rocm/include/roctracer -isystem bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include/roctracer -isystem external/mkl_dnn_v1/include -isystem bazel-out/k8-opt/bin/external/mkl_dnn_v1/include -isystem external/mkl_dnn_v1/src -isystem bazel-out/k8-opt/bin/external/mkl_dnn_v1/src -isystem external/mkl_dnn_v1/src/common -isystem bazel-out/k8-opt/bin/external/mkl_dnn_v1/src/common -isystem external/mkl_dnn_v1/src/common/ittnotify -isystem bazel-out/k8-opt/bin/external/mkl_dnn_v1/src/common/ittnotify -isystem external/mkl_dnn_v1/src/cpu -isystem bazel-out/k8-opt/bin/external/mkl_dnn_v1/src/cpu -isystem external/mkl_dnn_v1/src/cpu/gemm -isystem bazel-out/k8-opt/bin/external/mkl_dnn_v1/src/cpu/gemm -isystem external/mkl_dnn_v1/src/cpu/x64/xbyak -isystem bazel-out/k8-opt/bin/external/mkl_dnn_v1/src/cpu/x64/xbyak -isystem /home/conda/feedstock_root/build_artifacts/tensorflow-split_1659668861211/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include '-march=nocona' '-mtune=haswell' -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/conda/feedstock_root/build_artifacts/tensorflow-split_1659668861211/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include '-fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/tensorflow-split_1659668861211/work=/usr/local/src/conda/tensorflow-split-2.10.0.rc0' '-fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/tensorflow-split_1659668861211/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac=/usr/local/src/conda-prefix' -isystem /usr/local/cuda/include -DNDEBUG '-D_FORTIFY_SOURCE=2' -O2 -isystem /home/conda/feedstock_root/build_artifacts/tensorflow-split_1659668861211/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include -isystem /usr/local/cuda/include -fvisibility-inlines-hidden '-std=c++17' '-fmessage-length=0' '-march=nocona' '-mtune=haswell' -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/conda/feedstock_root/build_artifacts/tensorflow-split_1659668861211/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include '-fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/tensorflow-split_1659668861211/work=/usr/local/src/conda/tensorflow-split-2.10.0.rc0' '-fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/tensorflow-split_1659668861211/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac=/usr/local/src/conda-prefix' -isystem /usr/local/cuda/include -DNDEBUG '-D_FORTIFY_SOURCE=2' -O2 -isystem /home/conda/feedstock_root/build_artifacts/tensorflow-split_1659668861211/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include -isystem /usr/local/cuda/include -w -DAUTOLOAD_DYNAMIC_KERNELS '-march=nocona' '-mtune=haswell' '-std=c++17' -x cuda '-DGOOGLE_CUDA=1' '-Xcuda-fatbinary=--compress-all' '--cuda-gpu-arch=sm_35' '--cuda-gpu-arch=sm_50' '--cuda-gpu-arch=sm_60' '--cuda-gpu-arch=sm_62' '--cuda-gpu-arch=sm_70' '--cuda-gpu-arch=sm_72' '--cuda-gpu-arch=sm_75' '--cuda-gpu-arch=sm_80' '--cuda-gpu-arch=sm_86' '--cuda-include-ptx=sm_86' '--cuda-gpu-arch=sm_86' -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare '-ftemplate-depth=900' -fno-exceptions '-DGOOGLE_CUDA=1' '-DTENSORFLOW_USE_NVCC=1' '-DTENSORFLOW_USE_XLA=1' -DINTEL_MKL -msse3 -pthread '-nvcc_options=relaxed-constexpr' '-nvcc_options=ftz=true' -c tensorflow/core/kernels/image/crop_and_resize_op_gpu.cu.cc -o bazel-out/k8-opt/bin/tensorflow/core/kernels/image/_objs/crop_and_resize_op_gpu/crop_and_resize_op_gpu.cu.pic.o)
2022-08-05T04:57:58.4479524Z # Configuration: 813bf7b852fa211e0a6b6de193b15a0f7335c893bd7ccb5689e6309f66dfdcb8
2022-08-05T04:57:58.4480194Z # Execution platform: @local_execution_config_platform//:platform
2022-08-05T04:57:58.4481480Z nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
2022-08-05T04:57:58.4483034Z nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
2022-08-05T04:57:58.4483864Z ./tensorflow/stream_executor/dnn.h(790): error: namespace "std" has no member "optional"
2022-08-05T04:57:58.4484247Z 
2022-08-05T04:57:58.4484645Z ./tensorflow/stream_executor/dnn.h(790): error: expected a ")"
2022-08-05T04:57:58.4484930Z 
2022-08-05T04:57:58.4485372Z ./tensorflow/stream_executor/dnn.h(801): error: namespace "std" has no member "optional"
2022-08-05T04:57:58.4485699Z 
2022-08-05T04:57:58.4486128Z ./tensorflow/stream_executor/dnn.h(801): error: expected a ")"
2022-08-05T04:57:58.4486417Z 
2022-08-05T04:57:58.4486865Z ./tensorflow/stream_executor/dnn.h(807): error: qualified name is not allowed
2022-08-05T04:57:58.4487181Z 
2022-08-05T04:57:58.4487665Z ./tensorflow/stream_e

ngam and others added 3 commits August 5, 2022 12:32
Not gonna bother with fixing this... 

Error in fail: The following libraries cannot be linked either statically or dynamically:
@cub_archive//:cub
To ignore which libraries get linked statically for now, add the following to 'static_deps':
        "@cub_archive//:__subpackages__",
@h-vetinari
Copy link
Member

The recipe currently contains both:

  # We specify a version of abseil_cpp that is compatible with C++17
  skip: true  # [abseil_cpp != '20211102.0']

as well as

  - name: libtensorflow
    [...]
      host:
        - abseil-cpp 20211102.0

which seems contradictory.

@hmaarrfk @h-vetinari I lost track of what is ready for c++17 and what is not. The cuda11x builds are failing with c++17 errors. As you see below the compiler call includes c++17, but the error is obviously c++17 specific. It could be something else bringing c++14 (or something else) with it?

Any advice? How to make sure cuda11x builds actually work/compile with c++17?

the abseil shared builds should be C++17 already. Based on the error you pasted, it's that the code itself depends on a C++17 facility (std::optional), but perhaps something in our infra or bazel or ...? is still injecting a C++14 somewhere.

@ngam
Copy link
Contributor Author

ngam commented Aug 5, 2022

The recipe currently contains both:

  # We specify a version of abseil_cpp that is compatible with C++17
  skip: true  # [abseil_cpp != '20211102.0']

as well as

  - name: libtensorflow
    [...]
      host:
        - abseil-cpp 20211102.0

which seems contradictory.

it's largely okay, but should not be there for future iterations, so fixed it

@hmaarrfk @h-vetinari I lost track of what is ready for c++17 and what is not. The cuda11x builds are failing with c++17 errors. As you see below the compiler call includes c++17, but the error is obviously c++17 specific. It could be something else bringing c++14 (or something else) with it?
Any advice? How to make sure cuda11x builds actually work/compile with c++17?

the abseil shared builds should be C++17 already. Based on the error you pasted, it's that the code itself depends on a C++17 facility (std::optional),

but perhaps something in our infra or bazel or ...? is still injecting a C++14 somewhere.

yep, see
a5700d3

I think we will need to work on this custom toolchain, I may need to get the actual package finally and apply fixes globally...

good points, also the nvcc constrains break stuff

https://github.com/conda-forge/bazel-toolchain-feedstock

@ngam
Copy link
Contributor Author

ngam commented Aug 5, 2022

Maybe we will finally manage to build on the CI??? LOL

tensorflow/tensorflow#55611

integrating in #261

@ngam ngam changed the title try 2.10.0.rc0 2.10.0.rc0 Aug 5, 2022
@ngam
Copy link
Contributor Author

ngam commented Aug 6, 2022

The conclusion from this: We will be ready once 2.10 is out, but we we need to fine-tune both the custom_toolchain and our deps (e.g. protobuf and libprotobuf). I will leave these PRs (this and #261) open for a bit longer and then I will close them.

@h-vetinari
Copy link
Member

Thanks a lot for your work on this! Very timely, as previously. 🙃

The conclusion from this: We will be ready once 2.10 is out, but we we need to fine-tune both the custom_toolchain and our deps (e.g. protobuf and libprotobuf). I will leave these PRs (this and #261) open for a bit longer and then I will close them.

Why close them if we can take them as the basis for iterating? What protobuf changes are necessary? C++17 builds?

@ngam
Copy link
Contributor Author

ngam commented Aug 6, 2022

What protobuf changes are necessary? C++17 builds?

I am not sure still. The error is clearer in the jaxlib feedstock conda-forge/jaxlib-feedstock#122 and I think it is related. I suspect we cannot build with 3.20 and we have to pin to lower than that. Or, as recommended by the jax people, bundle it altogether since different libraries (e.g. jax, tensorflow, etc.) reexport the shared symbols and then they clash and segfaults. That has serious implications for other shared libraries (e.g. zlib) as you can see from the history of my tries to figure out the jaxlib problems

We also have problems with cuda 11.1 for some bizarre reason, I will investigate later if it actually carries to 11.2 and 11.0 (I disabled 10.2 completely). If the tensorflow team moved like the JAX team did, then we will likely should only build 11.2+ going forward. (Most jax builds <11.2 failed.) I will keep looking into the commit history to see what, if any, clues are there about this

Yes, we can keep them open :) I am just allergic to keep things hanging, I like the sense of "completeness" when things are closed, but let's keep them open since we can reuses them for rc1 and rc2 (as applicable)

@h-vetinari
Copy link
Member

Ugh, symbol re-exporting is just blech. If it's not your lib, don't provide the symbols (vendoring something and using it internally is one thing, but seizing other project's symbol namespaces quite another).

@ngam
Copy link
Contributor Author

ngam commented Aug 14, 2022

FYI

This actually fails with an eerily similar error to conda-forge/jaxlib-feedstock#122

something about abseil, grpcc, and protobuf

Will truncate and post the error message

@ngam
Copy link
Contributor Author

ngam commented Aug 14, 2022

Didn't allow me to post the error, so: https://gist.github.com/ngam/bcaf0a446e260f1bd43788dc2e411d51

@ngam ngam mentioned this pull request Aug 14, 2022
5 tasks
@ngam
Copy link
Contributor Author

ngam commented Aug 14, 2022

@h-vetinari @hmaarrfk let me know if the errors look familiar to you at all. It seems to me either:

  • something went wrong with our custom_toolchain
  • we need to pin harder and better for abseil, grpc, and protobuf (I say this because the error messages have always seemed to be about a combination of these three)

@h-vetinari
Copy link
Member

The linker not finding the destructor reminds me of problems I had during conda-forge/sentencepiece-feedstock#26, and the stuff that shows up as missing virtual functions on windows.

This answer might be relevant. Beyond that, I'd use the newest abseil builds, and also try the static ones matching the C++ version used to compile tf

@ngam ngam changed the title 2.10.0.rc0 2.10.0.rc1 Aug 15, 2022
@ngam
Copy link
Contributor Author

ngam commented Aug 17, 2022

Beyond that, I'd use the newest abseil builds, and also try the static ones matching the C++ version used to compile tf

newest abseil builds did the trick, thanks!!!

We need to pin both jaxlib and tensorflow at abseil 2022 going forward.

@ngam ngam mentioned this pull request Aug 17, 2022
5 tasks
@ngam
Copy link
Contributor Author

ngam commented Aug 17, 2022

closing in favor of #264

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants