Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

version unmatched when i build from source code #1083

Open
ehuaa opened this issue Mar 19, 2023 · 7 comments
Open

version unmatched when i build from source code #1083

ehuaa opened this issue Mar 19, 2023 · 7 comments

Comments

@ehuaa
Copy link

ehuaa commented Mar 19, 2023

After I git clone this project, i tried to compile from source code.
when i ran into bash ./scripts/build_pytorch_blade.sh
i got this error while my pytorch version is

1679227207233

torch.version
'2.1.0.dev20230316+cpu'
and torchvision.version is
'0.16.0.dev20230316+cpu'

which are the latest version right from the pip.
Can you tell me a way not to downgrade my pytorch or torchvision and install BladeDisc Successfully by change build_pytorch_blade.sh or requirements.txt a little bit? Thank you very much!

@ehuaa
Copy link
Author

ehuaa commented Mar 19, 2023

@wyzero

@tanyokwok
Copy link
Collaborator

@ehuaa The issue was raised because the script wants to install torch==1.7.1+cu110 as it depends; this is configured via TORCH_BLADE_CI_BUILD_TORCH_VERSION, see https://github.com/alibaba/BladeDISC/blob/main/pytorch_blade/scripts/build_pytorch_blade.sh#L32.

BladeDISC already supports torch 2.0; You can skip the torch pip installation in the script build_pytorch_blade.sh.

@ehuaa
Copy link
Author

ehuaa commented Mar 19, 2023

@ehuaa The issue was raised because the script wants to install torch==1.7.1+cu110 as it depends; this is configured via TORCH_BLADE_CI_BUILD_TORCH_VERSION, see https://github.com/alibaba/BladeDISC/blob/main/pytorch_blade/scripts/build_pytorch_blade.sh#L32.

BladeDISC already supports torch 2.0; You can skip the torch pip installation in the script build_pytorch_blade.sh.

okay, thanks, i will try it later. And the docker image was not used during the installation of building from source code, i wonder if i missed some steps...

@ehuaa
Copy link
Author

ehuaa commented Mar 22, 2023

ERROR: @local_config_cuda//:enable_cuda :: Error loading option @local_config_cuda//:enable_cuda: no such package '@llvm-raw//utils/bazel': java.io.IOException: Error downloading [https://storage.googleapis.com/mirror.tensorflow.org/github.com/llvm/llvm-project/archive/8c712296fb75ff73db08f92444b35c438c01a405.tar.gz, https://github.com/llvm/llvm-project/archive/8c712296fb75ff73db08f92444b35c438c01a405.tar.gz] to /home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/external/llvm-raw/temp14641944661529926033/8c712296fb75ff73db08f92444b35c438c01a405.tar.gz: Premature EOF
Traceback (most recent call last):
File "/home/banach/BladeDISC/pytorch_blade/setup.py", line 151, in
setup(
File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/init.py", line 108, in setup
return distutils.core.setup(**attrs)
File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/dist.py", line 1221, in run_command
super().run_command(command)
File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/banach/BladeDISC/pytorch_blade/setup.py", line 107, in run
self.cpp_run()
File "/home/banach/BladeDISC/pytorch_blade/setup.py", line 91, in cpp_run
build.test()
File "/home/banach/BladeDISC/pytorch_blade/bazel_build.py", line 283, in test
subprocess.check_call(test_cmd, shell=True, env=env, executable="/bin/bash")
File "/home/banach/anaconda3/envs/p310/lib/python3.10/subprocess.py", line 369, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -e; set -o pipefail; source .bazel_pyenv/bin/activate; bazel test --action_env PYTHON_BIN_PATH=/home/banach/anaconda3/envs/p310/bin/python3 --action_env BAZEL_LINKLIBS=-lstdc++ --action_env CC=/usr/bin/gcc --action_env CXX=/usr/bin/g++ --action_env DISC_FOREIGN_MAKE_JOBS=32 --copt=-DPYTORCH_VERSION_STRING="2.1.0.dev20230319+cu117" --copt=-DPYTORCH_MAJOR_VERSION=2 --copt=-DPYTORCH_MINOR_VERSION=1 --copt=-DTORCH_BLADE_CUDA_VERSION=11.7 --action_env TORCH_BLADE_TORCH_INSTALL_PATH=/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/torch --copt=-DPYBIND11_COMPILER_TYPE="_gcc" --copt=-DPYBIND11_STDLIB="_libstdcpp" --copt=-DPYBIND11_BUILD_ABI="_cxxabi1011" --config=torch_debug --config=torch_tensorrt --action_env TENSORRT_INSTALL_PATH=/usr/local/TensorRT/ --action_env NVCC=/usr/local/cuda/bin/nvcc --config=torch_enable_quantization --config=torch_cxx11abi_0 --config=torch_cuda //tests/mhlo/... //pytorch_blade:torch_blade_test_suite //tests/torch-disc-pdll/tests/... //tests/torchscript/...' returned non-zero exit status 2.
I ran into this error after i comment the torch pip installation and the url above cannot be reached is there anything wrong with my installation steps? i didn't run the nvidia docker. @tanyokwok

@tanyokwok
Copy link
Collaborator

@ehuaa BladeDISC workspace is built with bazel. Thus we use bazel to resolve a lot of project third-party dependencies.

The error might be caused since there is a downloading failure. Please check your network and retry.

@ehuaa
Copy link
Author

ehuaa commented Mar 22, 2023

**/tests/torchscript:since_1_14.graph.test FAILED in 0.8s
/home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/execroot/org_torch_blade/bazel-out/k8-dbg/testlogs/tests/torchscript/since_1_14.graph.test/test.log

Executed 37 out of 37 tests: 36 tests pass and 1 fails locally.**
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.
INFO: Build completed, 1 test FAILED, 12791 total actions
Traceback (most recent call last):
File "/home/banach/BladeDISC/pytorch_blade/setup.py", line 151, in
setup(
File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/init.py", line 108, in setup
return distutils.core.setup(**attrs)
File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/dist.py", line 1221, in run_command
super().run_command(command)
File "/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/banach/BladeDISC/pytorch_blade/setup.py", line 107, in run
self.cpp_run()
File "/home/banach/BladeDISC/pytorch_blade/setup.py", line 91, in cpp_run
build.test()
File "/home/banach/BladeDISC/pytorch_blade/bazel_build.py", line 283, in test
subprocess.check_call(test_cmd, shell=True, env=env, executable="/bin/bash")
File "/home/banach/anaconda3/envs/p310/lib/python3.10/subprocess.py", line 369, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -e; set -o pipefail; source .bazel_pyenv/bin/activate; bazel test --action_env PYTHON_BIN_PATH=/home/banach/anaconda3/envs/p310/bin/python3 --action_env BAZEL_LINKLIBS=-lstdc++ --action_env CC=/usr/bin/gcc --action_env CXX=/usr/bin/g++ --action_env DISC_FOREIGN_MAKE_JOBS=32 --copt=-DPYTORCH_VERSION_STRING="2.1.0.dev20230319+cu117" --copt=-DPYTORCH_MAJOR_VERSION=2 --copt=-DPYTORCH_MINOR_VERSION=1 --copt=-DTORCH_BLADE_CUDA_VERSION=11.7 --action_env TORCH_BLADE_TORCH_INSTALL_PATH=/home/banach/anaconda3/envs/p310/lib/python3.10/site-packages/torch --copt=-DPYBIND11_COMPILER_TYPE="_gcc" --copt=-DPYBIND11_STDLIB="_libstdcpp" --copt=-DPYBIND11_BUILD_ABI="_cxxabi1011" --config=torch_debug --config=torch_tensorrt --action_env TENSORRT_INSTALL_PATH=/usr/local/TensorRT/ --action_env NVCC=/usr/local/cuda/bin/nvcc --config=torch_enable_quantization --config=torch_cxx11abi_0 --config=torch_cuda //tests/mhlo/... //pytorch_blade:torch_blade_test_suite //tests/torch-disc-pdll/tests/... //tests/torchscript/...' returned non-zero exit status 3.

After i fixed the network problem, i failed with one test above ,is this the reason of the traceback above?
the info in since_1_14.graph.test/test.log is:

TEST 'MLIR torchscript :: since_1_14.graph' FAILED
Script:

: 'RUN: at line 1'; /home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/execroot/org_torch_blade/bazel-out/k8-dbg/bin/tests/torchscript/since_1_14.graph.test.runfiles/org_torch_blade/tests/torchscript/shape_analysis_tool --since 1.14.0 -f /home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/execroot/org_torch_blade/bazel-out/k8-dbg/bin/tests/torchscript/since_1_14.graph.test.runfiles/org_torch_blade/tests/torchscript/since_1_14.graph | /home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/execroot/org_torch_blade/bazel-out/k8-dbg/bin/tests/torchscript/since_1_14.graph.test.runfiles/llvm-project/llvm/FileCheck /home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/execroot/org_torch_blade/bazel-out/k8-dbg/bin/tests/torchscript/since_1_14.graph.test.runfiles/org_torch_blade/tests/torchscript/since_1_14.graph

Exit Code: 1

Command Output (stderr):

terminate called after throwing an instance of 'torch::jit::ErrorReport'
what():
Couldn't find an operator for aten::var.correction(Tensor self, int[1]? dim, , int? correction, bool keepdim=False) -> Tensor. Do you have to update a set of hardcoded JIT ops?failed shape propagation in this context. The above operation:
%4 : Tensor = aten::amax(%p1, %3, %1)
The inputs are:
Float(
, *, *, ) from %p1 : Float(, *, *, *) = prim::Param()
int[] from %3 : int[] = prim::ListConstruct(%2)
bool from %1 : bool = prim::Constantvalue=1
:
%cst_1: int = prim::Constantvalue=-1
%dims : int[] = prim::ListConstruct(%cst_1)
%1 : Tensor = aten::amax(%p1, %dims, %true)
~~~~ <--- HERE
return (%1)

/home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/execroot/org_torch_blade/bazel-out/k8-dbg/bin/tests/torchscript/since_1_14.graph.test.runfiles/org_torch_blade/tests/torchscript/since_1_14.graph:15:17: error: CHECK-LABEL: expected string not found in input
// CHECK-LABEL: graph
^
:1:6: note: scanning from here
graph(%self : Float(*, )):
^
:1:16: note: possible intended match here
graph(%self : Float(
, *)):
^

Input file:
Check file: /home/banach/.cache/bazel/_bazel_banach/73d137a07d4a9c12dceaec8145974e25/execroot/org_torch_blade/bazel-out/k8-dbg/bin/tests/torchscript/since_1_14.graph.test.runfiles/org_torch_blade/tests/torchscript/since_1_14.graph

-dump-input=help explains the following input dump.

Input was:
<<<<<<
1: graph(%self : Float(*, *)):
label:15'0 X~~~~~~~~~~~~~~~~~~~~~~ error: no match found
label:15'1 ? possible intended match
2: %1 : int = prim::Constantvalue=0
label:15'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3: %2 : int = prim::Constantvalue=1
label:15'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4: %3 : int = prim::Constantvalue=32
label:15'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5: %4 : int = prim::Constantvalue=512
label:15'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6: %5 : int[] = prim::ListConstruct(%3, %4, %2)
label:15'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.
.
.

Looking foward to your reply, thanks! @tanyokwok

@ehuaa
Copy link
Author

ehuaa commented Mar 22, 2023

i fixed this problem by pull the latest pr you committed last week, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants