Replies: 1 comment 1 reply
-
Do not load other intel libraries when your environment is managed by conda, as conda may have its intel packages. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Dear Developers,
I am using DeePMD-kit v3.0.0b3 for dpa-2 pretrain and finetune. I installed DeePMD-kit v3.0.0b3 from source successfully, but there was something wrong when it was runing. Can anybody help me fix it? Thanks a lot!!!
command: dp --pt train input_torch.json --finetune ./OpenLAM_2.2.0_27heads_beta3.pt --model-branch H2O_H2O-PD
output:
Traceback (most recent call last):
File "/share/home/xxx/apps/miniconda3/envs/dp-v300b3/lib/python3.9/site-packages/deepmd/pt/cxx_op.py", line 39, in load_library
torch.ops.load_library(module_file)
File "/share/home/xxx/apps/miniconda3/envs/dp-v300b3/lib/python3.9/site-packages/torch/_ops.py", line 1295, in load_library
ctypes.CDLL(path)
File "/share/home/xxx/apps/miniconda3/envs/dp-v300b3/lib/python3.9/ctypes/init.py", line 374, in init
self._handle = _dlopen(self._name, mode)
OSError: /share/apps/intel2020u4/compilers_and_libraries_2020.4.304/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so: undefined symbol: mkl_sparse_z_bsr_ng_avx521_sp_mv_i4
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/share/home/xxx/apps/miniconda3/envs/dp-v300b3/bin/dp", line 8, in
sys.exit(main())
File "/share/home/xxx/apps/miniconda3/envs/dp-v300b3/lib/python3.9/site-packages/deepmd/main.py", line 916, in main
deepmd_main = BACKENDSargs.backend.entry_point_hook
File "/share/home/xxx/apps/miniconda3/envs/dp-v300b3/lib/python3.9/site-packages/deepmd/backend/pytorch.py", line 66, in entry_point_hook
from deepmd.pt.entrypoints.main import main as deepmd_main
File "/share/home/xxx/apps/miniconda3/envs/dp-v300b3/lib/python3.9/site-packages/deepmd/pt/init.py", line 4, in
from deepmd.pt.cxx_op import (
File "/share/home/xxx/apps/miniconda3/envs/dp-v300b3/lib/python3.9/site-packages/deepmd/pt/cxx_op.py", line 95, in
ENABLE_CUSTOMIZED_OP = load_library("deepmd_op_pt")
File "/share/home/xxx/apps/miniconda3/envs/dp-v300b3/lib/python3.9/site-packages/deepmd/pt/cxx_op.py", line 90, in load_library
raise RuntimeError(error_message) from e
RuntimeError: This deepmd-kit package is inconsitent with PyTorch Runtime, thus an error is raised when loading deepmd_op_pt. You need to rebuild deepmd-kit against this PyTorch runtime.
AND, the OSError content varies according to the version of intel:
1)
OSError: /share/apps/intel_ips_xe_2018u3/compilers_and_libraries_2018.3.222/linux/mkl/lib/intel64_lin/libmkl_intel_thread.so: undefined symbol: mkl_blas_zgemm_blk_info_hi_thr_bdz
2)
OSError: /share/home/xxx/apps/intel/oneapi_20240201/mkl/2024.2/lib/libmkl_intel_thread.so: undefined symbol: mkl_sparse_d_csr_seq_sym_u_full_fw_sor_i4
Beta Was this translation helpful? Give feedback.
All reactions