Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【PIR API adaptor No.140-142】 Migrate lstsq/lu/lu_unpack into pir #58815

Merged
merged 7 commits into from
Nov 20, 2023
Merged

【PIR API adaptor No.140-142】 Migrate lstsq/lu/lu_unpack into pir #58815

merged 7 commits into from
Nov 20, 2023

Conversation

DrRyanHuang
Copy link
Member

@DrRyanHuang DrRyanHuang commented Nov 8, 2023

PR types

Others

PR changes

Others

Description

PIR API 推全升级

lstsq/lu 全部通过
lu_unpack 存在以下问题,check_grad 没有通过

TestLU_UnpackOp
TestLU_UnpackOp2
TestLU_UnpackOp3
TestLU_UnpackOp4

以上四者都未开启 check_grad, 由于精度和shape对不上的问题

2023-11-15 16:17:10 ======================================================================
2023-11-15 16:17:10 ERROR: test_check_grad (test_lu_unpack_op.TestLU_UnpackOp4)
2023-11-15 16:17:10 ----------------------------------------------------------------------
2023-11-15 16:17:10 Traceback (most recent call last):
2023-11-15 16:17:10   File "/workspace/Paddle/build/test/legacy_test/test_lu_unpack_op.py", line 176, in test_check_grad
2023-11-15 16:17:10     self.check_grad(['X'], ['L', 'U'], check_pir=True)
2023-11-15 16:17:10   File "/workspace/Paddle/build/python/op_test.py", line 2878, in check_grad
2023-11-15 16:17:10     self.check_grad_with_place(
2023-11-15 16:17:10   File "/workspace/Paddle/build/python/op_test.py", line 3135, in check_grad_with_place
2023-11-15 16:17:10     pir_grad = self._get_ir_gradient(
2023-11-15 16:17:10   File "/workspace/Paddle/build/python/op_test.py", line 3675, in _get_ir_gradient
2023-11-15 16:17:10     outs = executor.run(
2023-11-15 16:17:10   File "/workspace/Paddle/build/python/paddle/base/executor.py", line 1707, in run
2023-11-15 16:17:10     res = self._run_pir_impl(
2023-11-15 16:17:10   File "/workspace/Paddle/build/python/paddle/base/executor.py", line 2022, in _run_pir_impl
2023-11-15 16:17:10     ret = new_exe.run(list(feed.keys()), return_numpy)
2023-11-15 16:17:10   File "/workspace/Paddle/build/python/paddle/base/executor.py", line 821, in run
2023-11-15 16:17:10     tensors = self._new_exe.run(feed_names)._move_to_list()
2023-11-15 16:17:10 ValueError: 
2023-11-15 16:17:10 
2023-11-15 16:17:10   Compile Traceback (most recent call last):
2023-11-15 16:17:10 
2023-11-15 16:17:10 --------------------------------------
2023-11-15 16:17:10 C++ Traceback (most recent call last):
2023-11-15 16:17:10 --------------------------------------
2023-11-15 16:17:10 0   paddle::framework::ThreadPoolTempl<paddle::framework::StlThreadEnvironment>::WorkerLoop(int)
2023-11-15 16:17:10 1   paddle::framework::PirInterpreter::RunInstructionBaseAsync(unsigned long)
2023-11-15 16:17:10 2   paddle::framework::PirInterpreter::RunInstructionBase(paddle::framework::InstructionBase*)
2023-11-15 16:17:10 3   paddle::framework::PhiKernelInstruction::Run()
2023-11-15 16:17:10 4   phi::KernelImpl<void (*)(phi::CPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, phi::DenseTensor const&, phi::DenseTensor const&, phi::DenseTensor const&, phi::DenseTensor const&, phi::DenseTensor const&, bool, bool, phi::DenseTensor*), &(void phi::LUUnpackGradKernel<double, phi::CPUContext>(phi::CPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, phi::DenseTensor const&, phi::DenseTensor const&, phi::DenseTensor const&, phi::DenseTensor const&, phi::DenseTensor const&, bool, bool, phi::DenseTensor*))>::Compute(phi::KernelContext*)
2023-11-15 16:17:10 5   void phi::LUUnpackGradKernel<double, phi::CPUContext>(phi::CPUContext const&, phi::DenseTensor const&, phi::DenseTensor const&, phi::DenseTensor const&, phi::DenseTensor const&, phi::DenseTensor const&, phi::DenseTensor const&, phi::DenseTensor const&, bool, bool, phi::DenseTensor*)
2023-11-15 16:17:10 6   phi::funcs::GetMidDims(phi::DDim const&, phi::DDim const&, int, int*, int*, int*, int*)
2023-11-15 16:17:10 7   phi::enforce::EnforceNotMet::EnforceNotMet(phi::ErrorSummary const&, char const*, int)
2023-11-15 16:17:10 8   phi::enforce::GetCurrentTraceBackString[abi:cxx11](bool)
2023-11-15 16:17:10 
2023-11-15 16:17:10 ----------------------
2023-11-15 16:17:10 Error Message Summary:
2023-11-15 16:17:10 ----------------------
2023-11-15 16:17:10 InvalidArgumentError: Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [10, 12] and the shape of Y = [10, 10]. Received [12] in X is not equal to [10] in Y.
2023-11-15 16:17:10   [Hint: Expected y_dims[i] == 1 || x_dims[i + axis] == 1 == true, but received y_dims[i] == 1 || x_dims[i + axis] == 1:0 != true:1.] (at ../paddle/phi/kernels/funcs/elementwise_utils.h:55)
2023-11-15 16:17:10   [operator < pd_kernel.phi_kernel > error]
2023-11-15 16:17:10 

2023-11-15 16:17:10 ======================================================================
2023-11-15 16:17:10 FAIL: test_check_grad (test_lu_unpack_op.TestLU_UnpackOp2)
2023-11-15 16:17:10 ----------------------------------------------------------------------
2023-11-15 16:17:10 Traceback (most recent call last):
2023-11-15 16:17:10   File "/workspace/Paddle/build/test/legacy_test/test_lu_unpack_op.py", line 176, in test_check_grad
2023-11-15 16:17:10     self.check_grad(['X'], ['L', 'U'], check_pir=True)
2023-11-15 16:17:10   File "/workspace/Paddle/build/python/op_test.py", line 2878, in check_grad
2023-11-15 16:17:10     self.check_grad_with_place(
2023-11-15 16:17:10   File "/workspace/Paddle/build/python/op_test.py", line 3159, in check_grad_with_place
2023-11-15 16:17:10     self._assert_is_close(
2023-11-15 16:17:10   File "/workspace/Paddle/build/python/op_test.py", line 2836, in _assert_is_close
2023-11-15 16:17:10     self.assertLessEqual(max_diff, max_relative_error, err_msg())
2023-11-15 16:17:10 AssertionError: 1.0 not less than or equal to 1e-07 : Operator lu_unpack error, Gradient Check On Place(cpu) variable X (shape: (2, 10, 10), dtype: float64) max gradient diff 1.000000e+00 over limit 1.000000e-07, the first error element is 10, expected 2.500000e-03, but got 0.000000e+00.
2023-11-15 16:17:10 
2023-11-15 16:17:10 ----------------------------------------------------------------------

@DrRyanHuang

This comment was marked as off-topic.

@0x45f
Copy link
Contributor

0x45f commented Nov 14, 2023

lstsq API内关于place的相关代码内部需要讨论下如何处理~

@0x45f
Copy link
Contributor

0x45f commented Nov 14, 2023

lstsq API内关于place的相关代码内部需要讨论下如何处理~

辛苦将python/paddle/device/init.py文件get_device函数内的place = framework._current_expected_place()修改为place = framework._current_expected_place_()然后跑下ci再看看呢~

@DrRyanHuang
Copy link
Member Author

@0x45f 打扰了,我修复了dtype不适配pir的问题,现在出现了 Broadcast dimension mismatch 的问题,可以帮忙看一下吗

@0x45f
Copy link
Contributor

0x45f commented Nov 17, 2023

check_grad报错的可以先跳过,在pr描述中标记一下,我找相关的同学看下~

python/paddle/tensor/linalg.py Show resolved Hide resolved
@0x45f 0x45f merged commit 1585549 into PaddlePaddle:develop Nov 20, 2023
28 checks passed
@DrRyanHuang DrRyanHuang deleted the 140-142 branch November 20, 2023 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants