Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XPU][Phi Kernel] xpu::nonzero support simulator XPUSIM_SKIP_RUN mode #60388

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion paddle/phi/kernels/xpu/masked_select_kernel.cc
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@

#include "paddle/phi/kernels/masked_select_kernel.h"

#include "glog/logging.h"

#include "paddle/phi/backends/xpu/enforce_xpu.h"
#include "paddle/phi/core/kernel_registry.h"

Expand Down Expand Up @@ -54,7 +56,13 @@ void MaskedSelectKernel(const Context& dev_ctx,
mask.place(),
static_cast<void*>(out_size),
sizeof(int32_t));

if (std::getenv("XPUSIM_SKIP_RUN") &&
std::strcmp(std::getenv("XPUSIM_SKIP_RUN"), "1") == 0) {
VLOG(3) << "WARNING: In the simulator mode, the variable out_size_cpu "
"stores an uninitialized value. To avoid allocating a memory of "
"random size, we assign numel to out_size_cpu";
out_size_cpu = mask.numel();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

写死最大对后续算子的影响还需要实际跑模型的时候再评估下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

相当于模拟最慢场景的性能

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

性能倒是其次,有可能会影响算子规模

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip_run拿不到不出来真实规模,也可能每个step对应的nonzero_count相关算子的规模也不一样

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

很好的意见,后续有时间可以看看这个算子在模型中的位置以及它的返回值影响的范围。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的

}
DDim out_dim{out_size_cpu};
out->Resize(out_dim);
auto out_data = reinterpret_cast<XPUType*>(dev_ctx.template Alloc<T>(out));
Expand Down
5 changes: 2 additions & 3 deletions paddle/phi/kernels/xpu/nonzero_kernel.cc
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,8 @@ void NonZeroKernel(const Context& dev_ctx,
std::strcmp(std::getenv("XPUSIM_SKIP_RUN"), "1") == 0) {
VLOG(3) << "WARNING: In the simulator mode, the variable true_num_cpu "
"stores an uninitialized value. To avoid allocating a memory of "
"random size, we limit the value of true_num_cpu to the range 0 "
"<= true_num_cpu < numel";
true_num_cpu = std::min(std::max(true_num_cpu, 0), static_cast<int>(numel));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

某些case下会被设为0,后续的模型代码可能会报错,因此还是将其设为最大值。单测在这个PR里面:#60224

"random size, we assign numel to true_num_cpu";
true_num_cpu = numel;
}

out->Resize(common::make_ddim({static_cast<int64_t>(true_num_cpu), rank}));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@

#include "paddle/phi/kernels/sigmoid_cross_entropy_with_logits_grad_kernel.h"

#include "glog/logging.h"

#include "paddle/phi/backends/xpu/enforce_xpu.h"
#include "paddle/phi/backends/xpu/xpu_context.h"
#include "paddle/phi/core/kernel_registry.h"
Expand Down Expand Up @@ -79,6 +81,14 @@ void SigmoidCrossEntropyWithLogitsGradKernel(
dev_ctx.GetPlace(),
static_cast<void*>(non_zero),
sizeof(int));
if (std::getenv("XPUSIM_SKIP_RUN") &&
std::strcmp(std::getenv("XPUSIM_SKIP_RUN"), "1") == 0) {
VLOG(3)
<< "WARNING: In the simulator mode, the variable non_zero_cpu "
"stores an uninitialized value. To avoid allocating a memory of "
"random size, we assign numel to true_num_cpu";
non_zero_cpu = x.numel();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigmoid_cross_entropy_with_logits的单测比较难写,尝试了几种方法都不好调用。这样的写法在其它cases下已经测试过,没有问题。

}
r = xpu::scale(dev_ctx.x_context(),
reinterpret_cast<const XPUType*>(in_grad->data<T>()),
reinterpret_cast<XPUType*>(in_grad->data<T>()),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@

#include "paddle/phi/kernels/sigmoid_cross_entropy_with_logits_kernel.h"

#include "glog/logging.h"

#include "paddle/phi/backends/xpu/enforce_xpu.h"
#include "paddle/phi/backends/xpu/xpu_context.h"
#include "paddle/phi/core/kernel_registry.h"
Expand Down Expand Up @@ -75,7 +77,14 @@ void SigmoidCrossEntropyWithLogitsKernel(
dev_ctx.GetPlace(),
static_cast<void*>(non_zero),
sizeof(int));

if (std::getenv("XPUSIM_SKIP_RUN") &&
std::strcmp(std::getenv("XPUSIM_SKIP_RUN"), "1") == 0) {
VLOG(3)
<< "WARNING: In the simulator mode, the variable non_zero_cpu "
"stores an uninitialized value. To avoid allocating a memory of "
"random size, we assign numel to non_zero_cpu";
non_zero_cpu = x.numel();
}
r = xpu::scale(dev_ctx.x_context(),
reinterpret_cast<const XPUType*>(out->data<T>()),
reinterpret_cast<XPUType*>(out->data<T>()),
Expand Down
15 changes: 15 additions & 0 deletions test/xpu/test_masked_select_op_xpu.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import os
import unittest

import numpy as np
Expand Down Expand Up @@ -108,6 +109,20 @@ def test_static_mode(self):
)
self.assertEqual(np.allclose(res, np_out), True)

def test_simulator_skip_run_mode(self):
os.environ['XPUSIM_SKIP_RUN'] = '1'
paddle.disable_static(paddle.XPUPlace(0))
shape = (88, 6, 8)
np_x = np.random.random(shape).astype('float32')
np_mask = np.array(np.random.randint(2, size=shape, dtype=bool))
x = paddle.to_tensor(np_x)
mask = paddle.to_tensor(np_mask)
out = paddle.masked_select(x, mask)
# only check the numel of output
np.testing.assert_equal(out.numpy().size, np_x.size)
paddle.enable_static()
del os.environ['XPUSIM_SKIP_RUN']


class TestMaskedSelectError(unittest.TestCase):
def test_error(self):
Expand Down