-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gpu: generic: sycl: lnorm Intel GPU precision issues #2071
base: main
Are you sure you want to change the base?
Conversation
To give a little bit of additional context. I found that in this line. The division is making v_variance slightly different from the reference that is computed in benchdnn (there is a thr=0 for these cases). Maybe that is happening because of some compiler optimization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like an improvement. However I would just set the threshold to 5e-7 for all the cases, as 0 does not make sense as soon as floating point calculations are involved.
Do you know what is the default setting for |
@kala855, benchdnn is very sensitive to numerical issues by design. So in cases like this it's important to understand where the difference is coming from before considering the threshold change. |
We have been checking carefully these days to see what is happening here. Doing a comparison between the assembly generated by icpx on the OCL and SYCL versions of the implementations we found that:
Any suggestions or feedback will be more than welcome. Thanks. |
Thanks for checking. I would suggest:
|
afef1ae
to
030a62b
Compare
Hi @mgouicem I did what you mentioned to try to get the SYCL lnorm benchdnn tests working. The issue was opened and a workaround was pushed in this PR. Thanks for your feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I still believe it would be better to have nonzero threshold for all cases.
030a62b
to
db382e0
Compare
We found that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Maybe we could mention in the comment before the option that this is specifically required for Intel devices.
@@ -84,9 +84,13 @@ status_t ref_batch_normalization_fwd_t::init(impl::engine_t *engine) { | |||
= ::sycl::get_kernel_id<batch_normalization_fwd_kernel_vec_t>(); | |||
CHECK(create_kernel(engine, kid, &kernel_)); | |||
} else { | |||
// Enabling the IEEE div compliant implementation | |||
setenv("SYCL_PROGRAM_COMPILE_OPTIONS", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is not thread-safe to use setenv (e.g. another concurrent call to the same primitive might unset it after this one sets it).
const auto kid = ::sycl::get_kernel_id< | ||
batch_normalization_fwd_kernel_vec_t1>(); | ||
CHECK(create_kernel(engine, kid, &kernel_)); | ||
unsetenv("SYCL_PROGRAM_COMPILE_OPTIONS"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, here it does not restore the env var to its original value.
Is there a programmatic way to control the same knob (that does not involve envvar)? It is typically not thread-safe to modify environment variables and this should be avoided as much as possible IMHO. |
|| (kind == VAR && prb->dir & FLAG_FWD)) | ||
? 5e-7 | ||
: 0); | ||
#else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed with the new option given to the kernel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current PR relies on non thread-safe environment variable setting.
Description
When used in Intel PVC, the layer normalization SYCL kernel implementation faces some precision issues in variance computation.
[ 14][VAR][14] exp_f32: 2.68618e-05 exp: 2.68618e-05 got: 2.68618e-05 diff:1.81899e-12 rdiff:6.77165e-08 [COMPARE_STATS][VAR]: trh=0 err_max_diff:1.81899e-12 err_max_rdiff:6.77165e-08 all_max_diff:1.81899e-12 all_max_rdiff:6.77165e-08 8471:FAILED (errors:1 total:75) __REPRO: --lnorm --engine=gpu --dt=f32:bf16 --tag=axb --stat_tag=abx --flags=CH 15x3_n"lnorm_ci_0d:0"
The previous are just a couple of failing examples.
As a proposal, the variance threshold is modified to pass the failing tests.