-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: check for NaNs in emd loss matrix #623
base: master
Are you sure you want to change the base?
Conversation
ot/lp/__init__.py
Outdated
|
||
if np.isnan(M).any(): | ||
raise ValueError('The loss matrix should not contain NaN values.') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failing early here ensures that we do not segfault in the accelerated emd_c
call.
I did not look too deep into the emd_c
implementation, but my assumption is that this check is somewhat pessimistic. Maybe it is possible to formulate problems for which we do not need to access a subset of values in the loss matrix (possibly due to the graph being disconnected). In that case we could support NaN values in some cases. @rflamary what is your opinion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the graph is disconnected then the parts that are not used should have an infinite value (which is ha,ndled by the C++ solver). i'm OK with not handling naNs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments. Thanks @bobluppes for the PR
ot/lp/__init__.py
Outdated
@@ -302,6 +304,9 @@ def emd(a, b, M, numItermax=100000, log=False, center_dual=True, numThreads=1, c | |||
ot.optim.cg : General regularized OT | |||
""" | |||
|
|||
if np.isnan(M).any(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A problem here is that you are using numpy on arrays that might not be numpy (see backend function below). You should do the test later in the function on the OT loss marix that hhas been converted to numpy to avoid backend errors.
Types of changes
This PR introduces an additional check for
NaN
s in the loss matrix of the emd computation. IfNaN
s are detected we raise an error in order to protect against segfaults in the C++ backend.Motivation and context / Related issue
The motivation of this PR is to fail more gracefully in cases of
NaN
costs.Closes #469
How has this been tested (if it applies)
Added new tests.
PR checklist
CONTRIBUTING.md