Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Round bhattacharayya #4340

Merged
merged 1 commit into from
Jan 17, 2020
Merged

Conversation

AndrejaKovacic
Copy link
Contributor

Issue

Due to floating point precision, bhattacharayya distance would sometimes return negative distances.

Description of changes

Rounding numbers to 13th digit. I think this is precise enough for our purposes, otherwise, we could use decimal class instead of floats.

Includes
  • Code changes

@codecov
Copy link

codecov bot commented Jan 16, 2020

Codecov Report

Merging #4340 into master will increase coverage by 0.06%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #4340      +/-   ##
==========================================
+ Coverage   86.85%   86.91%   +0.06%     
==========================================
  Files         396      396              
  Lines       71828    71991     +163     
==========================================
+ Hits        62383    62571     +188     
+ Misses       9445     9420      -25

Copy link
Contributor

@janezd janezd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be solved by clipping, not rounding.

return -np.log(np.sum(np.sqrt(a.multiply(b))))
return -np.log(np.sum(np.sqrt(a * b)))
return np.clip(-np.log(np.sum(np.sqrt(a.multiply(b)))), 0, None)
return np.clip(-np.log(np.sum(np.sqrt(a * b))), 0, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I'm being an a..., I mean, annoying, but I would find it much nicer if you changed the code to something like

if sp.issparse(a):
    prod = a.multiply(b)
else:
    prod = a * b
return np.clip(-np.log(np.sum(np.sqrt(prod))

Or even prod = a.multiply(a) if sp.issparse(a) else a * b.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does look nicer, i changed it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, is the condition even needed? Wouldn't a.multiply(b) work to with sparse and dense matrices?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to think I haven't tried. :)

>>> import numpy as np
>>> a = np.arange(20).reshape(4, 5)
>>> a.multiply(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'multiply'

There is np.multiply, but it doesn't work for sparse matrices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, numpy arrays don't have multiply function, only the numpy module has it.

@janezd janezd merged commit 31cc232 into biolab:master Jan 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants