Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exact KNN is providing wrong result (rounding issue?) #2

Open
Laurae2 opened this issue May 6, 2019 · 0 comments
Open

Exact KNN is providing wrong result (rounding issue?) #2

Laurae2 opened this issue May 6, 2019 · 0 comments

Comments

@Laurae2
Copy link

Laurae2 commented May 6, 2019

It seems the exact KNN is providing wrong results when distances are very close to each other (~1e-8).

I am providing the following code for reproducibility:

library(atriar)

my_mat <- readRDS("my_mat.rds")
my_tree <- readRDS("my_tree.rds")

nrows <- 10000
ncols <- 100
k <- 100

tree_searcher <- atriar::create_searcher(my_mat,
                                         metric = "manhattan",
                                         cluster_max_points = 64,
                                         seed = 1)

system.time({atira_tree <- atriar::search_k_neighbors(
  searcher = tree_searcher,
  k = k,
  query_points = my_mat,
  exclude = cbind(seq_len(nrows), seq_len(nrows)),
  epsilon = 0
)})

release_searcher(tree_searcher)

all.equal(my_tree$Index + 1, atira_tree$index)
which(my_tree$Index + 1 != atira_tree$index)
all.equal(my_tree$Index[9815, 75:76] + 1, atira_tree$index[9815, 75:76])

my_tree$Index[9815, 75:76]
atira_tree$index[9815, 75:76]
sum(abs(my_mat[9815, ] - my_mat[9271, ])) - sum(abs(my_mat[9815, ] - my_mat[2452, ]))
sum(abs(my_mat[9815, ] - my_mat[9271, ])) < sum(abs(my_mat[9815, ] - my_mat[2452, ]))
# TRUE, hence 9271 < 2452 in knn ordering, while atria returns 2452 < 9271

The last 4 lines show the following error:

> my_tree$Index[9815, 75:76]
[1] 9270 2451
> atira_tree$index[9815, 75:76]
[1] 2452 9271
> sum(abs(my_mat[9815, ] - my_mat[9271, ])) - sum(abs(my_mat[9815, ] - my_mat[2452, ]))
[1] -8.249827e-08
> sum(abs(my_mat[9815, ] - my_mat[9271, ])) < sum(abs(my_mat[9815, ] - my_mat[2452, ]))
[1] TRUE
> # TRUE, hence 9271 < 2452 in knn ordering, while atria returns 2452 < 9271

You can find my debug files below containing:

  • my_mat.rds: the matrix to compute the KNN on (10K rows x 100 columns)
  • my_tree.rds: the expected exact KNN results for each row (k = 100, 0-indexed)

bugs.zip

The issue is on row 9815, at k = 75 and 76, where the distances are the following:

> print(sum(abs(my_mat[9815, ] - my_mat[9271, ])), digits = 18)
[1] 54.6829330527285649
> print(sum(abs(my_mat[9815, ] - my_mat[2452, ])), digits = 18)
[1] 54.6829331352268326
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant