Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of KNN based on the Spark-ML-LSH #3

Open
Victor0118 opened this issue Dec 7, 2017 · 0 comments
Open

Implementation of KNN based on the Spark-ML-LSH #3

Victor0118 opened this issue Dec 7, 2017 · 0 comments

Comments

@Victor0118
Copy link
Owner

Victor0118 commented Dec 7, 2017

Hash Function: h_i(x) = floor(r_i.dot(x) / bucketLength)
threshold = 2000
W = bucketLength
NHT = # of HashTables

  • The number of buckets will be (max L2 norm of input vectors) / bucketLength.
  • If input vectors are normalized, 1-10 times of pow(numRecords, -1/inputDim) would be a reasonable value
k NHT W Accuracy_train Accuracy_test T_index T_query
1 3 2 - 0.9087 54 175848
5 3 2 - 0.893 54 174651
9 3 2 - 0.8808 54 155673
1 5 2 - 0.9291 29 251302
5 5 2 - 0.9137 29 275162
9 5 2 - 0.9036 29 367008
1 7 2 - 0.9372 34 523696
5 7 2 - 0.9238 34 460986
9 7 2 - 0.9145 34 485565
1 3 5 - 0.9357 30 367245
5 3 5 - 0.9263 30 340930
9 3 5 - 0.9171 30 341963
1 5 5 - 0.9459 41 596984
5 5 5 - 0.9401 41 559091
9 5 5 - 0.93 41 561646
1 7 5 - 0.9496 22 770659
5 7 5 - 0.9465 22 787571
9 7 5 - 0.9385 22 841044
1 3 8 - 0.9419 37 439672
5 3 8 - 0.9348 37 417642
9 3 8 - 0.9253 37 422822
1 5 8 - 0.9481 24 605899
5 5 8 - 0.9438 24 609686
9 5 8 - 0.9358 24 609061
1 7 8 - 0.9511 22 780209
5 7 8 - 0.9447 22 769710
9 7 8 - 0.9409 22 769710
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant