Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Performance of NearestNeighbours kneighbours lookup #4021

Open
thomasaarholt opened this issue Jul 1, 2021 · 6 comments
Open

[QST] Performance of NearestNeighbours kneighbours lookup #4021

thomasaarholt opened this issue Jul 1, 2021 · 6 comments
Labels
inactive-90d question Further information is requested

Comments

@thomasaarholt
Copy link

thomasaarholt commented Jul 1, 2021

What is your question?
Is the NearestNeighbors.kneighbors problem well-suited for GPU? I find it quite slow for big data on the GPU. I've ran the following on a single RTX2080TI and a 32-core AMD processor. The .fit() method is very fast, but the .kneighbors() method is not. See below of some context and benchmarks. Note that in #4020 I show the other GPU algorithms crash the kernel.

We can use:

from cuml.neighbors import NearestNeighbors
model = NearestNeighbors()
model.fit(points)
model.kneighbors(new_points)

to find the neighbors of new_points among points, if they both are arrays of coordinates (with shape (N, 2)).

While the creation and fitting of the model to the data is very quick with cuML, the kneighbours lookup is relatively slow when compared with sklearn's CPU implementation:

import numpy as np, cupy as cp
from sklearn.neighbors import NearestNeighbors as NearestNeighborsCPU
from cuml.neighbors import NearestNeighbors as NearestNeighborsGPU
from time import time

n_neighbors = 10
n_samples = 1_000_000
n_unknown = 100_000

X = 1000*np.random.random((n_samples, 2))
unknown = 1000*np.random.random((n_unknown, 2))

print('Sklearn results')
for algo in ['auto', 'ball_tree', 'kd_tree']: # 'brute' takes a *really* long time, so we skip it
    t1 = time()
    knn = NearestNeighborsCPU(n_neighbors=n_neighbors, algorithm=algo)
    knn.fit(X);
    t2 = time()
    estimates = knn.kneighbors(unknown)
    print(f"{algo}: {t2-t1:.2f} and {time()-t2:.2f}")
    
print(\n'cuML results')
X = cp.asarray(X)
unknown = cp.asarray(unknown)

for algo in ['brute']: # see cuML #4020 for why I don't include other algorithms
    t1 = time()
    knn = NearestNeighborsGPU(n_neighbors=n_neighbors,  algorithm=algo)
    knn.fit(X);
    t2 = time()
    estimates = knn.kneighbors(unknown)
    print(f"{algo}: {t2-t1:.2f} and {time()-t2:.2f}")

Doing a simple with "medium" and "large" number of samples, I see the following results:

# times in seconds, for [.fit] and [.kneighbors]
# n_samples = 1_000_000
# n_unknown = 100_000
Sklearn results
auto: 0.95 and 0.62
ball_tree: 0.91 and 1.40
kd_tree: 0.97 and 0.70

cuML results
brute: 0.01 and 2.19

# n_samples = 10_000_000
# n_unknown = 1_000_000
Sklearn results
auto: 39.92 and 31.47
ball_tree: 36.40 and 43.35
kd_tree: 38.36 and 14.45

cuML results
brute: 0.03 and 200.59
@thomasaarholt thomasaarholt added ? - Needs Triage Need team to review and classify question Further information is requested labels Jul 1, 2021
@teju85
Copy link
Member

teju85 commented Jul 1, 2021

@mdoijade can you take a look at this and see why kneighbors is so slow?

@dantegd
Copy link
Member

dantegd commented Jul 1, 2021

@teju85 wouldn't the difference be because this is comparing different algorithms? i.e. brute in GPU (which takes too long in CPU) vs kd_tree/ball_tree?

@teju85
Copy link
Member

teju85 commented Jul 2, 2021

:smh: good catch @dantegd! I overlooked the array in line 14. Yes, this is indeed because the comparison is being made with brute-force against indexing methods. We don't yet have indexing based methods implemented.

@thomasaarholt
Copy link
Author

Right, I should have made that clearer. Just in case this wasn't noticed either, I created #4020, pointing out that the other GPU algorithms crash python on my system.

@hcho3 hcho3 removed the ? - Needs Triage Need team to review and classify label Jul 2, 2021
@cjnolet
Copy link
Member

cjnolet commented Jul 29, 2021

@thomasaarholt, as @dantegd pointed out, the differences here are strictly related to the underlying algorithms. While it may increase training time, we do support the ivpq algorithm, which should significantly minimize the query time at least.

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inactive-90d question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants