-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation fault #540
Comments
I don't see an obvious reason why this could go wrong. |
I also get a segmentation fault (also using cpu). This happens only if the faiss index is a member of an object, for instance here: import faiss
import numpy as np
class Products:
def __init__(self, path):
self.embeddings = np.load(f'{path}/embeddings.npy') # this loads a ~ 100000x512 float32 array
quantizer = faiss.IndexFlatIP(512)
self.index = faiss.IndexIVFFlat(quantizer, 512, 100, faiss.METRIC_L2)
self.index.train(self.embeddings)
self.index.add(self.embeddings)
def find_nearest(self, index, n):
return self.index.search(self.embeddings[index].reshape(1, -1), n)
p = Products('path/to/the/npy')
p.find_nearest(100, 10) # segfault happens here When implementing the same without a class, there is no segmentation fault: import faiss
import numpy as np
path = 'path/to/the/npy'
embeddings = np.load(f'{path}/embeddings.npy') # this loads a ~ 100000x512 float32 array
quantizer = faiss.IndexFlatIP(512)
index = faiss.IndexIVFFlat(quantizer, 512, 100, faiss.METRIC_L2)
index.train(embeddings)
index.add(embeddings)
def find_nearest(i, n):
return index.search(embeddings[i].reshape(1, -1), n)
find_nearest(100, 10) # no segfault, works as expected |
@wuhu This is because the quantizer gets garbage-collected by python. You could do |
@beauby Thanks for the quick reply! That worked. |
This seems to be another case of crashes in Python code. That makes me wonder: what would be the consequences of having the bindings automatically keep references to nested indexes on construction of new indexes (as in, apply the |
It is on our TODO list. It requires to do some SWIG trick that adds that reference when an object is passed in as a reference to a few dozen functions and constructors.
|
sorry, i reply so late. i double checked my code, and i found the same problem as @wuhu, not caused by the line i mentioned |
@godkillok Great – can we close this issue then? |
It should be documented at least somewhere while the automatic reference counting is not implemented yet. |
@asanakoy It is actually done now. If you are encountering the same behavior, it is a bug, in which case please open a separate issue. |
Segmentation fault
Running on:
Interface:
[ v] Python
training_vectors.shape is (2357720, 100). Training is done, but when go to search< index.search(training_vectors[0:10000], 100) > , it always report "Segmentation fault".
the code as follow:
(num, d) = training_vectors.shape
t1 = time.time()
nlist = max(5,int(num/500))
normalize_L2(training_vectors)
quantizer=faiss.IndexFlatIP(d)
index = faiss.IndexIVFFlat(quantizer, d, nlist, faiss.METRIC_INNER_PRODUCT)
index.train(training_vectors)
index.nprobe =max(1,int(nlist*0.6)) # default nprobe is 1, try a few more
index.add(training_vectors)
t2 = time.time()
logging.info('{} times is {}'.format('add and train', t2 - t1))
t1=time.time()
score, sim_id=index.search(training_vectors[0:10000], 100) # this line goes wrong
t2=time.time()
The text was updated successfully, but these errors were encountered: