Segmentation fault #540

godkillok · 2018-07-22T11:18:25Z

Segmentation fault

Running on:

[v] CPU

Interface:

[ v] Python
training_vectors.shape is (2357720, 100). Training is done, but when go to search< index.search(training_vectors[0:10000], 100) > , it always report "Segmentation fault".
the code as follow:

(num, d) = training_vectors.shape
t1 = time.time()

nlist = max(5,int(num/500))
normalize_L2(training_vectors)
quantizer=faiss.IndexFlatIP(d)

index = faiss.IndexIVFFlat(quantizer, d, nlist, faiss.METRIC_INNER_PRODUCT)
index.train(training_vectors)
index.nprobe =max(1,int(nlist*0.6)) # default nprobe is 1, try a few more
index.add(training_vectors)
t2 = time.time()
logging.info('{} times is {}'.format('add and train', t2 - t1))

t1=time.time()

score, sim_id=index.search(training_vectors[0:10000], 100) # this line goes wrong

t2=time.time()

mdouze · 2018-07-23T05:57:23Z

I don't see an obvious reason why this could go wrong.
Does the segfault still occur when searching 1000 vectors instead of 100000?

wuhu · 2018-07-30T11:46:27Z

I also get a segmentation fault (also using cpu).

This happens only if the faiss index is a member of an object, for instance here:

import faiss
import numpy as np

class Products:
    def __init__(self, path):
        self.embeddings = np.load(f'{path}/embeddings.npy')  # this loads a ~ 100000x512 float32 array
        quantizer = faiss.IndexFlatIP(512)
        self.index = faiss.IndexIVFFlat(quantizer, 512, 100, faiss.METRIC_L2)
        self.index.train(self.embeddings)
        self.index.add(self.embeddings)

    def find_nearest(self, index, n):
        return self.index.search(self.embeddings[index].reshape(1, -1), n)

p = Products('path/to/the/npy')
p.find_nearest(100, 10)  # segfault happens here

When implementing the same without a class, there is no segmentation fault:

import faiss
import numpy as np

path = 'path/to/the/npy'
embeddings = np.load(f'{path}/embeddings.npy')  # this loads a ~ 100000x512 float32 array
quantizer = faiss.IndexFlatIP(512)
index = faiss.IndexIVFFlat(quantizer, 512, 100, faiss.METRIC_L2)
index.train(embeddings)
index.add(embeddings)

def find_nearest(i, n):
    return index.search(embeddings[i].reshape(1, -1), n)

find_nearest(100, 10)  # no segfault, works as expected

beauby · 2018-07-30T12:23:58Z

@wuhu This is because the quantizer gets garbage-collected by python. You could do self.quantizer = ... instead, so that it is not GCed before your Products instance is destroyed.

wuhu · 2018-07-30T12:44:43Z

@beauby Thanks for the quick reply! That worked.

Enet4 · 2018-07-30T13:11:20Z

This seems to be another case of crashes in Python code. That makes me wonder: what would be the consequences of having the bindings automatically keep references to nested indexes on construction of new indexes (as in, apply the dont_dealloc_me trick)? I would imagine this to be safer and more predictable than requiring users to keep references by themselves in Python-land (although I probably overlooked something).

mdouze · 2018-07-30T13:29:56Z

It is on our TODO list. It requires to do some SWIG trick that adds that reference when an object is passed in as a reference to a few dozen functions and constructors.
The reason why we have not done it yet is:

I don't know exactly where in SWIG this has to be done
this problem hits only mid-level users. Low-level users know well enough the library and how the references work, and high-level users only use the index_factory that does not have this problem.

godkillok · 2018-07-31T10:47:48Z

sorry, i reply so late. i double checked my code, and i found the same problem as @wuhu, not caused by the line i mentioned

beauby · 2018-07-31T15:02:17Z

@godkillok Great – can we close this issue then?

asanakoy · 2019-07-24T18:07:22Z

It should be documented at least somewhere while the automatic reference counting is not implemented yet.

beauby · 2019-07-24T19:35:02Z

@asanakoy It is actually done now. If you are encountering the same behavior, it is a bug, in which case please open a separate issue.

godkillok changed the title ~~seg~~ Segmentation fault Jul 22, 2018

mdouze added the cant-repro label Jul 23, 2018

godkillok closed this as completed Aug 1, 2018

Jyoti1009 mentioned this issue Jul 27, 2021

Seg Fault while using in python in version 1.7.1 #2001

Closed

mabergerx mentioned this issue May 4, 2022

Segmentation fault while using FAISS 1.7.2 index.search() in FastAPI #2317

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault #540

Segmentation fault #540

godkillok commented Jul 22, 2018 •

edited

Loading

mdouze commented Jul 23, 2018

wuhu commented Jul 30, 2018

beauby commented Jul 30, 2018 •

edited

Loading

wuhu commented Jul 30, 2018

Enet4 commented Jul 30, 2018

mdouze commented Jul 30, 2018

godkillok commented Jul 31, 2018

beauby commented Jul 31, 2018

asanakoy commented Jul 24, 2019 •

edited

Loading

beauby commented Jul 24, 2019

Segmentation fault #540

Segmentation fault #540

Comments

godkillok commented Jul 22, 2018 • edited Loading

Segmentation fault

mdouze commented Jul 23, 2018

wuhu commented Jul 30, 2018

beauby commented Jul 30, 2018 • edited Loading

wuhu commented Jul 30, 2018

Enet4 commented Jul 30, 2018

mdouze commented Jul 30, 2018

godkillok commented Jul 31, 2018

beauby commented Jul 31, 2018

asanakoy commented Jul 24, 2019 • edited Loading

beauby commented Jul 24, 2019

godkillok commented Jul 22, 2018 •

edited

Loading

beauby commented Jul 30, 2018 •

edited

Loading

asanakoy commented Jul 24, 2019 •

edited

Loading