Why open transactions (txn) repeatedly? #22

cnlinxi · 2020-10-11T08:10:43Z

    txn = db.begin(write=True)
    for idx, data in enumerate(data_loader):
        # print(type(data), data)
        image, label = data[0]
        txn.put(u'{}'.format(idx).encode('ascii'), dumps_pyarrow((image, label)))
        if idx % write_frequency == 0:
            print("[%d/%d]" % (idx, len(data_loader)))
            txn.commit()
            txn = db.begin(write=True)

Here you repeatedly commit the data and re-open the transaction to prevent the file from becoming too large? Is this necessary? In practice, I do not find that LMDB is crashed because of too much memory, but it is possible that the dataset I used is too small.
I'm just very strange. The code here looks too wierd.

这里你重复提交数据并重新打开事务，这是为了防止文件过大吗？这是否是有必要的呢？我在实践中并没有发现lmdb因为内存过大而崩溃，但是也有可能我使用的数据集过小。
我只是非常奇怪，毕竟此处的代码看起来太难受了。

Lyken17 · 2020-10-20T15:39:46Z

I remember this is a snippet I found somewhere from stackoverflow since directly commit will lead to crash on some versions.

If you can help test on large scale dataset (e.g., imagenet) to ensure that current lmdb works smoothly, I think we can safely remove the last line.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why open transactions (txn) repeatedly? #22

Why open transactions (txn) repeatedly? #22

cnlinxi commented Oct 11, 2020

Lyken17 commented Oct 20, 2020

Why open transactions (txn) repeatedly? #22

Why open transactions (txn) repeatedly? #22

Comments

cnlinxi commented Oct 11, 2020

Lyken17 commented Oct 20, 2020