Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why open transactions (txn) repeatedly? #22

Open
cnlinxi opened this issue Oct 11, 2020 · 1 comment
Open

Why open transactions (txn) repeatedly? #22

cnlinxi opened this issue Oct 11, 2020 · 1 comment

Comments

@cnlinxi
Copy link

cnlinxi commented Oct 11, 2020

    txn = db.begin(write=True)
    for idx, data in enumerate(data_loader):
        # print(type(data), data)
        image, label = data[0]
        txn.put(u'{}'.format(idx).encode('ascii'), dumps_pyarrow((image, label)))
        if idx % write_frequency == 0:
            print("[%d/%d]" % (idx, len(data_loader)))
            txn.commit()
            txn = db.begin(write=True)

Here you repeatedly commit the data and re-open the transaction to prevent the file from becoming too large? Is this necessary? In practice, I do not find that LMDB is crashed because of too much memory, but it is possible that the dataset I used is too small.
I'm just very strange. The code here looks too wierd.

这里你重复提交数据并重新打开事务,这是为了防止文件过大吗?这是否是有必要的呢?我在实践中并没有发现lmdb因为内存过大而崩溃,但是也有可能我使用的数据集过小。
我只是非常奇怪,毕竟此处的代码看起来太难受了。

@Lyken17
Copy link
Owner

Lyken17 commented Oct 20, 2020

I remember this is a snippet I found somewhere from stackoverflow since directly commit will lead to crash on some versions.

If you can help test on large scale dataset (e.g., imagenet) to ensure that current lmdb works smoothly, I think we can safely remove the last line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants