Skip to content
This repository has been archived by the owner on Dec 7, 2023. It is now read-only.

node: The number of files in Chain_xxx folder keeps growing rapidly when the plugin RpcNep5Tracker was installed. #419

Closed
nicolegys opened this issue Jul 30, 2019 · 12 comments
Labels

Comments

@nicolegys
Copy link
Contributor

When I restart the neo-cli, the size of chain folder will be reduced to 12G, and number of files will be about 7k. Such as:
image
image

But the number and size will keep growing rapidly then.
Here is the screen shot, about 24 hours after my last retarting.
image
image
image
And as time goes on, growing......growing~
It seems that if the disk was large enough, the number of files would continue to increase.

Here is the screen shot of LOG file, seems no deleting after compacting.
image

Then, we found that when the plugin RpcNep5Tracker was not installed, this issue didn't appear.

@shargon
Copy link
Member

shargon commented Jul 30, 2019

Is expected, we index more information, and is stored there

@nicolegys
Copy link
Contributor Author

Is expected, we index more information, and is stored there

But the size will keep growing, someone saw 158G today.
image
The disk will be full someday because it's size is limited. For example, my disk is only 100G.
I think we should fix the issue, otherwise we would repeated restarting in the long future. TwT

@Qiao-Jin
Copy link
Contributor

Qiao-Jin commented Jul 30, 2019

Is expected, we index more information, and is stored there

The direct reason should probably be that unreleased snapshots prevent leveldb compaction from deleting outdated ldb files, after observation & experiments.

After looking into the leveldb log file on the specific problematic client, I observed that compactions failed to delete any outdated ldb file. I guessed that something such as snapshots prevent file deletion. My experiments & corresponding results are as follows:

  1. I built up a local leveldb env & kept on inserting notes as well as creating snapshots without releasing them. In this period I tried compaction but failed to delete outdated files, as what I expected. The number of ldb files kept on rising even when I was just inserting duplicate notes.

  2. The specific problem in this issue never occurs after removing ALL usage of func leveldb_create_snapshot in the code.

So it's obvious that the reason of this issue should probably be conflicts between snapshots & compaction.

I also observed that there are mulitple places in the code where snapshots are created but never released. We are testing to see whether the problem will re-occur after correction.

@erikzhang
Copy link
Member

@Qiao-Jin So is it a plugin bug?

@Qiao-Jin
Copy link
Contributor

Qiao-Jin commented Jul 31, 2019

@Qiao-Jin So is it a plugin bug?

The problem might be indirectly caused by some problem in the plugin, say, some exceptions, but the direct reason should be some db snapshots failed to be released. I'm looking for the such snapshots in the code.

@superboyiii
Copy link
Member

@Qiao-Jin So is it a plugin bug?

Now RPCNep5Tracker exposed this bug. neo-cli works well without RPCNep5Tracker.

@erikzhang
Copy link
Member

So you believe the bug is in RpcNep5Tracker? I simply checked the RpcNep5Tracker code and found no problems.

@superboyiii
Copy link
Member

So you believe the bug is in RpcNep5Tracker? I simply checked the RpcNep5Tracker code and found no problems.

Yes, although there seems no obvious relationship between this plugin and this issue. But I've made tests many times for three days. You could try to sync two neo-cli in two different servers, one with RPCNepTracker and one without, syncing to the latest height and wait for four or five hours. You will find absolutely different results of available disk space.

@Qiao-Jin
Copy link
Contributor

Qiao-Jin commented Aug 2, 2019

We retried the version removing ALL usage of func leveldb_create_snapshot in the code for a whole day, and this problem never occurs.

@HayesData
Copy link

HayesData commented Aug 3, 2019

FWIW I'm seeing the exact same issue occur on our node pool as well.

This is a horrible hack way of working around this but we need the RpcNep5Tracker plugin on our nodes.

Until somebody resolves the issue this is the (again admittedly horribly hacky) way I've worked around this problem.

It's simply fired with a cron job every 15 minutes.

#!/bin/bash

THRESHOLD=90
PERCENT_USED=`df -hT / | grep / | awk '{ print $6}' | sed s'/.$//'`

if (( PERCENT_USED >= THRESHOLD )); then
        echo `/bin/date` "- TIME TO RESTART NEO, PRIMARY PARTION "$PERCENT_USED"% FULL"
        /usr/sbin/service neo stop
        /bin/sleep 1
        /usr/sbin/service neo start
        echo `/bin/date` "- NEO HAS BEEN RESTARTED"
fi

Note: I should also mention this syntax is based around the fact we have the neo-cli rpcserver being maintained as a systemd daemon.

@Qiao-Jin
Copy link
Contributor

Qiao-Jin commented Aug 9, 2019

Some similiar bugs reported by leveldb users:
Level/leveldown#273
google/leveldb#164

@shargon shargon changed the title The number of files in Chain_xxx folder keeps growing rapidly when the plugin RpcNep5Tracker was installed. node: The number of files in Chain_xxx folder keeps growing rapidly when the plugin RpcNep5Tracker was installed. Dec 5, 2023
@shargon shargon added the node label Dec 5, 2023
@shargon
Copy link
Member

shargon commented Dec 5, 2023

Old, if remains, please re-open

@shargon shargon closed this as completed Dec 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants