RocksDB is not freeing up space after a delete. #8041

ilkoiliev951 · 2021-03-08T10:51:17Z

Note: Please use Issues only for bug reports. For questions, discussions, feature requests, etc. post to dev group: https://groups.google.com/forum/#!forum/rocksdb or https://www.facebook.com/groups/rocksdb.dev

Hello,

Most of our services are using a Kafka store, which as you know is using RocksDB under the hood. We are trying to delete outdated and wrongly formatted records every 6 hours, in order to free up space. Even though the record gets deleted from RocksDB (a tombstone gets added and the record is no longer available), we see no changes in space.

I suppose, that a compaction needs to be triggered, in order compact away the deleted records. However, as far as I know, a leveled compaction gets triggered only when number of L0 files reaches level0_file_num_compaction_trigger. Because of the fact, that my service is consuming almost no data (on dev environment), I believe, that a compaction cannot be triggered and therefore the "deleted" records remain.

Please note, that we are using only the default RocksDB configuration. I also noticed, that when I use #options.setDeleteObsoleteFilesPeriodMicros() in a custom RocksDB config, the size of the local store drops dramatically. However, I am not sure, what the method does exactly. I also read, that there is an option for a periodic compaction.

Any help would be appreciated. Thank you in advance.

riversand963 · 2021-03-17T07:25:04Z

Have you considered deletion-triggered compaction ()? Some simple code example as below.

Options options;
options.create_if_missing = true;
options.table_properties_collector_factories.emplace_back(NewCompactOnDeletionCollectorFactory(100, 90, /*deletion_ratio=*/0.5));
DestroyAndReopen(options);
for (int i = 0; i < 100; ++i) {
  ASSERT_OK(Put("key" + std::to_string(i), "value"));
}
ASSERT_OK(Flush());
for (int i = 0; i < 50; ++i) {
 ASSERT_OK(Delete("key" + std::to_string(i)));
}
ASSERT_OK(Flush());
ASSERT_OK(dbfull()->TEST_WaitForCompact());
ASSERT_EQ(1, NumTableFilesAtLevel(1));

btw, some clarification will be needed.

Even though the record gets deleted from RocksDB (a tombstone gets added and the record is no longer available)

my service is consuming almost no data.

These two sound contradictory, since deletion still writes tombstones to your db that can trigger compaction.

You can also look at TTL compaction which compacts data to the bottommost level even if there is no write to trigger compaction (https://github.com/facebook/rocksdb/blob/6.18.fb/include/rocksdb/advanced_options.h#L721).

ilkoiliev951 · 2021-03-22T11:12:17Z

Alright, I am going to explain in more detail. What I mean by "the record is deleted" is, that it is not longer available in the local Kafka KeyValueStore, when we try to retrieve it. By "my service is consuming almost no data" I mean, that my local instance consumes almost no data from the Kafka broker. However, this is becoming a huge problem on our production environment, where some of the stores exceed 2 GB.

The main problem is, that the old SST files do not get deleted. When I checked, I saw, that there are still file handles from my Java application to the old SST files. I explicitly closed all the iterators, however, the SST files remain. After setting the LOG_LEVEL of RocksDB to DEBUG, I saw, that a compaction happened. Is it possible, that the opened Kafka KeyValue store still holds reference to the old SST files and therefore preventing them from being deleted?

Is there a way to implement a Java-based deletion-triggered compaction?

riversand963 · 2021-03-22T20:15:21Z

Do you see these lines for the files in the log?
Something like:

2021/03/22-13:10:38.661781 7f58421fe700 [le/delete_scheduler.cc:77] Deleted file /tmp/rocksdbtest-148062/dbstress/000185.sst immediately, rate_bytes_per_sec 0, total_trash_size 0 max_trash_db_ratio 0.250000

or something like

[JOB 34] Delete /tmp/rocksdbtest-148062/dbstress/000185.sst type=2 number=..

ilkoiliev951 · 2021-03-22T20:52:33Z

Yes, but only for the deletion of the manifest file. I see no jobs scheduled for the deletion of sst files.

riversand963 · 2021-03-23T06:05:01Z

If the file is not being used, then after compaction, the compaction job will try to delete them. Something should be holding references to the files...

linas · 2021-04-13T16:50:19Z

FYI, there appears to be a file descriptor leak. I describe this in issue #3216 and in #4112. In unix/linux, you can delete a file, but as long as there is an open file descriptor to that file, those disk blocks will not be freed. This is because the OS assumes that the process will continue to access this file, and so the disk blocks cannot be freed, even though the inode entries are removed (so that ls -la will not show that file any longer).

You can view deleted-but-still open files by saying lsof -p <pid> |grep deleted -- in my case, I see hundreds or more of these.

Resolved: See my own comment #8041 (comment) immediately below. There is a file descriptor leak, because my own code had an iterator leak. Failing to delete iterators results in deleted files whose descriptor remains open (thus eating disk space) and the deleted file is still memory-mapped (thus eating RAM).

You can "watch this happen" -- run rocks for a while, wait until lsof -p <pid> |grep sst| wc gets large. Force compaction to happen (I can do this by closing rocks, reopening, and closing again, without exiting the app.) Verify that lsof -p <pid> |grep sst | grep deleted | wc is a large number. Verify the actual number of undeleted sst files is small: find your-rocks.rdb |grep sst |wc is "reasonable" for your dataset.

Now do df to look at your filesystem. Now exit the app that is using rocks, and do df again. Notice that dozens or hundreds of GBytes of disk space are now free. Notice that find your-rocks.rdb |grep sst |wc has not changed at all! All of that freed disk space is coming from the deleted-but-still-mapped sst files.

linas · 2021-04-13T18:38:03Z

Update: after this comment: #3216 (comment) I came to realize that I have made a complete newbie mistake in my c++ code: rocks iterators are NOT smart pointers that self-delete when they go out of scope. They must be explicitly deleted! Upon fixing this complete-newbie mistake, all my disk and RAM usage problems go away! Wow!

I suggest that anyone else reading this take a good close look at their iterators, and review <rocksdb/db.h> for anything else that needs an explicit delete.

This was referenced Apr 21, 2021

DB created with RocksDb.OpenReadOnly not releasing file handles after disposing + Directory.Delete #8211

Closed

DB created with RocksDb.OpenReadOnly not releasing file handles after disposing + Directory.Delete curiosity-ai/rocksdb-sharp#10

Closed

akankshamahajan15 closed this as completed Jun 11, 2021

andreiguzga mentioned this issue Oct 18, 2021

Issue with Opened File Descriptors when creating a checkpoint helium/miner#1116

Open

GabrielEValenzuela mentioned this issue Feb 21, 2024

Abnormally vulnerability detector database size during re-downloading. wazuh/wazuh#21663

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RocksDB is not freeing up space after a delete. #8041

RocksDB is not freeing up space after a delete. #8041

ilkoiliev951 commented Mar 8, 2021

riversand963 commented Mar 17, 2021 •

edited

Loading

ilkoiliev951 commented Mar 22, 2021

riversand963 commented Mar 22, 2021

ilkoiliev951 commented Mar 22, 2021 •

edited

Loading

riversand963 commented Mar 23, 2021

linas commented Apr 13, 2021 •

edited

Loading

linas commented Apr 13, 2021

RocksDB is not freeing up space after a delete. #8041

RocksDB is not freeing up space after a delete. #8041

Comments

ilkoiliev951 commented Mar 8, 2021

riversand963 commented Mar 17, 2021 • edited Loading

ilkoiliev951 commented Mar 22, 2021

riversand963 commented Mar 22, 2021

ilkoiliev951 commented Mar 22, 2021 • edited Loading

riversand963 commented Mar 23, 2021

linas commented Apr 13, 2021 • edited Loading

linas commented Apr 13, 2021

riversand963 commented Mar 17, 2021 •

edited

Loading

ilkoiliev951 commented Mar 22, 2021 •

edited

Loading

linas commented Apr 13, 2021 •

edited

Loading