Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abnormally vulnerability detector database size during re-downloading. #21663

Closed
3 tasks
MiguelazoDS opened this issue Jan 30, 2024 · 6 comments · Fixed by #22360
Closed
3 tasks

Abnormally vulnerability detector database size during re-downloading. #21663

MiguelazoDS opened this issue Jan 30, 2024 · 6 comments · Fixed by #22360
Assignees
Labels
level/subtask type/enhancement New feature or request

Comments

@MiguelazoDS
Copy link
Member

Description

It was found that after interrupting and resuming the feed processing the size of the database is too big. It is required to evaluate this behavior because the database may be corrupted, and this can unravel the entire scanning process.

Expected behavior

As of now, the database feed size is around 4.5GB. Interrupting and resuming the feed processing shouldn't exceed that value.

image

image

Actual behavior

After interrupting and resuming the feed processing the database feed size grows beyond 4.5GB

Steps to reproduce

  • Install manager 4.8.0 (62a5a67)
  • Let the processing start
    image
  • Interrupt the process before it ends (wazuh-control stop)
    image
    The logs do not say how the process is going.
    image
  • Restart manager. It will re-download the feed.
    image
    image
    At 89k the size is almost 4.5GB
    image
    image
    At 200k the size is 5.0GB
    image
    image
    At the end the size is 5.6GB
    image
    image

DoD

  • Analyze the root cause of this behavior.
  • Identify the reason.
  • Propose a fix if needed.
@GabrielEValenzuela GabrielEValenzuela self-assigned this Feb 14, 2024
@sebasfalcone
Copy link
Member

Moved ETA

  • Issue requires more investigation

@sebasfalcone
Copy link
Member

sebasfalcone commented Feb 20, 2024

On hold until we have more time to investigate

This issue has more depth than expected, it seems like rocksDB has some issues regarding the deletion of DBs with a high volume of data

@GabrielEValenzuela Will append a report with all these findings to the issue

@GabrielEValenzuela
Copy link
Member

Investigation of disk space management issue in RocksDB

Abstract

This report investigates a concerning issue observed in RocksDB, a popular embedded database library developed by Facebook. The issue pertains to the ineffective management of disk space, wherein the delete operation fails to free up disk space, consequently leading to an increase in the database size. This report aims to analyze the root cause of the issue and propose potential solutions to mitigate its impact.

Introduction

RocksDB is a high-performance, persistent key-value store designed for fast storage systems such as flash drives and hard disk drives. It is widely utilized in various applications ranging from web servers to distributed systems due to its efficiency and reliability. However, recent observations have revealed a critical issue regarding disk space management within RocksDB, for example, questions made in StackOverflow like [1] that uses Apache Kafka as a service, are proof of this issue.

Problem Statement

The primary problem identified in RocksDB involves the inefficiency of the delete operation in reclaiming disk space. Despite deleting data entries from the database, the corresponding disk space occupied by these entries remains allocated, leading to an accumulation of unused space and an increase in the overall database size over time. Furthermore, when new data is inserted into the database, appears to not overwrite the previously deleted data, exacerbating the disk space utilization issue.

Findings

Based on the provided findings, it seems the issue with inefficient disk space management in RocksDB may be attributed to several factors, including suboptimal delete operations, inefficient compaction strategies, and potential limitations in managing disk space utilization. Here's a summary of the key findings and potential solutions:

  1. Use of DeleteRange Function: Utilizing the DeleteRange function instead of performing range scans for deleting a range of keys can significantly improve write performance by creating range tombstones as a single key-value pair. This native operation speeds up the deletion process and enhances write performance, making it a preferable solution for performance-sensitive write paths. Currently throws CF not supported

  2. Compaction Strategies: Implementing trigger compaction on deletes can help mitigate the performance skew caused by excessive tombstones in SST files. By tracking long close-by tombstones and triggering compaction when necessary, the database can efficiently manage tombstone cleanup and improve range scan performance.

  3. Explicit Deletion of Iterators: Ensuring that iterators are explicitly deleted when they go out of scope can prevent memory leaks and optimize resource utilization within the database. [2] [3]

  4. Management of max_bytes_for_level_base Variable: Adjusting the max_bytes_for_level_base variable to optimize space and write amplification during compactions can improve overall performance and prevent excessive compactions that degrade performance over time. [4]

  5. Consideration of Probability in Delete Operations: Since the issue occurs with low probability, using standard delete operations in C++ may serve as a reliable alternative, considering the potential trade-offs in performance and resource utilization.

Conclusion

In conclusion, this report provides an analysis of this critical problem, addressing disk space management in RocksDB possibly requires a combination of optimizing delete operations, refining compaction strategies, and fine-tuning database configuration parameters to balance space utilization and performance. Implementation of the suggested solutions can help mitigate the observed issues and improve RocksDB's overall disk space management efficiency and reliability. While this is being addressed, we continue our research by bringing this issue to Meta's team of developers so that we can get a first-hand answer.


@sebasfalcone
Copy link
Member

@GabrielEValenzuela

Lets re-try this investigation with this changes 7714e2a

@GabrielEValenzuela
Copy link
Member

Investigation of disk space management issue in RocksDB - New analysis

Abstract

This report investigates a concerning issue observed in RocksDB, a popular embedded database library developed by Facebook. The issue pertains to the ineffective management of disk space, wherein the delete operation fails to free up disk space, consequently leading to an increase in the database size. This report aims to analyze the root cause of the issue and propose potential solutions to mitigate its impact.

Introduction

RocksDB is a high-performance, persistent key-value store designed for fast storage systems such as flash drives and hard disk drives. It is widely utilized in various applications ranging from web servers to distributed systems due to its efficiency and reliability. After the merging of:

We made a test to check if the problem was solved.

Findings

The problem size was solved successfully, we left proof in this comment

Before re-start
Screenshot from 2024-03-14 13-38-52

After re-start
image

Conclusion

We can close this issue and open a new one, related to the hash of the downloaded files

// Just process the new file if the hash is different from the last one.
auto inputFileHash {Utils::asciiToHex(Utils::hashFile(outputFilePath))};
if (context.spUpdaterBaseContext->downloadedFileHash != inputFileHash)
{
// Store new hash.
context.spUpdaterBaseContext->downloadedFileHash = std::move(inputFileHash);
// Download finished: Insert path into context.
context.data.at("paths").push_back(outputFilePath.string());
return;
}
logDebug2(WM_CONTENTUPDATER,
"File '%s' didn't change from last download so it won't be published",
outputFilePath.string().c_str());

If the download process exits, it should re-start with the same file, and only store the hash if the process ends successfully.

@sebasfalcone
Copy link
Member

Great analysis Gabi! Good to know this issue was solved 😅

Conclusion

Issues generated from the analysis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level/subtask type/enhancement New feature or request
Projects
None yet
3 participants