-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abnormally vulnerability detector database size during re-downloading. #21663
Comments
Moved ETA
|
On hold until we have more time to investigate This issue has more depth than expected, it seems like rocksDB has some issues regarding the deletion of DBs with a high volume of data @GabrielEValenzuela Will append a report with all these findings to the issue |
Investigation of disk space management issue in RocksDBAbstractThis report investigates a concerning issue observed in RocksDB, a popular embedded database library developed by Facebook. The issue pertains to the ineffective management of disk space, wherein the delete operation fails to free up disk space, consequently leading to an increase in the database size. This report aims to analyze the root cause of the issue and propose potential solutions to mitigate its impact. IntroductionRocksDB is a high-performance, persistent key-value store designed for fast storage systems such as flash drives and hard disk drives. It is widely utilized in various applications ranging from web servers to distributed systems due to its efficiency and reliability. However, recent observations have revealed a critical issue regarding disk space management within RocksDB, for example, questions made in StackOverflow like [1] that uses Apache Kafka as a service, are proof of this issue. Problem StatementThe primary problem identified in RocksDB involves the inefficiency of the delete operation in reclaiming disk space. Despite deleting data entries from the database, the corresponding disk space occupied by these entries remains allocated, leading to an accumulation of unused space and an increase in the overall database size over time. Furthermore, when new data is inserted into the database, appears to not overwrite the previously deleted data, exacerbating the disk space utilization issue. FindingsBased on the provided findings, it seems the issue with inefficient disk space management in RocksDB may be attributed to several factors, including suboptimal delete operations, inefficient compaction strategies, and potential limitations in managing disk space utilization. Here's a summary of the key findings and potential solutions:
ConclusionIn conclusion, this report provides an analysis of this critical problem, addressing disk space management in RocksDB possibly requires a combination of optimizing delete operations, refining compaction strategies, and fine-tuning database configuration parameters to balance space utilization and performance. Implementation of the suggested solutions can help mitigate the observed issues and improve RocksDB's overall disk space management efficiency and reliability. While this is being addressed, we continue our research by bringing this issue to Meta's team of developers so that we can get a first-hand answer. |
Lets re-try this investigation with this changes 7714e2a |
Investigation of disk space management issue in RocksDB - New analysisAbstractThis report investigates a concerning issue observed in RocksDB, a popular embedded database library developed by Facebook. The issue pertains to the ineffective management of disk space, wherein the delete operation fails to free up disk space, consequently leading to an increase in the database size. This report aims to analyze the root cause of the issue and propose potential solutions to mitigate its impact. IntroductionRocksDB is a high-performance, persistent key-value store designed for fast storage systems such as flash drives and hard disk drives. It is widely utilized in various applications ranging from web servers to distributed systems due to its efficiency and reliability. After the merging of: We made a test to check if the problem was solved. FindingsThe problem size was solved successfully, we left proof in this comment ConclusionWe can close this issue and open a new one, related to the hash of the downloaded files wazuh/src/shared_modules/content_manager/src/components/offlineDownloader.hpp Lines 148 to 162 in aca9796
If the download process exits, it should re-start with the same file, and only store the hash if the process ends successfully. |
Great analysis Gabi! Good to know this issue was solved 😅 Conclusion
Issues generated from the analysis |
Description
It was found that after interrupting and resuming the feed processing the size of the database is too big. It is required to evaluate this behavior because the database may be corrupted, and this can unravel the entire scanning process.
Expected behavior
As of now, the database feed size is around 4.5GB. Interrupting and resuming the feed processing shouldn't exceed that value.
Actual behavior
After interrupting and resuming the feed processing the database feed size grows beyond 4.5GB
Steps to reproduce
The logs do not say how the process is going.
At 89k the size is almost 4.5GB
At 200k the size is 5.0GB
At the end the size is 5.6GB
DoD
The text was updated successfully, but these errors were encountered: