-
Notifications
You must be signed in to change notification settings - Fork 801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add an option to delete metrics from multiproc files by the provided key #988
feat: Add an option to delete metrics from multiproc files by the provided key #988
Conversation
Hi @csmarchbanks ! May I ask you to take a look at the PR ? Thanks in advance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I don't think this is safe, if a scrape came along on a different process while writing the left/right data I think it could end up being corrupted?
If that is true a different approach could be taken, for example we could use a specific NaN value to signal that a metric has been deleted, possibly 0x7ff0000000000002
since that is the stale NaN used in Prometheus. That wouldn't lower the disk usage but would be safer.
@csmarchbanks Thank you for looking into this. I do not think that would be corrupted because we lock the deletion process of a specific metric. In addition, I am making this request so that I do not need to monkey patch the |
I believe the lock is only in the process updating, not in the process doing the read so unless we can guarantee an atomic write I have concerns about this. It would be fairly rare to have happen (metric deletion during a scrape). Another approach would be to create a new file during deletion and then move it to the same name as the old file, like what is down for the textfile. That should be atomic and would clean up disk space in the long term (though double it in the short term). |
@csmarchbanks Hi. Thank you for your review. I went with the temporary file approach. May I ask you to take another look? |
@csmarchbanks Hi. Could you please review the changes ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello! I left one comment, but haven't had the time to fully test this yet due to some leave. Hopefully next week.
…vided key Signed-off-by: Aleksei A <aapaseev@bk.ru>
Hi, @csmarchbanks . Thanks for the review. Could you please give it another look ? Thanks in advance. |
Hi @csmarchbanks . The suggested change has been applied |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, I gave this a further review today and have a couple more questions/concerns:
- How do you intend to use this? To fully remove a metric I think each PID would have to do the removal. For example, add a second PID into one of your tests.
- At a minimum we would need documentation that it is necessary to delete each PID individually, but it might just be too confusing/prone to errors.
- I don't think this is safe to do for counters, and could cause an inappropriate counter reset. This is why
mark_process_dead
only removes gauges.
Hi, @csmarchbanks . I think this pr is a possible solution to solve this issue(#916 ). Is that right? |
Yes, it would be a solution to those, or at least |
Hi @anChaOs @csmarchbanks I use this approach in my env for the Gauge type of metrics only. So should I leave the functionality for this type only? |
Probably only Gauge to start with. I am also not sure if it should be implemented for any of the Gauge aggregations? Those still behave similarly where a removal of just one pid might be confusing when the output is aggregated. I am also torn about just implementing |
Changes Made: This merge request introduces a new feature in the multiprocess mode that allows metrics deletion. The new functionality is encapsulated in a method called
delete_value
within theMmapedDict
class.Motivation: Until now, there was no option to remove individual metrics once they are written in multiprocess mode. It limited the flexibility of our system in dynamic scenarios where metrics might need to be removed post-write based on certain conditions. This created unnecessary clutter and could lead to memory inefficiencies over time.
How it Works: The
delete_value
method takes a key as an argument, which corresponds to the metric you want to delete. If the metric exists, it removes it from the dictionary and decreases the overall used memory size accordingly.Testing: New unittest cases have been added in
TestMmapedDict
. These tests verify correct deletion of existing metrics, handle attempts of deleting non-existing keys, check the unaffected state of other metrics upon deletion of one, and write-new after delete operations.Impact: The new deletion feature will allow more flexible and efficient handling of metrics in multiprocess mode. However, it should be used judiciously considering the operation involves memory manipulation.
The update doesn't affect the existing feature set and maintains backward compatibility.