Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: hash vuln db only once on load #2054

Conversation

lucasrod16
Copy link
Contributor

@lucasrod16 lucasrod16 commented Aug 14, 2024

Fixes #1502

This PR aims to improve the performance of LoadVulnerabilityDB when ValidateByHashOnGet is set to true

As described in the linked issue, GetStore and Status both hash the db. This PR changes Status to not hash the db by removing the call to c.Validate

As far as I can tell, Status is only used in a few places:

  1. LoadVulnerabilityDB

    This should not be an issue since GetStore already hashes the db

    status := dbCurator.Status()

  2. runDBStatus

    I am unsure whether hashing the db is desired behavior for grype db status. If so, perhaps a new field could be added to db.Config to allow CLI and library users to configure this behavior?

    status := dbCurator.Status()

  3. Test_DifferDirectory

    This test doesn't appear to rely on the hashing behavior in Status

    baseStatus := d.baseCurator.Status()

    targetStatus := d.targetCurator.Status()

@lucasrod16
Copy link
Contributor Author

Based on my local benchmark results and a sample program that calls LoadVulnerabilityDB executed with time, removing hash validation from Status reduced runtime duration by ~0.8s

go test ./grype -bench=BenchmarkLoadVulnerabilityDB -benchtime=20x

@lucasrod16
Copy link
Contributor Author

As suggested in the linked issue, I also ran benchmarks using io.CopyBuffer and a bufio.Reader with buffer sizes ranging from 64kb - 1mb in HashFile, and there were no noticeable performance improvements.

Signed-off-by: Lucas Rodriguez <lucas.rodriguez9616@gmail.com>
@lucasrod16 lucasrod16 force-pushed the 1502-improve-performance-of-LoadVulnerabilityDB branch from 0298e35 to d94742b Compare September 8, 2024 00:59
@lucasrod16
Copy link
Contributor Author

Hi @wagoodman,

I noticed that this pull request has been pending for a while. Is there anything needed from me? Please let me know if there are any updates required.

Thanks!

Copy link
Contributor

@wagoodman wagoodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the work on this @lucasrod16 I've also attached this to a v6 schema issue to help prevent regression here

@wagoodman wagoodman merged commit 60c0682 into anchore:main Sep 17, 2024
10 checks passed
@lucasrod16 lucasrod16 deleted the 1502-improve-performance-of-LoadVulnerabilityDB branch September 17, 2024 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LoadVulnerabilityDB could be faster with ValidateByHashOnGet
2 participants