Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics regexp usage for progress can be optimized #2596

Closed
tonistiigi opened this issue Jul 13, 2024 · 3 comments · Fixed by #2641
Closed

metrics regexp usage for progress can be optimized #2596

tonistiigi opened this issue Jul 13, 2024 · 3 comments · Fixed by #2641
Assignees
Milestone

Comments

@tonistiigi
Copy link
Member

Currently https://github.com/docker/buildx/blob/master/util/progress/metricwriter.go uses heavy regexp functions a lot.

This show both in inittrace, and with Find/Match functions when retrieving progress. It is present independently if tracing is enabled or not.

init github.com/docker/buildx/util/progress @30 ms, 0.077 ms clock, 41992 bytes, 224 allocs

For init we should avoid regexp.Compile in the init functions as it allocates lots of memory and blocks rest of the loading. While individually this isn't that significant, these bad practices by packages add up. This can be solved by adding sync.Once so the regexp instances are created one time on first access.

For the matchers, it looks like all the regexp always try to match against all of the vertex records coming in progress. This should only need to happen once per vertex. If vertex is type=exec it can not be type=source at the same time and does not need to be repeatedly rechecked. If a type for a specific digest has already been found, it does not need to be rechecked again when new logs are arriving.

I think the sync.Once can cover all the regexp together that then resolves the type info (and possible type specific values parsed from name) that is then memorized by digest.

While this would also be faster by doing a simpler string comparison in most cases, I don't think that is a good idea as the code becomes more error-prone. Instead, we should finish up returning type as a separate field in progress in buildkit side as was discussed in the original proposal for this. The buildx side of the code can already prepare for this (as it would need to keep regexp code for backwards compatibility anyway), so that there is a single function that determines the type based on string parsing that can then be skipped if the type is already set by buildkit.

@jsternberg

@crazy-max
Copy link
Member

could we use moby/moby#48166 or similar?

@jsternberg
Copy link
Collaborator

That code for the lazy regular expression is pretty good. We could either just copy the one function or we could ask them to include it in a non-internal package, but it's probably easier to just copy it for now.

@tonistiigi
Copy link
Member Author

There are a couple of other regexp that could be updated with lazy method, but for this probably a manual sync.Once makes more sense as we would need only one once for all the regexp instead of one per each. Bigger change probably is on the matching side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants