metrics regexp usage for progress can be optimized #2596

tonistiigi · 2024-07-13T01:47:54Z

Currently https://github.com/docker/buildx/blob/master/util/progress/metricwriter.go uses heavy regexp functions a lot.

This show both in inittrace, and with Find/Match functions when retrieving progress. It is present independently if tracing is enabled or not.

init github.com/docker/buildx/util/progress @30 ms, 0.077 ms clock, 41992 bytes, 224 allocs

For init we should avoid regexp.Compile in the init functions as it allocates lots of memory and blocks rest of the loading. While individually this isn't that significant, these bad practices by packages add up. This can be solved by adding sync.Once so the regexp instances are created one time on first access.

For the matchers, it looks like all the regexp always try to match against all of the vertex records coming in progress. This should only need to happen once per vertex. If vertex is type=exec it can not be type=source at the same time and does not need to be repeatedly rechecked. If a type for a specific digest has already been found, it does not need to be rechecked again when new logs are arriving.

I think the sync.Once can cover all the regexp together that then resolves the type info (and possible type specific values parsed from name) that is then memorized by digest.

While this would also be faster by doing a simpler string comparison in most cases, I don't think that is a good idea as the code becomes more error-prone. Instead, we should finish up returning type as a separate field in progress in buildkit side as was discussed in the original proposal for this. The buildx side of the code can already prepare for this (as it would need to keep regexp code for backwards compatibility anyway), so that there is a single function that determines the type based on string parsing that can then be skipped if the type is already set by buildkit.

@jsternberg

The text was updated successfully, but these errors were encountered:

crazy-max · 2024-07-19T10:10:24Z

could we use moby/moby#48166 or similar?

jsternberg · 2024-07-19T16:07:51Z

That code for the lazy regular expression is pretty good. We could either just copy the one function or we could ask them to include it in a non-internal package, but it's probably easier to just copy it for now.

tonistiigi · 2024-07-19T16:38:14Z

There are a couple of other regexp that could be updated with lazy method, but for this probably a manual sync.Once makes more sense as we would need only one once for all the regexp instead of one per each. Bigger change probably is on the matching side.

jsternberg self-assigned this Aug 5, 2024

crazy-max added this to the v0.17.0 milestone Aug 6, 2024

crazy-max mentioned this issue Aug 6, 2024

metricwriter: compile regular expressions only on first use #2641

Merged

crazy-max closed this as completed in #2641 Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics regexp usage for progress can be optimized #2596

metrics regexp usage for progress can be optimized #2596

tonistiigi commented Jul 13, 2024

crazy-max commented Jul 19, 2024

jsternberg commented Jul 19, 2024

tonistiigi commented Jul 19, 2024

metrics regexp usage for progress can be optimized #2596

metrics regexp usage for progress can be optimized #2596

Comments

tonistiigi commented Jul 13, 2024

crazy-max commented Jul 19, 2024

jsternberg commented Jul 19, 2024

tonistiigi commented Jul 19, 2024