mountinfo: simpify, speedup, and fix unescape #144

kolyshkin · 2024-07-26T02:55:52Z

This is an alternative to #143.

It has been pointed out that unescape implementation is unnecessarily
complex, and I tend to agree. Plus, we never expect any errors during
unescaping, and if there are any, we can always return the string as is.

So, drop the error return, and simplify the algorithm. Modify the test
case and the benchmark accordingly.

NOTE this also fixes the issue of trying to convert escape sequences
where the character code is >= utf8.RuneSelf (128 decimal, 0200 octal).
Adding those to the output may result in invalid utf-8 strings. The new
code doesn't try to unescape those (and the tests are amended to check
that).

Add some more test cases.

Here's benchstat output:

name               old time/op    new time/op    delta
Unescape-20           851ns ± 1%     358ns ± 3%  -57.94%  (p=0.008 n=5+5)

name               old alloc/op   new alloc/op   delta
Unescape-20            640B ± 0%      255B ± 0%  -60.16%  (p=0.000 n=5+4)

name               old allocs/op  new allocs/op  delta
Unescape-20            31.0 ± 0%      21.0 ± 0%  -32.26%  (p=0.008 n=5+5)

kolyshkin · 2024-07-26T02:59:02Z

Basically, the differences between this and #143 are:

mountinfo: implement unescape() using strings.Replacer() #143 is way simpler and thus easier to maintain down the road;
this one is slightly faster and more future-proof (in case kernel or some weird fs decides to escape some more characters).

In my defense, this time I really tried to make this as simple as possible.

thaJeztah · 2024-07-29T08:39:16Z

mountinfo/mountinfo_linux_test.go

+		// NOTE Go interprets \NNN inside double quotes exactly
+		// the same way as the function being tested, so:
+		//  - for input strings, we must either escape all backslash
+		//    characters ("\\") or use backticks `\`;
+		//  - for output strings, we can write something like "\040",
+		//    which is identical to " ".


Nit: perhaps put this above testCases as this doesn't apply to this test-case specifically

thaJeztah · 2024-07-29T09:32:17Z

mountinfo/mountinfo_linux_test.go

+		{"\\\\", "\\\\"},
+		{`\177`, "\177"}, // Max allowed ASCII value.
+		{`\222`, `\222`}, // Too large value.
+		{`Это\040камон\040какой-то`, "Это камон какой-то"}, // Some UTF-8.


Heh; of course I was curious what this was; funny how translate is inconsistent on c or k for kamon. And now I'm curious what "kamon" means 😂

😆 There's no such word, this is just a Russian approximation of how to say "come on".

The whole phrase is from an old MC Vspishkin & Nikiforovna song, in which he said something like "это какой-то чек вис аут! это комон какой-то!" in which "чек вис аут" and "комон" are approximation of English pronunciation of "check this out" and "come on". Overall it means nothing.

Here's a fragment with a link to the complete song: https://youtube.com/clip/UgkxjT1WPmAYApCJBD2N21iATuwL5aryYm7-?si=zwYUH4f21ugLPTPU

PS I just realized I wrote it as камон while he says it with a distinct "о" (комон). I'm taking it very seriously, so let me fix this.

thaJeztah · 2024-07-29T09:39:42Z

mountinfo/mountinfo_linux_test.go

+		{`/path/"with'quotes,\040space,\011tab`, `/path/"with'quotes, space,	tab`},
+		{`\12`, `\12`}, // Not enough digits.
+		{`\134`, `\`},  // Backslash.
+		{`"'"'"'`, `"'"'"'`},


Wondering; as we're updating all of these, should we also use keys here? I know it would be a lot more verbose, but it's generally "best practice" (I think?).

We could even consider adding a doc field to describe test-cases (instead of adding a comment), but I realise most don't have a description if they have a specific purpose 🤔

Happy to hear what you think though; if it's too verbose?

// NOTE Go interprets \NNN inside double quotes exactly // the same way as the function being tested, so: // - for input strings, we must either escape all backslash // characters ("\\") or use backticks `\`; // - for output strings, we can write something like "\040", // which is identical to " ". testCases := []struct { doc, input, output string }{ { input: "", output: "", }, { input: "/", output: "/", }, { input: "/some/longer/path", output: "/some/longer/path", }, { input: `/path\040with\040spaces`, output: "/path\040with\040spaces", }, { input: "/path/with\\134backslash", output: "/path/with\\backslash", }, { input: "/tab\\011in/path", output: "/tab\011in/path", }, { input: `/path/"with'quotes`, output: `/path/"with'quotes`, }, { input: `/path/"with'quotes,\040space,\011tab`, output: `/path/"with'quotes, space, tab`, }, { doc: "Not enough digits", input: `\12`, output: `\12`, }, { doc: "Backslash", input: `\134`, output: `\`, }, { input: `"'"'"'`, output: `"'"'"'`, }, { doc: "Backslash with extra digit", input: `/\1345`, output: `/\5`, }, { input: `/\12x`, output: `/\12x`, }, { doc: "Not enough digits", input: `\0`, output: `\0`, }, { doc: "NUL (min allowed ASCII value)", input: `\000\000`, output: "\000\000", }, { input: `\x`, output: `\x`, }, { input: "\\\\", output: "\\\\", }, { doc: "Max allowed ASCII value", input: `\177`, output: "\177", }, { doc: "Too large value", input: `\222`, output: `\222`, }, { doc: "Some UTF-8", input: `Это\040камон\040какой-то`, output: "Это камон какой-то", }, }

Yeah, with two fields only I think this will be overly verbose and taking too much vertical space.

thaJeztah · 2024-07-29T09:40:54Z

mountinfo/mountinfo_linux_test.go

+func BenchmarkUnescape(b *testing.B) {
+	testCases := []string{


FWIW; I copied these test cases from the other test. We could possibly consider making those test-cases a package-variable, so that we can use the same ones here as part of the benchmark (if we don't want to repeat the cases).

I thought about it, but:

When we change that variable, we change benchmark's input, making it somewhat harder to compare before/after.

Ultimately a test case and a benchmark should have different inputs. For test cases, we look for various ways to break things. For benchmarks, we look for either most typical or most complex (time consuming) cases, or both.

kolyshkin · 2024-08-13T00:12:20Z

Updated; PTAL @thaJeztah

go test -v -test.benchmem -count=10 -run ^$ -bench BenchmarkUnescape . go: downloading golang.org/x/sys v0.1.0 goos: linux goarch: arm64 pkg: github.com/moby/sys/mountinfo BenchmarkUnescape BenchmarkUnescape-10 1000000 1068 ns/op 640 B/op 31 allocs/op BenchmarkUnescape-10 992292 1082 ns/op 640 B/op 31 allocs/op BenchmarkUnescape-10 1000000 1050 ns/op 640 B/op 31 allocs/op BenchmarkUnescape-10 1000000 1038 ns/op 640 B/op 31 allocs/op BenchmarkUnescape-10 1000000 1034 ns/op 640 B/op 31 allocs/op BenchmarkUnescape-10 1000000 1045 ns/op 640 B/op 31 allocs/op BenchmarkUnescape-10 1000000 1071 ns/op 640 B/op 31 allocs/op BenchmarkUnescape-10 1000000 1033 ns/op 640 B/op 31 allocs/op BenchmarkUnescape-10 1000000 1032 ns/op 640 B/op 31 allocs/op BenchmarkUnescape-10 1000000 1070 ns/op 640 B/op 31 allocs/op PASS ok github.com/moby/sys/mountinfo 10.653s Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

It has been pointed out that unescape implementation is unnecessarily complex, and I tend to agree. Plus, we never expect any errors during unescaping, and if there are any, we can always return the string as is. So, drop the error return, and simplify the algorithm. Modify the test case and the benchmark accordingly. NOTE this also fixes the issue of trying to convert escape sequences where the character code is >= utf8.RuneSelf (128 decimal, 0200 octal). Adding those to the output may result in invalid utf-8 strings. The new code doesn't try to unescape those (and the tests are amended to check that). Finally, add some more test cases, and improve the documentation. Here's benchstat output: name old time/op new time/op delta Unescape-20 851ns ± 1% 358ns ± 3% -57.94% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Unescape-20 640B ± 0% 255B ± 0% -60.16% (p=0.000 n=5+4) name old allocs/op new allocs/op delta Unescape-20 31.0 ± 0% 21.0 ± 0% -32.26% (p=0.008 n=5+5) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

kolyshkin · 2024-08-14T06:11:34Z

Rebased; check that golangci-linter v1.60.1 (from #147) produces no warnings.

kolyshkin · 2024-09-06T21:19:12Z

PTAL @thaJeztah, I think I've addressed all your comments.

austinvazquez

LGTM, nice results.

thaJeztah

LGTM, thanks!

kolyshkin requested a review from thaJeztah July 26, 2024 02:55

kolyshkin mentioned this pull request Jul 26, 2024

mountinfo: implement unescape() using strings.Replacer() #143

Closed

1 task

kolyshkin force-pushed the unescape-speedup branch from 4733520 to 99a7c2b Compare July 26, 2024 03:03

kolyshkin changed the title ~~Unescape speedup~~ mountinfo: simpify, speedup, and fix unescape Jul 26, 2024

thaJeztah reviewed Jul 29, 2024

View reviewed changes

kolyshkin force-pushed the unescape-speedup branch from 99a7c2b to a6f9c75 Compare August 13, 2024 00:12

kolyshkin mentioned this pull request Aug 14, 2024

Add Go 1.23, drop 1.21, bump golangci-lint #147

Merged

thaJeztah and others added 2 commits August 13, 2024 23:05

kolyshkin force-pushed the unescape-speedup branch from a6f9c75 to 4446de2 Compare August 14, 2024 06:05

kolyshkin requested a review from thaJeztah September 6, 2024 21:19

austinvazquez approved these changes Sep 8, 2024

View reviewed changes

thaJeztah approved these changes Sep 9, 2024

View reviewed changes

thaJeztah merged commit a9393fd into moby:main Sep 9, 2024
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mountinfo: simpify, speedup, and fix unescape #144

mountinfo: simpify, speedup, and fix unescape #144

kolyshkin commented Jul 26, 2024

kolyshkin commented Jul 26, 2024 •

edited

Loading

thaJeztah Jul 29, 2024

thaJeztah Jul 29, 2024

kolyshkin Aug 12, 2024

thaJeztah Jul 29, 2024

kolyshkin Aug 13, 2024

thaJeztah Jul 29, 2024

kolyshkin Aug 13, 2024

kolyshkin commented Aug 13, 2024

kolyshkin commented Aug 14, 2024

kolyshkin commented Sep 6, 2024

austinvazquez left a comment

thaJeztah left a comment

		func BenchmarkUnescape(b *testing.B) {
		testCases := []string{

mountinfo: simpify, speedup, and fix unescape #144

mountinfo: simpify, speedup, and fix unescape #144

Conversation

kolyshkin commented Jul 26, 2024

kolyshkin commented Jul 26, 2024 • edited Loading

thaJeztah Jul 29, 2024

Choose a reason for hiding this comment

thaJeztah Jul 29, 2024

Choose a reason for hiding this comment

kolyshkin Aug 12, 2024

Choose a reason for hiding this comment

thaJeztah Jul 29, 2024

Choose a reason for hiding this comment

kolyshkin Aug 13, 2024

Choose a reason for hiding this comment

thaJeztah Jul 29, 2024

Choose a reason for hiding this comment

kolyshkin Aug 13, 2024

Choose a reason for hiding this comment

kolyshkin commented Aug 13, 2024

kolyshkin commented Aug 14, 2024

kolyshkin commented Sep 6, 2024

austinvazquez left a comment

Choose a reason for hiding this comment

thaJeztah left a comment

Choose a reason for hiding this comment

kolyshkin commented Jul 26, 2024 •

edited

Loading