-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mountinfo: simpify, speedup, and fix unescape #144
Conversation
Basically, the differences between this and #143 are:
In my defense, this time I really tried to make this as simple as possible. |
4733520
to
99a7c2b
Compare
mountinfo/mountinfo_linux_test.go
Outdated
// NOTE Go interprets \NNN inside double quotes exactly | ||
// the same way as the function being tested, so: | ||
// - for input strings, we must either escape all backslash | ||
// characters ("\\") or use backticks `\`; | ||
// - for output strings, we can write something like "\040", | ||
// which is identical to " ". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: perhaps put this above testCases
as this doesn't apply to this test-case specifically
mountinfo/mountinfo_linux_test.go
Outdated
{"\\\\", "\\\\"}, | ||
{`\177`, "\177"}, // Max allowed ASCII value. | ||
{`\222`, `\222`}, // Too large value. | ||
{`Это\040камон\040какой-то`, "Это камон какой-то"}, // Some UTF-8. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😆 There's no such word, this is just a Russian approximation of how to say "come on".
The whole phrase is from an old MC Vspishkin & Nikiforovna song, in which he said something like "это какой-то чек вис аут! это комон какой-то!" in which "чек вис аут" and "комон" are approximation of English pronunciation of "check this out" and "come on". Overall it means nothing.
Here's a fragment with a link to the complete song: https://youtube.com/clip/UgkxjT1WPmAYApCJBD2N21iATuwL5aryYm7-?si=zwYUH4f21ugLPTPU
PS I just realized I wrote it as камон while he says it with a distinct "о" (комон). I'm taking it very seriously, so let me fix this.
{`/path/"with'quotes,\040space,\011tab`, `/path/"with'quotes, space, tab`}, | ||
{`\12`, `\12`}, // Not enough digits. | ||
{`\134`, `\`}, // Backslash. | ||
{`"'"'"'`, `"'"'"'`}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering; as we're updating all of these, should we also use keys here? I know it would be a lot more verbose, but it's generally "best practice" (I think?).
We could even consider adding a doc
field to describe test-cases (instead of adding a comment), but I realise most don't have a description if they have a specific purpose 🤔
Happy to hear what you think though; if it's too verbose?
// NOTE Go interprets \NNN inside double quotes exactly
// the same way as the function being tested, so:
// - for input strings, we must either escape all backslash
// characters ("\\") or use backticks `\`;
// - for output strings, we can write something like "\040",
// which is identical to " ".
testCases := []struct {
doc, input, output string
}{
{
input: "",
output: "",
},
{
input: "/",
output: "/",
},
{
input: "/some/longer/path",
output: "/some/longer/path",
},
{
input: `/path\040with\040spaces`,
output: "/path\040with\040spaces",
},
{
input: "/path/with\\134backslash",
output: "/path/with\\backslash",
},
{
input: "/tab\\011in/path",
output: "/tab\011in/path",
},
{
input: `/path/"with'quotes`,
output: `/path/"with'quotes`,
},
{
input: `/path/"with'quotes,\040space,\011tab`,
output: `/path/"with'quotes, space, tab`,
},
{
doc: "Not enough digits",
input: `\12`,
output: `\12`,
},
{
doc: "Backslash",
input: `\134`,
output: `\`,
},
{
input: `"'"'"'`,
output: `"'"'"'`,
},
{
doc: "Backslash with extra digit",
input: `/\1345`,
output: `/\5`,
},
{
input: `/\12x`,
output: `/\12x`,
},
{
doc: "Not enough digits",
input: `\0`,
output: `\0`,
},
{
doc: "NUL (min allowed ASCII value)",
input: `\000\000`,
output: "\000\000",
},
{
input: `\x`,
output: `\x`,
},
{
input: "\\\\",
output: "\\\\",
},
{
doc: "Max allowed ASCII value",
input: `\177`,
output: "\177",
},
{
doc: "Too large value",
input: `\222`,
output: `\222`,
},
{
doc: "Some UTF-8",
input: `Это\040камон\040какой-то`,
output: "Это камон какой-то",
},
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, with two fields only I think this will be overly verbose and taking too much vertical space.
func BenchmarkUnescape(b *testing.B) { | ||
testCases := []string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW; I copied these test cases from the other test. We could possibly consider making those test-cases a package-variable, so that we can use the same ones here as part of the benchmark (if we don't want to repeat the cases).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it, but:
- When we change that variable, we change benchmark's input, making it somewhat harder to compare before/after.
- Ultimately a test case and a benchmark should have different inputs. For test cases, we look for various ways to break things. For benchmarks, we look for either most typical or most complex (time consuming) cases, or both.
99a7c2b
to
a6f9c75
Compare
Updated; PTAL @thaJeztah |
go test -v -test.benchmem -count=10 -run ^$ -bench BenchmarkUnescape . go: downloading golang.org/x/sys v0.1.0 goos: linux goarch: arm64 pkg: github.com/moby/sys/mountinfo BenchmarkUnescape BenchmarkUnescape-10 1000000 1068 ns/op 640 B/op 31 allocs/op BenchmarkUnescape-10 992292 1082 ns/op 640 B/op 31 allocs/op BenchmarkUnescape-10 1000000 1050 ns/op 640 B/op 31 allocs/op BenchmarkUnescape-10 1000000 1038 ns/op 640 B/op 31 allocs/op BenchmarkUnescape-10 1000000 1034 ns/op 640 B/op 31 allocs/op BenchmarkUnescape-10 1000000 1045 ns/op 640 B/op 31 allocs/op BenchmarkUnescape-10 1000000 1071 ns/op 640 B/op 31 allocs/op BenchmarkUnescape-10 1000000 1033 ns/op 640 B/op 31 allocs/op BenchmarkUnescape-10 1000000 1032 ns/op 640 B/op 31 allocs/op BenchmarkUnescape-10 1000000 1070 ns/op 640 B/op 31 allocs/op PASS ok github.com/moby/sys/mountinfo 10.653s Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
It has been pointed out that unescape implementation is unnecessarily complex, and I tend to agree. Plus, we never expect any errors during unescaping, and if there are any, we can always return the string as is. So, drop the error return, and simplify the algorithm. Modify the test case and the benchmark accordingly. NOTE this also fixes the issue of trying to convert escape sequences where the character code is >= utf8.RuneSelf (128 decimal, 0200 octal). Adding those to the output may result in invalid utf-8 strings. The new code doesn't try to unescape those (and the tests are amended to check that). Finally, add some more test cases, and improve the documentation. Here's benchstat output: name old time/op new time/op delta Unescape-20 851ns ± 1% 358ns ± 3% -57.94% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Unescape-20 640B ± 0% 255B ± 0% -60.16% (p=0.000 n=5+4) name old allocs/op new allocs/op delta Unescape-20 31.0 ± 0% 21.0 ± 0% -32.26% (p=0.008 n=5+5) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
a6f9c75
to
4446de2
Compare
Rebased; check that golangci-linter v1.60.1 (from #147) produces no warnings. |
PTAL @thaJeztah, I think I've addressed all your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
This is an alternative to #143.
It has been pointed out that unescape implementation is unnecessarily
complex, and I tend to agree. Plus, we never expect any errors during
unescaping, and if there are any, we can always return the string as is.
So, drop the error return, and simplify the algorithm. Modify the test
case and the benchmark accordingly.
NOTE this also fixes the issue of trying to convert escape sequences
where the character code is >=
utf8.RuneSelf
(128 decimal, 0200 octal).Adding those to the output may result in invalid utf-8 strings. The new
code doesn't try to unescape those (and the tests are amended to check
that).
Add some more test cases.
Here's benchstat output: