Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mountinfo: implement unescape() using strings.Replacer() #143

Closed
wants to merge 2 commits into from

Conversation

thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Jul 21, 2024

mountinfo: add benchmark for unescape

go test -v -test.benchmem -count=10 -run ^$ -bench BenchmarkUnescape .
go: downloading golang.org/x/sys v0.1.0
goos: linux
goarch: arm64
pkg: github.com/moby/sys/mountinfo
BenchmarkUnescape
BenchmarkUnescape-10     1000000      1068 ns/op     640 B/op      31 allocs/op
BenchmarkUnescape-10      992292      1082 ns/op     640 B/op      31 allocs/op
BenchmarkUnescape-10     1000000      1050 ns/op     640 B/op      31 allocs/op
BenchmarkUnescape-10     1000000      1038 ns/op     640 B/op      31 allocs/op
BenchmarkUnescape-10     1000000      1034 ns/op     640 B/op      31 allocs/op
BenchmarkUnescape-10     1000000      1045 ns/op     640 B/op      31 allocs/op
BenchmarkUnescape-10     1000000      1071 ns/op     640 B/op      31 allocs/op
BenchmarkUnescape-10     1000000      1033 ns/op     640 B/op      31 allocs/op
BenchmarkUnescape-10     1000000      1032 ns/op     640 B/op      31 allocs/op
BenchmarkUnescape-10     1000000      1070 ns/op     640 B/op      31 allocs/op
PASS
ok  github.com/moby/sys/mountinfo 10.653s

mountinfo: implement unescape() using strings.Replacer()

So, this mostly was because I saw this approach in another projefct, and liked
how simple/clean it was.

There are some downsides though;

  • performance looks to be slightly less
  • we're no longer bothering with invalid escape sequences

For the latter, I'm actually wondering how much validation we should do; should
we take invalid escape sequences as literal strings? What does Linux itself do
with these?

I know in the past we added too much complication at times, and sometimes were
validating things we should not care about (or even validating inconsistent with
the host itself).

Comparison between old unescape() and the new, using fstabUnescape.Replace():

Before:

go test -v -test.benchmem -count=10 -run ^$ -bench BenchmarkUnescape .
go: downloading golang.org/x/sys v0.1.0
goos: linux
goarch: arm64
pkg: github.com/moby/sys/mountinfo
BenchmarkUnescape
BenchmarkUnescape-10     1000000      1068 ns/op     640 B/op      31 allocs/op
BenchmarkUnescape-10      992292      1082 ns/op     640 B/op      31 allocs/op
BenchmarkUnescape-10     1000000      1050 ns/op     640 B/op      31 allocs/op
BenchmarkUnescape-10     1000000      1038 ns/op     640 B/op      31 allocs/op
BenchmarkUnescape-10     1000000      1034 ns/op     640 B/op      31 allocs/op
BenchmarkUnescape-10     1000000      1045 ns/op     640 B/op      31 allocs/op
BenchmarkUnescape-10     1000000      1071 ns/op     640 B/op      31 allocs/op
BenchmarkUnescape-10     1000000      1033 ns/op     640 B/op      31 allocs/op
BenchmarkUnescape-10     1000000      1032 ns/op     640 B/op      31 allocs/op
BenchmarkUnescape-10     1000000      1070 ns/op     640 B/op      31 allocs/op
PASS
ok  github.com/moby/sys/mountinfo 10.653s

After

go test -v -test.benchmem -count=10 -run ^$ -bench BenchmarkUnescape .
goos: linux
goarch: arm64
pkg: github.com/moby/sys/mountinfo
BenchmarkUnescape
BenchmarkUnescape-10     1000000      1141 ns/op     520 B/op      32 allocs/op
BenchmarkUnescape-10      977841      1132 ns/op     520 B/op      32 allocs/op
BenchmarkUnescape-10     1000000      1160 ns/op     520 B/op      32 allocs/op
BenchmarkUnescape-10      901806      1131 ns/op     520 B/op      32 allocs/op
BenchmarkUnescape-10      980247      1137 ns/op     520 B/op      32 allocs/op
BenchmarkUnescape-10      988596      1135 ns/op     520 B/op      32 allocs/op
BenchmarkUnescape-10      975658      1139 ns/op     520 B/op      32 allocs/op
BenchmarkUnescape-10      934603      1161 ns/op     520 B/op      32 allocs/op
BenchmarkUnescape-10      997353      1123 ns/op     520 B/op      32 allocs/op
BenchmarkUnescape-10      986551      1131 ns/op     520 B/op      32 allocs/op
PASS
ok  	github.com/moby/sys/mountinfo	11.245s

From the above:

  • new version is ~100 ns/op slower
  • new version has one more allocation (32 vs 31)
  • new version uses 120B less memory (520B vs 640B)

@thaJeztah
Copy link
Member Author

Just cleaning up some old branches; not 100% sure about this one; wasn't sure how strict we wanted to be, and if simplifying the code is worth the slightly reduced performance

    go test -v -test.benchmem -count=10 -run ^$ -bench BenchmarkUnescape .
    go: downloading golang.org/x/sys v0.1.0
    goos: linux
    goarch: arm64
    pkg: github.com/moby/sys/mountinfo
    BenchmarkUnescape
    BenchmarkUnescape-10     1000000      1068 ns/op     640 B/op      31 allocs/op
    BenchmarkUnescape-10      992292      1082 ns/op     640 B/op      31 allocs/op
    BenchmarkUnescape-10     1000000      1050 ns/op     640 B/op      31 allocs/op
    BenchmarkUnescape-10     1000000      1038 ns/op     640 B/op      31 allocs/op
    BenchmarkUnescape-10     1000000      1034 ns/op     640 B/op      31 allocs/op
    BenchmarkUnescape-10     1000000      1045 ns/op     640 B/op      31 allocs/op
    BenchmarkUnescape-10     1000000      1071 ns/op     640 B/op      31 allocs/op
    BenchmarkUnescape-10     1000000      1033 ns/op     640 B/op      31 allocs/op
    BenchmarkUnescape-10     1000000      1032 ns/op     640 B/op      31 allocs/op
    BenchmarkUnescape-10     1000000      1070 ns/op     640 B/op      31 allocs/op
    PASS
    ok  github.com/moby/sys/mountinfo 10.653s

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
So, this mostly was because I saw this approach in another projefct, and liked
how simple/clean it was.

There are some downsides though;

- performance looks to be _slightly_ less
- we're no longer bothering with invalid escape sequences

For the latter, I'm actually wondering how much validation we should do; should
we take invalid escape sequences as literal strings? What does Linux itself do
with these?

I know in the past we added too much complication at times, and sometimes were
validating things we should not care about (or even validating inconsistent with
the host itself).

Comparison between old `unescape()` and the new, using fstabUnescape.Replace():

Before:

    go test -v -test.benchmem -count=10 -run ^$ -bench BenchmarkUnescape .
    go: downloading golang.org/x/sys v0.1.0
    goos: linux
    goarch: arm64
    pkg: github.com/moby/sys/mountinfo
    BenchmarkUnescape
    BenchmarkUnescape-10     1000000      1068 ns/op     640 B/op      31 allocs/op
    BenchmarkUnescape-10      992292      1082 ns/op     640 B/op      31 allocs/op
    BenchmarkUnescape-10     1000000      1050 ns/op     640 B/op      31 allocs/op
    BenchmarkUnescape-10     1000000      1038 ns/op     640 B/op      31 allocs/op
    BenchmarkUnescape-10     1000000      1034 ns/op     640 B/op      31 allocs/op
    BenchmarkUnescape-10     1000000      1045 ns/op     640 B/op      31 allocs/op
    BenchmarkUnescape-10     1000000      1071 ns/op     640 B/op      31 allocs/op
    BenchmarkUnescape-10     1000000      1033 ns/op     640 B/op      31 allocs/op
    BenchmarkUnescape-10     1000000      1032 ns/op     640 B/op      31 allocs/op
    BenchmarkUnescape-10     1000000      1070 ns/op     640 B/op      31 allocs/op
    PASS
    ok  github.com/moby/sys/mountinfo 10.653s

After

    go test -v -test.benchmem -count=10 -run ^$ -bench BenchmarkUnescape .
    goos: linux
    goarch: arm64
    pkg: github.com/moby/sys/mountinfo
    BenchmarkUnescape
    BenchmarkUnescape-10     1000000      1141 ns/op     520 B/op      32 allocs/op
    BenchmarkUnescape-10      977841      1132 ns/op     520 B/op      32 allocs/op
    BenchmarkUnescape-10     1000000      1160 ns/op     520 B/op      32 allocs/op
    BenchmarkUnescape-10      901806      1131 ns/op     520 B/op      32 allocs/op
    BenchmarkUnescape-10      980247      1137 ns/op     520 B/op      32 allocs/op
    BenchmarkUnescape-10      988596      1135 ns/op     520 B/op      32 allocs/op
    BenchmarkUnescape-10      975658      1139 ns/op     520 B/op      32 allocs/op
    BenchmarkUnescape-10      934603      1161 ns/op     520 B/op      32 allocs/op
    BenchmarkUnescape-10      997353      1123 ns/op     520 B/op      32 allocs/op
    BenchmarkUnescape-10      986551      1131 ns/op     520 B/op      32 allocs/op
    PASS
    ok  	github.com/moby/sys/mountinfo	11.245s

From the above:

- new version is ~100 ns/op slower
- new version has one more allocation (32 vs 31)
- new version uses 120B less memory (520B vs 640B)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah thaJeztah force-pushed the simplify_unescape branch from b2d94c1 to a437d73 Compare July 24, 2024 08:07
Copy link
Collaborator

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting. I like the way it simplifies the code, and it seems the additional overhead is insignificant.

One small concern is current function interprets all escape sequences, while the new one is more limited. Need to take a look at the modern kernel, hope they haven't added more codes.

@kolyshkin
Copy link
Collaborator

OK, we unescape the following fields:

  • (4) root: root of the mount within the filesystem
  • (5) mount point: mount point relative to the process's root
  • (9) filesystem type: name of filesystem of the form "type[.subtype]"
  • (10) mount source: filesystem specific information or "none"

In general, I would not rely on man pages in such a delicate subject. We have to drink raw and unfiltered C code from the kernel source tree.

It looks like mountinfo is shown by show_mountinfo. Let's only look at the fields we're interested in:

  • Root is shown via show_path -> seq_dentry -> mangle_path, the characters being escaped are \, \n, \t and (space). Note that some filesystems have their own implementation of show_path and may escape different characters, although I can't find any fss doing that in the vanilla kernel source tree.
  • Mount point is shown via seq_path_root -> mangle_path, same escape characters as above.
  • FS type is shown by show_type -> mangle, which escapes (space), \t, \n, \\ and #. NOTE the #.
  • Mount source (aka "device") is shown either via fs-specific show_devname function, or via mangle with same rules as for the FS type above (i.e. it also includes #).

Looking into fs-specific show_devname, I found this:

  • afs: no escaping
  • bcachefs: no escaping
  • btrfs: escapes " \t\n\\" (as usual)
  • nfs[4]: escapes " \t\n\\" (as usual)
  • cifs: escapes " \t" only

Now, these analysis might be wrong and they are definitely incomplete as there are filesystems which source code is not in the vanilla kernel source tree (e.g. aufs), and they may have their own escaping rules.

At the very least, this patch should add # to the replace list.

@kolyshkin
Copy link
Collaborator

should we take invalid escape sequences as literal strings? What does Linux itself do
with these?

It looks like kernel uses string_unescape(..., UNESCAPE_OCTAL). Apparently it just replaces any \NNN with a character which octal value is NNN (NNN can be 1 to 3 digits long), with no error reporting (if it can not convert something it just leaves it as is).

Note though that the kernel always outputs 3 octal digits.

I will see if I can simplify my code.

@kolyshkin
Copy link
Collaborator

I ended up with #144, PTAL

@thaJeztah
Copy link
Member Author

I like the way it simplifies the code, and it seems the additional overhead is insignificant.

Yes! I honestly don't recall where I found this approach, but I saw it somewhere and thought "if it doesn't add a lot of overhead, then it might be an option".

FWIW: I don't really mind the existing code (it's still fairly readable, and if it's for performance, sometimes it's ok to have more verbose code for that), but thought I'd give it a try.

I ended up with #144, PTAL

Thx! Will give it a look 👍

@thaJeztah
Copy link
Member Author

Closing in favour of #144

@thaJeztah thaJeztah closed this Sep 9, 2024
@thaJeztah thaJeztah deleted the simplify_unescape branch September 9, 2024 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants