Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basenc: perform faster, streaming encoding #6719

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

andrewliebenow
Copy link
Contributor

Improve the performance, both in memory and time, of the encoding performed by the basenc (except in --z85 mode), base32, and base64 programs.

These programs now perform encoding in a buffered/streaming manner, so encoding is not constrained by the amount of available memory.

Improve the performance, both in memory and time, of the encoding
performed by the basenc (except in --z85 mode), base32, and base64
programs.

These programs now perform encoding in a buffered/streaming manner,
so encoding is not constrained by the amount of available memory.
@andrewliebenow
Copy link
Contributor Author

Setup

❯ dd if=/dev/urandom of=/dev/shm/one-random-gibibyte bs=1024 count=1048576

❯ du -k /dev/shm/one-random-gibibyte
1048576 /dev/shm/one-random-gibibyte

❯ cargo build --bin coreutils --features base64 --no-default-features --profile release

No wrapping

New implementation

Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.70
Maximum resident set size (kbytes): 2716

❯ /usr/bin/time --verbose -- ./target/release/coreutils base64 --wrap 0 -- /dev/shm/one-random-gibibyte 1>/dev/null
        Command being timed: "./target/release/coreutils base64 --wrap 0 -- /dev/shm/one-random-gibibyte"
        User time (seconds): 0.56
        System time (seconds): 0.13
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.70
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2716
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 119
        Voluntary context switches: 1
        Involuntary context switches: 1
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Existing implementation

Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.99
Maximum resident set size (kbytes): 2452180

❯ /usr/bin/time --verbose -- coreutils base64 --wrap 0 -- /dev/shm/one-random-gibibyte 1>/dev/null
        Command being timed: "coreutils base64 --wrap 0 -- /dev/shm/one-random-gibibyte"
        User time (seconds): 0.61
        System time (seconds): 1.36
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.99
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2452180
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 611915
        Voluntary context switches: 1
        Involuntary context switches: 157
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

GNU Core Utilities's implementation

Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.72
Maximum resident set size (kbytes): 1996

❯ /usr/bin/time --verbose -- /usr/bin/base64 --wrap 0 -- /dev/shm/one-random-gibibyte 1>/dev/null
        Command being timed: "/usr/bin/base64 --wrap 0 -- /dev/shm/one-random-gibibyte"
        User time (seconds): 0.61
        System time (seconds): 0.10
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.72
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1996
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 103
        Voluntary context switches: 2
        Involuntary context switches: 14
        Swaps: 0
        File system inputs: 16
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Default wrapping (76 characters)

New implementation

Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.40
Maximum resident set size (kbytes): 2592

❯ /usr/bin/time --verbose -- ./target/release/coreutils base64 -- /dev/shm/one-random-gibibyte 1>/dev/null
        Command being timed: "./target/release/coreutils base64 -- /dev/shm/one-random-gibibyte"
        User time (seconds): 3.22
        System time (seconds): 0.16
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.40
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2592
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 122
        Voluntary context switches: 1
        Involuntary context switches: 29
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Existing implementation

Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.04
Maximum resident set size (kbytes): 2452504

❯ /usr/bin/time --verbose -- coreutils base64 -- /dev/shm/one-random-gibibyte 1>/dev/null
        Command being timed: "coreutils base64 -- /dev/shm/one-random-gibibyte"
        User time (seconds): 3.21
        System time (seconds): 4.78
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.04
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2452504
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 611918
        Voluntary context switches: 1
        Involuntary context switches: 491
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

GNU Core Utilities's implementation

Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.99
Maximum resident set size (kbytes): 1912

❯ /usr/bin/time --verbose -- /usr/bin/base64 -- /dev/shm/one-random-gibibyte 1>/dev/null
        Command being timed: "/usr/bin/base64 -- /dev/shm/one-random-gibibyte"
        User time (seconds): 0.79
        System time (seconds): 0.19
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.99
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1912
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 102
        Voluntary context switches: 1
        Involuntary context switches: 52
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Copy link

GNU testsuite comparison:

GNU test failed: tests/timeout/timeout. tests/timeout/timeout is passing on 'main'. Maybe you have to rebase?

@sylvestre
Copy link
Sponsor Contributor

Please use hyperfine for benchmarking. Time isn't reliable enough for performances

@andrewliebenow
Copy link
Contributor Author

(Was having trouble getting poop to pipe output to /dev/null without using a shell script)

benchmark.sh:

#!/bin/sh

set \
	-C

set \
	-e

set \
	-f

set \
	-u

set \
	-x

${BASE_SIX_FOUR_BINARY:?} \
	--wrap \
	"${BASE_SIX_FOUR_WRAP_ARGUMENT:?}" \
	-- \
	/dev/shm/one-random-gibibyte \
	1>/dev/null

No wrapping

poop

New implementation

❯ BASE_SIX_FOUR_BINARY='./target/release/coreutils base64' BASE_SIX_FOUR_WRAP_ARGUMENT='0' poop --duration 10000 './benchmark.sh'
Benchmark 1 (14 runs): ./benchmark.sh
  measurement          mean ± σ            min … max           outliers
  wall_time           724ms ± 8.12ms     711ms …  736ms          0 ( 0%)        
  peak_rss           3.48MB ± 99.1KB    3.33MB … 3.65MB          0 ( 0%)        
  cpu_cycles         2.19G  ± 12.2M     2.18G  … 2.22G           0 ( 0%)        
  instructions       9.55G  ±  167      9.55G  … 9.55G           0 ( 0%)        
  cache_references   35.7M  ± 1.44M     33.8M  … 38.1M           0 ( 0%)        
  cache_misses        328K  ± 24.6K      306K  …  385K           0 ( 0%)        
  branch_misses      2.33M  ± 3.70K     2.32M  … 2.33M           0 ( 0%)        

Existing implementation

❯ BASE_SIX_FOUR_BINARY='coreutils base64' BASE_SIX_FOUR_WRAP_ARGUMENT='0' poop --duration 10000 './benchmark.sh' 
Benchmark 1 (5 runs): ./benchmark.sh
  measurement          mean ± σ            min … max           outliers
  wall_time          2.01s  ± 66.5ms    1.95s  … 2.13s           0 ( 0%)        
  peak_rss           2.51GB ±  108KB    2.51GB … 2.51GB          0 ( 0%)        
  cpu_cycles         2.38G  ± 27.6M     2.36G  … 2.42G           0 ( 0%)        
  instructions       9.40G  ±  370      9.40G  … 9.40G           0 ( 0%)        
  cache_references    116M  ±  320K      115M  …  116M           0 ( 0%)        
  cache_misses       1.44M  ± 46.4K     1.40M  … 1.51M           0 ( 0%)        
  branch_misses       372K  ±  294       372K  …  373K           0 ( 0%)        

hyperfine

New implementation

❯ BASE_SIX_FOUR_BINARY='./target/release/coreutils base64' BASE_SIX_FOUR_WRAP_ARGUMENT='0' hyperfine -- './benchmark.sh'         
Benchmark 1: ./benchmark.sh
  Time (mean ± σ):     714.1 ms ±  11.2 ms    [User: 535.9 ms, System: 172.8 ms]
  Range (min … max):   699.6 ms … 731.7 ms    10 runs

Existing implementation

❯ BASE_SIX_FOUR_BINARY='coreutils base64' BASE_SIX_FOUR_WRAP_ARGUMENT='0' hyperfine -- './benchmark.sh'        
Benchmark 1: ./benchmark.sh
  Time (mean ± σ):      1.975 s ±  0.042 s    [User: 0.565 s, System: 1.396 s]
  Range (min … max):    1.929 s …  2.084 s    10 runs

@sylvestre
Copy link
Sponsor Contributor

you should call hyperfine once with the old and new implementation
it will compare the results

@andrewliebenow
Copy link
Contributor Author

No wrapping

❯ hyperfine ' ./target/release/coreutils base64 --wrap 0 -- /dev/shm/one-random-gibibyte 1>/dev/null ' ' coreutils base64 --wrap 0 -- /dev/shm/one-random-gibibyte 1>/dev/null '
Benchmark 1:  ./target/release/coreutils base64 --wrap 0 -- /dev/shm/one-random-gibibyte 1>/dev/null 
  Time (mean ± σ):     733.6 ms ±  21.8 ms    [User: 545.9 ms, System: 182.4 ms]
  Range (min … max):   710.7 ms … 772.3 ms    10 runs
 
Benchmark 2:  coreutils base64 --wrap 0 -- /dev/shm/one-random-gibibyte 1>/dev/null 
  Time (mean ± σ):      2.051 s ±  0.022 s    [User: 0.565 s, System: 1.469 s]
  Range (min … max):    2.013 s …  2.082 s    10 runs
 
Summary
   ./target/release/coreutils base64 --wrap 0 -- /dev/shm/one-random-gibibyte 1>/dev/null  ran
    2.80 ± 0.09 times faster than  coreutils base64 --wrap 0 -- /dev/shm/one-random-gibibyte 1>/dev/null 

Default wrapping (76 characters)

❯ hyperfine ' ./target/release/coreutils base64 -- /dev/shm/one-random-gibibyte 1>/dev/null ' ' coreutils base64 -- /dev/shm/one-random-gibibyte 1>/dev/null '           
Benchmark 1:  ./target/release/coreutils base64 -- /dev/shm/one-random-gibibyte 1>/dev/null 
  Time (mean ± σ):      3.288 s ±  0.157 s    [User: 3.093 s, System: 0.177 s]
  Range (min … max):    3.023 s …  3.461 s    10 runs
 
Benchmark 2:  coreutils base64 -- /dev/shm/one-random-gibibyte 1>/dev/null 
  Time (mean ± σ):      8.115 s ±  0.067 s    [User: 3.234 s, System: 4.831 s]
  Range (min … max):    8.006 s …  8.220 s    10 runs
 
Summary
   ./target/release/coreutils base64 -- /dev/shm/one-random-gibibyte 1>/dev/null  ran
    2.47 ± 0.12 times faster than  coreutils base64 -- /dev/shm/one-random-gibibyte 1>/dev/null 

Copy link

GNU testsuite comparison:

GNU test failed: tests/timeout/timeout. tests/timeout/timeout is passing on 'main'. Maybe you have to rebase?

Copy link

GNU testsuite comparison:

GNU test failed: tests/timeout/timeout. tests/timeout/timeout is passing on 'main'. Maybe you have to rebase?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants