Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new Base64 encoder and decoder #24285

Closed
wants to merge 12 commits into from
Closed

Conversation

bicycle1885
Copy link
Member

This is a rewrite of the current base64 encoder and decoder tools. The implementation is placed in the stdlib directory so that we can deprecate the old tools as per #24222. This pull request will finally deprecate the current tools but it is not yet done.

The performance is also improved, really, dramatically, though there is still more room for fine-tuning:

using BenchmarkTools
import Base64

long = read("doc/src/assets/logo.png")
long_b64 = Base.base64encode(long)
e1_long = run(tune!(@benchmarkable Base.base64encode(long)))
e2_long = run(tune!(@benchmarkable Base64.base64encode(long)))
d1_long = run(tune!(@benchmarkable Base.base64decode(long_b64)))
d2_long = run(tune!(@benchmarkable Base64.base64decode(long_b64)))

short = long[1:10]
short_b64 = Base.base64encode(short)
e1_short = run(tune!(@benchmarkable Base.base64encode(short)))
e2_short = run(tune!(@benchmarkable Base64.base64encode(short)))
d1_short = run(tune!(@benchmarkable Base.base64decode(short_b64)))
d2_short = run(tune!(@benchmarkable Base64.base64decode(short_b64)))
~/v/julia (base64|…) $ ./julia --compiled-modules=no --depwarn=no -L benchmark.jl
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0-DEV.2272 (2017-10-23 10:31 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit c4798ae1d3* (0 days old master)
|__/                   |  x86_64-apple-darwin16.7.0

julia> e1_long
BenchmarkTools.Trial:
  memory estimate:  34.83 KiB
  allocs estimate:  26
  --------------
  minimum time:     669.348 μs (0.00% GC)
  median time:      672.605 μs (0.00% GC)
  mean time:        719.194 μs (0.94% GC)
  maximum time:     32.348 ms (97.74% GC)
  --------------
  samples:          6940
  evals/sample:     1

julia> e2_long
BenchmarkTools.Trial:
  memory estimate:  36.95 KiB
  allocs estimate:  161
  --------------
  minimum time:     42.240 μs (0.00% GC)
  median time:      44.704 μs (0.00% GC)
  mean time:        56.495 μs (12.98% GC)
  maximum time:     33.658 ms (99.63% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> ratio(median(e1_long), median(e2_long))
BenchmarkTools.TrialRatio:
  time:             15.045734162491053
  gctime:           1.0
  memory:           0.9424947145877378
  allocs:           0.16149068322981366

julia> d1_long
BenchmarkTools.Trial:
  memory estimate:  64.30 KiB
  allocs estimate:  15
  --------------
  minimum time:     8.401 ms (0.00% GC)
  median time:      8.635 ms (0.00% GC)
  mean time:        9.018 ms (0.17% GC)
  maximum time:     20.998 ms (0.00% GC)
  --------------
  samples:          555
  evals/sample:     1

julia> d2_long
BenchmarkTools.Trial:
  memory estimate:  76.56 KiB
  allocs estimate:  652
  --------------
  minimum time:     41.063 μs (0.00% GC)
  median time:      45.289 μs (0.00% GC)
  mean time:        63.643 μs (18.58% GC)
  maximum time:     31.145 ms (99.73% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> ratio(median(d1_long), median(d2_long))
BenchmarkTools.TrialRatio:
  time:             190.66942668830524
  gctime:           1.0
  memory:           0.8397959183673469
  allocs:           0.023006134969325152

julia> e1_short
BenchmarkTools.Trial:
  memory estimate:  512 bytes
  allocs estimate:  14
  --------------
  minimum time:     880.289 ns (0.00% GC)
  median time:      974.811 ns (0.00% GC)
  mean time:        1.578 μs (29.80% GC)
  maximum time:     1.633 ms (90.98% GC)
  --------------
  samples:          10000
  evals/sample:     45

julia> e2_short
BenchmarkTools.Trial:
  memory estimate:  1.02 KiB
  allocs estimate:  12
  --------------
  minimum time:     749.547 ns (0.00% GC)
  median time:      966.523 ns (0.00% GC)
  mean time:        1.556 μs (29.33% GC)
  maximum time:     2.423 ms (91.29% GC)
  --------------
  samples:          10000
  evals/sample:     86

julia> ratio(median(e1_short), median(e2_short))
BenchmarkTools.TrialRatio:
  time:             1.0085749155514918
  gctime:           1.0
  memory:           0.49230769230769234
  allocs:           1.1666666666666667

julia> d1_short
BenchmarkTools.Trial:
  memory estimate:  1.67 KiB
  allocs estimate:  10
  --------------
  minimum time:     5.968 μs (0.00% GC)
  median time:      6.419 μs (0.00% GC)
  mean time:        8.133 μs (14.06% GC)
  maximum time:     7.598 ms (98.35% GC)
  --------------
  samples:          10000
  evals/sample:     5

julia> d2_short
BenchmarkTools.Trial:
  memory estimate:  2.73 KiB
  allocs estimate:  44
  --------------
  minimum time:     1.161 μs (0.00% GC)
  median time:      1.371 μs (0.00% GC)
  mean time:        3.179 μs (50.93% GC)
  maximum time:     5.862 ms (96.38% GC)
  --------------
  samples:          10000
  evals/sample:     10

julia> ratio(median(d1_short), median(d2_short))
BenchmarkTools.TrialRatio:
  time:             4.683400218898212
  gctime:           1.0
  memory:           0.6114285714285714
  allocs:           0.22727272727272727

Some open problems I found while writing it:

  • Do we still need Base64 or base64 prefixes? They are now under the Base64 module so their prefixes are almost redundant.
  • Should padding be optional? Currently, padding is optional but rfc4648 states "Implementations MUST include appropriate pad characters at the end of encoded data unless the specification referring to this document explicitly states otherwise.". For example, the base64 module of Python 3.6.1 requires padding.
  • The current implementation returns funny results for malformed base64 inputs. For example,
julia> Base.base64decode("\0\0")
1-element Array{UInt8,1}:
 0x00

julia> Base64.base64decode("\0\0")
0-element Array{UInt8,1}

I believe this is a bug and should not be retained in the new implementation.

@JeffBezanson
Copy link
Member

Very nice!!

Also needs entries in doc/make.jl.

@ararslan ararslan added io Involving the I/O subsystem: libuv, read, write, etc. performance Must go faster stdlib Julia's standard library labels Oct 23, 2017
@bicycle1885
Copy link
Member Author

Markdown's rich renderer indirectly depends on the base64 encoding function here:

print(io, """<img src="data:image/png;base64,""")
print(io, stringmime(m, img))
. So, it may be impossible to separate the Base64 module from Base without breaking this. Does anybody know a workaround for this problem?

@TotalVerb
Copy link
Contributor

Move Markdown to stdlib too?

@bicycle1885
Copy link
Member Author

That sounds unlikely to me since Markdown is tightly bounded to the documentation system. But some rich renderers like this may be separable from Base.

@stevengj
Copy link
Member

Maybe keep the new implementation in Base for now, then we can work out how/whether to refactor it in a separate PR.

@StefanKarpinski
Copy link
Member

Since @bicycle1885 has already done most of the work to move it out – which is in my view the more important change – why not just carry on until we're done here? We can change Base64 implementations any time but moving the API out of Base needs to happen as soon as possible.

@stevengj
Copy link
Member

stevengj commented Oct 26, 2017

@StefanKarpinski, how do you propose to move it out if markdown rendering depends on stringmime (for images in HTML output)?

@StefanKarpinski
Copy link
Member

We should move Markdown into stdlib as well and Markdown can depend on Base64.

@stevengj
Copy link
Member

stevengj commented Oct 26, 2017

@StefanKarpinski, but isn't at least the Markdown.MD type required in Base, to define the docstrings? I guess you could put the rendering part into a stdlib package, but then that package would be perpetrating type piracy...

@JeffBezanson
Copy link
Member

There usually isn't anything wrong with type piracy. I haven't thought much about the best way to factor this particular code, but the type piracy in this case seems harmless.

@bicycle1885
Copy link
Member Author

I don't know when parsing of docstrings happens, but if it is possible to parse them lazily, it might be possible to move Markdown (and REPL?) into stdlib. If the goal of this excision is to cut down the size of Base, moving Markdown and REPL would be a great deal.

@JeffBezanson
Copy link
Member

Started looking at this a bit. I think we should separate the Docs module into code for writing doc strings vs. code for searching and displaying doc strings --- similar to how the code for building HTML manuals is a separate package (Documenter). Doc strings are already parsed lazily, we just need to better separate the code that uses the parsed representation. Then it would be easy to move Markdown and Base64 out of Base.

@JeffBezanson
Copy link
Member

#24361 should be a version of this that can be merged immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io Involving the I/O subsystem: libuv, read, write, etc. performance Must go faster stdlib Julia's standard library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants