new Base64 encoder and decoder #24285

bicycle1885 · 2017-10-23T12:33:21Z

This is a rewrite of the current base64 encoder and decoder tools. The implementation is placed in the stdlib directory so that we can deprecate the old tools as per #24222. This pull request will finally deprecate the current tools but it is not yet done.

The performance is also improved, really, dramatically, though there is still more room for fine-tuning:

using BenchmarkTools
import Base64

long = read("doc/src/assets/logo.png")
long_b64 = Base.base64encode(long)
e1_long = run(tune!(@benchmarkable Base.base64encode(long)))
e2_long = run(tune!(@benchmarkable Base64.base64encode(long)))
d1_long = run(tune!(@benchmarkable Base.base64decode(long_b64)))
d2_long = run(tune!(@benchmarkable Base64.base64decode(long_b64)))

short = long[1:10]
short_b64 = Base.base64encode(short)
e1_short = run(tune!(@benchmarkable Base.base64encode(short)))
e2_short = run(tune!(@benchmarkable Base64.base64encode(short)))
d1_short = run(tune!(@benchmarkable Base.base64decode(short_b64)))
d2_short = run(tune!(@benchmarkable Base64.base64decode(short_b64)))

~/v/julia (base64|…) $ ./julia --compiled-modules=no --depwarn=no -L benchmark.jl
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0-DEV.2272 (2017-10-23 10:31 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit c4798ae1d3* (0 days old master)
|__/                   |  x86_64-apple-darwin16.7.0

julia> e1_long
BenchmarkTools.Trial:
  memory estimate:  34.83 KiB
  allocs estimate:  26
  --------------
  minimum time:     669.348 μs (0.00% GC)
  median time:      672.605 μs (0.00% GC)
  mean time:        719.194 μs (0.94% GC)
  maximum time:     32.348 ms (97.74% GC)
  --------------
  samples:          6940
  evals/sample:     1

julia> e2_long
BenchmarkTools.Trial:
  memory estimate:  36.95 KiB
  allocs estimate:  161
  --------------
  minimum time:     42.240 μs (0.00% GC)
  median time:      44.704 μs (0.00% GC)
  mean time:        56.495 μs (12.98% GC)
  maximum time:     33.658 ms (99.63% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> ratio(median(e1_long), median(e2_long))
BenchmarkTools.TrialRatio:
  time:             15.045734162491053
  gctime:           1.0
  memory:           0.9424947145877378
  allocs:           0.16149068322981366

julia> d1_long
BenchmarkTools.Trial:
  memory estimate:  64.30 KiB
  allocs estimate:  15
  --------------
  minimum time:     8.401 ms (0.00% GC)
  median time:      8.635 ms (0.00% GC)
  mean time:        9.018 ms (0.17% GC)
  maximum time:     20.998 ms (0.00% GC)
  --------------
  samples:          555
  evals/sample:     1

julia> d2_long
BenchmarkTools.Trial:
  memory estimate:  76.56 KiB
  allocs estimate:  652
  --------------
  minimum time:     41.063 μs (0.00% GC)
  median time:      45.289 μs (0.00% GC)
  mean time:        63.643 μs (18.58% GC)
  maximum time:     31.145 ms (99.73% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> ratio(median(d1_long), median(d2_long))
BenchmarkTools.TrialRatio:
  time:             190.66942668830524
  gctime:           1.0
  memory:           0.8397959183673469
  allocs:           0.023006134969325152

julia> e1_short
BenchmarkTools.Trial:
  memory estimate:  512 bytes
  allocs estimate:  14
  --------------
  minimum time:     880.289 ns (0.00% GC)
  median time:      974.811 ns (0.00% GC)
  mean time:        1.578 μs (29.80% GC)
  maximum time:     1.633 ms (90.98% GC)
  --------------
  samples:          10000
  evals/sample:     45

julia> e2_short
BenchmarkTools.Trial:
  memory estimate:  1.02 KiB
  allocs estimate:  12
  --------------
  minimum time:     749.547 ns (0.00% GC)
  median time:      966.523 ns (0.00% GC)
  mean time:        1.556 μs (29.33% GC)
  maximum time:     2.423 ms (91.29% GC)
  --------------
  samples:          10000
  evals/sample:     86

julia> ratio(median(e1_short), median(e2_short))
BenchmarkTools.TrialRatio:
  time:             1.0085749155514918
  gctime:           1.0
  memory:           0.49230769230769234
  allocs:           1.1666666666666667

julia> d1_short
BenchmarkTools.Trial:
  memory estimate:  1.67 KiB
  allocs estimate:  10
  --------------
  minimum time:     5.968 μs (0.00% GC)
  median time:      6.419 μs (0.00% GC)
  mean time:        8.133 μs (14.06% GC)
  maximum time:     7.598 ms (98.35% GC)
  --------------
  samples:          10000
  evals/sample:     5

julia> d2_short
BenchmarkTools.Trial:
  memory estimate:  2.73 KiB
  allocs estimate:  44
  --------------
  minimum time:     1.161 μs (0.00% GC)
  median time:      1.371 μs (0.00% GC)
  mean time:        3.179 μs (50.93% GC)
  maximum time:     5.862 ms (96.38% GC)
  --------------
  samples:          10000
  evals/sample:     10

julia> ratio(median(d1_short), median(d2_short))
BenchmarkTools.TrialRatio:
  time:             4.683400218898212
  gctime:           1.0
  memory:           0.6114285714285714
  allocs:           0.22727272727272727

Some open problems I found while writing it:

Do we still need Base64 or base64 prefixes? They are now under the Base64 module so their prefixes are almost redundant.
Should padding be optional? Currently, padding is optional but rfc4648 states "Implementations MUST include appropriate pad characters at the end of encoded data unless the specification referring to this document explicitly states otherwise.". For example, the base64 module of Python 3.6.1 requires padding.
The current implementation returns funny results for malformed base64 inputs. For example,

julia> Base.base64decode("\0\0")
1-element Array{UInt8,1}:
 0x00

julia> Base64.base64decode("\0\0")
0-element Array{UInt8,1}

I believe this is a bug and should not be retained in the new implementation.

JeffBezanson · 2017-10-23T15:00:30Z

Very nice!!

Also needs entries in doc/make.jl.

bicycle1885 · 2017-10-24T02:29:21Z

Markdown's rich renderer indirectly depends on the base64 encoding function here:

julia/base/markdown/render/rich.jl

Lines 12 to 13 in d3afdfd

    
           print(io, """<img src="data:image/png;base64,""") 
        
           print(io, stringmime(m, img))

. So, it may be impossible to separate the Base64 module from Base without breaking this. Does anybody know a workaround for this problem?

TotalVerb · 2017-10-24T02:46:02Z

Move Markdown to stdlib too?

bicycle1885 · 2017-10-24T03:00:54Z

That sounds unlikely to me since Markdown is tightly bounded to the documentation system. But some rich renderers like this may be separable from Base.

stevengj · 2017-10-26T14:36:15Z

Maybe keep the new implementation in Base for now, then we can work out how/whether to refactor it in a separate PR.

StefanKarpinski · 2017-10-26T15:41:57Z

Since @bicycle1885 has already done most of the work to move it out – which is in my view the more important change – why not just carry on until we're done here? We can change Base64 implementations any time but moving the API out of Base needs to happen as soon as possible.

stevengj · 2017-10-26T16:40:22Z

@StefanKarpinski, how do you propose to move it out if markdown rendering depends on stringmime (for images in HTML output)?

StefanKarpinski · 2017-10-26T17:00:26Z

We should move Markdown into stdlib as well and Markdown can depend on Base64.

stevengj · 2017-10-26T19:47:49Z

@StefanKarpinski, but isn't at least the Markdown.MD type required in Base, to define the docstrings? I guess you could put the rendering part into a stdlib package, but then that package would be perpetrating type piracy...

JeffBezanson · 2017-10-26T20:55:59Z

There usually isn't anything wrong with type piracy. I haven't thought much about the best way to factor this particular code, but the type piracy in this case seems harmless.

bicycle1885 · 2017-10-27T01:53:33Z

I don't know when parsing of docstrings happens, but if it is possible to parse them lazily, it might be possible to move Markdown (and REPL?) into stdlib. If the goal of this excision is to cut down the size of Base, moving Markdown and REPL would be a great deal.

JeffBezanson · 2017-10-27T14:50:54Z

Started looking at this a bit. I think we should separate the Docs module into code for writing doc strings vs. code for searching and displaying doc strings --- similar to how the code for building HTML manuals is a separate package (Documenter). Doc strings are already parsed lazily, we just need to better separate the code that uses the parsed representation. Then it would be easy to move Markdown and Base64 out of Base.

JeffBezanson · 2017-10-27T17:27:27Z

#24361 should be a version of this that can be merged immediately.

bicycle1885 added 9 commits October 23, 2017 19:31

add Base64 module

aece31d

separate files

61cd43c

add docs

04037ee

formatting

c4798ae

fix type

0b39e7b

circumvent inference problem

ae34156

clean up

80e85ec

fix

4dfe963

fix2

3e546f9

bicycle1885 added 3 commits October 24, 2017 02:13

fix doc/make.jl

a63435c

temporarily unexport names from Base64 to build docs

9d69b43

fix doc/make.jl

9d46c87

ararslan added io Involving the I/O subsystem: libuv, read, write, etc. performance Must go faster stdlib Julia's standard library labels Oct 23, 2017

JeffBezanson mentioned this pull request Oct 27, 2017

replace Base.Base64 with stdlib Base64 module #24361

Merged

JeffBezanson closed this Oct 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new Base64 encoder and decoder #24285

new Base64 encoder and decoder #24285

bicycle1885 commented Oct 23, 2017

JeffBezanson commented Oct 23, 2017

bicycle1885 commented Oct 24, 2017

TotalVerb commented Oct 24, 2017

bicycle1885 commented Oct 24, 2017

stevengj commented Oct 26, 2017

StefanKarpinski commented Oct 26, 2017

stevengj commented Oct 26, 2017 •

edited

Loading

StefanKarpinski commented Oct 26, 2017

stevengj commented Oct 26, 2017 •

edited

Loading

JeffBezanson commented Oct 26, 2017

bicycle1885 commented Oct 27, 2017

JeffBezanson commented Oct 27, 2017

JeffBezanson commented Oct 27, 2017

new Base64 encoder and decoder #24285

new Base64 encoder and decoder #24285

Conversation

bicycle1885 commented Oct 23, 2017

JeffBezanson commented Oct 23, 2017

bicycle1885 commented Oct 24, 2017

TotalVerb commented Oct 24, 2017

bicycle1885 commented Oct 24, 2017

stevengj commented Oct 26, 2017

StefanKarpinski commented Oct 26, 2017

stevengj commented Oct 26, 2017 • edited Loading

StefanKarpinski commented Oct 26, 2017

stevengj commented Oct 26, 2017 • edited Loading

JeffBezanson commented Oct 26, 2017

bicycle1885 commented Oct 27, 2017

JeffBezanson commented Oct 27, 2017

JeffBezanson commented Oct 27, 2017

stevengj commented Oct 26, 2017 •

edited

Loading

stevengj commented Oct 26, 2017 •

edited

Loading