Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer.write(buffer, 'base64') #11866

Closed
jorangreef opened this issue Mar 15, 2017 · 16 comments
Closed

buffer.write(buffer, 'base64') #11866

jorangreef opened this issue Mar 15, 2017 · 16 comments
Labels
buffer Issues and PRs related to the buffer subsystem. question Issues that look for answers.

Comments

@jorangreef
Copy link
Contributor

jorangreef commented Mar 15, 2017

What's the fastest way to convert a buffer containing base64-encoded data to a buffer containing binary data?

Something like this?

var base64Buffer = ...; // e.g. A buffer containing base64 provided by a mime decoder.
var binaryBuffer = new Buffer(base64Buffer.toString('ascii'), 'base64')

Is there a way to write the base64 buffer directly into another buffer, without creating any interim strings?

I thought of binaryBuffer.write(base64Buffer, 'base64') but write() only accepts strings.

@mscdex mscdex added buffer Issues and PRs related to the buffer subsystem. question Issues that look for answers. labels Mar 15, 2017
@seishun
Copy link
Contributor

seishun commented Mar 15, 2017

@jorangreef There's always going to be an interim string, since you have to first "decode" ASCII by going from Buffer to string, then decode base64 by going from string to Buffer.

Your first snippet is fine, but I'd recommend using the new API: Buffer.from(base64Buffer.toString(), 'base64')

@mscdex
Copy link
Contributor

mscdex commented Mar 15, 2017

@seishun I think he means without creating a V8 String object. It's entirely possible, it's just not something that's currently available AFAIK.

@targos
Copy link
Member

targos commented Mar 15, 2017

This is partly available in the buffer.transcode API, but base64 encoding is not supported.

@seishun
Copy link
Contributor

seishun commented Mar 15, 2017

This is partly available in the buffer.transcode API, but base64 encoding is not supported.

Because it uses ICU, and base64 isn't a way to encode text as binary data, it's the other way around. IMO supporting it in buffer.transcode would be kind of confusing.

@jasnell
Copy link
Member

jasnell commented Mar 15, 2017

buffer.transcode() intentionally only deals with efficient conversion of char sets. It does not deal with transformations like hex or base64 and never should.

@jorangreef
Copy link
Contributor Author

Your first snippet is fine, but I'd recommend using the new API: Buffer.from(base64Buffer.toString(), 'base64')

@seishun thanks, using the constructor was a typo from habit.

I think he means without creating a V8 String object. It's entirely possible, it's just not something that's currently available AFAIK.

@mscdex yes, that's what I was getting at, avoiding the String allocation. It should be possible, but not provided by the api currently.

buffer.transcode() intentionally only deals with efficient conversion of char sets. It does not deal with transformations like hex or base64 and never should.

@jasnell agreed, hence the suggestion for buffer.write(buffer, 'base64') or another alternative. I don't think transcode is the way to do it.

@seishun
Copy link
Contributor

seishun commented Mar 16, 2017

buffer.write(buffer, 'base64') would be pretty confusing too. You're writing a buffer into a buffer, yet there's an encoding argument.

@jasnell
Copy link
Member

jasnell commented Mar 16, 2017

to be honest, I'd like to see a revision to Hex and Base64 support in the APIs because things are fairly inconsistent as they are now (e.g. Buffer.from([...]).toString('utf8') does not actually convert the bytes to utf8 but Buffer.from[...]).toString('hex') does convert to hex).

What would be more correct would be something like:

Buffer.from([...]).encode('base64');
Buffer.from([...]).encode('hex');

Unfortunately toString('base64') is one of the most common uses of Buffer in the ecosystem so this is likely something that can never actually change.

@addaleax
Copy link
Member

I agree, it’s been bugging me since I started node that toString('utf8') decodes from UTF-8 and toString('hex') encodes as Hex. If somebody has a good idea for how to tackle that wrt our API I’d love to see it. :)

@seishun
Copy link
Contributor

seishun commented Mar 16, 2017

I agree it's confusing, but on the other hand, toString(enc) just says "convert to string using enc" which makes sense for both hex and UTF-8.

@jasnell
Copy link
Member

jasnell commented Mar 16, 2017

@addaleax ... A require('buffer').encode(buf, 'hex') and require('buffer').encode(buf, 'base64') could easily deal with the discrepancy and would pair nicely with require('buffer').transcode(). The return value would be another Buffer with the encoded bytes. Introducing a new API that makes more sense is not really the difficult issue, however. It's transitioning people off the existing buf.toString('hex') and buf.toString('base64') that's the real difficulty.

Buffer creation is equally problematic. Buffer.from(bytes, 'utf8') and Buffer.from(bytes, 'hex') are quite different from one another. The inconsistency is irritating and better APIs could certainly help but getting userland to actually use those improved APIs has been a huge difficulty.

@seishun
Copy link
Contributor

seishun commented Mar 16, 2017

@jasnell I'd be against adding new APIs unless we are also planning to phase out the old ones, and we know well that it's not as easy as we'd wish. And the existing API isn't exactly bad either.

@jasnell
Copy link
Member

jasnell commented Mar 16, 2017

Yep, which is why I haven't pursued it. The current API is not painful enough to justify the level of disruption changing it would cause.

@addaleax
Copy link
Member

Can this be closed as an answered question? All the ways currently exposed by Node use an temporary string, and that makes sense, because as @seishun explained this question is really asking about base64+ascii “double” encoding.

@seishun
Copy link
Contributor

seishun commented Mar 20, 2017

Indeed, closing as an answered question.

@jorangreef
Copy link
Contributor Author

In case anyone has an issue with the lack of buffer-to-buffer Base64 encoding/decoding support in Node, I just finished https://github.com/ronomon/base64 as a stop-gap until these and other base64 issues are addressed. The encoder/decoder accepts and returns buffers without allocating any interim V8 Strings. The throughput gain from avoiding unnecessary string allocation is substantial over 4 MB+ buffers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. question Issues that look for answers.
Projects
None yet
Development

No branches or pull requests

6 participants