Skip to content
This repository has been archived by the owner on Jul 31, 2018. It is now read-only.

Buffer.from/Buffer.alloc/Buffer.allocUnsafe/Buffer() soft-deprecate #7

Closed
wants to merge 1 commit into from

Conversation

jasnell
Copy link
Member

@jasnell jasnell commented Jan 25, 2016

Add the Buffer.from/alloc/allocUnsafe EPS (correctly this time, I hope)

### Buffer.alloc(size[, fill])

Allocates a new initialized buffer of `size` bytes. If `fill` is specified
and is a string, the Buffer is allocated off the shared pool by calling:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and is a string

I would expect it to work as Buffer(size).fill(fill) does, in all cases (everything else is implementation details), except for the cases where .fill() doesn't work (e.g. .fill('') doesn't fill).

For example, I would expect Buffer.alloc(2, 10) to give the same result as Buffer(2).fill(10)<Buffer 0a 0a>.

Also, is Buffer.alloc(1e8) really allocated from the pool?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is Buffer.alloc(1e8) really allocated from the pool?

Allocations are of size Buffer.poolSize >> 1 or smaller. So it could, depending on what the user sets poolSize to. But by default no.

@jasnell Since we're here, may want to add Buffer.alloc(size[, fill[, encoding]]) so the encoding of the string can be specified. I should have done this on initial implementation but didn't think about it then.

Also, if alloc() is used then I suggest we also don't use a pool. Since other Buffer's contents can be accessed via buffer.buffer it might be considered a security violation, and the whole point of this thing is to not do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trevnorris ... +1 SGTM
On Jan 25, 2016 4:27 PM, "Trevor Norris" notifications@github.com wrote:

In XXX-buffer-alloc-from.md
#7 (comment):

+* Soft-Deprecate the existing Buffer() constructors and introduce new

  • Buffer.from(), Buffer.alloc() and Buffer.allocUnsafe() factory methods
  • to address Buffer API usability and security concerns.
    +* Add --zero-fill-buffers command line option to node to force all Buffers
  • to zero-fill when created

+## Details
+
+The details of the proposal are captured by [Pull Request 4682][] in the
+nodejs/node repository (https://github.com/jasnell/node/tree/buffer-safe).
+
+### Buffer.alloc(size[, fill])
+
+Allocates a new initialized buffer of size bytes. If fill is specified
+and is a string, the Buffer is allocated off the shared pool by calling:

Also, is Buffer.alloc(1e8) really allocated from the pool?

Allocations are of size Buffer.poolSize >> 1 or smaller. So it could,
depending on what the user sets poolSize to. But by default no.

@jasnell https://github.com/jasnell Since we're here, may want to add Buffer.alloc(size[,
fill[, encoding]]) so the encoding of the string can be specified. I
should have done this on initial implementation but didn't think about it
then.

Also, if alloc() is used then I suggest we also don't use a pool. Since
other Buffer's contents can be accessed via buffer.buffer it might be
considered a security violation, and the whole point of this thing is to
not do that.


Reply to this email directly or view it on GitHub
https://github.com/nodejs/node-eps/pull/7/files#r50779681.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trevnorris

But by default no.

I made a mistype, was talking about Buffer.alloc(1e8, 'a'). If that's not allocated off the shared pool, then the text here should be changed:

If fill is specified and is a string, the Buffer is allocated off the shared pool by calling

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh... yep, I keep forgetting the little nuances there of what is pooled vs. what isn't. I'll get the docs updated properly in the next round :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trevnorris ... buf.fill() does not currently accept an encoding argument does it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasnell it doesn't. i'm working on that patch right now to fix the API hole.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to echo @ChALkeR's point that buf.fill(val) currently accepts number values and this API should do that too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the proposal PR has been updated to account for that. For
completeness, @trevnorris and I are also working on allowing both fill()
and alloc(size, fill) to accept a Buffer and to accept an encoding
when a string is passed in, e.g:

// these are silly examples, but you should get the idea:
Buffer.alloc(10, 'foo', 'ascii');
Buffer.alloc(10, Buffer.from('foo'));
Buffer.alloc(10).fill('foo', 'ascii');
Buffer.alloc(10).fill(Buffer.from('foo'));

@ChALkeR
Copy link
Member

ChALkeR commented Jan 26, 2016

+1 on this, but the text should be clarified, details are a bit off.


```
Buffer.allocUnsafe(size).fill(fill);
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. Instead a simple call to new Uint8Array(), without setting the kNoZeroFill flag, and simply setting the prototype of the uint8array to Buffer.prototype, using Object.setPrototypeOf().

EDIT: nm. read this out of context. the above simply states the allocation mechanics. which is also stated below.

@trevnorris
Copy link
Contributor

@jasnell Thanks much for working on this. Nice to see a simple EPS that's to the point.

@rvagg
Copy link
Member

rvagg commented Jan 26, 2016

I still object to the word "unsafe" here ("safe" as well) but if I'm the only one left pushing this barrow I won't hold up progress with it.

@seishun
Copy link

seishun commented Jan 26, 2016

I'm also still against the word "unsafe". Either call the functions according to what they do without subjective assessments, or just don't provide the non-zero-filling version at all.

@ChALkeR
Copy link
Member

ChALkeR commented Jan 26, 2016

@seishun That's pretty much what it does, as @joepie91 noted here: #4 (comment), #4 (comment), and all over that thread.

Also, from my point of view the technical aspect doesn't really matter that much — what matters is how that would affect the ecosystem security, and explicitly marking at a Unsafe will surely have a positive effect on that.

@ChALkeR
Copy link
Member

ChALkeR commented Jan 26, 2016

/cc @evilpacket, @feross, @mafintosh.

@trevnorris
Copy link
Contributor

@ChALkeR Accept that there are those who disagree that Unsafe is a proper technical description. Comments like:

An unitialized memory allocation is always unsafe from a memory safety point of view.

are fundamentally opinion of a developer's aptitude and don't have technical merit. I've agreed to proceed with using Unsafe, despite the fact agree with @rvagg and @seishun that it's an incorrect description of the operation, because it shouldn't be holding up the API document as a whole. Maybe I'll fight it more once the document gets closer to completion, but in the mean time want to proceed with getting real work done.

@jasnell
Copy link
Member Author

jasnell commented Jan 27, 2016

At this point re: the naming, the focus needs to be primarily on refining the actual changes and documentation as opposed to bikeshedding on the name. I don't think we're going to identify a single name that will make everyone happy so at some point we're going to need a compromise on it.

@feross
Copy link

feross commented Jan 28, 2016

Apologies if I've missed this discussion from another thread, but what's the rationale for accepting an optional fill value to Buffer.alloc?

This seems clearer and simpler:

Buffer.alloc(5).fill('hello')

than this:

Buffer.alloc(5, 'hello')

And it keeps the API surface area a bit smaller and simpler.

Is there some kind of optimization we can do at the C++ level if we know the fill string?

@jasnell
Copy link
Member Author

jasnell commented Jan 28, 2016

I'm working on seeing if we can optimize it down to a single step. We may not be able to or it may not be worth the trouble -- in which case we should likely not accept the fill on the alloc. It's there currently to facilitate experimentation.

@trevnorris
Copy link
Contributor

@feross That API would require a double-fill. if users don't care about the performance implications then that's fine, but the decision was made around preventing as much impact as possible. I highly doubt there's a magical way I haven't heard of to do operation look-ahead and know the user is following up the alloc() with a fill().

@feross
Copy link

feross commented Feb 1, 2016

@trevnorris My concern is the API is more complicated than it needs to be. What I don't like in the current proposal is that alloc() and allocUnsafe() don't take the same arguments:

Buffer.alloc(size, [fill], [encoding])
Buffer.allocUnsafe(size)

If we eliminate the extra arguments to alloc() and make it parallel to allocUnsafe(), we don't lose anything.

Buffer.allocUnsafe(size).fill(fill)

If users don't care about the performance implications, they can do:

Buffer.alloc(size).fill(fill)

@feross
Copy link

feross commented Feb 1, 2016

Of course, if there's some way to optimize an alloc + non-zero fill value into one step, then that's a good argument for keeping the additional arguments. But if it's just calling fill() internally, then why support the extra arguments?

@jasnell
Copy link
Member Author

jasnell commented Feb 1, 2016

@feross ... i'm starting to lean that way but I have a couple of ideas of optimizing that I want to try out first. Hopefully I'll be getting to those this week.

@feross
Copy link

feross commented Feb 1, 2016

@jasnell Nice - thanks for the update.

@trevnorris
Copy link
Contributor

@feross Even if the operation is completely performed at the C++ level, it will essentially be no more than a Buffer.allocUnsafe(n).fill().

@jasnell
Copy link
Member Author

jasnell commented Feb 3, 2016

@feross @trevnorris ... OK, here's the current breakdown:

  1. Buffer.alloc(size) -- Always calls the underlying createBuffer() with flags[kNoZeroFill] = 0, which allocates a zero-filled Buffer using calloc.
  2. Buffer.alloc(size, fill[, encoding]) -- Always calls the underlying createBuffer() with flags[kNoZeroFill] = 0, which allocates a non zero-filled Buffer using malloc but immediately calls buf.fill(). This never slices off the shared pool so there's no risk of accidental disclosure via the buffer.buffer property.
  3. Buffer.allocUnsafe(size) -- Equivalent to the current new Buffer(size) which slices off the shared Buffer pool if size <= (Buffer.poolSize / 2), otherwise it calls createBuffer() with flags[kNoZeroFill] = 1.

While Buffer.alloc(size[, fill[, encoding]]) and Buffer.allocUnsafe(size).fill(fill) are very similar, they differ in that the former will never slice off the share Buffer pool so it will never have the buffer.buffer. This means that it will be safer-but-slower in general than Buffer.allocUnsafe(size).

In general, most developers will typically end up using Buffer.alloc(size) or Buffer.alloc(size[, fill[, encoding]]). The Buffers will be safe and generally perform rather well. For developers who need to eek out that extra performance, they can use Buffer.allocUnsafe(size). Most uses of new Buffer(size).fill(n) can simply be replaced by Buffer.alloc(size, fill).

@rvagg
Copy link
Member

rvagg commented Feb 3, 2016

The asymmetry also serves the purpose of highlighting that they are quite different operations with different implications. We can even explain why the same arguments are not supported in the non-filled version.

@jasnell
Copy link
Member Author

jasnell commented Feb 3, 2016

One additional difference that I just implemented in Buffer.alloc(size[, fill[, encoding]]) is that fill cannot be a zero-length string (it will throw). The buffer.fill() method, on the other hand, will accept a zero-length string, in which case the buffer will not be filled. By forbidding zero-length string in Buffer.alloc() we ensure that the expectation of a filled Buffer is always fulfilled

@feross
Copy link

feross commented Feb 3, 2016

@jasnell The asymmetry makes sense to me now.

The buffer.fill() method, on the other hand, will accept a zero-length string, in which case the buffer will not be filled.

Wow.

@jasnell
Copy link
Member Author

jasnell commented Feb 3, 2016

@feross .. yep, I just discovered that a couple weeks ago myself ;-).

@jasnell
Copy link
Member Author

jasnell commented Feb 3, 2016

Slight correction ... Buffer.alloc(size, fill[, encoding]) will call createBuffer() with flags[kNoZeroFill] = 0 if fill is undefined, but it will call it with flags[kNoZeroFill] = 1 if fill is defined, this is to protect against the buffer being double filled.

@ChALkeR
Copy link
Member

ChALkeR commented Feb 3, 2016

@jasnell I found that out only a week ago (#7 (comment)), and that looks like a bug to me, actually. I would say that the correct behavior on .fill('') is to throw, and the compatible behavior is to fill with zeros.

@rvagg
Copy link
Member

rvagg commented Feb 3, 2016

calling fill() is a pretty good indication that you actually want it filled .. with something. I'd go with something equivalent of if (val == null || val == 0) val = 0; to cover all of the odd cases, undefined, null, '', '0', [] (wut?).

@jasnell
Copy link
Member Author

jasnell commented Feb 3, 2016

I certainly don't disagree with that, but I'll submit a separate PR to fix
that behavior. I don't want to get it wrapped up into this one too much.
On Feb 3, 2016 12:33 AM, "Rod Vagg" notifications@github.com wrote:

calling fill() is a pretty good indication that you actually want it
filled .. with something. I'd go with something equivalent of if (val
== null || val == 0) val = 0; to cover all of the odd cases, undefined,
null, '', '0', .


Reply to this email directly or view it on GitHub
#7 (comment).

@trevnorris
Copy link
Contributor

Unfortunate that the EP couldn't serve its purpose of hashing out the API. C'est la vie.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants