-
Notifications
You must be signed in to change notification settings - Fork 66
Buffer.from/Buffer.alloc/Buffer.zalloc/Buffer() soft-deprecate #4
Conversation
You have my +1 on this |
+1 for the primary goal, except for the new method names. That is not a nitpick, I consider those to be very important. With the currently included method names, this proposal fixes only problem 2, but does not fix problem 1 from my note — inexperienced users will still use an unsafe method in a dangereous way without reading the docs first, and there will be more bugs in the community modules. See also q/a question 18. Also, I am a bit concerned about the secondary goal — even if that change is sometime pushed to v0.12.n+1, users that would use v0.12.n will be endangered. That does not change over time, replace «n» with the latest released version to that time (and «v0.12» with any branch). This point is valid until |
Is the intention to define function Buffer() {}
console.log(Buffer.from === undefined); // true
// See buffer.js#L47-L71, nodejs/node @ 66b9c0d (linked above)
Object.setPrototypeOf(Buffer.prototype, Uint8Array.prototype);
Object.setPrototypeOf(Buffer, Uint8Array);
console.log(Buffer.from === undefined); // false
console.log(Buffer.from === Uint8Array.from); // true See also: |
Yes, it overrides the built in from that is inherited from Uint8Array.
|
Thank you for clarifying these semantics. If this proposal is accepted, the docs should make mention that |
I would also like to add that I think it should be made marginally "harder" to use That is: If a developer only glances at the API without properly reading docs, it would be beneficial if they picked the method which is safest. Those who need / want the extra performance can find it by reading the documentation more closely. Someone who wants to allocate some memory may not read every buffer method - they'll likely stop when they find one which fits the bil Something like |
Looks good! This would have solved the security problems we've found in bittorrent-dht and ws. Thanks for turning this into an EPS @jasnell. |
+1 on this. Again, I feel that 'unsafe' is the most accurate and beginner-friendly way to convey this, rather than 'raw' or other more 'neutral' terms. |
This proposal does not actually fix the bug. new Buffer(variable) will continue to result in information disclosure vulnerabilities until existing code is changed. I still don't understand why this behavior change isn't being made. |
+1. |
Nice work. This solves the core problem that affected ws and bittorrent-dht, which is |
@nodejs/ctc this is probably a good place to move technical discussion for now. @jasnell can we have clarification on what "soft-deprecation" means? Is it doc-deprecation? With "hard-deprecation" meaning (apologies for the long prose to follow but this seems like the appropriate to finally lay out my current thoughts) Backporting to v0.10, v0.12, and v4 could actually make this more painful than it should be, we'll then have a pretty big feature-difference between patch-versions. I'm guessing the intention is to make the new factory functions available to everyone using Node.js but that makes the assumption that the majority of users are using the latest release in each line, which is not the case at all. I've seen companies justify the ongoing use of older v0.10 in production with typical risk-aversion combined with claims that being behind load-balancers and TLS terminators protect them from most of the security flaws that would normally force people to upgrade. There's also situations where Node is being used completely disconnected and therefore less at risk to security vulnerabilities. Even beyond commercial use, I'd really prefer we stick to semver as much as possible as it solves a lot of these kinds of problems. At this stage I think my preference, if we are determined to act, would be something like:
I don't think the accidental- To re-state what I said above, introducing factory methods to v0.10 and v0.12 will not solve the version discrepancies because it assumes people use the latest, by not adding the factory methods to those versions people are much more likely to write and use feature-detection mechanisms, which is good, and we could even encourage this in the docs ( While I accept that we have a usability problem here and that core has to take some responsibility for the way people use Node, I don't accept that we should treat such problems in the same way that we treat core security vulnerabilities. Hence my insistence on taking a longer view on this with v0.10 and v0.12 to prioritise greater ecosystem stability over the possibility of ecosystem packages containing similar API misuse to what we are seeing. Vulnerabilities in individual npm packages are almost always going to be of much lesser impact that vulnerabilities in core and they are also much easier to patch and distribute without the kind of breakage and uncertainty that happens when we do it with core. And lastly, something I keep on repeating, it's essential that we do our best to educate people to think about Node as a server-side programming platform where evil can wreak havoc if you don't think through what you're doing carefully and be attune to the kinds of common vulnerabilities that can be created. Node.js is a platform that attracts users who are used to a browser sandbox and are not accustomed to thinking as carefully about input sanitisation and careful interaction with system resources. This is going to be a continual theme (TrendMicro being the current big news on this front) and while we can tackle it in part by making the API safer we also need to accept that nannying users risks lulling them into a perception that Node.js is more of a sandbox than it is. We must accept a caveat emptor approach. |
This is what I meant in all my messages, and I hope that this is what it means here.
v0.10 and v0.12 are not semver, but I understand your concerns. Though, note that this EPS does not describe backporting this feature to v0.10/v0.12, and I personally view that as a separate question. If you want to know my opinion on this — backporting the new API methods to v0.10 and v0.12 would assert faster ecosystem migration to the safer methods, and will motivate people who are using older patch releases of v0.10 and v0.12 to update to the latest patch version (which I also view as a positive change), but only when they will update their libraries. Also note the the popular libraries could have these concerns themselves, so they could intoduce migration to the new Buffer api in a semver-major version of the library. Also, merging this to all the supported branches will allow new libraries to use the safer API from the start. Still, I view this discussion as controversial and less important, thus I do not want to discuss this in the current pull, unless backporting new API changes is included to the EPS itself.
Of course it wouldn't be hit straight away, partially because the old methods would be only soft-deprecated for quite a while, partially because people generally don't react very fast and there are a lot of packages that need to be changed. The primary goal of this is to provide safer API for use in the future (as it seems to me),
Again: I view this to be very important, because choosing wrong method names would fix only the second problem (accidental
Please, no. While new API will cause the previous versions (in the branches it gets backported to) to crash (if libraries start relying on new API), this change will make them leak uninitialized buffers unnoticed (if the libraries start relying on zero-fill-by-default), which is far far worse. Also note that you can't stop them from relying on zero-fill-by-default once it's introduced — almost noone will want to have two zero-fills.
Strongly agreed, moreover, there are several more specific education-related issues that I wanted to discuss after this gets fixed (and one other change).
The problem is (as you already said) that they already have this perception, and there are no reasons not to guard them from common pitfalls (like the Buffer API), leaving more dangereous methods to be the less obvious to use ones and reachable by more experienced developers. Btw, if your statement was in protection of not changing the API, then it was a very bad reasoning for not making the ecosystem safer — those effects are incomparable, |
@rvagg Well, @ChALkeR has said most of it, but just to weigh in on the naming issue once again: it is critical that the That being said, I do not particularly care about using |
Do you think this is a good case for doing something similar to ie. a widely used userland module which provides these semantics, using native methods where available. |
I realize you weren't asking me specifically, but: it's actually not entirely clear to me what the purpose of the Aside from that, many people still perceive a 'cost' to using a userland module, even if there isn't any such cost. "I don't want to depend on third-party modules for this" is still a commonly heard argument in #Node.js, and it is still an extra hurdle for people to take. The same concern applies to "recommending people to write a polyfill". |
@rvagg ... thank you for the lengthy review... I think you and I are on the same page. @ChALkeR @joepie91 and others... the key challenge with the Currently I'm thinking:
|
No, it isn't. An unitialized memory allocation is always unsafe from a memory safety point of view. That you can mitigate the issue doesn't change that the operation itself is unsafe. That there are 'safe uses' also doesn't change that. None of that concerns the operation itself. EDIT: And if you (general you, not personal you) are sufficiently competent to understand how to use an uninitialized allocation, then you are also capable of understanding that "unsafe" concerns memory safety. Yes, this makes it "look scary" - that is precisely the point. There's no technical argument not to. |
@jasnell I think the rust approach to this is quite nice: https://doc.rust-lang.org/book/unsafe.html The general theme is that pretty much everything else is safe, and unchecked usage of this function is unsafe. Therefore the method gets marked unsafe, and you are required to wrap it in some safety. Node.js and rust have quite different target audiences and use-cases, but I think in this case there is some overlap. Another argument for making them more different is that if you're skimming through code, a single letter difference isn't always immediately visible. Something that can potentially expose runtime memory should look visually distinct, and jump out at the reader. As with everything else I've written, this is all IMO. |
To be clear, personally I have no problem with using |
@jasnell I understand - I believe you've explicitly remarked on representing objections from others in the previous thread, so I've been more or less assuming that throughout the course of the threads :) Hence my attempt to re-explain why the use of the If there are people who still object to the usage of the term I've made this argument several times now, but technical counter-arguments have not been forthcoming - and it seems silly to block the decision on an objection that's not backed by a technical argument, or which may even already have been retracted because of the "memory safety" distinction. |
I have no hard feeling on the need of the That is valid if and only if the deprecation message (both the soft-deprecation docs one and the future hard-deprecation code one) of Let's wait for what the opponents of the |
I really like
And for bonus points, doesn't include 👍 |
Yes but |
I take offence at your insistence that usage of uninitialized memory is always unsafe. Take the following: var b = Buffer.alloc(n).fill('abc'); or maybe this: var b = Buffer.alloc(n);
fs.read(fd, b, 0, b.length, 0, cb); or possibly I'm just not being lazy and actually keep track of what I'm doing. The rebuttal of having a moment of potentially "unsafe" memory is pointless. There's a potential for anything to happen if a developer isn't careful. Allowing unchecked variables to be processed from an outside source is a pretty big mess up regardless of what the results are. Don't tell me what's unsafe. Just tell me what's technically happening. That said, |
@jasnell There was a mess-up in how these EPS's are supposed to be processed. The document is supposed to start with |
What the programer did was assume that new Buffer(object) always serializes a representation of object. That doesn't happen when object is a number. Pointless pontificating about "server-side can unleash evil" and the as-yet unsuccessful idea of sufficiently smart programmers doesn't change the underlying problem, which can be fixed by a simple change that requires no action by any programmer. The current approach in this branch doesn't fix this. Instead it creates new constructors, which existing code will need to be rewritten to use. In the meantime existing code remains vulnerable. The funny overload behavior will remain. |
@jasnell Also before this is merged the API documentation (not full, just technical overview) should be placed directly in this EPS. |
I just found a few more modules affected by this issue:
Not a huge deal, since I think they're pretty hard to exploit. But wanted to share the additional data points. |
@jasnell I've thought about the backporting strategy a bit more, overnight. I can see the following options:
All of the above options assume that an opt-in flag exists (and is backported) to forcibly zero-fill all buffers, application-wide. I think we're all in agreement about that part. @ChALkeR You say that |
About 8% of npm modules use I don't think that we can numerically estimate the potential damage introduced by people removing the zero-fill. |
@jasnell Note on why |
|
I've got Buffer.from working correctly in the PR with no issues. All tests
|
Is |
It seems like the most significant argument against a backported change to However, the conclusion drawn from this has been to paint any backported change as dangerous. I think this can be revisited. -- A silly proposal -- Consider this: the only requirement for a less-dangerous This was touched upon in nodejs/node#4660 (comment):
Zeroes are indeed a valid value. So are ones. So are the bytes in the ASCII string "banana", repeated over and over. Or the current Node.js version string, followed by all zeroes. You name it. Now don't get me wrong, zeroing out is absolutely the most sensible choice in every other way. It's backwards compatible, in the sense that all-zeroes was always a valid outcome with uninitialized memory. It's much more performant than repeating an ASCII string, or some other silly initialization. The only drawback is that it creates a new pitfall, making people wrongly assume that zeroing out happens in all versions. If (and only if) that pitfall is determined to be a deal breaker, I propose that we initialize -- Why that's undesirable --
-- The tradeoff -- Is the depending-on-all-zeroes pitfall so great that we can sacrifice some more performance in initialization? |
But it could throw on all values. Eventually. |
That makes no actual difference with being zero-filled.
I do not view this (zero-fill-all-the-buffers) approach to be a performance vs security tradeoff. I view it as a bad ecosystem security vs even worse ecosystem security tradeoff. When I mention performance concerns, I mention that those are the concerns that some people would have when updating to new Node.js (hence delaying update and staying on the unpatched version) and other people would have when deciding whether or not keep their own initialization of allocated Also, the proponents of just-zero-fill-all-the-buffer-and-dont-touch-the-api approach are forgetting that it alone would not even fix the issue completely even in the long term (taking only patched versions of Node.js into an account). |
Proposal PR has been updated to use the names |
Thinking further about the issue over the default behavior of |
@lime: The issue is not so much that people would expect a zero-filled Buffer, but that they would expect a safely filled Buffer - ie. not containing stuff that was previously in memory. That problem is not resolved by filling the Buffer with non-zero (or even random) data, unfortunately, and the same consequences remain applicable. @jasnell, @trevnorris: How likely is it that Aside from that:
While I like this idea, we should expect some backlash from some users who don't like warnings to appear in their |
@joepie91 & @ChALkeR: Since my proposal is slightly tongue-in-cheek, I won't make any big attempts to defend it. However, I'd like to try clarifying my intent just a bit. The way I see it, the documented behaviour could go something like this:
Now, the problem at hand is that documentation isn't enough. Instead of reading / remembering this documented behaviour, people will take a look at the observable behaviour of If the observable behaviour of |
The suggested deprecation warning printed to |
That would work only in an ideal world where everyone reads docs and follows them. But, as I already said above, it should be good to zero-fill |
Given how many of modules are currently using |
I believe I specifically said, in the comment you quoted, that documentation isn't enough. That is why I suggest that That something can be a different kind of initialization. FWIW, I don't think this would be desirable. That something can also be a deprecation warning, as suggested by @jasnell. Personally, I think this sounds like the best approach. |
@ChALkeR Buffer at minimum must be the Uint8Array constructor. ES6 inheritance requires this. So throwing on all values also isn't an option. |
For completeness we need to talk about the native |
@rvagg ... Yep, I was going to turn my attention to the native side next
|
Closed this because I realized I had screwed the pull request up. Opening a new corrected PR. Discussion can continue here, of course. |
Refs: #7 |
Refs: nodejs/node#4682
Refs: nodejs/node#4660