provide a way to introduce new encodings for `new Buffer('someString', 'someEncoding')` #2835

Mithgol · 2015-09-12T16:28:08Z

In the constructor new Buffer('someString', 'someEncoding') Node.js v4.0.0 itself supports a limited number of encodings: 'ascii', 'utf8', 'utf16le' (aka 'ucs2'), 'base64', 'hex', 'binary'. That's a half of a dozen. That's not plenty.

And that's why a widely used package iconv-lite (450+ direct dependents, ≈240+ thousands of daily downloads) provides a method (.extendNodeEncodings()) that adds a support of many other known encodings to the Buffer API.

However, iconv-lite does not seem to work in Node v4.0.0 well enough. Any use of an iconv-lite-provided encoding in an attempt of new Buffer('someLatinString', 'encodingName') results in some random output such as the following:

It seems to me that either 0fa6c4a was not enough to fix #1547 or a separate deeper issue exists.

Currently iconv-lite's extend-node.js changes the behaviour of the following methods:

in `SlowBuffer`	in `Buffer`
`SlowBuffer.prototype.toString`	`Buffer.prototype.toString`
`SlowBuffer.prototype.write`	`Buffer.prototype.write`
`SlowBuffer.byteLength`	`Buffer.byteLength`
	`Buffer.isEncoding`

Why those changes were enough in Node v0.10 and v0.12 but aren't in Node v4.0.0?

I may be wrong, but… it seems to me that in Node v0.12 the Buffer's constructor have used this.write(subject, encoding) internally but in the current Node v4.0.0 neither the Buffer's constructor nor its fromString helper do that. They seem to use binding.createFromString(string, encoding) (where necessary) or allocPool.write(string, poolOffset, encoding) and both of these come from process.binding('buffer') and aren't replaced by iconv-lite. And it won't be easy to replace them from userland, I suppose.

Is my assumption correct?

What should be done in iconv-lite (or in Node.js, or in both) for a multitude of encodings to work in the new Buffer('someString', 'encodingName') constructor correctly?

The text was updated successfully, but these errors were encountered:

Mithgol · 2015-09-12T19:37:55Z

And a status badge goes to #2798.

Mithgol · 2015-09-13T05:13:15Z

This issue may be seen as a spiritual successor to nodejs/node-v0.x-archive#1772.

bnoordhuis · 2015-09-13T10:08:07Z

I'm inclined to say this is solely on iconv-lite's head for trying to monkey-patch core, and I say this as the author of a module that does something similar. Opinions, anyone?

ChALkeR · 2015-09-13T11:17:31Z

Monkey-patching is not supported, and will never be for obvious reasons. That includes both overriding builtin methods and adding new methods to builtins (both Node.js and v8). Such code could and will break in minor or patch versions, or even within the same Node.js version.

It's entirely iconv-lite fault for breaking here.

On the other hand, a supported method of defining new encodings seems like a valid idea, given that overriding an already defined (both built-in or thirdparty) encoding produces an error.

Fishrock123 · 2015-09-13T12:57:00Z

Introducing new encoding options from userland seems valid, so long as it doesn't require any V8 monkey-business.

Patching core, especially in this case, isn't exactly supported. Buffer changes were forced on us via V8, nothing we can really do.

ChALkeR · 2015-09-13T13:12:22Z

lib/buffer.js#L191 — .isEncoding. This could be patched to be aware of user-defined encodings.
lib/buffer.js#L342 — .toString. This could be patched to support user-defined .toString for a new encoding.
lib/buffer.js#L87 — .fromString. This could be patched to support user-defined .fromString for a new encoding.
lib/buffer.js#L534 — .write. This could be patched to support user-defined .write for a new encoding.

We could add API like Buffer.registerEncoding(encoding, toBinary, fromBinary), which would register encoding if it doesn't exist yet, and that will throw an error if that encoding is already defined.

ChALkeR · 2015-09-13T13:17:43Z

On the other hand, I see no actual profit in the above. I am just saying that doing so is possible.

As always, partial (~⅓) usage of .extendNodeEncodings:

a/alinator-0.2.2.tgz/lib/requests/scheduleRequest.js:9:iconv.extendNodeEncodings();
a/appbuilder-2.9.3-426.tgz/lib/common/mobile/android/android-emulator-services.js:29:        iconv.extendNodeEncodings();
c/ctbc_payment-0.2.1.tgz/lib/ctbc_creditcard.js:5:iconv.extendNodeEncodings();
c/ctbc_payment-0.2.1.tgz/lib/ctbc_unionpay.js:5:iconv.extendNodeEncodings();
d/delatinise-cli-0.1.11.tgz/lib/delatinise.js:159:      iconv.extendNodeEncodings();
f/fiunis-2.1.1.tgz/fiunis.js:1:require('iconv-lite').extendNodeEncodings();
f/fidonet-squish-0.0.39.tgz/fidonet-squish.js:5:require('iconv-lite').extendNodeEncodings();
f/fidonet-jam-3.8.7.tgz/fidonet-jam.js:6:require('iconv-lite').extendNodeEncodings();
h/hzip-1.1.0.tgz/encoding/iconv-lite/lib/extend-node.js:7:    iconv.extendNodeEncodings = function extendNodeEncodings() {
h/hzip-1.1.0.tgz/encoding/iconv-lite/lib/extend-node.js:187:            throw new Error("require('iconv-lite').undoExtendNodeEncodings(): Nothing to undo; extendNodeEncodings() is not called.")
i/iconv-lite-0.4.10.tgz/lib/extend-node.js:8:    iconv.extendNodeEncodings = function extendNodeEncodings() {
i/iconv-lite-0.4.10.tgz/lib/extend-node.js:179:            throw new Error("require('iconv-lite').undoExtendNodeEncodings(): Nothing to undo; extendNodeEncodings() is not called.")
l/linez-3.2.1.tgz/js/linez.js:4:iconv.extendNodeEncodings();
n/node-captions-0.4.2.tgz/captions.js:12:iconv.extendNodeEncodings();
n/node-cmpp-3-1.1.7.tgz/cmppSocket.js:17:iconv.extendNodeEncodings();
n/nativescript-1.0.2.tgz/lib/common/mobile/android/android-emulator-services.js:29:        iconv.extendNodeEncodings();
n/node-ral-0.0.36.tgz/lib/ral.js:25:iconv.extendNodeEncodings();
n/node-ral-0.0.36.tgz/test/protocol/http_protocol_post_test.js:17:iconv.extendNodeEncodings();
r/rastreiojs-0.2.4.tgz/lib/index.js:21:iconv.extendNodeEncodings();
s/simteconf-0.6.10.tgz/simteconf.js:10:require('iconv-lite').extendNodeEncodings();
s/slap-0.1.28.tgz/lib/cli.js:80:    iconv.extendNodeEncodings();
s/sro-0.1.4.tgz/lib/sro.js:13:iconv.extendNodeEncodings();
u/umpay-transfer-1.0.4.tgz/index.js:10:iconv.extendNodeEncodings();
u/umpay-1.1.1.tgz/index.js:10:iconv.extendNodeEncodings();

That's actually pretty low (18 modules). I expect the total count to be around 50-60 modules.

@Mithgol Could you explain how iconv.extendNodeEncodings() is better than manually converting those encodings, for the reference?

Mithgol · 2015-09-13T15:14:17Z

@ChALkeR

It's better because it's easier to introduce to an existing project. It feels almost infinitely easier.

You just write require('iconv-lite').extendNodeEncodings() once and suddenly it just works everywhere like a charm.

You don't have to do anything else.

For example, you don't have to remember every place where Buffer(string, encoding) was written previously and carefully upgrade them to iconvLite.encode while simultaneously keeping in mind that Buffer(string) (without encoding) defaults to UTF-8 but iconvLite.encode doesn't.

seishun · 2015-10-15T18:21:50Z

I don't think it should be supported to change the behavior of core functions from userland. We don't do that anywhere else as far as I'm aware.

danielgindi · 2015-10-18T07:28:41Z

I think that allowing a module to add more encodings to Buffer - makes so much sense, that I can't understand the resistance.
Take for example a giant project, which has many points where encodings are used. It reads CSVs, it creates log files and Excel files, it is maybe reading outputs of other system modules or remote modules over the web. Encodings may vary! And you don't want to write so much code to handle encodings in so many places, it almost seems like a hack itself.

The proper solution would be to use the native Node.js features for handling Buffers and Encoding. But the thing that is missing is the support for new encodings.

Mithgol · 2015-10-21T09:11:26Z

@seishun

That's true, Node.js does not support extending the behaviour of its core functions anywhere else, but that's probably because Node.js does not have an obviously limited support of an obviously vast area (such as encodings) anywhere else.

Also, most of other such core modules are wrappers around third-party libraries, and that's their excuse. For example,

How should I extend crypto with SHA-3? — crypto is a wrapper around OpenSSL, just wait until that hash becomes supported in the upstream.
How should I extend Intl with a support for Tengwar? — We merely wrap around ICU, you should wait until Tengwar becomes supported by the Unicode Consortium according to their roadmap (subject to change).
How should I extend zlib with a support for unpacking .ZIP files in addition to its current inflating deflated streams? — zlib is merely a wrapper around the well-known zlib library and thus it's weird to add some features that are missing in the upstream.

Mithgol · 2015-12-21T20:15:43Z

This comment is a nudge after a couple of months.

trevnorris · 2015-12-28T18:33:05Z

@Mithgol Basically taking @ChALkeR's API, would
Buffer.registerEncoding(encoding, toBuffer, toString) work? Where toBuffer returns a Buffer and toString return a String. Both of which are propagated to the user.

Mithgol · 2015-12-28T18:43:28Z

Yes.

seishun · 2015-12-28T18:59:34Z

I don't think Buffer should be used for encoding/decoding beyond the few common encodings that it provides for convenience. It's simply outside of its scope of responsibility.

For example, you don't have to remember every place where Buffer(string, encoding) was written previously and carefully upgrade them to iconvLite.encode while simultaneously keeping in mind that Buffer(string) (without encoding) defaults to UTF-8 but iconvLite.encode doesn't.

TBH if one ends up with Buffer calls all over the codebase and needs to change the encoding in all of them, it seems like code smell. It's not node's job to help with bad architectural decisions.

And even in your example, you'd still have to carefully "upgrade" all the Buffer(string) calls to Buffer(string, 'whatever-encoding') anyway.

It's such a rare edge case that adding an API that will likely cause many issues down the road for questionable benefit doesn't seem worth it.

jasnell · 2016-03-16T00:16:37Z

Recommending that this be closed. Modules that want to provide support for other encodings can do so easily without monkey patching node or having node provide any kind of extension mechanism. I don't think we should be encouraging this anti-pattern further.

bnoordhuis · 2016-03-16T09:02:32Z

Let's close then.

Mithgol · 2016-03-16T23:11:37Z

Quick summary of changes that are necessary to do without require('iconv-lite').extendNodeEncodings():

Obvious replacements:
- Buffer.isEncoding → iconv.encodingExists
- buf.toString(encoding) → iconv.decode(buf, encoding)
- buf(str, encoding) → iconv.encode(str, encoding)
Cannot use method chaining; for example, Buffer(str).toString('utf7') becomes iconv.decode(iconv.encode(str, 'utf8'), 'utf7')
Cannot use encoding options of fs methods; for example,
- fs.readFileSync(filename, {encoding: encoding}) → iconv.decode(fs.readFileSync(filename), encoding)
- fs.writeFileSync(filename, content, {encoding: encoding}) → fs.writeFileSync(filename, iconv.encode(content, encoding))
Cannot use simple buf.toString(encoding, start, end) conversion, should nest iconv.decode(buf.slice(start, end), encoding)
Sometimes adding iconv-lite to dependencies becomes necessary to check iconv.encodingExists explicitly in places where Buffer.isEncoding silently worked because of require('iconv-lite').extendNodeEncodings() in another dependency.

¯\_(ツ)_/¯

ChALkeR · 2017-07-19T17:27:22Z

@Mithgol #13644 seems to solve this in a nicer (and more standardized) way. Does it cover the usecases which you have in mind for this?

Mithgol · 2017-07-20T14:19:56Z

@ChALkeR Unfortunately, not all of them, because the list of WHATWG-supported encodings seems to be quite limited.

For example, UTF-7 is not supported and hence I cannot use it for Fidonet Unicode substrings.

I'd better stay on iconv-lite.

Mithgol mentioned this issue Sep 12, 2015

use g++-4.8 (support C++11) and start testing against Node.js version 4.0 on Travis CI ashtuchkin/iconv-lite#105

Merged

vkurchatkin added feature request Issues that request new features to be added to Node.js. buffer Issues and PRs related to the buffer subsystem. labels Sep 12, 2015

Mithgol mentioned this issue Sep 12, 2015

Packages that currently do not work with Node.js v4.0 [List] #2798

Closed

Mithgol added a commit to Mithgol/node-fidonet-jam that referenced this issue Sep 12, 2015

don't test against iojs on Travis CI (nodejs/node#2835 is the reason)

0f43741

Mithgol added a commit to Mithgol/fiunis that referenced this issue Sep 12, 2015

don't test above Node 0.12 on Travis (nodejs/node#2835 is the reason)

8f1c5a1

Mithgol added a commit to Mithgol/node-twi2fido that referenced this issue Sep 12, 2015

warining: don't use above Node 0.12 (nodejs/node#2835 is the reason)

48cb70c

Mithgol added a commit to Mithgol/echolist-csv2hpt that referenced this issue Sep 12, 2015

warning: don't use above Node 0.12 (nodejs/node#2835 is the reason)

5212b69

Mithgol added a commit to Mithgol/fido2rss that referenced this issue Sep 12, 2015

warning: don't use above Node 0.12 (nodejs/node#2835 is the reason)

f7f2374

Mithgol added a commit to Mithgol/fiunis that referenced this issue Sep 13, 2015

v2.1.2: support Node.js and io.js by working around nodejs/node#2835

a8a06f2

Mithgol added a commit to Mithgol/node-fidonet-jam that referenced this issue Sep 13, 2015

v3.8.9: support Node v4 and io.js by working around nodejs/node#2835

5e67f87

Mithgol added a commit to Mithgol/node-twi2fido that referenced this issue Sep 13, 2015

v1.3.3: upgrade fiunis to version ~2.1.2 (avoid nodejs/node#2835)

29054a9

Mithgol added a commit to Mithgol/fido2rss that referenced this issue Sep 13, 2015

v0.8.2: upgrade some dependencies, avoid nodejs/node#2835 (JAM only)

6aea1e1

Mithgol added a commit to Mithgol/echolist-csv2hpt that referenced this issue Sep 13, 2015

Buffer's constructor is not used and thus nodejs/node#2835 is avoided

89271f3

Mithgol added a commit to Mithgol/fido2rss that referenced this issue Sep 13, 2015

Squish support avoids nodejs/node#2835: Buffer's constructor not used

6cd837b

Mithgol mentioned this issue Sep 27, 2015

breaking changes in iconv-lite version 0.4.12 ashtuchkin/iconv-lite#107

Open

alexlamsl mentioned this issue Feb 6, 2016

UTF not BOM kangax/html-minifier#372

Closed

ChALkeR mentioned this issue Mar 2, 2016

src: remove BINARY encoding #5504

Closed

4 tasks

bnoordhuis closed this as completed Mar 16, 2016

Mithgol added a commit to Mithgol/dauria that referenced this issue Mar 16, 2016

won't extend Node's supported encodings: nodejs/node#2835 is closed

944ea2d

Mithgol added a commit to Mithgol/fiunis that referenced this issue Mar 16, 2016

won't extend Node's supported encodings: nodejs/node#2835 is closed

d102b25

Mithgol added a commit to Mithgol/simteconf that referenced this issue Mar 16, 2016

won't extend Node's supported encodings: nodejs/node#2835 is closed

b7bf87a

Mithgol added a commit to Mithgol/echolist-csv2hpt that referenced this issue Mar 16, 2016

won't extend Node's supported encodings: nodejs/node#2835 is closed

ea7469d

Mithgol added a commit to Mithgol/node-fidonet-jam that referenced this issue Mar 16, 2016

won't extend Node's supported encodings: nodejs/node#2835 is closed

545811f

Mithgol added a commit to Mithgol/node-twi2fido that referenced this issue Mar 16, 2016

won't extend Node's supported encodings: nodejs/node#2835 is closed

676400e

Mithgol added a commit to Mithgol/node-twi2fido that referenced this issue Mar 16, 2016

v2.3.5: use iconv-lite explicitly after nodejs/node#2835 is closed

da5a97d

trevnorris mentioned this issue Mar 30, 2016

Request UTF-32 Character Encoding #5956

Closed

seishun mentioned this issue Mar 7, 2018

feature request: register custom encodings nodejs/help#1133

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

provide a way to introduce new encodings for `new Buffer('someString', 'someEncoding')` #2835

provide a way to introduce new encodings for `new Buffer('someString', 'someEncoding')` #2835

Mithgol commented Sep 12, 2015

Mithgol commented Sep 12, 2015

Mithgol commented Sep 13, 2015

bnoordhuis commented Sep 13, 2015

ChALkeR commented Sep 13, 2015

Fishrock123 commented Sep 13, 2015

ChALkeR commented Sep 13, 2015

ChALkeR commented Sep 13, 2015

Mithgol commented Sep 13, 2015

seishun commented Oct 15, 2015

danielgindi commented Oct 18, 2015

Mithgol commented Oct 21, 2015

Mithgol commented Dec 21, 2015

trevnorris commented Dec 28, 2015

Mithgol commented Dec 28, 2015

seishun commented Dec 28, 2015

jasnell commented Mar 16, 2016

bnoordhuis commented Mar 16, 2016

Mithgol commented Mar 16, 2016

ChALkeR commented Jul 19, 2017 •

edited

Loading

Mithgol commented Jul 20, 2017

provide a way to introduce new encodings for new Buffer('someString', 'someEncoding') #2835

provide a way to introduce new encodings for new Buffer('someString', 'someEncoding') #2835

Comments

Mithgol commented Sep 12, 2015

Mithgol commented Sep 12, 2015

Mithgol commented Sep 13, 2015

bnoordhuis commented Sep 13, 2015

ChALkeR commented Sep 13, 2015

Fishrock123 commented Sep 13, 2015

ChALkeR commented Sep 13, 2015

ChALkeR commented Sep 13, 2015

Mithgol commented Sep 13, 2015

seishun commented Oct 15, 2015

danielgindi commented Oct 18, 2015

Mithgol commented Oct 21, 2015

Mithgol commented Dec 21, 2015

trevnorris commented Dec 28, 2015

Mithgol commented Dec 28, 2015

seishun commented Dec 28, 2015

jasnell commented Mar 16, 2016

bnoordhuis commented Mar 16, 2016

Mithgol commented Mar 16, 2016

ChALkeR commented Jul 19, 2017 • edited Loading

Mithgol commented Jul 20, 2017

provide a way to introduce new encodings for `new Buffer('someString', 'someEncoding')` #2835

provide a way to introduce new encodings for `new Buffer('someString', 'someEncoding')` #2835

ChALkeR commented Jul 19, 2017 •

edited

Loading