Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer: Invalid bytes length calculation #11165

Closed
avchugaev opened this issue Feb 4, 2017 · 11 comments
Closed

buffer: Invalid bytes length calculation #11165

avchugaev opened this issue Feb 4, 2017 · 11 comments
Labels
buffer Issues and PRs related to the buffer subsystem. doc Issues and PRs related to the documentations. good first issue Issues that are suitable for first-time contributors.

Comments

@avchugaev
Copy link

Node Version: v7.5.0
Platform: Windows 10 x64

It seems that Buffer.byteLength() method and buffer.byteLength returns different values for base64 encoding. Maybe for some others too.

Here is TypeScript class that works with encodings:

export type EncodingName = 'utf8' | 'utf16le' | 'ascii' | 'latin1' | 'ucs2' | 'hex' | 'base64' | 'binary';


export default class Encoding {
    public static UTF8: Encoding = new Encoding('utf8');
    public static UTF16LE: Encoding = new Encoding('utf16le');
    public static ASCII: Encoding = new Encoding('ascii');
    public static LATIN1: Encoding = new Encoding('latin1');
    public static UCS2: Encoding = new Encoding('ucs2');
    public static HEX: Encoding = new Encoding('hex');
    public static BASE64: Encoding = new Encoding('base64');
    public static BINARY: Encoding = new Encoding('binary');


    public static convert(originalString: string, originalEncoding: Encoding, targetEncoding: Encoding): string {
        return originalEncoding.getBytes(originalString).toString(targetEncoding.encodingName);
    }


    private _encodingName: EncodingName;


    get encodingName(): EncodingName {
        return this._encodingName;
    }


    constructor(encodingName: EncodingName = 'utf8') {
        this._encodingName = encodingName;
    }


    public getBytesCount(text: string): number {
        return Buffer.byteLength(text, this.encodingName);
    }


    public getBytes(text: string): Buffer {
        return Buffer.from(text, this.encodingName);
    }
}

let source: string = 'abc123 \u03C0';               // abc123 π
let base64String: string = Encoding.convert(source, Encoding.UTF8, Encoding.BASE64);
let originalString: string = Encoding.convert(base64String, Encoding.BASE64, Encoding.UTF8);

console.log(
    Encoding.UTF8.getBytes(source).byteLength,      // 9
    Encoding.UTF8.getBytesCount(source)             // 9
);

console.log(
    Encoding.BASE64.getBytes(source).byteLength,    // 5
    Encoding.BASE64.getBytesCount(source)           // 6  - Why different length?
);

console.log(originalString);                        // abc123 π
console.log(base64String);                          // YWJjMTIzIM+A
@addaleax addaleax added buffer Issues and PRs related to the buffer subsystem. doc Issues and PRs related to the documentations. good first issue Issues that are suitable for first-time contributors. labels Feb 4, 2017
@addaleax
Copy link
Member

addaleax commented Feb 4, 2017

It seems that Buffer.byteLength() method and buffer.byteLength returns different values for base64 encoding. Maybe for some others too.

That’s mostly because your example string isn’t valid Base64, so Buffer.from(text, encoding) won’t be able to create a buffer with the appropriate content.

For base64 the Buffer.byteLength function only checks the input string length, not whether it’s actually reading valid base64 (it also doesn’t ignore whitespace in the input). This should probably be documented.

@avchugaev
Copy link
Author

Don't you think, this two ways must return the same value? Even their names say that (byteLength). If we have 2 options that looks similar, but returns different values, this may cause errors and bring misunderstanding.

@avchugaev
Copy link
Author

In general, I assume that Buffer.from(text, encoding).byteLength === Buffer.byteLength(text, encoding).

@avchugaev
Copy link
Author

For base64 the Buffer.byteLength function only checks the input string length, not whether it’s actually reading valid base64 (it also doesn’t ignore whitespace in the input). This should probably be documented.

Does this means, that one of approaches calculates wrong length?

@addaleax
Copy link
Member

addaleax commented Feb 4, 2017

@achugaev93 Oh, I agree – this naming collision is pretty unfortunate.

Node didn’t choose the .byteLength property name, it was introduced when Buffer started inheriting from Uint8Array; Buffer.byteLength() had already been there for a long time before that. So that’s why these two names exist but have different semantics right now.

In general, I assume that Buffer.from(text, encoding).byteLength === Buffer.byteLength(text, encoding).

Fwiw, that’s true for valid input (at least for a strict interpretation of “valid”)…

(And by the way: You can use buffer.byteLength, but buffer.length is a bit more idiomatic I think?)

Does this means, that one of approaches calculates wrong length?

Maybe? Depends on what you mean by “wrong”, I guess. I find it pretty hard to tell what the “right” length of Buffer.from('abc123 π', 'base64') should be…

@avchugaev
Copy link
Author

@addaleax
Copy link
Member

addaleax commented Feb 4, 2017

https://en.wikipedia.org/wiki/Base64 :)

Umm… what are you trying to say? I do know what Base64 is 😄

@seishun
Copy link
Contributor

seishun commented Feb 6, 2017

@achugaev93 First, it's preferable to use JavaScript in test cases, since not everyone here knows or is interested in learning TypeScript. Second, your test case is far from minimal. If I understood your issue correctly, you could easily shorten it to a few lines.

Regarding the return value of Buffer.byteLength for 'base64', I'm quite sure it's by design. It would be much slower if it had to validate the base64 string, so it assumes it's valid. There are multiple packages on npm that validate base64.

I agree that this behavior could be documented though.

@avchugaev
Copy link
Author

Ok, it seems that only Buffer.from(text, this.encodingName).length returns correct value, right? From other side it seems that Buffer.byteLength returns wrong value.

@seishun
Copy link
Contributor

seishun commented Feb 6, 2017

Both return correct length given a valid base64-encoded string. For invalid strings, there's no "correct" length.

seishun added a commit that referenced this issue Feb 9, 2017
PR-URL: #11238
Refs: #11165
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
@seishun
Copy link
Contributor

seishun commented Feb 9, 2017

The behavior is documented now: 271d50a

@seishun seishun closed this as completed Feb 9, 2017
italoacasas pushed a commit that referenced this issue Feb 13, 2017
PR-URL: #11238
Refs: #11165
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
italoacasas pushed a commit to italoacasas/node that referenced this issue Feb 14, 2017
PR-URL: nodejs#11238
Refs: nodejs#11165
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
krydos pushed a commit to krydos/node that referenced this issue Feb 25, 2017
PR-URL: nodejs#11238
Refs: nodejs#11165
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
jasnell pushed a commit that referenced this issue Mar 7, 2017
PR-URL: #11238
Refs: #11165
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
jasnell pushed a commit that referenced this issue Mar 7, 2017
PR-URL: #11238
Refs: #11165
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
MylesBorins pushed a commit that referenced this issue Mar 9, 2017
PR-URL: #11238
Refs: #11165
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
MylesBorins pushed a commit that referenced this issue Mar 9, 2017
PR-URL: #11238
Refs: #11165
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
thetutlage added a commit to adonisjs/core that referenced this issue Feb 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. doc Issues and PRs related to the documentations. good first issue Issues that are suitable for first-time contributors.
Projects
None yet
Development

No branches or pull requests

3 participants