Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Babylon] Use char codes contants #6727

Merged
merged 20 commits into from
Nov 16, 2017
Merged

[Babylon] Use char codes contants #6727

merged 20 commits into from
Nov 16, 2017

Conversation

xtuc
Copy link
Member

@xtuc xtuc commented Nov 2, 2017

Q                       A
Fixed Issues? #6726 (comment)
Patch: Bug Fix?
Major: Breaking Change?
Minor: New Feature?
Tests Added + Pass? Yes
Documentation PR
Any Dependency Changes?
License MIT

Adds a constant file where every charCode is exported.

@@ -924,7 +925,7 @@ export default class Tokenizer extends LocationParser {

readNumber(startsWithDot: boolean): void {
const start = this.state.pos;
let octal = this.input.charCodeAt(start) === 0x30; // '0'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the hex representation.

@@ -873,7 +874,7 @@ export default class Tokenizer extends LocationParser {
val = code - 97 + 10; // a
} else if (code >= 65) {
val = code - 65 + 10; // A
} else if (code >= 48 && code <= 57) {
} else if (charCodes.isIn09(code)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some helpers for ranges.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imho this should be called isDigit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isDigit looks more meaningful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

@@ -328,31 +329,31 @@ export default class Tokenizer extends LocationParser {
loop: while (this.state.pos < this.input.length) {
const ch = this.input.charCodeAt(this.state.pos);
switch (ch) {
case 32: // space
case 160: // non-breaking space
case charCodes.space:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the number representation.

@xtuc xtuc added pkg: parser PR: Internal 🏠 A type of pull request used for our changelog categories PR: Polish 💅 A type of pull request used for our changelog categories labels Nov 2, 2017
@xtuc xtuc changed the title feat: setup constants [Babylon] Use char codes contants Nov 2, 2017
@@ -0,0 +1,26 @@
// @flow
Copy link
Member

@rajasekarm rajasekarm Nov 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this a separate npm package.
I didn't find any module which does the inverse operation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any either. We could technically publish it on NPM.

I don't want to use an existing module because we don't use every char code and I wanted to add functions for ranges. That way we only have the necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would love to have this published separately and supporting all (most?) char codes

im always adding such constants in my projects, would be cool to just have a shareable module

That way we only have the necessary.

imho keeping more is a non-issue, maintenance costs of such package should be really low too as this stuff never changes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we can do this. Do we publish a module from the Babel org? It seems not really related.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on what kind of control do you want to have over it. Im really fine with any "owner" for it - if you feel it doesn't belong to babel maybe just release it under ur name, make it a dep and invite whole babel as collab.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created https://github.com/xtuc/charcodes, ping me if you need collab/publish.

Can I invite a whole Babel group at once?


type Char = number;

const charCodes = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of exporting an object it would be better to export them as named exports to support better tree-shakeability of those

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would also enable better inlining of this values i think.

@nicolo-ribaudo
Copy link
Member

Can't we just use strings? input[pos] instead of input.charCodeAt(pos) and then use strings in comparisons.

@Andarist
Copy link
Member

Andarist commented Nov 2, 2017

some strings are hard to express with a string, i.e. paragraph separator - String.fromCharCode(0x2029) just looks like an empty string

@xtuc xtuc requested a review from danez November 2, 2017 16:27
@@ -4,6 +4,7 @@

import type { Options } from "../options";
import type { Position } from "../util/location";
import charCodes from "../util/charCodes";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be charCodes, { isDigit }?

Guess once we make them individual exports we can just do * as charCodes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I mentally planned to use the * import.

@babel-bot
Copy link
Collaborator

babel-bot commented Nov 2, 2017

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/5864/

@hzoo
Copy link
Member

hzoo commented Nov 2, 2017

There are also some cases in src/plugins/jsx/index.js

@xtuc
Copy link
Member Author

xtuc commented Nov 2, 2017

@hzoo yes of course, I just started.

@xtuc
Copy link
Member Author

xtuc commented Nov 3, 2017

I was wondering what would be the best representation for our constants. Currently they are using numbers maybe hex would be faster?

@bmeurer, do you know? 😇

@bmeurer
Copy link
Member

bmeurer commented Nov 3, 2017

This will definitely be slower in Node currently, because

  1. you penalize the baseline performance, because now you need at least two loads to get to the constant, whereas before you just had a literal there, and
  2. even the optimized code will to do at least one load, since V8 has machinery for constant-field tracking, but it's not shipping yet, so it doesn't know that those properties are constants.

@nicolo-ribaudo
Copy link
Member

nicolo-ribaudo commented Nov 3, 2017

We could replace them at compile time (maybe with babel-macros?)

@bmeurer
Copy link
Member

bmeurer commented Nov 3, 2017

That'd work.

@rajasekarm
Copy link
Member

I'll take src/tokenizer/index.js

@xtuc
Copy link
Member Author

xtuc commented Nov 8, 2017

Does anyone has a clue about the error in the CI: https://travis-ci.org/babel/babel/jobs/299171334#L929-L962? I don't think it's actually related to this PR.

case 34:
case 39: // '"', "'"
case charCodes.questionMark:
case charCodes.apostrophe:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That error is because here it should be " and ', not ? and '.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @nicolo-ribaudo, thanks you so much!

@nicolo-ribaudo
Copy link
Member

Is that any number in the source code other than charcodes? If not we could enable no-magic-numbers

@@ -114,7 +117,7 @@ function codePointToString(code: number): string {
} else {
return String.fromCharCode(
((code - 0x10000) >> 10) + 0xd800,
((code - 0x10000) & 1023) + 0xdc00,
((code - 0x10000) & charCodes.lowercaseF) + 0xdc00,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f is 102, not 1023

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's due to my search-and-replace. I ended up with charCodes.lowercaseF3 at this place. Thanks, i'll update it.

@@ -230,7 +233,7 @@ export default class Tokenizer extends LocationParser {
readToken(code: number): void {
// Identifier or keyword. '\uXXXX' sequences are allowed in
// identifiers, so '\' also dispatches to that.
if (isIdentifierStart(code) || code === 92 /* '\' */) {
if (isIdentifierStart(code) || code === charCodes.backslash /* '\' */) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the comment can be removed

@xtuc
Copy link
Member Author

xtuc commented Nov 9, 2017

@nicolo-ribaudo I would like to get rid of all the magic numbers. But I guess that will be an iterative process because they are still magic numbers to me (I don't know what to replace them).

@@ -363,7 +368,7 @@ export default class Tokenizer extends LocationParser {

default:
if (
(ch > 8 && ch < 14) ||
(ch > charCodes.backSpace && ch < charCodes.shiftOut) ||
(ch >= 5760 && nonASCIIwhitespace.test(String.fromCharCode(ch)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5760 can be converted to a constant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have it here https://github.com/xtuc/charcodes/blob/master/packages/charcodes/src/index.js#L103. But it's kind of a strange char. I would prefer to use a function which tests against a range. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would be the function's name though 🤣

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point 😂 no idea

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charcodes.isUnusualWhiteSpace? 🤔 😂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make it handle also the charCodes.space and charCodes.nonBreakingSpace and call it charCodes.isSpaceSeparator (https://www.compart.com/en/unicode/category/Zs)

@xtuc
Copy link
Member Author

xtuc commented Nov 11, 2017

src/tokenizer/index.js is ok to me, left is:

  • readHexChar I'm not really sure how we can remove it, what do you guys think?
  • Maybe missing a function as said here, I replaced it with the oghamSpaceMark constant.

@rajzshkr any update on your files? Can I help you?

@nicolo-ribaudo
Copy link
Member

readHexChar I'm not really sure how we can remove it, what do you guys think?

Why do you think we should remove it? It doesn't use char codes, it just says "read an hex int of a given length".

Maybe missing a function as said here, I replaced it with the oghamSpaceMark constant.

After thinking about it a bit, I think a function with a descriptive name (like charCodes.isSpaceSeparator) would be better, since I not many people know what oghamSpaceMark is and why it is there. (I just learned what Ogham is lol)

@xtuc
Copy link
Member Author

xtuc commented Nov 13, 2017

charCodes.isSpaceSeparator

I don't think that's a good idea actually, we are going to support more chars than Babylon is currently doing. And I will end up just the moving nonASCIIwhitespace list from Babylon's source.

And I did the JSX plugin file.

@xtuc
Copy link
Member Author

xtuc commented Nov 15, 2017

Can we merge this?

I'm not sure what to do with isSpaceSeparator, but we can merge this for now.

Copy link
Member

@Andarist Andarist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im only wondering if transform-charcodes could possibly get more generalized, probably would be useful for people to have such 'transform-inline-modules' or something

that ofc is just an idea and not something to do within this PR

@nicolo-ribaudo
Copy link
Member

@Andarist transform-inline-modules is more or less rollup 😛

@Andarist
Copy link
Member

@Andarist transform-inline-modules is more or less rollup 😛

Nice one! It won't inline variables though :P

@@ -25,6 +25,8 @@
"devDependencies": {
"@babel/helper-fixtures": "7.0.0-beta.31",
"babel-plugin-transform-for-of-as-array": "1.0.4",
"babel-plugin-transform-charcodes": "0.0.7",
"charcodes": "0.0.6",
Copy link
Member

@nicolo-ribaudo nicolo-ribaudo Nov 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes right. The transform doesn't require a specific version of charCodes, it's probably a good idea to use a peer dependency here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I updated with ^

@xtuc
Copy link
Member Author

xtuc commented Nov 16, 2017

Ok, I'll go ahead and merge this.

Thanks for the reviews.

@xtuc xtuc merged commit bb89364 into master Nov 16, 2017
@xtuc xtuc deleted the feat-use-charcode-constants branch November 16, 2017 09:35
@lock lock bot added the outdated A closed issue/PR that is archived due to age. Recommended to make a new issue label Oct 5, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 5, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated A closed issue/PR that is archived due to age. Recommended to make a new issue pkg: parser PR: Internal 🏠 A type of pull request used for our changelog categories PR: Polish 💅 A type of pull request used for our changelog categories
Projects
None yet
Development

Successfully merging this pull request may close these issues.