Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow dashes in identifiers, mangle them. #2370

Closed
wants to merge 2 commits into from
Closed

Allow dashes in identifiers, mangle them. #2370

wants to merge 2 commits into from

Conversation

paulmillr
Copy link

Wtf is this shit?

This patch enables using dashes in identifier names and mangles letters after them to uppercase.
E.g. for-each would be compiled to forEach.

This greatly improves readability sticking to idiomatic js output, because you're able to write code in idiomatic js style and everyone would be able to use it without bullshit.

See underscores are stupid. tl;dr: you need to use shift to write underscores and camelcase etc.

Simplicity

The main point of this pull request is that when you do shit simply, everything is ok.

One simple rule is used: identifiers always mangled. Others aren't.

  • {parse-int} would be compiled to {parseInt: parseInt}.
  • {'parse-int'} would be compiled to {'parse-int': 'parse-int'} because string ain't an identifier.
  • Same with object['to-string'], it's compiled to object['to-string']

What about interoperability with javascript?

If you always have to do the mental translation between the two names, interoperability suffers

is weak argument because you already need to do mental translation for:

  • yes, on compiles to true, no, off to false
  • return is automatically applied to every last statement in function and everything is a statement
  • there is no var so you can accidentally reassign some function and spend a hour looking for the bug
  • there are inclusive and exclusive ranges. and ... / .. don't give any clarity which one is inclusive.
  • there is isnt and there is is not, but they are not the same
  • for-in isn't the same as in js, for-of isn't the same as in ecmascript 6
  • a- b vs a -b

Backwards compat

Existing codebases are easy to transfer. Just replace (\w)([+-*/])(\w) with $1 $2 $3 in all files.

Example 1:

CoffeeScript source rewritten with dashes: https://github.com/paulmillr/coffee-script/tree/topics/use-dashes.

Script used for auto-replacing: https://gist.github.com/2901194.

Example 2:

user = document.query-selector('.user')
user.add-event-listener 'click', ((event) -> console.log event), yes

id = window.set-interval ->
  console.log 'hello'
, 1000
window.set-timeout (-> window.clear-timeout id), 1500

encoded = encode-URI-component string
number = parse-int '5'
string = number.to-string()

page-text = document.query-selector('body').inner-text
logged-in = $('body').has-class 'logged-in'
obj = {a: 1, b: 2}
descriptors = Object.get-own-property-names(obj)
  .map((name) -> Object.get-property-descriptor obj, name)

compiles to

var descriptors, encoded, id, loggedIn, number, obj, pageText, string, user;

user = document.querySelector('.user');

user.addEventListener('click', (function(event) {
  return console.log(event);
}), true);

id = window.setInterval(function() {
  return console.log('hello');
}, 1000);

window.setTimeout((function() {
  return window.clearTimeout(id);
}), 1500);

encoded = encodeURIComponent(string);
number = parseInt('5');
string = number.toString();

pageText = document.querySelector('body').innerText;

loggedIn = $('body').hasClass('logged-in');

obj = {
  a: 1,
  b: 2
};

descriptors = Object.getOwnPropertyNames(obj).map(function(name) {
  return Object.getPropertyDescriptor(obj, name);
});

Closes #2345. Thanks @goatslacker for initial work.

@domenic
Copy link
Contributor

domenic commented Jun 6, 2012

my-obj =
    key-a: "foo"
    key-b: "bar"

get-prop-name = (external-condition) ->
    if external-condition then "keyA" else "keyB" # wut

alert(my-obj[get-prop-name()])

#--

my-other-obj =
    key-a: "foo"
    keyA: "bar" # SyntaxError: you said "keyA" twice. really, you did.

#--

my-third-obj =
   key-a: "foo"

my-third-obj.keyA = "baz"
alert(my-obj.key-a) # wut

#--

my-fourth-obj =
    key-a: "foo"

`var myObjFromExternalJavaScriptLibrary = {
    keyA: "bar"
};`

_.extend(my-fourth-obj, myObjFromExternalJavaScriptLibrary)
# or should I say, my-obj-from-external-java-script-library?

alert(my-fourth-obj.key-a) # damn

#--

four = 4
five = 5

one = five-four # ReferenceError: fiveFour is not defined
one = five-4 # ReferenceError: five4 is not defined
one = 5-four # works, ayup

@paulmillr
Copy link
Author

@domenic welp. a little problem as compared to readability improvement and implying coffeescript already have inconsistencies of this sort.

@goatslacker
Copy link

+1 for this as it aids readability and that adds to programmer happiness.

There can be confusion as it's been previously mentioned in other threads:

encode-uRI-component encode-u-r-i-component would both eval to the same identifier. And as @domenic and @jashkenas points out it breaks interoperability.

I'm in the camp that the readability benefit completely outweighs the confusion. It's a simple rule to remember.

@goatslacker
Copy link

@domenic

my-other-obj =
    key-a: "foo"
    keyA: "bar"

Shouldn't even compile.

@domenic
Copy link
Contributor

domenic commented Jun 6, 2012

@goatslacker Good call, that's a pretty big wtf by itself.

@rwz
Copy link

rwz commented Jun 6, 2012

I personally do not see how it improves readability. I don't have any problems with camelCase as well.

@rwz
Copy link

rwz commented Jun 6, 2012

Btw, i see how it adds a lot of confusion.

@homakov
Copy link

homakov commented Jun 6, 2012

I see profit in making spaces compulsory, if you use - as minus be sure to leave spaces, if no - it is a variable. readability-is-higher - than-it-was

@piranha
Copy link

piranha commented Jun 6, 2012

I always love languages which require spaces around minus since this frees minus to be used in identifiers. So I guess I'm +1 on this.

@gkz
Copy link

gkz commented Jun 6, 2012

LiveScript has this, but allows no spaces around the minus when one of the operands is a number. This fits with the only common use case of a no-spaced minus I can imagine, doing something like list[i-1]. Thus, 1-x, x-1 are allowed as subtraction, but a-b is an identifier.

@michaelficarra
Copy link
Collaborator

I like the LiveScript implementation. +1 for adopting that.

@osuushi
Copy link

osuushi commented Jun 6, 2012

One qualm I would have about the LiveScript behavior is that I tend to be wary of rules that cause surprises when making small mutations to code. For example, say you have a function for getting a previous item in an array, like this:

prev = (list, x) -> list[x-1]

Later, you decide to make the function more general by adding a step argument like so:

prev = (list, x, step=1) -> list[x-step]

Under the LiveScript rules, the original version is valid and has expected behavior, but the modified version gets translated to list[xStep]. The additional complexity of the rule means that programmers have to think harder about making small changes. With a simpler rule, the original version would have been list[x - 1] in the first place, and it could be modified without the programmer having to think about unrelated language features.

@paulmillr
Copy link
Author

I agree with @osuushi and @piranha, -1 on implementing livescript version. There's only one simple rule currently and it shouldn't be harder than that. Also:

new Uint-16-array
new Int-8-array
get-10-users()

As for one = 5-four # works, ayup, this should be disallowed.

@goatslacker
Copy link

I'm afraid the LiveScript version is our only option as decremental operators, which are widely used, would break with this implementation.

I think it's bad enough that list[x-step] has some confusion. Compiling x-- as just x would make matters worse.

The RegExp should only allow dashes followed by a letter or number.

As far as x-step being seen as a negative: I see it as a good thing. Whitespace your code, it's significant and helps readability.

@paulmillr
Copy link
Author

@goatslacker I agree, we should disallow x-- dashes. Would do it today.

@satyr
Copy link
Collaborator

satyr commented Jun 6, 2012

Perl6 allows hyphens in names, but only when followed by an alphabetic character:

$ perl6
> sub a-b {}
a-b
> sub a-1 {}
Malformed block at line 1, near "-1 {}\n"
> sub -a {}
Malformed block at line 1, near "-a {}\n"
> sub a- {}
Malformed block at line 1, near "- {}\n"

http://perlcabal.org/syn/S02.html#Names

@josher19
Copy link
Contributor

josher19 commented Jun 6, 2012

I'm generally -1 on this proposal because Javascript already has 4 uses of minus '-' : binary (a - b), unary ( -a ), prefix decrement ( --a ) and postfix decrement ( a-- ). In coffeescript, using it to also mean camel-case (a-b becomes aB instead of a - b ) seems to break the principle of least astonishment for non-Perl developers, and it can also break existing code:

    [3,1,4,1,5,9].sort (a,b) -> a-b

Principle of least astonishment - The principle of least astonishment is usually referenced in regards to the user interface, but the same principle applies to written code. Code should surprise the reader as little as possible. The means following standard conventions, code should do what the comments and name suggest, and potentially surprising side effects should be avoided as much as possible.

@paulmillr
Copy link
Author

@josher19 just add spaces around your operators. It is a good practice, actually.

@osuushi
Copy link

osuushi commented Jun 6, 2012

@goatslacker why not simply say "identifiers may not begin or end with a hyphen, and may not contain consecutive hyphens"? You're going to need those rule anyway (otherwise green--frog and green-frog would both translate to the same greenFrog, and that-guy- would be the same as that-guy), and with them in place, I can't see any situation where you'd have a clash with decrementing.

@josher19
Copy link
Contributor

josher19 commented Jun 7, 2012

@paulmillr That's easy to do for projects going forward, but a bit more difficult for existing code. Recently I upgraded someone else's code to v1.3.3 and had to spend some time finding all of the places where new keywords (such as a variable named "package") were causing compile time errors. The example I gave above would produce a harder to debug run-time error rather than a compile time error or warning, but other scenarios could fail even less gracefully.

Would there be a way to disable the "dasherize" option from the command line or generate warnings where a match is found?

Adding spaces around all minus signs ('-') for existing projects might be almost as easy as something like:

    file.toString().replace /([a-z])-([a-z])/ig, "$1 - $2"

Although text in quotes or triple quotes (""") may cause a slight problem:

    "coffee-script" becomes "coffee - script"

@gkz: Current LiveScript implementation compiles x--y to x- and throws a parse error for x-- * y (bad) and leaves x-- as well as x-- + y unchanged (good).

@goatslacker
Copy link

@josher19

The tokens produced by CoffeeScript and LiveScript's lexer are identical for both x--y and x-- * y so this should be due to some parsing bug.

It's also trivial to write a tool that analyzes coffee-script source code for possible clashes with this new feature. I volunteer to make a node module for it if this makes it in. I hope that's not a show stopper :)

@gkz
Copy link

gkz commented Jun 7, 2012

@josher19 I recently pushed a fix for the first issue. a--b is not recognized as an identifier - no more consecutive dashes in identifiers. Regarding x-- * y, that error is inherited from Coco it seems (http://satyr.github.com/cup/#t:a--%20*%20b).

EDIT:
Merged a fix @satyr just pushed up for x-- * y - everything is good now. Thanks satyr

@gkz gkz mentioned this pull request Jun 7, 2012
@3den
Copy link

3den commented Jun 8, 2012

IMHO we dont need one more way to write identifiers. Using hyphens on identifiers gives no real benefit to coffeescript it just adds more redundancy.

@connec
Copy link
Collaborator

connec commented Jun 10, 2012

Any 'dynamic' usage would need to use both versions, which has a considerable impact on comprehensibility (see @domenic's first comment for more examples):

obj = some-prop: 'whatever'
console.log obj['some-prop'] # Doesn't work, but why? some-prop is given explicitly...
console.log obj['someProp'] # ... does work? But I... I mean... I didn't add it?

Also in traces, your obj.some-prop will show up as obj.someProp. Doing it this way requires people coding in CoffeeScript to also think about how it will translate to Javascript, which is not a good thing.

A more consistent change would be to allow hyphens in object properties, where no mangling is required:

class A
  set-name: (@the-name) ->
  get-name: -> @the-name

a = new A
a.set-name 'Bob'
alert a.get-name()

Compiles to:

var A, a;

A = (function() {
  function A() {}
  A.prototype['set-name'] = function(name) {
    this['the-name'] = name;
  };
  A.prototype['get-name'] = function() {
    return this['the-name'];
  };
  return A;
})();

a = new A;
a['set-name']('Bob');
console.log(a['get-name']());

This would allow people who want to incorporate hyphenated syntax to do so to a reasonable degree (only plain old variables cannot be hyphenated), and creates no major surprises (although it still raises the same syntax issues - a.b-c.d !== a.b - c.d etc.).

This has parallels to how keywords are treated at the moment:

> coffee -cbe "a.true()"
a["true"]();

> coffee -cbe "true = 'value'"
SyntaxError: reserved word "true" can't be assigned on line 1

@paulmillr
Copy link
Author

@connec i've told about this issue already. Just one simple rule: identifiers are mangled. String is obviously not an, innit?

@connec
Copy link
Collaborator

connec commented Jun 10, 2012

@paulmillr

Unless I'm misunderstanding something, in your example you say document.query-selector will compile to document.querySelector, so I'm assuming for consistency {query-selector: null} would compile to {querySelector: null} (if it doesn't then take o = {}; o.query-selector = null, same principle applies).

Therefore if you declare an object with hyphenated properties o = {some-prop: null}, to access that property using a string value you must use o['someProp'], despite the code saying right there that o['some-prop'] should exist.

In addition, consider the case were you actually do define a hyphenated property o = {'some-prop': null}. To access that property you must use a string value, even though o.some-prop would be completely valid syntax it would point to something else entirely (o.someProp). Again despite the code saying otherwise, o.some-prop would not exist.

This is a fairly simple rule, but it adds a great deal of inconsistency and potential for confusion, and the benefits of mangling over simply allowing hyphenated properties would, I feel, be lost in the inconsistencies created with respect to string access to properties, names in stack traces, etc.

@osuushi
Copy link

osuushi commented Jun 10, 2012

Doing it this way requires people coding in CoffeeScript to also think about how it will translate to Javascript, which is not a good thing.

But coding CoffeeScript really already requires you to think about the JS translation, and I'm not sure that fact is going to change in the foreseeable future. One of the reasons I love CoffeeScript but could never get behind languages that compile more opaquely to JS is that CS doesn't feel like a different language; it feels like a really warm and fuzzy method for generating JavaScript code, just like C is a warm and fuzzy way of generating assembly.

As long as the conversion rules are simple, I don't think that having to think about the JS translation is really a problem.

@danibrear
Copy link

I can't tell if this is a really elaborate joke or not.

@3den
Copy link

3den commented Jun 11, 2012

@paulmillr underscore is accepted in javascript, and unlike dashes breaks nothing. But CamelCase is the standard thats why most developers use it.

Functions without parentheses dont break any javascript feature, and IMHO it is very nice to use on functions that have no arguments.

The only point of merging this PR is if are going to start using it as coffee-script standard, otherwise any one who uses it are just fucking with javascript conventions. This may be nice to you but many developers will never use dashes unless it is the standard and I see no point in this-dummy-standard...

Imaging teaching coffee-script to a new developer a-b != a - b, a+b == a + b, a*b == a * b, they would ask "why minus dont work like the other operations?" the answer is just so may to write identifiers-like-this...

And than you go on ant teach him to debug so you have to tell him that this-identifier will compile to thisIdentifier and the dash dont even exist. Any student with brains would tell you, man this is dummy!

So this PR breaks many features and the only benefit is that it looks pretty to some people.

@paulmillr
Copy link
Author

For those who are interested how CoffeeScript source looks when written with dashes, here it is

https://github.com/paulmillr/coffee-script/tree/topics/use-dashes

I've used this script https://gist.github.com/2901194 to convert stuff and then just fixed comments & strings.

I especially like how rewriter.coffee looks like now:

@3den
Copy link

3den commented Jun 11, 2012

Alt coffee-script

@paulmillr
Copy link
Author

@3den dashes with identifiers don't break any javascript feature because dash identifiers are disallowed in javascript.

I think we should disallow calling operators without spacing. Stuff a+b should be prohibited for consistency. But i'm waiting for Jeremy response — maybe he'll close this pull request and I won't need to write code for it at all.

@paulmillr
Copy link
Author

@jashkenas what's your opinion on the stuff and prohibiting of a+b?

@ajoslin
Copy link

ajoslin commented Jun 11, 2012

This is enough to say 'no' to it for me:

obj = { cool-stuff: 1 }
# ... later ....
a = obj['cool-stuff'] # undefined???

@paulmillr
Copy link
Author

@ajoslin

obj = {0xff: 1}
obj['0xff']  # => undefined???

@piranha
Copy link

piranha commented Jun 11, 2012

obj = {cool-stuff: 1, 'cool-stuff': 2}
obj.cool-stuff # 1
obj.coolStuff # 1
obj['cool-stuff'] # 2

@ajoslin
Copy link

ajoslin commented Jun 11, 2012

obj = { cool-stuff: 1, coolStuff: 2 }
obj.cool-stuff # what should this be?
obj.coolStuff # what should this be?

@danibrear
Copy link

@ajoslin not defending them but your example would be:

obj = { coolStuff: 1, coolStuff: 2}

in javascript if I'm understanding this idea correctly.

@piranha
Copy link

piranha commented Jun 11, 2012

I guess it should have same reaction, as obj = {coolStuff: 1, coolStuff: 2} has right now? I.e. syntax error.

@ajoslin
Copy link

ajoslin commented Jun 11, 2012

Ok, true, compile error. Still confusing though. They definitely look different...

@jashkenas
Copy link
Owner

Sorry to have let this drag on so long before putting this ticket out of its misery ;)

CoffeeScript isn't going to mangle identifiers for the same reasons that have been mentioned for underscores and dashes in previous tickets.

Most importantly, it breaks rough compatibility with JavaScript. At the moment, JavaScript APIs can be used from CoffeeScript and vice-versa, transparently ... with all of the documentation lining up and being accurate. If we mangled identifiers, you'd always have to be translating into (or out of) camelCase in your head as you read the documentation.

Of secondary importance, out of all of the things we might mangle, identifiers are one of the most problematic. Straightforward things like string indexing into objects -- as demonstrated above -- become more painful than they need to be.

obj.first-name = "Bob"
obj['first-name'] is undefined

@jashkenas jashkenas closed this Jun 11, 2012
@connec
Copy link
Collaborator

connec commented Jun 11, 2012

If you want dashes use something more consistent like: obj.cool-stuff compiling to obj['cool-stuff'], (still needs spacing around arithmetic operators, which is also nice). I can't see any advantage to mangling.

@3den
Copy link

3den commented Jun 11, 2012

yay!!!!!

@simen
Copy link

simen commented Jun 11, 2012

Thank god.

@ricroberts
Copy link

Sanity prevails!

@rupakg
Copy link

rupakg commented Jun 15, 2012

Phew!

@brendte
Copy link

brendte commented Jun 15, 2012

If this were a couple of months ago, I'd be waiting for the "April Fools"....

@vassilvk
Copy link

So this wasn't a joke?

@osuushi
Copy link

osuushi commented Jun 15, 2012

Guys, what is with all the gloating? I was lukewarm on this idea myself, for reasons of personal taste, but coming into a discussion you didn't bother to participate in and gloating and guffawing after the matter has been closed, especially when the subject in question is something someone actually took the time to write up into a pull request including tests... that's just tacky.

@brendte
Copy link

brendte commented Jun 15, 2012

Well, for my part, I would have commented if I'd come across it before it was already closed. Sorry you find the comment tacky. I find your fatherly scolding and air of moral superiority tacky, so I guess we're even.

@danibrear
Copy link

Come on guys, its settled. I might not have agreed with it either but hell yeah for the effort Paul!

@brendte
Copy link

brendte commented Jun 15, 2012

You're right. Excellent work Paul.

Take care,

Brendten Eickstaedt

On Jun 15, 2012, at 7:08 PM, David Brear
reply@reply.github.com
wrote:

Come on guys, its settled. I might not have agreed with it either but hell yeah for the effort Paul!


Reply to this email directly or view it on GitHub:
#2370 (comment)

@3den
Copy link

3den commented Jun 15, 2012

I dint like this PR neither but @paulmillr did a good job to add this feature and I respect that.

@xk
Copy link

xk commented Jun 18, 2012

It just needs a small tweak: please consider a Reopen:

This patch enables using + in identifier names and mangles letters after them to uppercase.
E.g. for+each would be compiled to forEach.

(...)

@3den
Copy link

3den commented Jun 18, 2012

@xk remember to allow identifiers with * and /, LOL!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for dashes in variable names, mangle them