Skip to content
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.

Separator: = vs. : #8

Closed
mathiasbynens opened this issue Jul 30, 2016 · 19 comments
Closed

Separator: = vs. : #8

mathiasbynens opened this issue Jul 30, 2016 · 19 comments
Labels

Comments

@mathiasbynens
Copy link
Member

mathiasbynens commented Jul 30, 2016

Perl does both:

$ perl -Mutf8 -E 'say "π" =~ /\p{Script=Greek}/'
1

$ perl -Mutf8 -E 'say "π" =~ /\p{Script:Greek}/'
1

We only want to support one, but which one? The current proposal uses =, but why not :?

00:35:54 <bterlson> what is the rationale for `=` in `Script=` btw?
00:37:40 <bterlson> I mean, why not `Script:foo`
00:38:09 <mathiasbynens> no strong preference, but I think we should only support either `:` or `=` but not both: https://github.com/mathiasbynens/es-regexp-unicode-property-escapes#why-not-support--as-a-separator-in-addition-to-
00:38:29 <bterlson> both is absurd
00:38:40 <mathiasbynens> Perl does both!
00:38:46 <bterlson> absurd
00:38:50 <mathiasbynens> :)
00:39:06 <bterlson> `:` aligns with property syntax
00:43:06 <mathiasbynens> hmm yeah that makes sense… although property name grammar in \p{} is much more restrictive than Identifier
@mathiasbynens
Copy link
Member Author

: aligns with property syntax, but that’s where the similarity ends — property name/value grammar in \p{…} is much more restrictive than Identifier.

= on the other hand reminds of SQL, where \p{property=value} becomes something like SELECT * FROM symbols WHERE property = 'value';, i.e. match all symbols where the value for property $property is $value. I like the mental model of querying the Unicode Database.

@mathiasbynens
Copy link
Member Author

mathiasbynens commented Aug 12, 2016

@bterlson @littledan @hashseed @patch Thoughts?

@hashseed
Copy link

I'd say it's arbitrary. Any separator would do.

@bterlson
Copy link
Member

I still prefer : slightly as I like to think about it like creating an options bag, but my only strong preference is to not do both.

@littledan
Copy link
Member

I prefer = slightly, but that may just be because that's the first syntax I saw @hashseed implement and it looked nice to me.

@bterlson
Copy link
Member

Time for a twitter poll! :-P

@bterlson
Copy link
Member

bterlson commented Aug 12, 2016

https://twitter.com/bterlson/status/764184006095048704

ECMAScript’s RegExps are learning more about Unicode with the \p proposal. What syntax should it use?

330 votes:

  • 52% /\p{Script:Greek}/
  • 28% /\p{Script=Greek}/
  • 20% Why not both?

@bterlson
Copy link
Member

This seems possibly confusing as @bmeck points out.

let foo;
`${foo=1}`; // foo = 1
/\p{foo=1}/; // syntax error?

@hashseed
Copy link

hashseed commented Aug 12, 2016

Not really sure why that’s confusing… one is a string template and the other is a regexp literal. Syntax is entirely different…

@bterlson
Copy link
Member

It's possibly confusing because in order to understand what foo=1 is doing you have to understand that the syntax is entirely different despite looking identical (and even the surrounding syntax is similar what with the curlies and all).

@patch
Copy link

patch commented Aug 12, 2016

In theory I like : better, but in practice I use and teach = because that's what I see much more frequently in the wild and more regex engines support it. I think of regex as a language of its own embedded within other languages without any syntactic relationship to the languages that embed it. Note that the = in \p{…=…} aligns with the = in (?=…) for positive lookaheads and (?<=…) for positive lookbehinds.

I performed an extremely unscientific survey of my locally checked-out git projects (which of course includes my own code):

$ ack -ch '\\p\{\w+=\w+\}'
814
$ ack -ch '\\p\{\w+:\w+\}'
121

Also the regex docs for Java and ICU only include = as well as the specification for Unicode Sets and their use in the Unicode CLDR data files. Lastly, I've never seen a regex engine that solely supports : but would love to hear about it if anyone knows one.

@mathiasbynens
Copy link
Member Author

In theory I like : better […]

Seems like most people feel that way.

@patch makes a very good point in favor of =, though:

I think of regex as a language of its own embedded within other languages without any syntactic relationship to the languages that embed it. Note that the = in \p{…=…} aligns with the = in (?=…) for positive lookaheads and (?<=…) for positive lookbehinds.

I’m slightly leaning towards sticking to = now.

@bterlson What do you think?

@bterlson
Copy link
Member

I find @patch's arguments the most persuasive so far and am convinced that regexp experts will generally prefer =. I'm not sure JS developers would generally find it more approachable because they may not be regexp experts, have experience with other engines, know much about Unicode, see the correspondence between lookaheads, etc.

I cannot argue strongly in favor of : so I support moving forward with =. The twitter poll is clearly in favor of :, though, fwiw :)

mathiasbynens added a commit that referenced this issue Aug 22, 2016
@mathiasbynens
Copy link
Member Author

mathiasbynens commented Sep 28, 2016

@littledan
Copy link
Member

littledan commented Sep 28, 2016

At TC39, we decided to reverse the judgement here and go with :.

@bterlson
Copy link
Member

I feel like we didn't represent the FAQ entry contents well... do you @littledan? If not maybe we can do a quick re-check?

@mathiasbynens
Copy link
Member Author

A quick re-check sounds good! If the decision made in this issue is reversed, I’d love to hear the rationale for it.

@littledan
Copy link
Member

littledan commented Sep 28, 2016

OK, I'll see if we have time to discuss this at this TC39 meeting later. The rationale was that = is used for property set, but the examples in the FAQ seem to show that RegExps already assign a new meaning to =.

@littledan
Copy link
Member

littledan commented Sep 28, 2016

Cc @allenwb who made the point for : rather than =.

mathiasbynens added a commit that referenced this issue Feb 16, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants