Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we use _ (underscore) instead of void? #11

Open
rbuckton opened this issue Jun 14, 2024 · 7 comments
Open

Can we use _ (underscore) instead of void? #11

rbuckton opened this issue Jun 14, 2024 · 7 comments

Comments

@rbuckton
Copy link
Collaborator

rbuckton commented Jun 14, 2024

An overwhelming majority of programming languages with a "Discard"-like concept use _ as the discard marker, including C#, F#, Python, Rust, Go, and now Java, so there has been some interest in whether it might be possible to pursue _ in ECMAScript as well.

The Problem

In ECMAScript _ is a legal identifier which makes it difficult to reserve for this purpose without potentially "breaking the web". Popular packages like underscore and lodash are frequently referenced using _, including in the package documentation:

Documentation for _.each() in Underscore.js
Documentation for _.each() in Underscore.js

There is regular pushback against repurposing any identifier in ES as a result of these concerns, so it is unlikely we can follow any kind of deprecation path for _ as did, for example, Java in its implementation.

Allow Duplicate Names: A Possible Solution?

As I have briefly discussed in both the April and June 2024 plenary sessions, there is a potential path forward to use _ for discards, though not without some limitations. One way in which we could enable this behavior is to make a change to the semantics for duplicate lexical bindings. Today, we have a static semantics rule that results in an early error if you declare a lexical binding with the same name as another lexical binding in the same block scope, or the same name as a "var" declared name in the same "var" scope, as well as a rule to produce an early error if you duplicate a parameter name in a strict function parameter list:

let _, _; // SyntaxError: Identifier '_' has already been declared
var _;
{
  let _; // SyntaxError: Identifier '_' has already been declared
}
"use strict";
function f(_, _) { // SyntaxError: Duplicate parameter name not allowed in this context
}

Since these are already errors, there is no code that currently executes successfully in these conditions, and it's unlikely that any code has a dependency on this behavior. As a result, we could relax this restriction and allow lexical bindings to be duplicated, but have such duplication poison the binding name in the current lexical environment:

let _, _; // ok
console.log(_); // SyntaxError: Identifier '_' cannot be referenced in this context

What about underscore/lodash?

Rather than restrict this to _, we can enforce these rules for all identifiers. This means that you could use __ or ___ or foo or any other variable name you want as a discard without shadowing an outer _ declaration.

What about typeof?

Since typeof is intended to work even for undeclared variables, it is expected that

let _, _;
console.log(typeof _);

would print "undefined" just as it would if _ were not declared.

What about shadowing?

While typeof might treat _ in let _, _ as if it were not declared, we are still creating a poisoned binding in the current environment, which means that _ remains poisoned in nested scopes. However, a redeclaration in a nested scope would shadow the poisoned binding:

let _, _;
{
  let _ = "foo";
  console.log(_); // prints "foo"
}

Advantages

This approach has several advantages:

  • No single identifier is treated uniquely within the specification as this would apply to all identifiers.
  • No need to use the void keyword.
  • No need to change existing code to use discards.
  • No special syntax or prologue directive to enable this behavior.

Disadvantages

Unfortunately, this is not a complete solution as there are several disadvantages over void:

  • Cannot be used as a discard in an AssignmentPattern
  • Cannot be used as-is in pattern matching
  • A single _ at the top level of a Script becomes a global

Cannot be used as a discard in an AssignmentPattern

Any IdentifierReference in an AssignmentPattern must still be a valid assignment target. Since _ is still treated as a regular identifier, an assignment pattern like { x: _, y: _, ...z } = obj would still attempt to write to an identifier named _. Rather than acting as a discard, that would either (a) overwrite an existing variable named _, (b) introduce a new global variable (in non-strict code), or (c) throw due to a missing declaration (in strict code).

Since that is very context-dependent, it is inherently unsafe to ignore the error in (c), so we cannot loosen this restriction like we could for lexical bindings.

Cannot be used as-is in pattern matching

In pattern matching an identifier that is not property name must be an IdentifierReference. As a result, a pattern like

when { x: _, y: _ }

would actually be matching the values of the x and y properties to whatever the value of _ is, so we would not be able to use _ as-is as a discard.

Instead, we would have to leverage let patterns as proposed by the pattern matching proposal, e.g.,

when { x: let _, y: let _ }

, which is far less ideal.

A single _ at the top level of a Script becomes a global

The purpose of a discard is to evaluate but not bind, however the following introduces a global _ variable:

<!DOCTYPE html>
<html>
<head>
<script>
    let { x: _, ...obj } = { x: 1, y: 2 };
</script>
<script>
    console.log(_);
</script>
</head>
</html>

Users might mistakenly assume that _ was actually a discard when in fact it introduced a global variable that is visible to all code running on the page. This can be mitigated by forcibly poisoning the declaration via, e.g., let _, _;, but that is not going to be obvious to most developers.

@ljharb
Copy link
Member

ljharb commented Jun 14, 2024

It's an existing invariant that typeof can't ever generate an error; that's why typeof NonExistentGlobal === 'undefined' has always been a safe check. That must not ever change.

@rbuckton
Copy link
Collaborator Author

rbuckton commented Jun 14, 2024

It's an existing invariant that typeof can't ever generate an error; that's why typeof NonExistentGlobal === 'undefined' has always been a safe check. That must not ever change.

That is a fair point. typeof should work and probably treat the poisoned declaration as if it were merely missing. I'll make a note in the OP and update my examples.

@rbuckton
Copy link
Collaborator Author

It's an existing invariant that typeof can't ever generate an error; that's why typeof NonExistentGlobal === 'undefined' has always been a safe check. That must not ever change.

I'll make a note in the OP and update my examples.

Updated. See the sections marked "What about typeof?" and "What about shadowing?", above.

@michaelficarra
Copy link
Member

@rbuckton I don't see what I consider to be the main benefit of void listed here. _ is a refactoring hazard. There exists today many large files that refer to a library as _ throughout. When working in that file, if someone wants to introduce a function wrapper around some of the code that uses a discard binding, references to _ within that code will now be overridden. void doesn't have this inconsistent behaviour. So even if we allow duplicate _ bindings, we still need to have void for discard in circumstances like these. I'm not a fan of having both, but I also think it's unacceptable to have just _.

@rbuckton
Copy link
Collaborator Author

This is covered by the proposal in the OP. I am not restricting this to _. Any identifier could be used as a discard, so the refactoring hazard is minimal. A linter or checker could easily detect a duplicate _ (or other ID) and warn you if you've shadowed a nested reference.

@michaelficarra
Copy link
Member

Yes and in my example you could also rename the inner _ references to an alias that was captured outside the scope. But that has significantly more complicated semantics than void just always meaning discard.

@rbuckton
Copy link
Collaborator Author

Yes and in my example you could also rename the inner _ references to an alias that was captured outside the scope. But that has significantly more complicated semantics than void just always meaning discard.

I would argue it has roughly the same semantics as renaming any variable in JS has. It does not introduce a new hazard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants