Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDScript: Fix local can be used before declared #73427

Conversation

dalexeev
Copy link
Member

Closes #54944.

@akien-mga
Copy link
Member

Intuitively, this makes the behavior unexpected for me. While shadowing a member variable should be avoided (and it raises a warning), until the local variable is defined, I would expect x in your example to be the member variable - it hasn't been shadowed by a local yet.

var x = 1

func test():
	print(x)
	@warning_ignore("unused_variable", "shadowed_variable")
	var x = 2

I'd expect this to print 1, not raise an error IMO. It can however emit a warning for shadowed variable, which it seems to.

That being said, the behavior implemented here seems consistent with Python 3:

>>> x = 1
>>> def test():
...     print(x)
...     x = 2
...     print(x)
... 
>>> test()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in test
UnboundLocalError: local variable 'x' referenced before assignment

But it's worth noting that there is greater ambiguity in Python as there's no var x syntax to declare a new member/local.
In GDScript, we can access and modify the member variables without shadowing them.

@dalexeev
Copy link
Member Author

dalexeev commented Mar 6, 2023

JavaScript:

In GDScript, we can access and modify the member variables without shadowing them.

You can use self.a to access a shadowed member.

@kleonc
Copy link
Member

kleonc commented Mar 6, 2023

And in C++, C# it works as expected according to akien's comment.

I think when making decision what's the most expected behavior in here we should look at the GDScript itself. And here, in case there's no shadowing involved, the local variable (its identifier) is said to be "not declared in the current scope" if it's used before the declaration:
Godot_v4 0-stable_win64_cN5n0rrNd1
Godot_v4 0-stable_win64_ohNkK3RDj9

So conceptually (according to the above) the local variable doesn't exist before the declaration (assuming we treat variable/identifier as one), which means there's no local variable to shadow anything yet. Currently this PR is inconsistent with that.

Either other places should be made consistent with this PR (e.g. the above error should be like Cannot use identifier "a" before it was declared.), or it should be made to work like akien suggested (which I'd prefer as it's intuitive for me too).

@dalexeev
Copy link
Member Author

dalexeev commented Mar 6, 2023

And here, in case there's no shadowing involved, the local variable (its identifier) is said to be "not declared in the current scope"

It's just a wrong error message, this PR fixes that case too.

https://github.com/godotengine/godot/blob/6fc420d030147430d7b73ffaeb704e3d2ae20ea8/modules/gdscript/gdscript_analyzer.cpp#L3551-L3558

https://github.com/godotengine/godot/blob/6fc420d030147430d7b73ffaeb704e3d2ae20ea8/modules/gdscript/gdscript_analyzer.cpp#L3560

https://github.com/godotengine/godot/blob/6fc420d030147430d7b73ffaeb704e3d2ae20ea8/modules/gdscript/gdscript_analyzer.cpp#L3693-L3695

The bug was that not all possible locals were checked. Therefore, the search for the identifier continued further (members, globals, etc.) and a generic error message was displayed. The presence of a same named identifier in a higher scope made it possible to deceive the analyzer, but the compiler/codegen still treats it as a local identifier.

@vnen
Copy link
Member

vnen commented Mar 6, 2023

I agree with @akien-mga here, the member should be accessible before declaration of the local. That's the most intuitive and how it works in 3.x. It was broken by the same cause of this bug, so it was an oversight.

Maybe we can merge this as it is to fix the bug, but at some point soon we need to fix the behavior.

@dalexeev
Copy link
Member Author

dalexeev commented Mar 6, 2023

It seems wrong to me that the same identifier within the same scope can have different meanings. Even if it worked in 3.x, I think it should be an error, like in JavaScript and Python. There is no right use case for this behavior, you must either choose a different name for the identifier, or specify self to distinguish the member from the local.

@vnen
Copy link
Member

vnen commented Apr 28, 2023

Bringing this up again. I can accept this since it's more restrictive (and the alternative is already not working anyway). But one thing that we may discuss is what is the "scope".

For instance, variables declared in a nested block is not available to the outer scope:

if true:
	var foo = 0
print(foo) # Error: "foo" not declared

This is expected. However, you can declare the same name in the outer scope:

if true:
	var foo = 0
var foo = 1
print(foo) # Prints "1"

OTOH, you cannot shadow a variable from the outer scope:

var foo = 1
if true:
	var foo = 0 # Error: "foo" is already declared in this scope

One particular oddity is this:

if true:
	var foo = 0
var foo = 1
if true:
	var foo = 2

The var foo = 2 is disallowed because of shadowing of outer scope, but the var foo = 0 works fine.

Considering all this, in a situation like the following:

var foo = 0
func _ready():
	print(foo)
	if true:
		var foo = 1
	var foo = 2

The declaration of both foo are valid as per the previous example. But print(foo) would be invalid according to this PR because it's using a variable before it is declared. However, the declaration of var foo = 1 in the nested block still works fine even though the foo name is already reserved in the outer scope (given print(foo) earlier is forbidden).

For me this is inconsistent. Either we disallow shadowing of inner scopes even if the outer one is declared later, or we go back to the previous behavior of the name pointing the the outer scope if the current one is not yet declared.

@dalexeev dalexeev force-pushed the gds-fix-use-local-before-declared branch from 6fc420d to aae58e1 Compare June 15, 2023 11:16
@dalexeev
Copy link
Member Author

Bringing this up again. I can accept this since it's more restrictive (and the alternative is already not working anyway). But one thing that we may discuss is what is the "scope".

Done. I added a check for variables declared in the parent block below the current block. The check is not possible in the parser, only in the analyzer.

I think with this additional restriction we shouldn't cherry-pick this to 4.0.

@capnm
Copy link
Contributor

capnm commented Jun 16, 2023

I'd expect this to print 1, not raise an error IMO.

Agree.

See a correct go program (without error or any warnings):

package main
import "fmt"

var a = "Martin!"

func main() {
    fmt.Println("Hi, ", a)
    var a = 2    
    fmt.Println(a)
    if a == 2 {
    	var a = 42
    	fmt.Println(a)
    }
}

Hi,  Martin!
2
42

https://go.dev/ref/spec#Declarations_and_scope

@adamscott
Copy link
Member

@vnen @dalexeev Is it ready to be merged / stable enough in 4.1?

@dalexeev
Copy link
Member Author

@adamscott From the technical side, I think it's ready to be merged. But some people prefer to be able to shadow identifiers and treat scopes differently. Perhaps this should be discussed at the GDScript team meeting?

@adamscott adamscott modified the milestones: 4.1, 4.2 Jun 17, 2023
@adamscott
Copy link
Member

I put the milestone to 4.2 as this is a pending discussion. Thanks @dalexeev.
I added this PR to the PRs to team review.

@dalexeev
Copy link
Member Author

dalexeev commented Jul 9, 2023

Summary

Questions for discussion

  1. Scope of local variable/identifier.
  2. Identifier shadowing.

Common points for both options

  1. A local variable can shadow member variable.
  2. In one block, several variables with the same name cannot be defined (with shading of the previous one, exists in some programming languages).
  3. A local variable cannot shadow a parent block's local variable, this is considered an error.
  4. Two blocks, neither of which is a parent of the other (for example, sibling blocks), can have variables with the same name.

Current state

The parser/analyzer and the compiler/runtime treat variable scopes differently, which leads to non-obvious behavior and bugs in edge cases related to variable shadowing and its use before definition.

Also shadowing an identifier is one of the most common and hardly noticeable mistakes. We have a warning when a local variable shadowing a member variable, but as the issue showed, users don't really pay attention to it, as they first notice the bug.

Options

Light background indicates a variable scope. A piece of code where an identifier can be used and means the same thing.
Dark background (white text) indicates scope before declaration (Option 2 only). Using the identifier here will result in an error.
Strikethrough text indicates errors. After an error, scopes are highlighted as if there was no error.

Option 1 (mentioned above): A local variable exists from definition to the end of the block in which it is defined.

Option 2 (this PR): A local variable exists in the block in which it is defined, but cannot be used until it is defined. (The same as Option 1, but with an additional restriction.)

Pros of Option 2:

  1. Reducing the cases where the same identifier in nearby places means different things.
  2. Simple implementation. Minimal changes in the parser/analyzer, no changes in the compiler.

Cons of Option 2:

  1. Disallows previously possible code:
if true:
    var a = 1
var a = 2

Another interesting question about loops, since in these blocks the control flow returns to the beginning of the block.

var a = 1

func _ready():
    for _i in 3:
        print(a)
        var a = 2
        print(a)

What should the following code output:

  1. The error "Cannot use local variable "a" before it was declared.";
  2. 1 2 1 2 1 2;
  3. 1 2 2 2 2 2?

@nlupugla
Copy link
Contributor

Re the loop example, I would say option 3 makes the most sense to me, ie: 1 2 2 2 2 2.

In the same way that the parser is only allowed to look 1 token behind because humans reading the code have a small short-term memory, I think analysis like this should be consistent with what a human with a short memory reading the code would expect.

Similarly, maybe a good principle to follow would be that, to the extent possible, code "in the future" shouldn't raise an error on code "in the past".

@dalexeev
Copy link
Member Author

Re the loop example, I would say option 3 makes the most sense to me, ie: 1 2 2 2 2 2.

In my opinion, this is non-obvious behavior. I don't think there are many languages that would implement it this way. In addition, this will significantly and unnecessarily complicate the implementation, since we would have to generate different code for the first and subsequent iterations of the loop.

@kleonc
Copy link
Member

kleonc commented Jul 10, 2023

var a = 1

func _ready():
    for _i in 3:
        print(a)
        var a = 2
        print(a)
    

What should the following code output:

1. The error "Cannot use local variable "a" before it was declared.";

2. `1 2 1 2 1 2`;

3. `1 2 2 2 2 2`?

Poll:

  1. 🎉
  2. 🚀
  3. ❤️

@dalexeev
Copy link
Member Author

I want to clarify: the current behavior is not Option 1 and not Option 2. It's an Option 0 that has a bug due to the fact that the parser/analyzer and compiler/runtime treat scopes differently.

To me, both options seem acceptable, but in this PR I made Option 2 because it's easier to implement and more restrictive. We can choose Option 1 if we decide it's better. It just takes time for me or someone else to implement it.

There is also an "Option Go" (see comment), which suggesting the ability to shadow local variables. But this is not currently possible and is not suggested in either Option 1 or Option 2. Therefore, we do not consider the option in this discussion.

@nlupugla
Copy link
Contributor

nlupugla commented Jul 11, 2023

Edit 2023/07/11 10:31am: clarified what I meant by "option 1".

Out of curiosity, just tested the looping example in Python to see which of the three outputs it gave between

1. The error "Cannot use local variable "a" before it was declared.";

2. `1 2 1 2 1 2`;

3. `1 2 2 2 2 2`?

When a is defined in the global scope, it gives output 1.

image

When a is a class member, it also gives output 1.

image

to be clear, I'm not arguing for any of these 3 outputs, just presenting this test as some more data to consider.

@dalexeev
Copy link
Member Author

dalexeev commented Jul 11, 2023

Out of curiosity, just tested this in Python and its approach seems to be option 1.

1. The error "Cannot use local variable "a" before it was declared."; — is Option 2.
2. 1 2 1 2 1 2; — is Option 1.

Now we are completely confused. 🙃

@AThousandShips
Copy link
Member

AThousandShips commented Jul 11, 2023

Probably because you swapped the order for your "what should it print" 🙃

And the visual example is far easier to process than the description

@nlupugla
Copy link
Contributor

Oh, I see what I did. When I said option 1, I meant option 1/3 of the poll, not option 1/2 as outlined in the summary. I'll edit my post to clarify.

@vnen
Copy link
Member

vnen commented Jul 11, 2023

For me the most intuitive behavior is that a local variable is only available after it is declared. Before that it is either an error (if it does not exist) or it accessing an outer scope.

However, I believe being more restrictive is a better approach overall, as it gives less chances for the user to make mistakes. Doing this sort of shadowing intentionally is rare and usually frowned upon since it's harder to follow the code (the same name in the same function might refer to different things depending on the line).

So while I originally said it should follow the more intuitive route, I'm now leaning towards the more restrictive solution.

@dalexeev
Copy link
Member Author

Obviously shadowing is a bad practice that we should not encourage. But should we disallow it? We already have a warning (which some users may not notice, but is this a problem?). It's a question of balance between "language allows you to shoot yourself in the foot" and "language is unnecessarily intrusive."

It's funny, but after discussion I leaned towards the intuitive option.

@nlupugla
Copy link
Contributor

I think it would be useful to see how more languages handle a similar situation.

Normally, I would advocate for more help from the analyzer over less, but something feels "off" about allowing code "in the future" to affect code "in the past".

@vnen
Copy link
Member

vnen commented Jul 11, 2023

Obviously shadowing is a bad practice that we should not encourage. But should we disallow it? We already have a warning (which some users may not notice, but is this a problem?).

I don't think we should disallow shadowing of class variables, but we should still disallow shadowing of outer blocks. But this PR is not really about shadowing in general, it is about a solution to an issue that it made it more restrictive than it was before (hence the generated discussion).

So we if there are concerns about shadowing in general we should go the usual route of proposal and discussion.

This is PR is actually about lifecycle of locals: (1) do they exist only after they're declared or (2) are they present anywhere in the scope no matter where they are declared.

I still think (1) is more intuitive but (2) is more restrictive which is better overall as it allows less mistakes.

Normally, I would advocate for more help from the analyzer over less, but something feels "off" about allowing code "in the future" to affect code "in the past".

While you are writing a function linearly this might feel odd, but if you think that you may edit the function multiple times in the future then the actual line order become less relevant. Once you start editing it, you may not be aware anymore of what came before so raising an error in this case is more likely to help you than getting in your way.

@dalexeev
Copy link
Member Author

dalexeev commented Jul 17, 2023

JavaScript - Option 2.

I'm pretty sure the idea came to me because of JavaScript.

Uncaught ReferenceError: can't access lexical declaration 'a' before initialization.

Example 1
let a = 1;

function _ready() {
    // Uncaught ReferenceError: can't access lexical declaration 'a' before initialization.
    let a = a + 1;
    console.log(a);
}

_ready();
Example 2
let a = 1;

function _ready() {
    for (let i = 0; i < 3; i++) {
        // Uncaught ReferenceError: can't access lexical declaration 'a' before initialization.
        console.log(a);
        let a = 2;
        console.log(a);
    }
}

_ready();

Python - Option 2.

To be fair, Python doesn't have a variable declaration keyword (var/let).

3.8: UnboundLocalError: local variable 'a' referenced before assignment
3.11: UnboundLocalError: cannot access local variable 'a' where it is not associated with a value

Example 1
a = 1

def _ready():
    # UnboundLocalError: cannot access local variable 'a' where it is not associated with a value
    a = a + 1
    print(a)

_ready()
Example 2
a = 1

def _ready():
    for i in range(3):
        # UnboundLocalError: cannot access local variable 'a' where it is not associated with a value
        print(a)
        a = 2
        print(a)

_ready()

C++ - Option 1 like, but variable is shadowed already in the definition initializer.

Example 1
#include <stdio.h>

int a = 1;

int main() {
    printf("%d\n", a); // 1
    int a = a + 1; // The right "a" is a local variable, not a global one!
    printf("%d\n", a); // 32745 // Or 1 with other compilers.

    return 0;
}
Example 2
#include <stdio.h>

int a = 1;

int main() {
    for (int i = 0; i < 3; i++) {
        printf("%d\n", a);
        int a = 2;
        printf("%d\n", a);
    }
    // 1 2 1 2 1 2

    return 0;
}

C# - Option 2.

A local variable 'a' cannot be used before it is declared.

Example 1
using System;
class HelloWorld {
    static int a = 1;
    static void Main() {
        // error CS0844: A local variable `a' cannot be used before it is declared.
        // Consider renaming the local variable when it hides the member `HelloWorld.a'
        Console.WriteLine(a);
        int a = a + 1;
        Console.WriteLine(a);
    }
}
Example 2
using System;
class HelloWorld {
    static int a = 1;
    static void Main() {
        for (int i = 0; i < 3; i++) {
            // error CS0844: A local variable `a' cannot be used before it is declared.
            // Consider renaming the local variable when it hides the member `HelloWorld.a'
            Console.WriteLine(a);
            int a = 2;
            Console.WriteLine(a);
        }
    }
}

@dalexeev
Copy link
Member Author

dalexeev commented Jul 24, 2023

v3.5.2.stable.official:

Test 1. Using a variable before the definition when there is a shadowed member
extends Node

var a = 1

func _ready():
    print(a) # Prints 1.
    var a = 2
    print(a) # Prints 2.
Test 2. Same as Test 1, but in for loop
extends Node

var a = 1

func _ready():
    for _i in 3:
        print(a)
        var a = 2
        print(a)
    # Prints 1 2 1 2 1 2.
Test 3. Definition in nested block before definition in outer block
extends Node

func _ready():
    if true:
        var a = 1
        print(a) # Prints 1.
    var a = 2
    print(a) # Prints 2.
Test 4. Definition in nested block after definition in outer block
extends Node

func _ready():
    var a = 1
    print(a)
    if true:
        var a = 2 # Compile time error: Variable "a" already defined in the scope (at line 4).
        print(a)
Test 5. Using the same variable in the initializer when there is a shadowed member
extends Node

var a = 1

func _ready():
    print(a)
    var a = a + 1 # Runtime error: Invalid operands 'Nil' and 'int' in operator '+'.
    print(a)

The behavior corresponds to Option 1 ("intuitive"), except for Test 5.

gdscript-test-scopes-3.x.zip

@adamscott
Copy link
Member

adamscott commented Jul 24, 2023

Discussed during a GDScript Meeting. Here's the resolution that we agreed upon:

  • Concerning the issue about parent scope having a declaration, then used in the child scope before declaration:
    • we agreed to make it usable before declaration with the parent's scope value;
    • but we issue a warning about it.
  • For situations where preceding child scope is "shadowing" a variable declared afterwards in the parent scope
    • we agreed to the same resolution: make it usable
    • but we still issue a warning about the use of a variable with the same name as declared in the parent.

We chose to not implement a "Swift" or "Lua" way right now about variable declaration scope, as it is not in the scope of the issue at hand.

And we decided to warn instead of throwing errors to maintain compatibility with existing code.

@dalexeev
Copy link
Member Author

I think we can close this PR. Whether it's me or someone else who implements it, I'd rather open a new PR anyway. Thank you all for the interesting discussion and the consensus we have reached!

@dalexeev dalexeev closed this Jul 24, 2023
@dalexeev dalexeev deleted the gds-fix-use-local-before-declared branch July 24, 2023 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Accessing a member variable before declaring a variable that would overshadow returns null
8 participants