Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TokenType: Tests and optimization for pointer types #8060

Merged
merged 8 commits into from
Sep 28, 2023

Conversation

martin-strecker-sonarsource
Copy link
Contributor

Follow up to #7369

Copy link
Contributor

@antonioaversa antonioaversa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting additional tests. I have also made a few educational questions that are lower priority, not blocking the merge of the PR, and can be addressed at a later stage.

@@ -1286,4 +1286,26 @@ public void M(object o)
}
}
""", allowSemanticModel);

[DataTestMethod]
[DataRow("var d = [u:dateTimePointer]->Date;", false)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Educational: could you please briefly explain when a token is of an unknown type (i.e. TokenType.UnknownTokentype), and why the method parameter dateTimePointer is of such type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unknown is the default tokentype and is used for all tokens that are not of any of the other types:

  public enum TokenType {
    [pbr::OriginalName("UNKNOWN_TOKENTYPE")] UnknownTokentype = 0,
    [pbr::OriginalName("TYPE_NAME")] TypeName = 1,
    [pbr::OriginalName("NUMERIC_LITERAL")] NumericLiteral = 2,
    [pbr::OriginalName("STRING_LITERAL")] StringLiteral = 3,
    [pbr::OriginalName("KEYWORD")] Keyword = 4,
    [pbr::OriginalName("COMMENT")] Comment = 5,
  }

"Unknown" tokens are skipped when the info is written to the disc because there is no difference.


[DataTestMethod]
[DataRow("var d = [u:dateTimePointer]->Date;", false)]
[DataRow("var d = dateTimePointer->[u:Date];", false)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Educational: I guess Date is of unknown type because dateTimePointer is of unknown type. Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnknownTokentype here means the classification of the token isn't any of the other TokenTypes. It doesn't have anything to do with the actual .Net types involved. The enum is badly named. It should be called TokenClassification instead of TokenType.

Also:
For all identifier tokens, there can only be two possible TokenType outcomes: "TypeName" or "UnknownTokentype" *). If we can conclude that the token can never bind to a type, we are done. A pointer can never be a type. (On a normal member access, we can not be sure: Math.Pi here Math can bind to a type or a property. Here we need to consult the semantic model for Math and see what it binds to.

*) One exception is "value" in property setters, which is classified as a keyword.

[DataRow("var d = [u:dateTimePointer]->Date;", false)]
[DataRow("var d = dateTimePointer->[u:Date];", false)]
[DataRow("var d = (*[u:dateTimePointer]).Date;", false)]
[DataRow("var d = (*dateTimePointer).[u:Date];", false)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Educational: [u:dateTimePointer] and [u:Date] could be assessed in the same test, correct? Decorating the token dateTimePointer with the token type assertion syntax doesn't impact the token Date.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I try to assert only single tokens as it eases the test debugging. Also, the allowSemanticModel value may differ between the tokens. I just prefer the test isolation here (test a single thing).

[DataRow("System.[t:Int32]* iPointer;", false)]
[DataRow("[k:void]* voidPointer;", false)]
[DataRow("DateTime d = default; M(&[u:d]);", false)]
public void IdentifierToken_Unsafe_Pointers(string statement, bool allowSemanticModel) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some additional test cases that come to mind:

int?* x1; // Pointer of a nullable value type ("?" shouldn't be part of the token, right?
Nullable<int>* x2; // Nullable is of type t, whereas int is of type k, both are recognized
int** x3; // ** is not included in the token int, which is of type k

And relative assignments:

_ = &x1; // x1 of type u
_ = (*(&x1))->Value; // x1 and Value of type u
_ = (*x1).Value; // x1 and Value of type u

There are also ways of mixing @ and &, and we may want to make sure we only highlight the right thing:

var @int = (int*)null; // 2nd int and null of type k, 1st int of type u (?) with @ included
_ = *@int; // @int of type u (?), * not included, @ included
_ = &@int; // @int of type u (?), & not included, @ included

You can also similarly mix [] with * and &.

Hereafter is my understanding of one of the cases mentioned above (to be verified):

[DataRow("var [u:@int] = ([k:int]*)[k:null];_ = *[u:@int];", false)]

Would it make sense to cover (some of) them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a couple of tests. The @ tests are a bit unrelated and I separated those. The @ is just parsed as part of the identifier token, so it doesn't really change anything.

@@ -196,6 +196,8 @@ name switch
// that can not bind to a type. The only things that can bind to a type are SimpleNames (Identifier or GenericName)
// or pre-defined types. None of the pre-defined types have a nested type, so we can exclude these as well.
{ Parent: MemberAccessExpressionSyntax x } when AnyMemberAccessLeftIsNotASimpleName(x) => TokenType.UnknownTokentype,
// The left side of a pointer member access must be a pointer and can not be a type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Educational: this assertion still holds when using using unsafe from C# 12, right?

using System;
using unsafe Ptr = int*;

internal class Program
{
    private unsafe static void Main(string[] args)
    {
        int x = 3;
        Ptr y = &x;
        Console.WriteLine($"x = {y->ToString()}");
    }
}

Ptr should still be considered TokenType.UnknownTokentype, and not a type. Correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just works. Ptr is in a type position and is therefore classified as type (There is a distinguishing happening before ClassifyMemberAccess is called. An IdentfierSymtax can either be in an expression or a type context. In a type context, a single IdentfierName resolves to TokenType.TypeName and in an expression context to UnknownTokentype).

@sonarcloud
Copy link

sonarcloud bot commented Sep 25, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@sonarcloud
Copy link

sonarcloud bot commented Sep 25, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

100.0% 100.0% Coverage
0.0% 0.0% Duplication

Copy link
Contributor

@antonioaversa antonioaversa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot for your explanations, they made the code much clearer to me!

@antonioaversa
Copy link
Contributor

@martin-strecker-sonarsource Converted to draft temporarily, to avoid merging by mistake, while the CI runs.
The latest failure of UTs doesn't seem to have anything to do with the changes of this PR, and they seem to be related to Azure machines provisioning. I am rerunning the UTs. As soon as they are green, I am going to undraft and merge the PR.

@antonioaversa antonioaversa marked this pull request as ready for review September 28, 2023 08:37
@antonioaversa antonioaversa merged commit 24ba91d into master Sep 28, 2023
25 checks passed
@antonioaversa antonioaversa deleted the Martin/TokenType_Unsafe branch September 28, 2023 08:37
@andrei-epure-sonarsource andrei-epure-sonarsource added this to the 9.12 milestone Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants