Fix multiline semantic highlighting for 'record' #2789

hopehadfield · 2023-08-03T19:54:11Z

Before:

After:

hopehadfield · 2023-08-03T20:38:45Z

I believe this fixes redhat-developer/vscode-java#1442 as well, though I couldn't reproduce the automatic line-splitting for inner records mentioned, even with version 0.61.0. The highlight colours are now correct.

rgrunber · 2023-08-04T19:03:14Z

It looks very reasonable. I'll just have a look at some other way to properly apply the same colouring as class gets. It could be that it's only possible from the client side.

0dinD · 2023-08-07T17:55:37Z

When calling addToken, you need to make sure that the tokens are added in the correct order (the order of appearance in the document). That's because the tokens are encoded using line/column deltas, and negative deltas are not supported since the Language Server Protocol specifies that they are unsigned integers.

Currently, you first add the record keyword token, and then you return true, which means that the visitor will continue visiting the child nodes of the RecordDeclaration, some of which appear before the keyword token, which will cause problems when encoding the tokens. For example, this PR causes a regression where the public modifier keyword before the record keyword is missing from the semantic tokens. I believe the same thing will happen with semantic tokens in the Javadoc comment of the RecordDeclaration, which also appears before the record keyword.

In fact, you can see that the tokens have been incorrectly encoded, because there is a test failure for SemanticTokensHandlerTest#testSemanticTokens_Constructors. First, it complains that it expected the private modifier keyword before the record keyword. And even if you add an assertion for the record keyword, it will complain that Token deltaColumn should not be negative for the private token.

Here is the correct visitation order: https://github.com/eclipse-jdt/eclipse.jdt.core/blob/4c26c0761103220355168e77cd6e556274b131a7/org.eclipse.jdt.core/dom/org/eclipse/jdt/core/dom/RecordDeclaration.java#L305-L312

The record keyword appears between the modifiers and the name/identifier, so I would propose something like this:

	@Override
	public boolean visit(RecordDeclaration node) {
		acceptNode(node.getJavadoc());
		acceptNodeList(node.modifiers());

		// Here goes the code that calls addToken for the "record" keyword

		acceptNode(node.getName());
		acceptNodeList(node.typeParameters());
		acceptNodeList(node.recordComponents());
		acceptNodeList(node.superInterfaceTypes());
		acceptNodeList(node.bodyDeclarations());

		return false;
	}

0dinD · 2023-08-07T18:18:49Z

Also, what's the reason for adding semantic token modifiers such as public, static etc to the record token? The way I see it, those modifiers should only apply to the name/identifier token, i.e. "Tuple" in your screenshots. The record itself may be public, static etc, but the record keyword is just a keyword, I don't think there is any need for further semantic information.

Regarding this code comment: Token type 'MODIFIER' is used (rather than 'RECORD') to provide the correct highlight colour.

The RECORD token type would never be correct for any keyword in any context, it is meant for a name/identifier token (and identifiers are not keywords), such as "Tuple" in your screenshots. It is analogous to the class, enum, interface etc. token types: https://code.visualstudio.com/api/language-extensions/semantic-highlight-guide#standard-token-types-and-modifiers

0dinD · 2023-08-07T18:33:07Z

@rgrunber:

I'll just have a look at some other way to properly apply the same colouring as class gets. It could be that it's only possible from the client side.

The class declaration keyword is defined as storage.modifier.java in the TextMate grammar. That is to say, it has the same TextMate token type as the modifier keywords, public, static, final etc. In my opinion, this is incorrect (yet another problem with the Java TextMate grammar). If you look at a language like TypeScript, the class declaration keyword is defined to have the storage.type.class.ts TextMate token type. For reference, here are the TextMate conventions: https://macromates.com/manual/en/language_grammars#naming_conventions

But that feels like a separate discussion. The fact is that currently, the class, record etc. keyword is treated like a modifier keyword in the TextMate grammar. And so if we want the semantic tokens to be consistent with the TextMate tokens, we will have to use the modifier token type, as is currently done in this PR. It is mapped to the storage.modifier.java TextMate token: https://github.com/redhat-developer/vscode-java/blob/8fa2464e16028b248fd89efc14622aeab251e890/package.json#L148

Signed-off-by: Hope Hadfield <hhadfiel@redhat.com>

hopehadfield · 2023-08-08T18:26:45Z

@0dinD Thanks for the input. I've fixed the order of the token addition, removed the modifiers for the record token, and updated the SemanticTokensHandlerTest#testSemanticTokens_Constructors test so that it passes now.

rgrunber · 2023-08-09T19:24:04Z

I suspect the choice for class was guided by Eclipse :

Thanks for the reference to semanticTokenScopes. That's what I was curious to know.

@0dinD I just have one question about the approach. If the semantic tokens must be added in order of appearance in the document, why don't we just sort them at the very end, instead of making the reading process more convoluted ? I can't imagine we're saving that much time by avoiding the additional sort at the end.

0dinD · 2023-08-10T14:29:34Z

I suspect the choice for class was guided by Eclipse

The TextMate grammar we use was originally created for the Atom editor, but yeah, perhaps the authors took some inspiration from Eclipse. Whatever the reason might be, I think it's still the wrong solution. If the goal was to apply the same color to the modifiers and the class declaration keyword, the correct solution would have been to apply the same color to two different TextMate token types (as is being done in most other languages, such as TypeScript). The current solution where the grammar does not differentiate between the token types is not only wrong semantically (the class declaration keyword is, in fact, not a modifier), but it also prevents theme authors from applying different colors to the class declaration keyword compared to the modifier.

I just have one question about the approach. If the semantic tokens must be added in order of appearance in the document, why don't we just sort them at the very end, instead of making the reading process more convoluted ? I can't imagine we're saving that much time by avoiding the additional sort at the end.

That thought has occurred to me. Sorting the list of tokens would allow us to visit nodes without needing to worry about ordering, but sorting a list with thousands of items (possibly at a high rate when the user is typing, which causes many token requests) will have a performance cost.

On the one hand

Originally (before I contributed patches for the semantic tokens), ordering was not a problem we had to think about, because the SemanticTokensVisitor always delegated the logic involved with visiting AST child nodes to the superclass, ASTVisitor, by not returning false from the visitation methods. The ASTVisitor class already guarantees that children are visited in the correct order: https://github.com/eclipse-jdt/eclipse.jdt.core/blob/0ba960caeb3c1ef51361af2db5ada06e11556744/org.eclipse.jdt.core/dom/org/eclipse/jdt/core/dom/ASTVisitor.java#L55-L62

Basically, the AST is already sorted (by nature). It feels wrong to traverse the tree in the wrong order, and then "cover it up" by sorting the result. Like, why not just do an in-order traversal?

Also, due to the nature of sorting, the performance impact would not scale linearly. For extremely large source files with a large number of semantic tokens, the performance impact could grow quite large. Of course, it's hard to say too much about this performance impact without actually measuring it.

Nowadays, we do have to worry about visitation order, because we sometimes choose not to delegate to ASTVisitor for the child visitation logic. This is because we sometimes need to pass additional information from parent nodes down into the visitation logic for child nodes (due to bugs and limitations in the JDT Core API). I mean, an alternative solution would be for the child nodes to query information from their parent, but this quickly leads to more code complexity, because one parent has many child nodes (and even more grandchild nodes, and so-on).

making the reading process more convoluted

I don't know exactly what you're referring to here, but keep in mind that even if we sort the list, we still have to call acceptNode / acceptNodeList on all the children manually, in order to avoid visiting the same node twice. Like, we can either visit the children ourselves via acceptNode / acceptNodeList, or we can delegate to ASTVisitor (doing both will lead to visiting the same node twice).

In this specific case with the record declaration keyword, we actually can delegate to ASTVisitor and remove the acceptNode / acceptNodeList calls, given that we also sort the list of tokens at the end. But that's only because the record declaration keyword is not an AST node. In most other cases, the "reading process" will still be "convoluted" (the way I see it), even if we sort the list of tokens.

On the other hand

I do agree that sorting the list would make the code a bit easier to maintain. If it turns out that the performance cost is acceptable, it's probably a good idea.

I suspect (without having done much testing) that, for normal-sized source files, the cost of sorting the list is dwarfed by the costs involved with waiting for the shared AST provider and other document lifecycle jobs. On the other hand, I have some ideas for how those performance costs could be removed, which I would like to explore in the future.

In order to get a relevant performance metric, it would probably be good to look at the performance of SemanticTokensVisitor itself. That is to say: How long does it take to convert ASTs of different sizes to semantic token data, with and without sorting the list?

0dinD · 2023-08-10T14:37:58Z

@hopehadfield:

I've fixed the order of the token addition, removed the modifiers for the record token, and updated the SemanticTokensHandlerTest#testSemanticTokens_Constructors test so that it passes now.

Great, this PR looks good to me now! On paper, I haven't had time to test it.

It looks like the CI tests have not run for this PR (or at least, I can't see them linked here on GitHub, all I see is CodeQL and ECA). Do you have any idea why that might be @rgrunber? Maybe because this PR was a draft PR at first, and then got converted into a "normal" PR?

rgrunber · 2023-08-14T13:29:16Z

Test this please.

We were running into https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/3449#note_1196262 due to the org migration. Should be fixed now.

0dinD · 2023-08-16T00:47:16Z

Ah, that explains it!

Anyways, I tried the latest version of this PR out, and it works as expected. Looks good to me!

0dinD · 2023-08-16T17:27:25Z

I have some things to add to this discussion: #2789 (comment)

After thinking about it some more, I realize that we could probably just change the implementation of addToken so that it performs Insertion Sort while building the list. This should have a very minimal performance impact (an order of magnitude less than sorting the entire list after building it), because most tokens will be inserted in the correct order, due to the ordered nature of the AST and ASTVisitor (which gives Insertion Sort near-linear performance).

Also, think we can probably get rid of most (if not all) of the manual child-visitation logic (the acceptNode / acceptNodeList calls), and instead rely on StructuralPropertyDescriptors in the AST. I hadn't realized until now that they can give us a lot of information about the relation between a child and its parent, using ASTNode#getLocationInParent(). This means that many of my previous workarounds that involve manually visiting child nodes from the parent can probably be refactored.

All of which is to say that I think there are a bunch of improvements that could be made to SemanticTokensVisitor to simplify the code and reduce the future maintenance burden. I'll investigate this more when I have time, and probably send a PR if all goes well.

But that discussion is quite off-topic from this PR. I still think that the current implementation of this PR is good, given the current state of SemanticTokensVisitor (where we need to manage token insertion order "manually").

rgrunber

This looks correct to me. After I took some time to realize you're just copying the visitation order defined by the node for its children, it makes sense.

For now, I think we should be fine to respect visitation order. It seems only tokens not part of the AST model are at risk of being visited out of order (we may have access to their info at the "root" prior to visiting any children that come before it.

0dinD · 2023-08-16T23:30:03Z

It seems only tokens not part of the AST model are at risk of being visited out of order

No, the risk of out-of-order visitation is introduced in all cases (even for tokens derived from AST nodes) where we avoid delegating the child visitation logic (returning false from a visitor method), and instead visit the children manually (via acceptNode / acceptNodeList).

As a reminder, here's one of my previous comments:

Nowadays, we do have to worry about visitation order, because we sometimes choose not to delegate to ASTVisitor for the child visitation logic. This is because we sometimes need to pass additional information from parent nodes down into the visitation logic for child nodes (due to bugs and limitations in the JDT Core API). I mean, an alternative solution would be for the child nodes to query information from their parent, but this quickly leads to more code complexity, because one parent has many child nodes (and even more grandchild nodes, and so-on).

Here's an example where there is a risk of out-of-order visitation even though we are only visiting AST nodes:

eclipse.jdt.ls/org.eclipse.jdt.ls.core/src/org/eclipse/jdt/ls/core/internal/semantictokens/SemanticTokensVisitor.java

Lines 316 to 323 in 6fe5425

    
           @Override 
        
           public boolean visit(ModuleDeclaration node) { 
        
           	acceptJavdocOfModuleDeclaration(node); 
        
           	acceptNodeList(node.annotations()); 
        
           	addTokenToSimpleNamesOfName(node.getName(), TokenType.NAMESPACE); 
        
           	acceptNodeList(node.moduleStatements()); 
        
           	return false; 
        
           }

Now, I'm not sure if I explained what I meant very well in the above paragraph that I quoted. And I'm not sure if I agree with what I said anymore either way, now that I became aware of StructuralPropertyDescriptor and the information they can give us in terms of querying parent information from the child nodes. I think there might be ways in which the SemanticTokensVisitor code can be simplified, in regards to child visitation logic. It would be good if we could eliminate the risk of out-of-order visitation, by always delegating the child visitation logic. I'll investigate this in the future, when I have time.

hopehadfield marked this pull request as draft August 3, 2023 19:58

hopehadfield force-pushed the 1444-record branch from d2931da to e1d57ae Compare August 3, 2023 20:14

hopehadfield force-pushed the 1444-record branch from e1d57ae to cb2222c Compare August 4, 2023 16:15

Fix multiline semantic highlighting for 'record'

a219b76

Signed-off-by: Hope Hadfield <hhadfiel@redhat.com>

hopehadfield force-pushed the 1444-record branch from cb2222c to a219b76 Compare August 8, 2023 18:16

hopehadfield marked this pull request as ready for review August 8, 2023 18:26

rgrunber added this to the End August 2023 milestone Aug 16, 2023

rgrunber added bug semantic highlighting labels Aug 16, 2023

rgrunber approved these changes Aug 16, 2023

View reviewed changes

rgrunber merged commit 6fe5425 into eclipse-jdtls:master Aug 16, 2023
4 checks passed

hopehadfield deleted the 1444-record branch August 17, 2023 13:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix multiline semantic highlighting for 'record' #2789

Fix multiline semantic highlighting for 'record' #2789

hopehadfield commented Aug 3, 2023 •

edited

Loading

hopehadfield commented Aug 3, 2023

rgrunber commented Aug 4, 2023

0dinD commented Aug 7, 2023

0dinD commented Aug 7, 2023 •

edited

Loading

0dinD commented Aug 7, 2023

hopehadfield commented Aug 8, 2023

rgrunber commented Aug 9, 2023 •

edited

Loading

0dinD commented Aug 10, 2023

0dinD commented Aug 10, 2023

rgrunber commented Aug 14, 2023 •

edited

Loading

0dinD commented Aug 16, 2023

0dinD commented Aug 16, 2023

rgrunber left a comment

0dinD commented Aug 16, 2023

Fix multiline semantic highlighting for 'record' #2789

Fix multiline semantic highlighting for 'record' #2789

Conversation

hopehadfield commented Aug 3, 2023 • edited Loading

hopehadfield commented Aug 3, 2023

rgrunber commented Aug 4, 2023

0dinD commented Aug 7, 2023

0dinD commented Aug 7, 2023 • edited Loading

0dinD commented Aug 7, 2023

hopehadfield commented Aug 8, 2023

rgrunber commented Aug 9, 2023 • edited Loading

0dinD commented Aug 10, 2023

On the one hand

On the other hand

0dinD commented Aug 10, 2023

rgrunber commented Aug 14, 2023 • edited Loading

0dinD commented Aug 16, 2023

0dinD commented Aug 16, 2023

rgrunber left a comment

Choose a reason for hiding this comment

0dinD commented Aug 16, 2023

hopehadfield commented Aug 3, 2023 •

edited

Loading

0dinD commented Aug 7, 2023 •

edited

Loading

rgrunber commented Aug 9, 2023 •

edited

Loading

rgrunber commented Aug 14, 2023 •

edited

Loading