Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Tokenize punctuation (and generics) #22

Merged
merged 24 commits into from
Apr 19, 2016
Merged

Tokenize punctuation (and generics) #22

merged 24 commits into from
Apr 19, 2016

Conversation

winstliu
Copy link
Contributor

@winstliu winstliu commented Oct 1, 2015

This PR tokenizes a lot of the missing Java punctuation, such as parentheses, square and curly brackets, angle brackets and commas (for generics), etc.

Fixes #19
Fixes #23
Fixes #29

@chbk
Copy link
Contributor

chbk commented Oct 5, 2015

Looks great, except the square brackets aren't tokenized everywhere. Maybe that's what you mean with char[]?

This is my (quick) test case:

javabraces

public class MyClass {

    public static void main(String[] args) {

        String[] array;

        HashMap<Integer,String> map = new HashMap<>();

        boolean[] list = new boolean[10];

        list[i] = (sum == n ? true : false);
    }
}

Brackets on line 3 aren't scoped:

line3

But on line 5 they are:

line5

I also noticed bugs on lines 7 and 9 while typing this, i'll open new issues for that.

@winstliu
Copy link
Contributor Author

winstliu commented Oct 5, 2015

Yeah, that's basically what I was talking about. Though it looks like they sometimes get tokenized, so I'll try to fix that :).

@kevinsawicki
Copy link
Contributor

Just a heads up, this no longer merges now that #21 has been merged.

@winstliu
Copy link
Contributor Author

winstliu commented Nov 3, 2015

@chbk your example seems to pass now. I have one question though: can you please take a look at this pattern and see if brackets also need to be matched in that? I can't figure out what it's supposed to match at all.

@chbk
Copy link
Contributor

chbk commented Nov 9, 2015

I'm not sure about that pattern. I'm guessing it tries to find variable names in bulk, so this would pass
int abc[], test = 9;
I don't think you should bother with brackets there since they seem to be already tokenized by the "code" section.

@winstliu
Copy link
Contributor Author

winstliu commented Dec 9, 2015

Updates!

I finally got back around to working on this. This should 100% fix #19 now and specs are passing 🍏 which is a great start, but I still want to include #23 before merging.

Sorry for the delay @chbk ;(.

@chbk
Copy link
Contributor

chbk commented Dec 9, 2015

Thanks for taking the time to work on this @50Wliu

# Conflicts:
#	spec/java-spec.coffee
@winstliu
Copy link
Contributor Author

#23 should be fixed as well now 🎉!

@chbk could you test this out and report if there's still any places that aren't getting tokenized correctly? Thanks! 🙇

@@ -694,20 +724,7 @@
'object-types-inherited':
'patterns': [
{
'begin': '\\b((?:[a-z]\\w*\\.)*[A-Z]+\\w*)<'
'end': '>|[^\\w\\s,<]'
'name': 'entity.other.inherited-class.java'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I couldn't find a way to keep this name with the new generics include. @kevinsawicki do you know if this should be possible, or was it intentionally not allowed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know if this should be possible, or was it intentionally not allowed?

At first glance, it does appear this would have to be removed to get the new generics patterns working. Nothing jumps out to my as to why this scope would be helpful so I think it is okay to remove it.

@winstliu winstliu changed the title Tokenize braces Tokenize punctuation (and generics) Dec 10, 2015
@winstliu
Copy link
Contributor Author

The latest commit a67ef17 changes all commas to be punctuation.separator.delimiter. Not sure if that's the best naming for it though: /cc @atom/languages (first time I'm using this, yay!).

@winstliu
Copy link
Contributor Author

Just realized that this doesn't tokenize the comma in all scenarios yet, such as int a, b = 3.

Object types weren't getting matched in certain places (eg return types)
because of a change in 1bf118e

Fix all primitive types always getting matched as arrays
@winstliu
Copy link
Contributor Author

@chbk Fixed, fixed, and fixed 😄.

Your Map example was something that I caused by accident in 1bf118e when moving object-types around. I've fixed that (hopefully) by placing it in all-types.

Same thing with String - objects weren't getting matched in return types.
boolean being tagged as an array was a silly bug. [] is now required to call something an array, instead of being optional...

@chbk
Copy link
Contributor

chbk commented Jan 13, 2016

It works as expected now. I combed through my files and found another live one:

t

abstract class Document implements Comparable,Serializable {}       // Comma isn't tokenized

@winstliu
Copy link
Contributor Author

When in doubt, keep fixing.

@winstliu
Copy link
Contributor Author

@chbk just wanted to ping you in case you didn't see my last comment and commit 😉.

@chbk
Copy link
Contributor

chbk commented Jan 27, 2016

Sorry I didn't get back to you, your last commit works fine 👍
Unfortunately I haven't been working with Java lately, so I haven't encountered any other bugs for the moment.

@chbk
Copy link
Contributor

chbk commented Feb 1, 2016

I got back to this and found 2 other bugs:

p

package com.apple.quicktime.v2;     // Periods aren't scoped

b

class Test {

    public String[][] return2DStringArray() {    // Brackets

        String[][] array = new String[2][2];
        return array;
    }

    public int[][] return2DintArray() {          // Brackets

        int[][] array = new int[2][2];
        return array;
    }
}

@winstliu
Copy link
Contributor Author

winstliu commented Feb 1, 2016

I think I can do a quick-fix for the brackets but the real underlying issue is #24 (the array regex is really broken right now).

@winstliu
Copy link
Contributor Author

winstliu commented Feb 2, 2016

Apparently I was wrong...multidimensional arrays turned out to be a lot easier than expected to implement.

Think there's anything else to tackle @chbk?

This is slowly turning into a full-blown rewrite of the entire grammar 😆.

At least, I'm pretty sure it's not needed since `#code` should match it
@chbk
Copy link
Contributor

chbk commented Feb 2, 2016

All these changes are starting to add up indeed! I don't see anything else, besides #24 and perhaps these minor enhancements:

  • The package wildcard could use a scope: import mypackage.*;
  • instanceof is just keyword.operator, making it hard to differentiate from others.

`instanceof` is now `keyword.operator.instanceof.java`
Fixed a few implementation bugs in the `package` invalid character
regexes
@winstliu
Copy link
Contributor Author

winstliu commented Feb 2, 2016

@chbk hopefully this will be the last test!

'match': '\\s'
'name': 'invalid.illegal.character_not_allowed_here.java'
'match': '\\*'
'name': 'punctuation.wildcard.java'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaximSokolov Do you think this naming is ok?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't variable.language.wildcard be better? I don't think it's a punctuation mark.
Note how syntax themes would highlight it:

screen shot 2016-02-03 at 11 05 44
DuoTone dark

@MaximSokolov
Copy link

How about wrapping brackets with meta.brace:

{
  'begin': '(\\[)'
  'beginCaptures':
    '0':
      'name': 'meta.brace.square.scope-is-for-back-capability.java'
    '1':
      'name': 'punctuation.bracket.square.java'
  # ...
}

This allows to use punctuation scope (now both meta.brace and punctuation should be styled) and won't break themes which style meta.brace (e.g. Solarized)

@@ -448,10 +492,56 @@
'captures':
'1':
'name': 'keyword.operator.dereference.java'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also how about changing it to punctuation.separator.period I think period are less important than other operators:

screen shot 2016-02-04 at 10 32 44
Monokai

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about punctuation.separator.deference? Or are we going with labeling what the actual character is now?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

punctuation.separator.accessor maybe?

`meta.brace` is now `punctuation.bracket`
All brackets now contain the type of bracket in their scope
`keyword.other.dereference` is now `punctuation.separator.period`
@winstliu
Copy link
Contributor Author

winstliu commented Feb 9, 2016

I think this is ready for review now.

@winstliu
Copy link
Contributor Author

🚢ing to get some exposure. I'm fairly confident this works as expected, and there's extensive spec coverage.

@winstliu winstliu merged commit a672147 into master Apr 19, 2016
@winstliu winstliu deleted the wl-braces branch April 19, 2016 16:47
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants