Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go target generated code for Java.g4 grammar doesn't compile #1397

Closed
ereyes01 opened this issue Nov 22, 2016 · 22 comments
Closed

Go target generated code for Java.g4 grammar doesn't compile #1397

ereyes01 opened this issue Nov 22, 2016 · 22 comments

Comments

@ereyes01
Copy link
Contributor

Before submitting an issue to ANTLR, please check off these boxes:

  • [*] I am not submitting a question on how to use ANTLR; instead, go to antlr4-discussion google group or ask at stackoverflow
  • [*] I have done a search of the existing issues to make sure I'm not sending in a duplicate

Expected behavior

antlr4 -Dlanguage=Go Java.g4 should generate Go code that compiles. Java.g4 was taken from github.com/antlr/grammars-v4

Other info:

antlr4 command is: alias antlr4='java -Xmx500M -cp "/home/ereyes/.m2/repository/org/antlr/antlr4/4.6-SNAPSHOT/antlr4-4.6-SNAPSHOT.jar:$CLASSPATH" org.antlr.v4.Tool'

$ go version
go version go1.7.1 linux/amd64

Actual behavior

The generated code does not compile. Here's what the Go compiler says:

$ go install ./
# /home/ereyes/code/antlr4-java/parser
./java_lexer.go:726: syntax error: unexpected _input, expecting comma or )
./java_lexer.go:739: syntax error: unexpected _input, expecting comma or )

Steps to reproduce the behavior

$ cd ~/code
$ git clone https://github.com/antlr/grammars-v4
$ mkdir -p antlr4-java/src/parser
$ export GOPATH=$HOME/code/antlr4-java
$ cd antlr4-java/src/parser
$ cp ~/code/grammars-v4/java/Java.g4 ./
$ antlr4 -Dlanguage=Go Java.g4
$ go get github.com/antlr/antlr4/runtime/Go/antlr
$ go install ./

I looked at the error and attempted to fix by applying the patch below, but it still doesn't fix the problem:

diff --git a/java_lexer.go b/java_lexer.go
index b7b70c6..4b5114f 100644
--- a/java_lexer.go
+++ b/java_lexer.go
@@ -723,7 +723,7 @@ func (p *JavaLexer) JavaLetter_Sempred(localctx antlr.RuleContext, predIndex int
 			return Character.isJavaIdentifierStart(_input.LA(-1))
 
 	case 1:
-			return Character.isJavaIdentifierStart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))
+			return Character.isJavaIdentifierStart(Character.toCodePoint((char)(_input.LA(-2)), (char)(_input.LA(-1))))
 
 	default:
 		panic("No predicate with index: " + fmt.Sprint(predIndex))
@@ -736,7 +736,7 @@ func (p *JavaLexer) JavaLetterOrDigit_Sempred(localctx antlr.RuleContext, predIn
 			return Character.isJavaIdentifierPart(_input.LA(-1))
 
 	case 3:
-			return Character.isJavaIdentifierPart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))
+			return Character.isJavaIdentifierPart(Character.toCodePoint((char)(_input.LA(-2)), (char)(_input.LA(-1))))
 
 	default:
 		panic("No predicate with index: " + fmt.Sprint(predIndex))

... I tried to fix the parentheses in the cast in java_lexer.go. I also wondered if the intention was to cast to rune instead of char, which isn't a built-in type in Go. Anyways, here are the compiler errors after applying this patch:

./java_lexer.go:723: undefined: Character in Character.isJavaIdentifierStart
./java_lexer.go:723: undefined: _input in _input.LA
./java_lexer.go:726: undefined: Character in Character.isJavaIdentifierStart
./java_lexer.go:726: undefined: char
./java_lexer.go:726: undefined: _input in _input.LA
./java_lexer.go:726: undefined: char
./java_lexer.go:726: undefined: _input in _input.LA
./java_lexer.go:736: undefined: Character in Character.isJavaIdentifierPart
./java_lexer.go:736: undefined: _input
./java_parser.go:14393: no new variables on left side of :=
./java_lexer.go:736: too many errors

... and after changing char to rune:

./java_lexer.go:723: undefined: Character in Character.isJavaIdentifierStart
./java_lexer.go:723: undefined: _input in _input.LA
./java_lexer.go:726: undefined: Character in Character.isJavaIdentifierStart
./java_lexer.go:726: undefined: _input in _input.LA
./java_lexer.go:736: undefined: Character in Character.isJavaIdentifierPart
./java_lexer.go:736: undefined: _input in _input.LA
./java_lexer.go:739: undefined: Character in Character.isJavaIdentifierPart
./java_lexer.go:739: undefined: _input in _input.LA
./java_parser.go:14393: no new variables on left side of :=
@ereyes01
Copy link
Contributor Author

I should also note, I have gotten JSON.g4 to work with the Go target, and have been able to play with the generated parser apis...

@KvanTTT
Copy link
Member

KvanTTT commented Nov 23, 2016

You should remove target-specific (Java) semantic predicates from grammar such as {Character.isJavaIdentifierStart(_input.LA(-1))}? and other.

@ereyes01
Copy link
Contributor Author

I see, this makes sense. Thanks for the quick reply! I'll try this out later this evening...

@ereyes01
Copy link
Contributor Author

I removed the inline Java code from the grammar and now I'm left with:

> go install .
# parser
./java_parser.go:14393: no new variables on left side of :=

In the generated code:

decl1>>>	_alt := p.GetInterpreter().AdaptivePredict(p.GetTokenStream(), 139, p.GetParserRuleContext())

		for _alt != 2 && _alt != antlr.ATNInvalidAltNumber {
	...
		}
		p.SetState(1232)
		p.GetErrorHandler().Sync(p)
decl2>>>	_alt := p.GetInterpreter().AdaptivePredict(p.GetTokenStream(), 140, p.GetParserRuleContext())

The code above was inside the parser for this rule in Java.g4:

arrayCreatorRest
    :   '['
        (   ']' ('[' ']')* arrayInitializer
        |   expression ']' ('[' expression ']')* ('[' ']')*
        )
    ;

Not sure if this is the same problem encountered in #1401 ?

@parrt
Copy link
Member

parrt commented Nov 28, 2016

@pboyer for you, sir!

@pboyer
Copy link
Contributor

pboyer commented Nov 30, 2016

Thank you @ereyes01! I'll take a look at this tomorrow.

@pboyer
Copy link
Contributor

pboyer commented Nov 30, 2016

@ereyes01 Would you mind including your modified Java.g4 grammar as well?

@ereyes01
Copy link
Contributor Author

Here's the patch I applied to Java.g4:

diff --git a/java/parser/Java.g4 b/java/parser/Java.g4
index 8743761..986dee6 100644
--- a/java/parser/Java.g4
+++ b/java/parser/Java.g4
@@ -980,10 +980,8 @@ JavaLetter
     :   [a-zA-Z$_] // these are the "java letters" below 0x7F
     |   // covers all characters above 0x7F which are not a surrogate
         ~[\u0000-\u007F\uD800-\uDBFF]
-        {Character.isJavaIdentifierStart(_input.LA(-1))}?
     |   // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
         [\uD800-\uDBFF] [\uDC00-\uDFFF]
-        {Character.isJavaIdentifierStart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?
     ;
 
 fragment
@@ -991,10 +989,8 @@ JavaLetterOrDigit
     :   [a-zA-Z0-9$_] // these are the "java letters or digits" below 0x7F
     |   // covers all characters above 0x7F which are not a surrogate
         ~[\u0000-\u007F\uD800-\uDBFF]
-        {Character.isJavaIdentifierPart(_input.LA(-1))}?
     |   // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
         [\uD800-\uDBFF] [\uDC00-\uDFFF]
-        {Character.isJavaIdentifierPart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?
     ;
 
 //

Thanks!

@pboyer
Copy link
Contributor

pboyer commented Nov 30, 2016

@ereyes01 If you get the chance, do you think you can try out this fix? pboyer#90

If not, I should be able to get to it tomorrow.

@ereyes01
Copy link
Contributor Author

ereyes01 commented Dec 1, 2016

@pboyer it works 👍 thanks!

@pboyer pboyer mentioned this issue Dec 1, 2016
@pboyer
Copy link
Contributor

pboyer commented Dec 1, 2016

Thanks so much for checking, @ereyes01!

parrt added a commit that referenced this issue Dec 1, 2016
@parrt
Copy link
Member

parrt commented Dec 1, 2016

Fixed by #1436

@BurtHarris
Copy link

BurtHarris commented Jun 18, 2017

I believe this still isn't good. It's using a predicate when a simple character class will do: according to the Java language specification there's no need for anything but ASCII letters and numbers.. See PR antlr/grammars-v4#762.

@cowang
Copy link

cowang commented Jun 20, 2017

@BurtHarris No. Identifiers may be written using any of the world's languages. For more than you ever wanted to know, see my comments elsewhere.

@Nhoya
Copy link

Nhoya commented Feb 22, 2019

This issue seems still there.. I'm using ANTLR 4.7.2 with go

go version
go1.11.5 windows/amd64

I generated the code with

java -jar antlr-4.7.2-complete.jar -Dlanguage=Go parser .\Java8.g4

The errors are in java8_lexer.go

syntax error: unexpected _input, expecting comma or ) [756:86]
syntax error: unexpected _input, expecting comma or ) [769:70]

Snippet:

L 756

return Character.isJavaIdentifierStart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))

L 769

return Character.isJavaIdentifierPart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))

@KvanTTT
Copy link
Member

KvanTTT commented Feb 22, 2019

Try to use Java.g4 grammar instead of Java8.g4. It does not contain actions, semantic predicates and it's suitable for 8 version.

@Nhoya
Copy link

Nhoya commented Feb 22, 2019

Hi @KvanTTT, thanks for the reply

Unfortunately Java.g4 is not available in https://github.com/antlr/grammars-v4/tree/master/java

@KvanTTT
Copy link
Member

KvanTTT commented Feb 22, 2019

I mean the ordinary java grammar that consists of JavaLexer.g4 and JavaParser.g4.

@Nhoya
Copy link

Nhoya commented Feb 22, 2019

I'm trying to build a Java.g4 from Lexer and Parser.. Apparently this is not well documented in the README (please correct me if I'm wrong) and the only valid reference seems this one #638 but is really messy... I tried with

java -jar antlr-4.7.2-complete.jar .\grammars-v4\java\JavaLexer.g4 .\grammars-v4\java\JavaLexer.g4 -o .

but without any output... Any hint about it?

EDIT:
solved with:

java -jar antlr-4.7.2-complete.jar -Dlanguage=Go -lib .\grammars-v4\java\*.g4 -o parser

@Nhoya
Copy link

Nhoya commented Feb 22, 2019

Unfortunately the command ahead only generates

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
-a----        2/22/2019     15:17          52118 JavaParser.interp
-a----        2/22/2019     15:17           2344 JavaParser.tokens
-a----        2/22/2019     15:17          36037 javaparser_base_listener.go
-a----        2/22/2019     15:17          28371 javaparser_listener.go
-a----        2/22/2019     15:17         511445 java_parser.go

As you can see java_lexer.go is missing @KvanTTT

@KvanTTT
Copy link
Member

KvanTTT commented Feb 22, 2019

You should generate lexer at first.

@Nhoya
Copy link

Nhoya commented Feb 22, 2019

If I use

java -jar 'C:\Javalib\antlr-4.7.2-complete.jar' -lib .\JavaLexer.g4 .\JavaParser.g4 -o build

I expect to see both the java_lexer.go and java_parser.go to be generated in the build directory. If this is not the case I'm missing something. Can you please link me the part of the documentation I'm skipping?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants