-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsing code with an invalid encoding raises #998
Comments
What Ruby does also looks wrong to me but > Ripper.sexp("# encoding: utf\n")
-:1: unknown encoding name: utf (ArgumentError) To me this error looks very special, a set of encodings that is supported by your Ruby implementation/parser is not defined by any spec, and if it's not well-defined then it can't be a What should be thrown/returned in your opinion? |
I'm not quite sure what the right thing to do would be to be honest. This is what Prism returns: irb(main):002> Prism.parse("# encoding: utf\n")
=>
#<Prism::ParseResult:0x000075467d799cb0
@comments=[#<Prism::InlineComment @location=#<Prism::Location @start_offset=0 @length=15 start_line=1>>],
@data_loc=nil,
@errors=
[#<Prism::ParseError @message="unknown or invalid encoding in the magic comment" @location=#<Prism::Location @start_offset=12 @length=3 start_line=1> @level=:argument>],
@magic_comments=[#<Prism::MagicComment @key="encoding" @value="utf">],
@source=#<Prism::Source:0x000075467d8a2af8 @offsets=[0, 16], @source="# encoding: utf\n", @start_line=1>,
@value=
@ ProgramNode (location: (1,0)-(1,0))
├── locals: []
└── statements:
@ StatementsNode (location: (1,0)-(1,0))
└── body: (length: 0),
@warnings=[]> They appear to have the luxury of defining each encoding themselves and since they only support one Ruby version currently it's not that complicated https://github.com/ruby/prism/blob/a09fc1f913fed81f820c23534674f3d1c1695aa2/src/encoding.c#L5011-L5198 I don't think that doing this here is worth the effort. Before reporting I made this commit Earlopain@ce9f99a which just returns nil when an unrecognized encoding is encountered. This results in the default encoding of the ruby version to be used. Does that sound like an acceptable solution to you? |
No, it makes parser swallow the error (which IMO is the worst of "return something special"/"throw something"/"return nothing" options). Invalid encoding should result in an error of some kind, but I'm not sure that we can find a way that is reasonable and backward-compatible at the same time (even the former is not easy by itself). |
A subclass of This would make the error more actionable. You can currently catch the |
Agree, makes sense to me. It doesn't make this error "less special" but at least it'll be trivial to catch it. Feel free to send a PR. |
Will do, thank you 👍 |
Note, Prism performs error-tolerant parsing. It would be good to keep in mind that there can be differences due to encoding: Invalid Encoding$ ruby -rprism -e 'p Prism.parse_success?("# encoding: utf")'
false
$ ruby -rprism -e 'p Prism.parse("# encoding: utf")'
#<Prism::ParseResult:0x00007fe565ab7c08 @value=@ ProgramNode (location: (1,0)-(1,0))
├── locals: []
└── statements:
@ StatementsNode (location: (1,0)-(1,0))
└── body: (length: 0)
, @comments=[#<Prism::InlineComment @location=#<Prism::Location @start_offset=0 @length=15 start_line=1>>], @magic_comments=[#<Prism::MagicComment @key="encoding" @value="utf">], @data_loc=nil, @errors=[#<Prism::ParseError @message="unknown or invalid encoding in the magic comment" @location=#<Prism::Location @start_offset=12 @length=3 start_line=1> @level=:argument>], @warnings=[], @source=#<Prism::Source:0x00007fe565abc140 @source="# encoding: utf", @start_line=1, @offsets=[0]>> Valid Encoding$ ruby -rprism -e 'p Prism.parse_success?("# encoding: utf-8")'
true
$ ruby -rprism -e 'p Prism.parse("# encoding: utf-8")'
#<Prism::ParseResult:0x00007fd2ff857c88 @value=@ ProgramNode (location: (1,0)-(1,0))
├── locals: []
└── statements:
@ StatementsNode (location: (1,0)-(1,0))
└── body: (length: 0)
, @comments=[#<Prism::InlineComment @location=#<Prism::Location @start_offset=0 @length=17 start_line=1>>], @magic_comments=[#<Prism::MagicComment @key="encoding" @value="utf-8">], @data_loc=nil, @errors=[], @warnings=[], @source=#<Prism::Source:0x00007fd2ff85c080 @source="# encoding: utf-8", @start_line=1, @offsets=[0]>> |
@koic Of course I fully support the idea of returning as much data in case of an error as possible, but what prism does can be unsafe, especially this part from your output above: @value="utf-8" There are no guarantees that what comes after IMO the only safe context of the error are source maps which are somewhat obvious, and like I said before I think this error is very special, the whole parsing output should look like this:
which at the end makes no sense. This error is so unusual that it becomes a leaky abstraction that infects types and contracts, probably it can only be "fixed" by deprecating it. |
Ruby does raise an error when trying to execute the file:
It doesn't look like a
SyntaxError
, not quite sure what this actually is, where it's coming from. However I believe the parser gem here shouldn't raise anArgumentError
in this case. This makes tools like RuboCop raise as well when trying to analyze these invalid files.The text was updated successfully, but these errors were encountered: