tokenizer doesn't handle numeric literals if they start with 0 #331

Y-Nak · 2021-03-23T13:47:01Z

What is wrong?

Tokenizer doesn't tokenize correctly if numeric literals start with 0 followed by non zero value.
For example,

is tokenized into

    [
        Token {
            typ: NUMBER,
            string: "0",
            span: Span {
                start: 0,
                end: 1,
            },
            line: "02\n",
        },
        Token {
            typ: NUMBER,
            string: "2",
            span: Span {
                start: 1,
                end: 2,
            },
            line: "02\n",
        },
        Token {
            typ: NEWLINE,
            string: "\n",
            span: Span {
                start: 2,
                end: 3,
            },
            line: "02\n",
        },
        Token {
            typ: ENDMARKER,
            string: "",
            span: Span {
                start: 3,
                end: 3,
            },
            line: "",
        },
    ],

Also, it would be better to discuss spec of number literals(sorry if it has been discussed already).
i.e. currently, tokenizer allows hex/bin/octal/decimal numbers while analyzer allows only decimal numbers. This seems inconsistent between them.

EDIT: opened #333 for further discussion of numeric literal representations.

How can it be fixed

It would be fixed by tweaking the code below

fe/parser/src/tokenizer/regex.rs

Lines 18 to 21 in 3a7fa34

    
           pub const HEXNUMBER: &str = r"0[xX](?:_?[0-9a-fA-F])+"; 
        
           pub const BINNUMBER: &str = r"0[bB](?:_?[01])+"; 
        
           pub const OCTNUMBER: &str = r"0[oO](?:_?[0-7])+"; 
        
           pub const DECNUMBER: &str = r"(?:0(?:_?0)*|[1-9](?:_?[0-9])*)";

.

The text was updated successfully, but these errors were encountered:

This was referenced Mar 23, 2021

Properly reject octal number literals #330

Merged

Properly tokenize numeric literals when they start with 0 #334

Merged

cburgdorf closed this as completed in #334 Mar 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokenizer doesn't handle numeric literals if they start with 0 #331

tokenizer doesn't handle numeric literals if they start with 0 #331

Y-Nak commented Mar 23, 2021 •

edited

Loading

tokenizer doesn't handle numeric literals if they start with 0 #331

tokenizer doesn't handle numeric literals if they start with 0 #331

Comments

Y-Nak commented Mar 23, 2021 • edited Loading

What is wrong?

How can it be fixed

Y-Nak commented Mar 23, 2021 •

edited

Loading