Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix scanner hang on very large numbers #2134

Merged
merged 3 commits into from
Oct 18, 2017

Conversation

tsholmes
Copy link
Member

@tsholmes tsholmes commented Oct 17, 2017

Fixes #2080

The regex used for parsing doubles had an extremely high branching factor, resulting in an out-of-memory error when parsing longer strings. This replaces it with a lower branching regex of:

  • 0 or 1
    • 1 or more digit
    • 0 or more
      • _
      • 0 or more digits
  • .
  • 1 or more digit
  • 0 or more
    • _
    • 0 or more digits

This maintains the behavior of allowing weird numbers like 0_.0_

@Dunbaratu
Copy link
Member

Dunbaratu commented Oct 18, 2017

It never occurred to me to look at the regex for Double as the source of the problem with Integers. (Big doubles parsed just fine, only big integers didn't). I guess it couldn't decide whether it was integer or double until too far into the string.

I'm still baffled why it was running into the problem with big numbers in comments. I assumed the [skip] directive for the comment rule ends up circumventing all the other rules, putting it into a mode where it doesn't even bother checking all the other rules until the comment is over. That's usually how comment skip rules work in most parser generators, although I admit I'm not too familiar with the inner workings of TinyPG.

Perhaps the parser doesn't realize that the regex //[^\n]*\n? cannot possibly fail to find a match once you get the //. The way that regex is set up, it's not possible to fail the match - literally everything till the end of line, no matter what it is, must match. But the system might still keep branching the tree to check the other rules just in case the comment rule fails (which it cannot, but maybe it doesn't know that).

Edit: Yeah that must be it. In other tools when you want to declare a skip section (like a comment) you do it by specifying a start pattern that begins a new special state in which the only active rule is the one that looks for the end pattern to get back out of that state. Once you hit the start pattern, all the other rules are deactivated until you hit the end pattern and get out of the skipping state. In the TinyPG file, that doesn't seem to be how you do it.

@Dunbaratu
Copy link
Member

Oh, by the way, I didn't say it, but thanks for this. I don't think I'd have ever found it. I should have a chance to give it a try and merge it tomorrow.

@tsholmes
Copy link
Member Author

Happy to help! Also @Dunbaratu check out #2135 for why it happened even in comments and some huge speed improvements

@Dunbaratu
Copy link
Member

In testing this, I've discovered that the code we had in the compiler to switch to using a Double when the value was too big for an Integer never really got implemented. (Probably because it couldn't parse the big integer in the first place). I'm adding that clause in as part of my merge.

@Dunbaratu Dunbaratu merged commit 134af67 into KSP-KOS:develop Oct 18, 2017
@Dunbaratu Dunbaratu added this to the v1.1.4.0 milestone Dec 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants