Is it possible to switch tokenizers in a parser? #11

jessicah · 2021-09-10T03:59:00Z

I'm trying to write a parser when the syntax is generally whitespace delimited tokens; however, in certain contexts, the whitespace is significant; examples such as quoted strings, but I also have instances where a block is delimited by the tokens:

action ... {
  /* start of whitespace significant block
   * where we read everything until the closing brace
   */
}

Would I perhaps need two tokenizers? One that basically looks for these special conditions, and then another that feeds these in and generates a more specific token stream?

Am trying to rewrite a parser using yacc, and it has three "parsing modes" in what appears to be a character-by-character scanning function.

The text was updated successfully, but these errors were encountered:

lexected · 2022-04-10T14:55:04Z

Hi @jessicah,

Apologies for the very late response, but perhaps better late than never?

It is not immediately clear to me what you're trying to achieve.

If you wish to ignore whitespaces that are not inside quotes or comments (such as those of C), you can simply use a single machine, writing

an ignored root for general whitespaces
an ordinary root for quoted strings
a root or a submachine reference for C-style block comments that captures the body of the comment into a raw field

But I am not sure if this is what you want.

jessicah · 2022-04-11T07:07:26Z

Hmm, my grammar is https://github.com/jessicah/jam-redux/blob/master/jamgram.astir.

Basically, everything is whitespace delimited except for quoted strings, and "action" blocks, which are blocks of shell script. For example: https://github.com/haiku/buildtools/blob/master/jam/Jamfile#L260-L275, or as a more complete example of a file trying to be parsed: https://github.com/haiku/buildtools/blob/master/jam/Jambase.

lexected self-assigned this Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to switch tokenizers in a parser? #11

Is it possible to switch tokenizers in a parser? #11

jessicah commented Sep 10, 2021

lexected commented Apr 10, 2022

jessicah commented Apr 11, 2022

Is it possible to switch tokenizers in a parser? #11

Is it possible to switch tokenizers in a parser? #11

Comments

jessicah commented Sep 10, 2021

lexected commented Apr 10, 2022

jessicah commented Apr 11, 2022