Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to switch tokenizers in a parser? #11

Open
jessicah opened this issue Sep 10, 2021 · 2 comments
Open

Is it possible to switch tokenizers in a parser? #11

jessicah opened this issue Sep 10, 2021 · 2 comments
Assignees

Comments

@jessicah
Copy link

I'm trying to write a parser when the syntax is generally whitespace delimited tokens; however, in certain contexts, the whitespace is significant; examples such as quoted strings, but I also have instances where a block is delimited by the tokens:

action ... {
  /* start of whitespace significant block
   * where we read everything until the closing brace
   */
}

Would I perhaps need two tokenizers? One that basically looks for these special conditions, and then another that feeds these in and generates a more specific token stream?

Am trying to rewrite a parser using yacc, and it has three "parsing modes" in what appears to be a character-by-character scanning function.

@lexected lexected self-assigned this Apr 10, 2022
@lexected
Copy link
Owner

Hi @jessicah,

Apologies for the very late response, but perhaps better late than never?

It is not immediately clear to me what you're trying to achieve.

If you wish to ignore whitespaces that are not inside quotes or comments (such as those of C), you can simply use a single machine, writing

  • an ignored root for general whitespaces
  • an ordinary root for quoted strings
  • a root or a submachine reference for C-style block comments that captures the body of the comment into a raw field

But I am not sure if this is what you want.

@jessicah
Copy link
Author

Hmm, my grammar is https://github.com/jessicah/jam-redux/blob/master/jamgram.astir.

Basically, everything is whitespace delimited except for quoted strings, and "action" blocks, which are blocks of shell script. For example: https://github.com/haiku/buildtools/blob/master/jam/Jamfile#L260-L275, or as a more complete example of a file trying to be parsed: https://github.com/haiku/buildtools/blob/master/jam/Jambase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants