Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing root feature #304

Closed
micrenda opened this issue Aug 23, 2024 · 5 comments
Closed

Changing root feature #304

micrenda opened this issue Aug 23, 2024 · 5 comments

Comments

@micrenda
Copy link

I would like to ask if it is possible to pass a specific target rule instead of using the main priority chain when parsing a string.

Let me clarify with an example:

Suppose I have the following rule set:

species    <- molecule ( '(' excitatopm ')' )?
molecule <- # Description of a molecule
excitation <- excitation_ele / excitation_vib / excitation_rot
excitation_ele <- # something
excitation_vib <- # something
excitation_rot <- # something

Usually, in my code, I would do something like this:

pegParser = peg::parser();
pegParser.load_grammar(s);
std::any result;
pegParser.parse("H2O(2V1)", result);

This works fine. However, in my unit tests or in other parts of the code, I might want to parse according to a specific rule. In that case, I would like to do something like this:

pegParser = peg::parser();
pegParser.load_grammar(s);
std::any result;
pegParser.parse("2V1", "excitation_vib", result);

This way, I would use excitation_vib as the root rule and expect an exception if excitation_vib does not fully consume the input.

Is this possible? With the current implementation, to achieve something like this, I would need to change the grammar by making the target rule the new root. However, I was wondering if there is a better way to do it.

micrenda pushed a commit to micrenda/cpp-peglib that referenced this issue Aug 29, 2024
@micrenda
Copy link
Author

micrenda commented Aug 29, 2024

Added PR #305 which implement this feature: it may need some rework.

@yhirose
Copy link
Owner

yhirose commented Aug 30, 2024

@micrenda thanks for the feedback, but I don't understand the example grammar... The grammar isn't valid. ('excitatopm' is not defined, and 'excitation' is not referenced.) So pegParser.load_grammar(s); doesn't work due to the incorrect grammar. cpp-peglib doesn't allow such incorrect grammar...

@micrenda
Copy link
Author

micrenda commented Aug 30, 2024

Hello

In the example I wrote I just omitted the actual implementation, because it was not important (and I also made a typos!). Let me give you a valid grammar:

species    <- molecule ( ' ' '(' excitation ')' )?
molecule <- ([A-Z] [a-z]? [0-9]?)+
excitation <- excitation_ele / excitation_vib / excitation_rot
excitation_ele <- 'A' / 'B' / 'C'
excitation_vib <- [0-9]* 'V' [0-9]+
excitation_rot <- 'J' [0-9]+

In my code, now I can do something like this:

pegParser = peg::parser();
pegParser.load_grammar(s);
std::any result;
pegParser.parse("H2O (2V1)", result);

And it will work perfectly.

However, using the PR #305, it is now possible to also do this in unit testing or in other section of code:

pegParser = peg::parser();
pegParser.load_grammar(s);
std::any result;
pegParser.parse("2V1",  result, nullptr, "excitation_vib");

For me this is a life saver :-)

@yhirose
Copy link
Owner

yhirose commented Aug 31, 2024

Thanks for the clear explanation. I now fully understand what you would like to do. (By the way, I put comments in your pull request to fix problems that I found, and the following sample uses the revised version.)

Unfortunately, there are some situations where the parser doesn't work properly with this. %whitespace feature is one of them.

// sample.cc
#include <iostream>
#include <peglib.h>

using namespace peg;

int main(void) {
  parser parser(R"(
Start       <- A
A           <- B (',' B)*
B           <- '[one]' / '[two]'
%whitespace <- [ \t\n]*
  )");

  std::cout << std::boolalpha;

  std::cout << parser.parse("[one],[two]") << std::endl;
  std::cout << parser.parse(" [one] , [two] ") << std::endl;

  std::cout << parser.parse("[one],[two]", nullptr, "A") << std::endl;
  std::cout << parser.parse(" [one] , [two] ", nullptr, "A") << std::endl;
}
> ./sample
true
true
true
false

As you can see, %whitespace only works with Start. It's because cpp-peglib applies some special treatments only to the start rule. You can see what are added to the start rule in perform_core function.

std::shared_ptr<Grammar> perform_core(const char *s, size_t n,

yhirose added a commit that referenced this issue Sep 1, 2024
@yhirose
Copy link
Owner

yhirose commented Sep 1, 2024

@micrenda I made a change to allow users to specify the start definition rule name in the parser constructor and load_grammar method at #306. (Unfortunately, we cannot do the same in parse method because of the reason I explained in the above comment. But hope this pull request can satisfy your needs.)

auto grammar = R"(
  Start       <- A
  A           <- B (',' B)*
  B           <- '[one]' / '[two]'
  %whitespace <- [ \t\n]*
)";

peg::parser parser(grammar, "A"); // Start Rule is "A"

  or

peg::parser parser;
parser.load_grammar(grammar, "A"); // Start Rule is "A"

parser.parse(" [one] , [two] "); // OK

Could you take a look at it when you have time? Thanks!

yhirose added a commit that referenced this issue Sep 2, 2024
yhirose added a commit that referenced this issue Sep 2, 2024
@yhirose yhirose closed this as completed in 79eb37c Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants