Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(parser): add CustomPayloadResolver to parser services #1067

Conversation

Yokozuna59
Copy link
Contributor

@Yokozuna59
Copy link
Contributor Author

There is still a case hasn't been handled, the ability to preform consume on an empty capture group.

For example, the following regex:

const accessibilityTitleRegex = /accTitle[\t ]*:[\t ]*([^\n]*)/y;

Assuming accTitle: was passed, the payload would be undefined, then the if statement here would set the text to the image instead of consuming it.

https://github.com/langium/langium/blob/2d56cbb4c998a097f568d9ea36eda2baafe78b63/packages/langium/src/parser/langium-parser.ts#L195

Copy link
Member

@msujew msujew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I didn't resolve that issue where the payload might be an object rather than just a string since I'm not sure what the best approach is..

I think we should deal with this in a fashion that is more in line with the rest of the framework. I would propose to introduce an optional CustomTokenPatternResolver service, something like this:

interface TokenPayloadResolver {
  resolvePayload(payload: any): string
}

The service should be optional and can be used downstream to implement custom behavior for any kind of payload.

@msujew
Copy link
Member

msujew commented May 30, 2023

Assuming accTitle: was passed, the payload would be undefined, then the if statement here would set the text to the image instead of consuming it.

We can use the 'payload' in token check for that. This will always return whether the property was set, even if is undefined.

@Yokozuna59
Copy link
Contributor Author

Yokozuna59 commented May 30, 2023

Sounds great! I'm not really aware of the Langium structure, so I'm not sure I'd be able to help much with that. If you could point me out, or you could take the lead if you have time.

@msujew
Copy link
Member

msujew commented May 31, 2023

@Yokozuna59 I've pushed some changes to your branch.

@spoenemann Care to take a look at the changes?

@msujew msujew requested a review from spoenemann May 31, 2023 09:12
@msujew msujew added the parser Parser related issue label May 31, 2023
@Yokozuna59
Copy link
Contributor Author

@Yokozuna59 I've pushed some changes to your branch.

Awesome! But you haven't created a test case for object payload (although it's working fine; I've tested it). Should I add it, or is it an unnecessary test case?

@Yokozuna59
Copy link
Contributor Author

Yokozuna59 commented May 31, 2023

I have added LINE_BREAKS to prevent chevrotain from logging that the option is missing:

Warning: Warning: A Custom Token Pattern should specify the <line_breaks> option.
        The problem is in the <Payload> Token Type
        For details See: https://chevrotain.io/docs/guide/resolving_lexer_errors.html#CUSTOM_LINE_BREAK

@msujew
Copy link
Member

msujew commented May 31, 2023

But you haven't created a test case for object payload (although it's working fine; I've tested it). Should I add it, or is it an unnecessary test case?

Given that the payload resolver is implemented on user site (we only implement the infrastructure), I don't see the need to test it. The test only shows that the Parser -> PayloadResolver -> string control flow works as expected.

I have added LINE_BREAKS to prevent chevrotain from logging that the option is missing

Thanks!

@Yokozuna59
Copy link
Contributor Author

Yokozuna59 commented May 31, 2023

One last thing: shouldn't the name of the service be PayloadResolver instead of CustomPayloadResolver? By doing so, it'll be more in line with the other services since all of them should provide custom services, not the name itself.

@Yokozuna59 Yokozuna59 changed the title fix(parser): use payload instead of image if exists in consume fix(parser): add CustomPayloadResolver to parser services May 31, 2023
@Yokozuna59
Copy link
Contributor Author

Yokozuna59 commented May 31, 2023

@msujew The 'payload' in token isn't working as expected. It returns false even if payload was set to undefined.

It's working fine within the Object.assign statement, but after returning the value it's appear that it has been trimmed out.

@Yokozuna59
Copy link
Contributor Author

Yokozuna59 added a commit to Yokozuna59/mermaid-lint that referenced this pull request Jun 1, 2023
see eclipse-langium/langium#1067

Signed-off-by: Yokozuna59 <u.yokozuna@gmail.com>
Yokozuna59 added a commit to Yokozuna59/mermaid-lint that referenced this pull request Jun 1, 2023
see eclipse-langium/langium#1067

Signed-off-by: Yokozuna59 <u.yokozuna@gmail.com>
@msujew
Copy link
Member

msujew commented Jun 1, 2023

@Yokozuna59 I'm actually thinking about this again, and have a few reservations against continuing with this change: We already have a ValueConverter interface for the exact same use case. I think this issue can be easily solved with the value converter instead. What do you think?

@Yokozuna59
Copy link
Contributor Author

Yokozuna59 commented Jun 1, 2023

@Yokozuna59 I'm actually thinking about this again, and have a few reservations against continuing with this change: We already have a ValueConverter interface for the exact same use case. I think this issue can be easily solved with the value converter instead. What do you think?

I saw it about an 8 hours ago and thought the same thing. But I'm not sure what the best approach is yet; is there a best practice thing or something? I would go with it since the undefined payload isn't supported from the chevrotain side. But I'm not sure if you should ignore the payload feature since it was built on top of chevrotain.

@msujew
Copy link
Member

msujew commented Jun 2, 2023

While it's not completely unreasonable to support token.payload, having two separate features that kind of do the same thing is undesirable from a framework perspective, especially since the ValueConverter is way easier to use. In your case, you just want to extract the ID part from title ID, so it makes perfect sense to use the ValueConverter.

The only use case that I can imagine to support the payload resolver is that the payload contains information that doesn't already exist within the token image string. Though I wonder whether this is a reasonable request or just a hack around some very, very weird language. In that case, I would recommend against using Langium to parse it at all.

@Yokozuna59
Copy link
Contributor Author

Oh yes, chevrotain can return the matched tokens as the third argument. Anyway, since I don't need other matched tokens while parsing the current, I think I'll stick with ValueConvert as you suggested.

Thanks! I'll close the PR.

@Yokozuna59 Yokozuna59 closed this Jun 2, 2023
@Yokozuna59 Yokozuna59 deleted the use-payload-over-image-if-exists branch June 2, 2023 10:10
@Yokozuna59
Copy link
Contributor Author

I tried using ValueConverter, but it's only support regular typescript types:

https://github.com/langium/langium/blob/main/packages/langium/src/parser/value-converter.ts#L23

I can't return objects or undefined.

@msujew
Copy link
Member

msujew commented Jun 2, 2023

@Yokozuna59 I'm a bit confused what the use case for both of them is. When running the Langium CLI, the generated type for a property that is parsed from a terminal can only ever be string | number | boolean | bigint | Date, so why would want to return an object? You would use a parser rule to create objects, terminals - by convention - can only return primitives. As for undefined, what is your use case for that?

@msujew
Copy link
Member

msujew commented Jun 2, 2023

Note that if you really want to return undefined, you can just do undefined!.

@Yokozuna59
Copy link
Contributor Author

@Yokozuna59 I'm a bit confused what the use case for both of them is. When running the Langium CLI, the generated type for a property that is parsed from a terminal can only ever be string | number | boolean | bigint | Date, so why would want to return an object? You would use a parser rule to create objects, terminals - by convention - can only return primitives.

Because it can be returned using token.payload that's why I thought of using it. Anyway I don't have a usage for it now so don't bother about it.

As for undefined, what is your use case for that?

The empty capture. If the regex has captured an empty matched group, I want to set the value as undefined.

@msujew
Copy link
Member

msujew commented Jun 2, 2023

Because it can be returned using token.payload that's why I thought of using it. Anyway I don't have a usage for it now so don't bother about it.

I see. Reasonable thought, but that will lead to a mismatch in type of the generated AST property and the actual value within. The value converter ensures that these types stay in sync.

The empty capture. If the regex has captured an empty matched group, I want to set the value as undefined.

I see why you would do that, though I would recommend to return something like an empty string instead. There is a semantic difference between something that was parsed but it's empty, and something that wasn't parsed at all. Returning undefined in both cases might lead to confusion down the line. But if that's semantically equivalent, then returning undefined is reasonable. We should probably add that as a valid return type for the ValueConverter.

@Yokozuna59
Copy link
Contributor Author

I see. Reasonable thought, but that will lead to a mismatch in type of the generated AST property and the actual value within. The value converter ensures that these types stay in sync.

Then it's make sense to just ignore it.

I see why you would do that, though I would recommend to return something like an empty string instead. There is a semantic difference between something that was parsed but it's empty, and something that wasn't parsed at all. Returning undefined in both cases might lead to confusion down the line. But if that's semantically equivalent, then returning undefined is reasonable.

I could set the returned value to an empty string, and use || operator validate it has value somewhere else, but what about other types for other cases? for example, 0? I thing it would be better to return a undefined because it's effects other parts.

We should probably add that as a valid return type for the ValueConverter.

It would be great!

Thanks!

@Yokozuna59
Copy link
Contributor Author

I came across an weird case in ValueConverter, for example, the following test case:

https://github.com/Yokozuna59/mermaid-lint/blob/47f35cdf2a7c71abf4b13b39d5842a6663d61814/packages/mermaid-parser/tests/common/title.test.ts#L143-L154

In the matcher function, it matches both first and second title, then goes to ValueConverter with the first match ignoring the second, Which results in making chevrotain throwing an exception indicating that not all code was parsed.

@msujew
Copy link
Member

msujew commented Jun 2, 2023

@Yokozuna59 RegExp used in these matcher functions should be sticky (y flag) and only return a single match. Chevrotain (it's not Langium this time) isn't able to deal with multiple matches.

@Yokozuna59
Copy link
Contributor Author

Yokozuna59 commented Jun 2, 2023

@Yokozuna59 RegExp used in these matcher functions should be sticky (y flag) and only return a single match. Chevrotain (it's not Langium this time) isn't able to deal with multiple matches.

I tried it locally before saying that. If there is no y flag it matches the first title twice, but still the offset changes.

Whereas when there is y, it matches both and second one also, but still uses the first.

@Yokozuna59
Copy link
Contributor Author

the title regex:

const titleRegex = /title(?:[\t ]+([^\n\r]*))?/y;

The entered input:

pie title test
title sample

Enter matcher functions:

  • First match:
    image

  • Second match:
    image

Enter ValueConverter

I added console.log in ValueConverter to see how many times have been entered, only once:

image

Then stray to building AST. It's occurs when there is more than one value need to be converted.

@Yokozuna59
Copy link
Contributor Author

The only use case that I can imagine to support the payload resolver is that the payload contains information that doesn't already exist within the token image string. Though I wonder whether this is a reasonable request or just a hack around some very, very weird language. In that case, I would recommend against using Langium to parse it at all.

I think I found a use case (although not sure if it's the best approach), as discussed here mermaid-js/mermaid#4450 (comment), if the user want parse a rule and there is a another rule have similar pattern.

So from that example: TIMELINE_PERIOD_EVENT only occurs after TIMELINE_PERIOD_TITLE, so when a TIMELINE_PERIOD_TITLE is matched, I want to make sure that the previous rule isn't TIMELINE_PERIOD_TITLE.

flowchart LR
    title --> event
    title --> title2[another title]
    title2[another title] -->|tokenize as event| event
Loading

@msujew
Copy link
Member

msujew commented Jul 21, 2023

@Yokozuna59 See my answer here. I believe there's in error in your regex definition.

@Yokozuna59
Copy link
Contributor Author

@Yokozuna59 See my answer here. I believe there's in error in your regex definition.

@msujew replied here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Parser related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants