Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat) add semver.org example #300

Merged
merged 1 commit into from
Jun 20, 2022
Merged

Conversation

dselman
Copy link
Contributor

@dselman dselman commented Jun 18, 2022

Signed-off-by: Dan Selman danscode@selman.org

Adds an example grammar that describes a SemVer.org v2 semantic version.

Signed-off-by: Dan Selman <danscode@selman.org>
Copy link
Contributor

@hildjj hildjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good now.

Minor nits. If you like, I can merge it as-is.

@@ -6,6 +6,9 @@ This file documents all notable changes to Peggy.
Unreleased
----------

- [#299](https://github.com/peggyjs/peggy/issues/299) Add example grammar for a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this up after merge. Apparently I didn't leave a template in the changelog after 2.0.1, because that was done i n such a rush.

= major:$numericIdentifier '.' minor:$numericIdentifier '.' patch:$numericIdentifier
{
return {
major: parseInt(major, 10),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor parseInt into numericIdentifier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving parseInt to numericIdentifier may be not so good, because that rule is also used in contexts where you do not want a number, but only a match result. I'm fine to leave parsing here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then an intermediate rule that does the parsing, I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless, the $ should go in numericIdentifier.

= [0-9]

positiveDigit
= [1-9]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline at end, please

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please create an issue to set up husky and some kind of linter, so that nobody has to enforce code style by hand?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a linter for .peggy files yet, as far as I know. There should be one.

We lint all of the JS on each GHA run.

= digit* nonDigit identifierChar*

numericIdentifier
= '0' / (positiveDigit digit*)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this?

numericIdentifier
  = num:$(positiveDigit digit*) { return parseInt(num) }
  / '0' ![0-9] { return 0 }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I don't understand how this works right now. positiveDigit digit* should return ['1', ['2', '3']], and parseInt(['1', ['2', '3']], 10) = 1.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest a bit different code, though:

numericIdentifier = num:$('0' / [1-9][0-9]*) { return parseInt(num); }

IMO you don't have to write a lot of code to make it more readable. "Positive digit" doesn't add any more information to [1-9].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to keep it closer to https://semver.org/spec/v2.0.0.html I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I don't understand how this works right now.

The magic in the $ operator that returns the text from the input instead of results of subexpressions.

"Positive digit" doesn't add any more information to [1-9].

Agreed, especially since this rule is used only once

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed $ up above.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I didn't find a place in the spec that would limit numbers by any limit, so I suspect parseInt doesn't formally comply to the spec :/

preRelease
= head:$preReleaseIdentifier tail:('.' @$preReleaseIdentifier)*
{
return [head, ...tail];
Copy link

@reverofevil reverofevil Jun 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this. I automatically kept using return tail.unshift(head), tail;.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the docs now, look for "Parsing Lists".

= alphanumericIdentifier
/ digit+

alphanumericIdentifier
Copy link

@reverofevil reverofevil Jun 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to spec, it should be identifierChar* nonDigit identifierChar*.

If you substitute everything into alphanumericIdentifier, it turns out the only reason why it exists is to avoid parse conflict in preReleaseIdentifier. If there is no nonDigit, it might match same things as numericIdentifier.

In PEG parsers there is no parse conflicts, it can be greatly simplified.

Copy link

@reverofevil reverofevil Jun 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is all you need to comply to the spec:

semver = major:numericId "." minor:numericId "." patch:numericId prerelease:("-" @prerelease)? build:("+" @build)? {
	return {major, minor, patch, prerelease, build};
}
prerelease = head:prereleaseId tail:("." @prereleaseId)* { return [head, ...tail]; }
build = head:alnumId tail:("." @alnumId)* { return [head, ...tail]; }
prereleaseId = alnumId / $("0" / [1-9][0-9]*)
alnumId = $[0-9a-z-]i+

Copy link

@reverofevil reverofevil Jun 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually buildId is just alnumId. It will never hit the second branch. The only reason why grammar in spec distinguishes them, is to stress that numeric identifiers are sorted in numeric order. As this is a parser, and not comparator, it shouldn't really care.

Copy link
Contributor

@MarcelBolten MarcelBolten Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is all you need to comply to the spec:

semver = major:numericId "." minor:numericId "." patch:numericId prerelease:("-" @prerelease)? build:("+" @build)? {
	return {major, minor, patch, prerelease, build};
}
prerelease = head:prereleaseId tail:("." @prereleaseId)* { return [head, ...tail]; }
build = head:alnumId tail:("." @alnumId)* { return [head, ...tail]; }
prereleaseId = alnumId / $("0" / [1-9][0-9]*)
alnumId = $[0-9a-z-]i+

rule numericId is missing, maybe like so:

semver = major:numericId "." minor:numericId "." patch:numericId prerelease:("-" @prerelease)? build:("+" @build)? {
	return {major, minor, patch, prerelease, build};
}
prerelease = head:prereleaseId tail:("." @prereleaseId)* { return [head, ...tail]; }
build = head:alnumId tail:("." @alnumId)* { return [head, ...tail]; }
prereleaseId = alnumId / numericId 
alnumId = $[0-9a-z-]i+
numericId = $("0" / [1-9][0-9]*)

However, while this will successfully parse all valid examples it will also successfully parse invalid examples (1.2.3-0123, 1.2.3-0123.0123) but "numeric identifiers MUST NOT include leading zeroes" in the pre-release part.

pre:('-' @preRelease)?
build:('+' @build)?
{
return { versionCore, pre, build };

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think that hiding most important information into a deeper object with non-descript name is a good idea. At least {...versionCore, pre, build}, please.

@hildjj
Copy link
Contributor

hildjj commented Jun 19, 2022

@dselman sorry this is getting so many nitpicks from all of us here. I think it's just because this seems so useful to have as an example!

@reverofevil
Copy link

@hildjj Say for yourself. I personally just like nitpicking (and very sorry for that).

@dselman
Copy link
Contributor Author

dselman commented Jun 20, 2022

@dselman sorry this is getting so many nitpicks from all of us here. I think it's just because this seems so useful to have as an example!

No problem, at all. I'm enjoying the discussion and learning! I suggest this is merged (functionally correct) and then another PR can be created to refactor it to be more concise and/or elegant. That takes me off the critical path. How does that sound?

@hildjj
Copy link
Contributor

hildjj commented Jun 20, 2022

another PR can be created to refactor it to be more concise and/or elegant

We can do that. I think the most efficient way would be for me to create a new branch here on the main repo, edit this PR to target that branch, we iterate there until we're done, then another PR to pull that branch into main. I'm starting that process now, with the branch name semver-grammar.

@hildjj hildjj changed the base branch from main to semver-grammar June 20, 2022 17:02
@hildjj hildjj merged commit 4b98f92 into peggyjs:semver-grammar Jun 20, 2022
@hildjj
Copy link
Contributor

hildjj commented Jun 20, 2022

I'm going to go through all of the comments here, expect a PR with my changes, which I'll link here.

hildjj added a commit to hildjj/peggy that referenced this pull request Jun 20, 2022
@dselman dselman deleted the example-semver branch June 22, 2022 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants