-
-
Notifications
You must be signed in to change notification settings - Fork 1
Thoughts on building on top of the regex
library?
#1
Comments
Hey, thanks for bringing this up, I am totally up to collaborate! I also found
I tried to replace For |
Cool! I'm also totally up to collaborate if/where it makes sense! I'll say up front though that it's possible
You used it right! You're getting a bunch of "Invalid escape" and "Invalid character in character class" errors because Extending from the setup in your linked diff, you can change const re = regex({
flags: 'dg',
subclass: true,
unicodeSetsPlugin: null,
disable: {
x: true,
n: true,
v: true,
},
})({ raw: [p] }) This adds regex({
flags: 'dg',
subclass: true,
plugins: [str => str.replace(/(\\\\)|\\(['"`#])/g, '$1$2')],
unicodeSetsPlugin: null,
disable: {
x: true,
n: true,
v: true,
},
})({ raw: [p] }) The plugin used above un-escapes the characters Edit: I've built on the above code in a comment on the relevant Shiki PR. |
Aside: Note that I would therefore advise against this. IMO it's much better to just not support such regexes than to make them dangerous. But using |
Yeah, I agree. Would |
That's a great idea, and yes. Supporting an option to return a string instead of a regex is totally doable. (Tentative option name I'm traveling at the moment but will try to add such a feature in the next few days. There are a few things I need to review as part of adding this to ensure it works well. Question: Do TextMate grammars ever do search and replace with backreferences in a user-provided replacement string? If not (and regexes are only used to search), this would be much easier. If yes, |
Do you aim to have a single exported API for It's it's a bit awkward to use
I am not very sure if they would have backreferences or not. But for Shiki it only needs Thanks a lot for all the information and the willingness to collaborate, even though my usage can be a bit tricky. Enjoy your travel! We can talk more about it after that. |
Good feedback. I've just published a new version of I'd recommend something like this: import { rewrite } from 'regex'
str = rewrite(str, {
flags: 'dg',
unicodeSetsPlugin: null,
disable: {
n: true,
v: true,
x: true,
},
}).expression This leaves the pattern alone apart from transforming atomic groups, possessive quantifiers, subroutines, and subroutine definition groups. Differences from Oniguruma (all are edge cases):
If you wanted, you could disable subroutine and subroutine definition group processing via option |
Clear and concise description of the problem
Great project! Idea: Building on
regex
(which has some overlap; it's a library for extending native JS syntax and emitting native JS regexes) would give you free support for several complex features that are hard to reimplement robustly. Additionally,regex
has an existing and easy to use plugin system that would allow extending it with additional Oniguruma features.Suggested solution
Extended Oniguruma syntax and flags supported by
regex
:x
. On by default, but can be disabled via an option. The handling is more robust/correct than oniguruma-to-js's current handling since whitespace and comments still separate tokens (e.g. with\0 1
), etc. See the collapsed details under "Show more details" here, but note that this is modeled on PCRE/Perl's flagxx
and doesn't work the same as Oniguruma within character classes.\g<name>
. This is the same syntax as Oniguruma, but note that it's modelled on PCRE/Perl subroutines which work a little differently (better) than Oniguruma for edge cases related to things like backreferences to groups defined within the subroutine. But for probably 95% of use, it works the same.Extended flags and syntax supported by
regex
but not by Oniguruma:n
("named capture only" mode). On by default, but can be disabled via an option. This makes(...)
noncapturing, and additionally makes numbered backreferences to named groups an error (which Oniguruma does even without flagn
).Existing oniguruma-to-js features not supported by
regex
:v
and emitting them as nested classes, they would correctly error when they're on a range boundary, and correctly be treated as a set if on a set subtraction/intersection boundary.\h
. Again trivial to add via a plugin, with support both inside and outside character classes, and correctly handling them as an error when on a character class range or set operation boundary.regex
also already supports this via interpolation ofRegExp
instances that have different flags than the outer regex.If you think building on
regex
could be a good fit, it would also be possible to extend theregex
library with new options that would make its features work more similarly to Oniguruma/Onigmo.Alternative
No response
Additional context
No response
Validations
The text was updated successfully, but these errors were encountered: