Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filterx regexp_subst match group changes #409

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

bshifter
Copy link
Member

@bshifter bshifter commented Dec 5, 2024

continuation of #405

  • substitution match groups feature is enabled by default
  • the match group control flag argument groups inverted and renamed to nogroups
  • match group indexes are now supported up to 999 instead of 9 (multi digits)
  • checking replacement pattern at construct time to disable match group feature if it's unnecessary

@bshifter bshifter force-pushed the fx-rgx-subst-matchgrps branch from 3826e2f to bb55648 Compare December 9, 2024 09:43
@bshifter bshifter mentioned this pull request Dec 9, 2024
@alltilla
Copy link
Member

Really cool, thanks for this!

the match group control flag argument groups inverted and renamed to nogroups

I might have miscommunicated this, sorry if so.
Can we keep it groups, but make it default to true? The fact that we set it to false by default if the replacement does not have a group, is not relevant for the user, it is just a performance optimization, and does not have any functional impact.
I think introducing negation to the option names usually causes confusion.

…to allow up to 3 digits.

The parser now supports up to 3 digits for match group references.
If the reference exceeds this limit, the parser exits and treats it as a
3-digit reference. This change prevents excessive greediness in processing
match group references and improves performance by limiting the reference size.

Additionally, match groups can now include leading zeros, allowing users to
reference match groups with fewer than 3 digits before alphanumeric characters.
For example, \100000000 will be parsed as match group 100, but by using a prefix
like \00100000000, the parser will correctly identify it as match group 1.

Signed-off-by: shifter <shifter@axoflow.com>
Signed-off-by: shifter <shifter@axoflow.com>
@bshifter bshifter force-pushed the fx-rgx-subst-matchgrps branch 5 times, most recently from 2ed52df to 1a0ad06 Compare December 12, 2024 00:52
…present

The function now automatically checks the replacement pattern at config
parsing time. If no match group references are found in the pattern, it
will behave as if the 'groups' option is disabled, disabling match group
functionality for performance reasons.

By default, match groups are enabled in regexp_subst. However, this behavior
can now be explicitly disabled by setting the optional named argument
groups=false, as shown:
`regexp_subst(target, match_pattern, replace_pattern, groups=false);`

Signed-off-by: shifter <shifter@axoflow.com>
Signed-off-by: shifter <shifter@axoflow.com>
@bshifter bshifter force-pushed the fx-rgx-subst-matchgrps branch from 1a0ad06 to 231c51a Compare December 12, 2024 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants