Basic regex functionality via the pcre2 library.
Requires installed pcre2 library and CBQN compiled with FFI support.
make
by default produces shared object files for character widths 8, 16 and 32.
Use make UTF8
to only make for UTF8, or similarly for UTF16
and UTF32
.
To import in UTF mode use pcre2 ← {utf⇐8} •Import "pcre2.bqn"
, and similarly utf⇐16
, utf⇐32
.
𝕩
- string containing a valid pcre2 regular expression.
𝕨
(optional) - options namespace {option⇐value, ...}
.
Returns a namespace containing the compiled expression and the following functions:
Test
: Returns1
if𝕩
matches,0
otherwise.Match
: Returns a list containing the first match in text𝕩
and any capture groups.IMatch
: Returns a list containing pairs of indices of the first match and capture groups.MatchAll
: Global match. Returns a list of all matches in𝕩
._Replace
: Replace first match in𝕩
according to replacement pattern𝕗
._ReplaceAll
: Global replace. Replace all matches in𝕩
according to pattern𝕗
.Free
: Free the compiled expression.
𝕗
- string containing a valid pcre2 regular expression.
𝕩
- text to match against.
𝕨
(optional) - options namespace {option⇐value, ...}
.
Compiles expression 𝕗
and calls MatchAll
on 𝕩
, calls Free
when finished.
𝕗
- string containing a valid pcre2 regular expression.
𝕘
- replacement pattern.
𝕩
- text to match against.
𝕨
(optional) - options namespace {option⇐value, ...}
.
Compiles expression 𝕗
and calls _ReplaceAll
with pattern 𝕘
on 𝕩
, calls Free
when finished.
Options are given as a namespace {option⇐value, ...}
and can be passed as 𝕨
to •Import
to set default, or passed to Compile
, _MatchAll
, and _ReplaceAll_
.
utf
- Must be set when calling•Import
. Sets encoding width. Possible values:8
,16
,32
.jit
- Enable jit compiling of regular expressions. Default:jit⇐1
.ucp
- Use unicode properties to determine character types for \w, \d, as well as character cases. Default:ucp⇐0
.multiline
- Multiline matching mode. Default:multiline⇐1
.greedy
- If0
inverts greedy modifiers. Default:greedy⇐1
.anchored
- Force pattern anchoring. Default:anchored⇐0
.caseless
- Ignore cases when matching. Default:caseless⇐0
.extended
- Ignore whitespace and comments in regular expressions. Default:extended⇐0
.substitute_extended
- Extended replacement processing mode. Default:substitute_extended⇐0
.
Other options:
bufsize
- Initial size of output buffer when doing replacement. Automatically resizes if too small. Default:bufsize⇐0
.overlap
- MatchAll, after matching, will only move offset by 1 character rather than to end of the match. Default:overlap⇐0
.