A single file regex library written in and for Zig.
Note: This library is still in development. For now I would recommend using another library if you are looking for something robust.
This library was mainly inspired by the rough implementation of Ken Thompson's algorithm outlined in this article by Russ Cox, though not everything is based off it.
Just download the regex.zig
file and include it into your project however you want.
const std = @import("std");
const regex = @import("regex.zig");
pub fn main() !void
{
var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);
defer arena.deinit();
const rx = try regex.Regex.compile("ab*c", arena.allocator());
defer rx.deinit();
if (rx.match("abc"))
{
std.debug.print("Hooray!\n", .{});
}
else
{
std.debug.print("Uh oh...\n", .{});
}
}
Like all regular expressions, non-operator characters which are next to each other concatenate (so abc
would match "abc"). Brackets are denoted by normal parenthesis (i.e., ()
).
The following regex operators are supported as of current:
|
- IfR
andS
are regular expressions, thenR|S
matchesR
orS
(Note: This has the highest precedence, soa|bc*
is equivalent toa|(bc*)
, not(a|b)c*
).*
- IfR
is a regular expression, thenR*
matches 0 or more repetitions ofR
.+
- Same as*
but 1 or more repetitions.?
- IfR
is a regular expression, thenR?
matches 1 or no appearances ifR
The following character classes are supported as of current
.
- represents any character.[]
- represents any of the characters inside these brackets (e.g.,[abc]
would mean 'a', 'b' or 'c').-
- use inside square brackets to denote a range of characters (e.g.,[a-z]
would mean any character from 'a' to 'z').
/
is an escape character, it can be used to escape any character that would normally represent an operator or character class* (e.g., /*
would match "*"), and can also represent the following control codes:
/n
- recognises new line ascii character./r
- recognises return carriage ascii character./t
- recognises tab ascii character.
*Note: What requires escaping depends one whether you're inside []
or not. For example,
*
does not need escaping when inside []
, but -
does.
- Substring matching
- Subexpressions (probably with
{}
). - UTF8 support
- Any other operators I can think of or remember that are commonly used or are useful.