Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to use RE2 (defaults on native regex engine) #1684

Closed
wants to merge 15 commits into from

Conversation

efebarlas
Copy link
Contributor

@efebarlas efebarlas commented Jul 9, 2021

Signed-off-by: Efe Barlas ebarlas@purdue.edu

What issue does this pull request resolve?
#1683
What changes did you make?
-I added an option on using RE2 or the native regex engine. This option defaults on the native regex engine.
-I added a dependency to node-re2. The unpacked size of node-re2 is 1.26 MB, but I do not know the minified size.
-I changed the 'usePattern' function to read the option, and to try and compile the regex in RE2. If RE2 throws an error, the system produces a log message, and falls back to the native regex engine.
Is there anything that requires more attention while reviewing?
-I haven't added tests which demonstrates the correctness of regex validation when RE2 is used. I'm waiting for a green light from the maintainers of ajv to do so.
Answer to this comment:
-When RE2 is enabled, some test cases in json-schema-test fail, because RE2 and ECMA 262, section 15.10.1 differ in their definition of whitespace (JS includes \v, the other does not).

efebarlas added 3 commits July 9, 2021 11:28
Signed-off-by: Efe Barlas <ebarlas@purdue.edu>
Signed-off-by: Efe Barlas <ebarlas@purdue.edu>
Signed-off-by: Efe Barlas <ebarlas@purdue.edu>
lib/core.ts Outdated Show resolved Hide resolved
lib/vocabularies/code.ts Outdated Show resolved Hide resolved
lib/vocabularies/code.ts Outdated Show resolved Hide resolved
@davisjam
Copy link

davisjam commented Jul 9, 2021

Questions and notes:

  • If RE2 is used as the default engine, does the test suite pass?
  • If merged, updates should be made to the documentation on the AJV website that describes ReDoS concerns, to document this option.

Signed-off-by: Efe Barlas <ebarlas@purdue.edu>
@epoberezkin
Copy link
Member

epoberezkin commented Jul 15, 2021

I am not terribly excited about the idea of making node-re2 the dependency... I understand the problem though.

A better approach could be to make an option regExp to pass a function that given a string returns an object that has the same methods as RegExp object; this option would default to (s) => opts.unicodeRegExp ? new RegExp(s, "u") : new RegExp(s). It would also simplify the code in usePattern a bit - opts.unicodeRegExp would only be checked only if regExp option is not passed and only at instance construction.

Some other comments in code

const {useRE2} = opts
if (u === "u" && useRE2) {
try {
const engine = new RE2(pattern)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably incorrect to call it "engine", RE2 is an engine, this one is probably just "regExp" or something...

})
} catch (e) {
self.logger.log(
"Warning: One of the regexes in the schema is not supported by RE2. Falling back to the native regex engine"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fallback would be just handled in the passed function

@epoberezkin
Copy link
Member

@efebarlas thanks - getting there! Possibly, having re2.ts file might create the expectation that re2 doesn't have to be imported separately... I guess it can be all covered in the docs.

docs/options.md Outdated Show resolved Hide resolved
Copy link
Member

@epoberezkin epoberezkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool - looks good - I can finalise

lib/core.ts Show resolved Hide resolved
epoberezkin added a commit that referenced this pull request Nov 13, 2021
commit 1835f3517ffb750ea4c75ce3ee8d9c262374e8f4
Author: Evgeny Poberezkin <2769109+epoberezkin@users.noreply.github.com>
Date:   Sat Nov 13 18:04:08 2021 +0000

    simplify regExp option

commit e7f1eb9
Merge: 98f04d3 f68ef8f
Author: Evgeny Poberezkin <2769109+epoberezkin@users.noreply.github.com>
Date:   Sat Nov 13 17:20:15 2021 +0000

    Merge branch 'master' into master

commit 98f04d3
Author: Evgeny Poberezkin <2769109+epoberezkin@users.noreply.github.com>
Date:   Sat Nov 13 17:20:04 2021 +0000

    Update docs/options.md

commit 0ff99ed
Merge: d9ea90c 8fccddb
Author: Efe Barlas <43009963+efebarlas@users.noreply.github.com>
Date:   Wed Nov 10 00:15:33 2021 -0500

    Merge branch 'master' into master

commit d9ea90c
Author: efebarlas <ebarlas@purdue.edu>
Date:   Wed Nov 10 00:09:17 2021 -0500

    prettier:write to pass CI

    Signed-off-by: efebarlas <ebarlas@purdue.edu>

commit b29cd91
Merge: f50eb43 20089ed
Author: efebarlas <ebarlas@purdue.edu>
Date:   Tue Nov 9 21:54:45 2021 -0500

    Merge branch 'master' of github.com:efebarlas/ajv
    Tests added for code.regExp option

commit f50eb43
Author: efebarlas <ebarlas@purdue.edu>
Date:   Tue Nov 9 21:54:28 2021 -0500

    Tests added

    Signed-off-by: efebarlas <ebarlas@purdue.edu>

commit 20089ed
Author: Efe Barlas <43009963+efebarlas@users.noreply.github.com>
Date:   Tue Nov 9 21:53:34 2021 -0500

    Update options.md

commit fd3e290
Merge: 41dd4bc 6ef0c66
Author: Evgeny Poberezkin <2769109+epoberezkin@users.noreply.github.com>
Date:   Sun Sep 12 19:07:28 2021 +0100

    Merge branch 'master' into master

commit 41dd4bc
Merge: 698f411 a9f38cd
Author: Evgeny Poberezkin <2769109+epoberezkin@users.noreply.github.com>
Date:   Sun Sep 12 11:35:20 2021 +0100

    Merge branch 'master' into master

commit 698f411
Author: Efe Barlas <ebarlas@purdue.edu>
Date:   Thu Aug 12 14:55:17 2021 -0400

    dev-dependency to node-re2 added

commit a0720f8
Author: Efe Barlas <ebarlas@purdue.edu>
Date:   Thu Aug 12 14:43:39 2021 -0400

    re2 runtime lib + regExp code option added

commit 1470c23
Author: Efe Barlas <ebarlas@purdue.edu>
Date:   Fri Jul 9 14:14:45 2021 -0400

    variable name changes

    Signed-off-by: Efe Barlas <ebarlas@purdue.edu>

commit 8f7ca34
Author: Efe Barlas <ebarlas@purdue.edu>
Date:   Fri Jul 9 13:22:38 2021 -0400

    minor changes

    Signed-off-by: Efe Barlas <ebarlas@purdue.edu>

commit 9791cce
Author: Efe Barlas <ebarlas@purdue.edu>
Date:   Fri Jul 9 13:20:47 2021 -0400

    remove comments

    Signed-off-by: Efe Barlas <ebarlas@purdue.edu>

commit b07542d
Author: Efe Barlas <ebarlas@purdue.edu>
Date:   Fri Jul 9 11:28:29 2021 -0400

    added: RE2 Option with fallback

    Signed-off-by: Efe Barlas <ebarlas@purdue.edu>
epoberezkin added a commit that referenced this pull request Nov 13, 2021
commit 1835f3517ffb750ea4c75ce3ee8d9c262374e8f4
Author: Evgeny Poberezkin <2769109+epoberezkin@users.noreply.github.com>
Date:   Sat Nov 13 18:04:08 2021 +0000

    simplify regExp option

commit e7f1eb9
Merge: 98f04d3 f68ef8f
Author: Evgeny Poberezkin <2769109+epoberezkin@users.noreply.github.com>
Date:   Sat Nov 13 17:20:15 2021 +0000

    Merge branch 'master' into master

commit 98f04d3
Author: Evgeny Poberezkin <2769109+epoberezkin@users.noreply.github.com>
Date:   Sat Nov 13 17:20:04 2021 +0000

    Update docs/options.md

commit 0ff99ed
Merge: d9ea90c 8fccddb
Author: Efe Barlas <43009963+efebarlas@users.noreply.github.com>
Date:   Wed Nov 10 00:15:33 2021 -0500

    Merge branch 'master' into master

commit d9ea90c
Author: efebarlas <ebarlas@purdue.edu>
Date:   Wed Nov 10 00:09:17 2021 -0500

    prettier:write to pass CI

    Signed-off-by: efebarlas <ebarlas@purdue.edu>

commit b29cd91
Merge: f50eb43 20089ed
Author: efebarlas <ebarlas@purdue.edu>
Date:   Tue Nov 9 21:54:45 2021 -0500

    Merge branch 'master' of github.com:efebarlas/ajv
    Tests added for code.regExp option

commit f50eb43
Author: efebarlas <ebarlas@purdue.edu>
Date:   Tue Nov 9 21:54:28 2021 -0500

    Tests added

    Signed-off-by: efebarlas <ebarlas@purdue.edu>

commit 20089ed
Author: Efe Barlas <43009963+efebarlas@users.noreply.github.com>
Date:   Tue Nov 9 21:53:34 2021 -0500

    Update options.md

commit fd3e290
Merge: 41dd4bc 6ef0c66
Author: Evgeny Poberezkin <2769109+epoberezkin@users.noreply.github.com>
Date:   Sun Sep 12 19:07:28 2021 +0100

    Merge branch 'master' into master

commit 41dd4bc
Merge: 698f411 a9f38cd
Author: Evgeny Poberezkin <2769109+epoberezkin@users.noreply.github.com>
Date:   Sun Sep 12 11:35:20 2021 +0100

    Merge branch 'master' into master

commit 698f411
Author: Efe Barlas <ebarlas@purdue.edu>
Date:   Thu Aug 12 14:55:17 2021 -0400

    dev-dependency to node-re2 added

commit a0720f8
Author: Efe Barlas <ebarlas@purdue.edu>
Date:   Thu Aug 12 14:43:39 2021 -0400

    re2 runtime lib + regExp code option added

commit 1470c23
Author: Efe Barlas <ebarlas@purdue.edu>
Date:   Fri Jul 9 14:14:45 2021 -0400

    variable name changes

    Signed-off-by: Efe Barlas <ebarlas@purdue.edu>

commit 8f7ca34
Author: Efe Barlas <ebarlas@purdue.edu>
Date:   Fri Jul 9 13:22:38 2021 -0400

    minor changes

    Signed-off-by: Efe Barlas <ebarlas@purdue.edu>

commit 9791cce
Author: Efe Barlas <ebarlas@purdue.edu>
Date:   Fri Jul 9 13:20:47 2021 -0400

    remove comments

    Signed-off-by: Efe Barlas <ebarlas@purdue.edu>

commit b07542d
Author: Efe Barlas <ebarlas@purdue.edu>
Date:   Fri Jul 9 11:28:29 2021 -0400

    added: RE2 Option with fallback

    Signed-off-by: Efe Barlas <ebarlas@purdue.edu>

Co-authored-by: Evgeny Poberezkin <2769109+epoberezkin@users.noreply.github.com>
@epoberezkin
Copy link
Member

merged

@marcospassos
Copy link

marcospassos commented Apr 6, 2022

@epoberezkin and @efebarlas, passing a custom engine will also affect the validation of format: 'regex', right? I couldn't find any information about this in the docs.

@efebarlas
Copy link
Contributor Author

efebarlas commented Apr 7, 2022

After some experimenting, I don't think the validation of format: 'regex' is affected by changing the engine, because it seems that regexes with features RE2 doesn't support, such as (?=.@.), pass validation when RE2 is used. I think with formats, we just use the default regex engine.

I might work on a patch for this at some point, but I'm unfortunately busy at the moment.

@epoberezkin
Copy link
Member

Thank you!

@reed-lawrence
Copy link

What is the RegExtEngine.code field intended to be used with? The following code is currently throwing an error:

import Ajv from 'ajv';
import RE2 from 're2';

const ajv = new Ajv({ code: { regExp: RE2 } });

Property 'code' is missing in type 'RE2Constructor' but required in type 'RegExpEngine'.

@davisjam
Copy link

@reed-lawrence , can you clarify whether you are asking "What is the purpose of this feature?" or something else?

@reed-lawrence
Copy link

reed-lawrence commented Sep 27, 2023

@davisjam Thanks for the reply.

I guess I am wondering if the declared interface is correct or how to reconcile that the code sample in the documentation for ReDoS attack mitigation does not build. What would be the proper integration of RE2 given the error posted in my prior comment?

Is the .code field required or optional? Given that the re2 library does not implement it. If the .code field is required, what is the proper implementation of an external regex engine such as re2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants