NVIDIA · andygrove · Nov 9, 2021 · Nov 5, 2021 · Nov 5, 2021 · Nov 5, 2021
diff --git a/docs/compatibility.md b/docs/compatibility.md
@@ -257,14 +257,32 @@ The plugin supports reading `uncompressed`, `snappy` and `gzip` Parquet files an
 fall back to the CPU when reading an unsupported compression format, and will error out in that
 case.
 
-## Regular Expressions
-The RAPIDS Accelerator for Apache Spark currently supports string literal matches, not wildcard
-matches.
+## LIKE
 
 If a null char '\0' is in a string that is being matched by a regular expression, `LIKE` sees it as
 the end of the string.  This will be fixed in a future release. The issue is
 [here](https://github.com/NVIDIA/spark-rapids/issues/119).
 
+## Regular Expressions
+
+### regexp_replace
+
+The RAPIDS Accelerator for Apache Spark currently supports string literal matches, not wildcard
+matches for the `regexp_replace` function and will fall back to CPU if a regular expression pattern 
+is provided.
+
+### RLike
+
+The GPU implementation of `RLike` has the following known issues where behavior is not consistent with Apache Spark and
+this expression is disabled by default. It can be enabled setting `spark.rapids.sql.expression.RLike=true`.
+
+- `.` matches `\r` on the GPU but not on the CPU ([cuDF issue #9619](https://github.com/rapidsai/cudf/issues/9619))
+- `$` does not match the end of string if the string ends with a line-terminator 
+  ([cuDF issue #9620](https://github.com/rapidsai/cudf/issues/9620))
+
+`RLike` will fall back to CPU if any regular expressions are detected that are not supported on the GPU 
+or would produce different results on the GPU.
+
 ## Timestamps
 
 Spark stores timestamps internally relative to the JVM time zone.  Converting an arbitrary timestamp
@@ -569,60 +587,6 @@ distribution. Because the results are not bit-for-bit identical with the Apache
 `approximate_percentile`, this feature is disabled by default and can be enabled by setting
 `spark.rapids.sql.expression.ApproximatePercentile=true`.
 
-## RLike
-
-The GPU implementation of RLike has a number of known issues where behavior is not consistent with Apache Spark and
-this expression is disabled by default. It can be enabled setting `spark.rapids.sql.expression.RLike=true`.
-
-A summary of known issues is shown below but this is not intended to be a comprehensive list. We recommend that you
-do your own testing to verify whether the GPU implementation of `RLike` is suitable for your use case.
-
-We plan on improving the RLike functionality over time to make it more compatible with Spark so this feature should
-be used at your own risk with the expectation that the behavior will change in future releases.
-
-### Multi-line handling
-
-The GPU implementation of RLike supports `^` and `$` to represent the start and end of lines within a string but
-Spark uses `^` and `$` to refer to the start and end of the entire string (equivalent to `\A` and `\Z`).
-
-| Pattern | Input  | Spark on CPU | Spark on GPU |
-|---------|--------|--------------|--------------|
-| `^A`    | `A\nB` | Match        | Match        |
-| `A$`    | `A\nB` | No Match     | Match        |
-| `^B`    | `A\nB` | No Match     | Match        |
-| `B$`    | `A\nB` | Match        | Match        |
-
-As a workaround, `\A` and `\Z` can be used instead of `^` and `$`.
-
-### Null support
-
-The GPU implementation of RLike supports null characters in the input but does not support null characters in 
-the regular expression and will fall back to the CPU in this case.
-
-### Qualifiers with nothing to repeat
-
-Spark supports qualifiers in cases where there is nothing to repeat. For example, Spark supports `a*+` and this
-will match all inputs. The GPU implementation of RLike does not support this syntax and will throw an exception with
-the message `nothing to repeat at position 0`.
-
-### Stricter escaping requirements
-
-The GPU implementation of RLike has stricter requirements around escaping special characters in some cases.
-
-| Pattern   | Input  | Spark on CPU | Spark on GPU |
-|-----------|--------|--------------|--------------|
-| `a[-+]`   | `a-`   | Match        | No Match     |
-| `a[\-\+]` | `a-`   | Match        | Match        |
-
-### Empty groups
-
-The GPU implementation of RLike does not support empty groups correctly.
-
-| Pattern   | Input  | Spark on CPU | Spark on GPU |
-|-----------|--------|--------------|--------------|
-| `z()?`    | `a`    | No Match     | Match        |
-| `z()*`    | `a`    | No Match     | Match        |
-
 ## Conditionals and operations with side effects (ANSI mode)
 
 In Apache Spark condition operations like `if`, `coalesce`, and `case/when` lazily evaluate

diff --git a/integration_tests/src/main/python/string_test.py b/integration_tests/src/main/python/string_test.py
@@ -492,22 +492,30 @@ def test_rlike_embedded_null():
             conf={'spark.rapids.sql.expression.RLike': 'true'})
 
 @allow_non_gpu('ProjectExec', 'RLike')
-def test_rlike_null_pattern():
+def test_rlike_fallback_null_pattern():
     gen = mk_str_gen('[abcd]{1,3}')
     assert_gpu_fallback_collect(
             lambda spark: unary_op_df(spark, gen).selectExpr(
                 'a rlike "a\u0000"'),
             'RLike',
             conf={'spark.rapids.sql.expression.RLike': 'true'})
 
+@allow_non_gpu('ProjectExec', 'RLike')
+def test_rlike_fallback_empty_group():
+    gen = mk_str_gen('[abcd]{1,3}')
+    assert_gpu_fallback_collect(
+            lambda spark: unary_op_df(spark, gen).selectExpr(
+                'a rlike "a()?"'),
+            'RLike',
+            conf={'spark.rapids.sql.expression.RLike': 'true'})
+
 def test_rlike_escape():
     gen = mk_str_gen('[ab]{0,2}[\\-\\+]{0,2}')
     assert_gpu_and_cpu_are_equal_collect(
             lambda spark: unary_op_df(spark, gen).selectExpr(
                 'a rlike "a[\\\\-]"'),
             conf={'spark.rapids.sql.expression.RLike': 'true'})
 
-@pytest.mark.xfail(reason='cuDF supports multiline by default but Spark does not - https://github.com/rapidsai/cudf/issues/9439')
 def test_rlike_multi_line():
     gen = mk_str_gen('[abc]\n[def]')
     assert_gpu_and_cpu_are_equal_collect(
@@ -518,18 +526,20 @@ def test_rlike_multi_line():
                 'a rlike "e$"'),
             conf={'spark.rapids.sql.expression.RLike': 'true'})
 
-@pytest.mark.xfail(reason='cuDF has stricter requirements around escaping - https://github.com/rapidsai/cudf/issues/9434')
 def test_rlike_missing_escape():
     gen = mk_str_gen('a[\\-\\+]')
     assert_gpu_and_cpu_are_equal_collect(
             lambda spark: unary_op_df(spark, gen).selectExpr(
-                'a rlike "a[-]"'),
+                'a rlike "a[-]"',
+                'a rlike "a[+-]"',
+                'a rlike "a[a-b-]"'),
             conf={'spark.rapids.sql.expression.RLike': 'true'})
 
-@pytest.mark.xfail(reason='cuDF does not support qualifier with nothing to repeat - https://github.com/rapidsai/cudf/issues/9434')
-def test_rlike_nothing_to_repeat():
+@allow_non_gpu('ProjectExec', 'RLike')
+def test_rlike_fallback_possessive_quantifier():
     gen = mk_str_gen('(\u20ac|\\w){0,3}a[|b*.$\r\n]{0,2}c\\w{0,3}')
-    assert_gpu_and_cpu_are_equal_collect(
+    assert_gpu_fallback_collect(
             lambda spark: unary_op_df(spark, gen).selectExpr(
                 'a rlike "a*+"'),
+                'RLike',
             conf={'spark.rapids.sql.expression.RLike': 'true'})