Skip to content

Commit

Permalink
Improve performance for replace-multi for long strings (#12858)
Browse files Browse the repository at this point in the history
Adds more efficient algorithm for multi-string version of `cudf::strings::replace` for longer strings (greater than 256 bytes on average in each row).

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Mike Wilson (https://github.com/hyperbolic2346)
  - Yunsong Wang (https://github.com/PointKernel)
  - Bradley Dice (https://github.com/bdice)

URL: #12858
  • Loading branch information
davidwendt authored Mar 17, 2023
1 parent 3540613 commit 8881cb6
Show file tree
Hide file tree
Showing 5 changed files with 636 additions and 114 deletions.
1 change: 1 addition & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -547,6 +547,7 @@ add_library(
src/strings/regex/regex_program.cpp
src/strings/repeat_strings.cu
src/strings/replace/backref_re.cu
src/strings/replace/multi.cu
src/strings/replace/multi_re.cu
src/strings/replace/replace.cu
src/strings/replace/replace_re.cu
Expand Down
4 changes: 2 additions & 2 deletions cpp/benchmarks/string/replace.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2021-2022, NVIDIA CORPORATION.
* Copyright (c) 2021-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -69,7 +69,7 @@ static void generate_bench_args(benchmark::internal::Benchmark* b)
int const row_mult = 8;
int const min_rowlen = 1 << 5;
int const max_rowlen = 1 << 13;
int const len_mult = 4;
int const len_mult = 2;
generate_string_bench_args(b, min_rows, max_rows, row_mult, min_rowlen, max_rowlen, len_mult);
}

Expand Down
Loading

0 comments on commit 8881cb6

Please sign in to comment.