Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize JSON writer #13144

Merged
merged 13 commits into from
Apr 24, 2023
Merged

Conversation

karthikeyann
Copy link
Contributor

Description

Reduce benchmark runtime by skipping unrequired combinations
Optimize List and Struct conversion functions.

  • Creates array of string views and use make_strings_column to create the joined strings, and update offsets to create concatenated row string.

Update default rows_per_chunk in Cython.

512MB dataframe writing runtime reduced from 84s to 2.7s.
JSON_WRITER benchmark runtime reduced from ~15m to ~5m.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@karthikeyann karthikeyann added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. 4 - Needs Review Waiting for reviewer to review or respond 4 - Needs cuIO Reviewer Performance Performance related issue strings strings issues (C++ and Python) non-breaking Non-breaking change labels Apr 16, 2023
@karthikeyann karthikeyann self-assigned this Apr 16, 2023
@github-actions github-actions bot added the Python Affects Python cuDF API. label Apr 16, 2023
@karthikeyann karthikeyann changed the title reduce benchmark runtime by skipping unrequired combinations Optimize JSON writer Apr 16, 2023
@karthikeyann karthikeyann added 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function Cython and removed 2 - In Progress Currently a work in progress labels Apr 18, 2023
@karthikeyann
Copy link
Contributor Author

Nsight Profile:
Before:
image
After:
image

@karthikeyann karthikeyann marked this pull request as ready for review April 18, 2023 13:45
@karthikeyann karthikeyann requested review from a team as code owners April 18, 2023 13:45
cpp/src/io/json/write_json.cu Outdated Show resolved Hide resolved
cpp/src/io/json/write_json.cu Outdated Show resolved Hide resolved
cpp/src/io/json/write_json.cu Outdated Show resolved Hide resolved
cpp/src/io/json/write_json.cu Outdated Show resolved Hide resolved
cpp/src/io/json/write_json.cu Outdated Show resolved Hide resolved
cpp/src/io/json/write_json.cu Outdated Show resolved Hide resolved
cpp/src/io/json/write_json.cu Show resolved Hide resolved
Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions. Approving.

cpp/benchmarks/io/json/json_writer.cpp Outdated Show resolved Hide resolved
Comment on lines 143 to 145
string_view const row_prefix; //{
string_view const row_suffix; //} or }\n for json-lines
string_view const value_separator; //,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
string_view const row_prefix; //{
string_view const row_suffix; //} or }\n for json-lines
string_view const value_separator; //,
string_view const row_prefix; // {
string_view const row_suffix; // } or }\n for json-lines
string_view const value_separator; // ,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this not caught in clang-format?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it should. I noticed the new clang-format doing some odd things myself.

Copy link
Contributor

@bdice bdice Apr 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuSpatial and cugraph-ops noticed a bug in clang-format 16.0.1 relating to spacing around * (it made some multiplications look like pointers). We can revert to another version if you think there are issues with the version we've chosen. For those two repos, we switched to 15.0.7. You can alter .pre-commit-config.yaml and run pre-commit run --all-files clang-format if you'd like to test it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not seen anything concerning that is worth reverting to an older version.

cpp/src/io/json/write_json.cu Outdated Show resolved Hide resolved
@karthikeyann karthikeyann added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs Review Waiting for reviewer to review or respond 4 - Needs cuIO Reviewer labels Apr 21, 2023
@karthikeyann
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 2a7fd5b into rapidsai:branch-23.06 Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue Python Affects Python cuDF API. strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants