Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Switch engine=cudf to the new JSON reader #12509

Merged
merged 22 commits into from
Jan 23, 2023

Conversation

galipremsagar
Copy link
Contributor

@galipremsagar galipremsagar commented Jan 9, 2023

Description

Fixes: #12470
This PR:

  • Switches the cudf engine in json reader to the map to the newest JSON reader. Introduces the cudf_legacy engine to map to the old JSON reader.
  • Fixes an issue with _fsspec_data_transfer & compression that is required for the switch, these failures are already caught by tests.

Note: When engine='auto', and line=False, the pandas json reader will be used. To override the selection, we pass engine='cudf'.

Dependent on :

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@github-actions github-actions bot added the Python Affects Python cuDF API. label Jan 9, 2023
@galipremsagar galipremsagar added improvement Improvement / enhancement to an existing function breaking Breaking change labels Jan 10, 2023
@codecov
Copy link

codecov bot commented Jan 10, 2023

Codecov Report

Base: 86.58% // Head: 85.70% // Decreases project coverage by -0.88% ⚠️

Coverage data is based on head (c01e0ff) compared to base (b6dccb3).
Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-23.02   #12509      +/-   ##
================================================
- Coverage         86.58%   85.70%   -0.88%     
================================================
  Files               155      155              
  Lines             24368    24870     +502     
================================================
+ Hits              21098    21316     +218     
- Misses             3270     3554     +284     
Impacted Files Coverage Δ
python/cudf/cudf/_version.py 1.41% <0.00%> (-98.59%) ⬇️
python/cudf/cudf/core/buffer/spill_manager.py 72.50% <0.00%> (-7.50%) ⬇️
python/cudf/cudf/core/buffer/spillable_buffer.py 91.07% <0.00%> (-1.78%) ⬇️
python/cudf/cudf/utils/dtypes.py 77.85% <0.00%> (-1.61%) ⬇️
python/cudf/cudf/options.py 86.11% <0.00%> (-1.59%) ⬇️
python/cudf/cudf/core/single_column_frame.py 94.30% <0.00%> (-1.27%) ⬇️
python/cudf/cudf/io/json.py 91.04% <0.00%> (-1.02%) ⬇️
...ython/custreamz/custreamz/tests/test_dataframes.py 98.38% <0.00%> (-1.01%) ⬇️
python/dask_cudf/dask_cudf/io/csv.py 96.34% <0.00%> (-1.00%) ⬇️
python/dask_cudf/dask_cudf/io/parquet.py 91.81% <0.00%> (-0.59%) ⬇️
... and 46 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@galipremsagar galipremsagar changed the title [WIP] Switch default to new json reader [REVIEW] Switch default to new json reader Jan 20, 2023
@galipremsagar galipremsagar changed the title [REVIEW] Switch default to new json reader [REVIEW] Switch default engine to the new JSON reader Jan 20, 2023
@galipremsagar galipremsagar marked this pull request as ready for review January 20, 2023 22:04
@galipremsagar galipremsagar requested review from a team as code owners January 20, 2023 22:04
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
The only suggestion is to remove all engine='cudf' and engine="cudf" :)

docs/cudf/source/user_guide/io/read-json.md Show resolved Hide resolved
python/cudf/cudf/tests/test_json.py Show resolved Hide resolved
Comment on lines 47 to 49
# TODO: Deprecated in 23.02, please
# give some time until `cudf_legacy`
# support can be removed completely.
Copy link
Contributor

@bdice bdice Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"give some time until cudf_legacy support can be removed completely"

Will our one-release deprecation policy be enough for this, or do you think a special exception is warranted? If more time is needed, let's try to indicate the release in which support should be removed in the error message or perhaps a code comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we definitely want to give this a special exception. I don't think we have a specific release decided yet to remove it completely. I updated the comment here.

cc: @GregoryKimball @vuule too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll pick the removal release based on 23.02 feedback. In general, we'll remove the old reader as soon as there are no user issues specific to the new one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable! I just wanted to make sure we discussed this topic before merging. 👍

Comment on lines 41 to 44
raise ValueError(
"engine='cudf_experimental' support has been removed, "
"use `engine='cudf'`"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a hard error or a deprecation warning that replaces the value with "cudf"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it was experimental, I feel we have the flexibility to make this an error, what do you think?

Copy link
Contributor

@bdice bdice Jan 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's justifiable -- but we'll want to delete this error at some point later. Therefore, it's the same amount of work for us as developers to deprecate it as to force an error. "Add warning, delete warning later" vs. "add error, delete error later."

No strong feelings here - resolve as you see fit.

python/cudf/cudf/io/json.py Show resolved Hide resolved
python/dask_cudf/dask_cudf/io/tests/test_json.py Outdated Show resolved Hide resolved
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, once conversations are resolved.

python/dask_cudf/dask_cudf/io/json.py Outdated Show resolved Hide resolved
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Looking forward to future removal of engine='cudf's :)

@galipremsagar
Copy link
Contributor Author

Looks good. Looking forward to future removal of engine='cudf's :)

Which also means the removal of engine='pandas', actually the engine parameter entirely 😉

@galipremsagar galipremsagar changed the title [REVIEW] Switch default engine to the new JSON reader [REVIEW] Switch engine=cudf to the new JSON reader Jan 20, 2023
Copy link
Contributor

@karthikeyann karthikeyann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍
Looking forward to add more orients, and remove engine parameter entirely.

Copy link
Member

@rjzamora rjzamora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @galipremsagar - Looks good on the dask side!

@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team 4 - Needs Dask Reviewer labels Jan 23, 2023
@galipremsagar
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit ecef4e2 into rapidsai:branch-23.02 Jan 23, 2023
@karthikeyann
Copy link
Contributor

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge breaking Breaking change improvement Improvement / enhancement to an existing function Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]cudf fails to read the JSON string of an empty body
5 participants