Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use orjson instead of json, when available #17955

Merged
merged 5 commits into from
Oct 16, 2024

Conversation

hauntsaninja
Copy link
Collaborator

@hauntsaninja hauntsaninja commented Oct 15, 2024

For mypy -c 'import torch', the cache load time goes from 0.44s to 0.25s as measured by manager's data_json_load_time. If I time dump times specifically, I see a saving of 0.65s to 0.07s. Overall, a pretty reasonable perf win -- should we make it a required dependency?

I don't know if the sqlite cache path is used at all (what's the status?), but let me know if I need a cleverer migration than renaming the table

See also #3456

For `mypy -c 'import torch'`, the cache load time goes from 0.44s to
0.25s as measured by manager's data_json_load_time
If I time dump times specifically, I see a saving of 0.65s to 0.07s.
Overall, a pretty reasonable perf win -- should we make it a required
dependency?

I don't know if the sqlite cache path is used at all, but let me know if
I need a cleverer migration than renaming the table

This comment has been minimized.

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 15, 2024

Sounds very promising! I can also perform some measurements.

should we make it a required dependency?

I wonder how well maintained orjson is, and does it ship binary wheels for all the platforms we care about? We might be adding ARM Linux wheels at some point, and it would be nice if all our dependencies would ship with binary wheels (though it's perhaps not essential for Linux, as long as there are x86-64 wheels).

I don't know if the sqlite cache path is used at all (what's the status?),

Sqlite caching is very much used, and I'm thinking of enabling it by default in the future. In certain use cases it's significantly faster than a file-per-module cache, and we use it at work.

return orjson.dumps(obj, option=orjson.OPT_INDENT_2 | orjson.OPT_SORT_KEYS) # type: ignore[no-any-return]
else:
# TODO: If we don't sort keys here, testIncrementalInternalScramble fails
# We should document exactly what is going on there
Copy link
Collaborator Author

@hauntsaninja hauntsaninja Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lmk if you know off the top of your head why sorting keys is important!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be just so that tests produce the keys in a predictable order on older Python 3 versions where dict didn't preserve insertion order.

@hauntsaninja
Copy link
Collaborator Author

hauntsaninja commented Oct 15, 2024

I think orjson does ship wheels for all platforms we care about. It would be nice if Python packaging had a concept of a "default" extra for this kind of thing, though

This comment has been minimized.

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 15, 2024

I'm seeing a 10-15% improvement to the performance of time mypy -c 'import torch' on Linux. This probably also helps incremental mypy runs in general.

Copy link
Collaborator Author

@hauntsaninja hauntsaninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think this PR should be good to go.

Questions to resolve now:

  • Is just using files2 in sqlite a sufficient migration

Open questions that we can resolve later:

  • Documenting why sort_keys is important
  • Adding an extra that includes orjson (or relying on it by default)
  • Adding test coverage for the optional feature

This comment has been minimized.

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 16, 2024

Is just using files2 in sqlite a sufficient migration

This seems fine. It's an internal implementation detail, and caches aren't compatible between mypy versions.

Can you check if misc/convert-cache.py still works?

return orjson.dumps(obj, option=orjson.OPT_INDENT_2 | orjson.OPT_SORT_KEYS) # type: ignore[no-any-return]
else:
# TODO: If we don't sort keys here, testIncrementalInternalScramble fails
# We should document exactly what is going on there
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be just so that tests produce the keys in a predictable order on older Python 3 versions where dict didn't preserve insertion order.

@hauntsaninja
Copy link
Collaborator Author

hauntsaninja commented Oct 16, 2024

Thanks, there was a missing spot where I'd forgotten to change to files2.

I'm a little confused at how cache-convert is meant to work, I think it might be a little broken on master? Looking...

Copy link
Contributor

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

@hauntsaninja
Copy link
Collaborator Author

Okay, fixed the cache convert problem on master in #17974

@hauntsaninja hauntsaninja merged commit c1f2db3 into python:master Oct 16, 2024
17 checks passed
@hauntsaninja hauntsaninja deleted the use-orjson branch October 16, 2024 21:03
JukkaL pushed a commit that referenced this pull request Oct 17, 2024
hauntsaninja added a commit that referenced this pull request Oct 20, 2024
For `mypy -c 'import torch'`, the cache load time goes from 0.44s to
0.25s as measured by manager's data_json_load_time. If I time dump times
specifically, I see a saving of 0.65s to 0.07s. Overall, a pretty
reasonable perf win -- should we make it a required dependency?

See also #3456
hauntsaninja added a commit that referenced this pull request Oct 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants