Use orjson instead of json, when available #17955

hauntsaninja · 2024-10-15T08:20:44Z

For mypy -c 'import torch', the cache load time goes from 0.44s to 0.25s as measured by manager's data_json_load_time. If I time dump times specifically, I see a saving of 0.65s to 0.07s. Overall, a pretty reasonable perf win -- should we make it a required dependency?

I don't know if the sqlite cache path is used at all (what's the status?), but let me know if I need a cleverer migration than renaming the table

See also #3456

For `mypy -c 'import torch'`, the cache load time goes from 0.44s to 0.25s as measured by manager's data_json_load_time If I time dump times specifically, I see a saving of 0.65s to 0.07s. Overall, a pretty reasonable perf win -- should we make it a required dependency? I don't know if the sqlite cache path is used at all, but let me know if I need a cleverer migration than renaming the table

JukkaL · 2024-10-15T08:46:51Z

Sounds very promising! I can also perform some measurements.

should we make it a required dependency?

I wonder how well maintained orjson is, and does it ship binary wheels for all the platforms we care about? We might be adding ARM Linux wheels at some point, and it would be nice if all our dependencies would ship with binary wheels (though it's perhaps not essential for Linux, as long as there are x86-64 wheels).

I don't know if the sqlite cache path is used at all (what's the status?),

Sqlite caching is very much used, and I'm thinking of enabling it by default in the future. In certain use cases it's significantly faster than a file-per-module cache, and we use it at work.

hauntsaninja · 2024-10-15T08:52:42Z

mypy/util.py

+            return orjson.dumps(obj, option=orjson.OPT_INDENT_2 | orjson.OPT_SORT_KEYS)  # type: ignore[no-any-return]
+        else:
+            # TODO: If we don't sort keys here, testIncrementalInternalScramble fails
+            # We should document exactly what is going on there


Lmk if you know off the top of your head why sorting keys is important!

It might be just so that tests produce the keys in a predictable order on older Python 3 versions where dict didn't preserve insertion order.

hauntsaninja · 2024-10-15T08:54:10Z

I think orjson does ship wheels for all platforms we care about. It would be nice if Python packaging had a concept of a "default" extra for this kind of thing, though

JukkaL · 2024-10-15T09:30:57Z

I'm seeing a 10-15% improvement to the performance of time mypy -c 'import torch' on Linux. This probably also helps incremental mypy runs in general.

hauntsaninja

Okay, I think this PR should be good to go.

Questions to resolve now:

Is just using files2 in sqlite a sufficient migration

Open questions that we can resolve later:

Documenting why sort_keys is important
Adding an extra that includes orjson (or relying on it by default)
Adding test coverage for the optional feature

JukkaL · 2024-10-16T10:53:54Z

Is just using files2 in sqlite a sufficient migration

This seems fine. It's an internal implementation detail, and caches aren't compatible between mypy versions.

Can you check if misc/convert-cache.py still works?

JukkaL · 2024-10-16T10:58:17Z

mypy/util.py

+            return orjson.dumps(obj, option=orjson.OPT_INDENT_2 | orjson.OPT_SORT_KEYS)  # type: ignore[no-any-return]
+        else:
+            # TODO: If we don't sort keys here, testIncrementalInternalScramble fails
+            # We should document exactly what is going on there


It might be just so that tests produce the keys in a predictable order on older Python 3 versions where dict didn't preserve insertion order.

hauntsaninja · 2024-10-16T20:35:12Z

Thanks, there was a missing spot where I'd forgotten to change to files2.

I'm a little confused at how cache-convert is meant to work, I think it might be a little broken on master? Looking...

github-actions · 2024-10-16T20:51:57Z

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

hauntsaninja · 2024-10-16T20:56:03Z

Okay, fixed the cache convert problem on master in #17974

Follow up to #17955

For `mypy -c 'import torch'`, the cache load time goes from 0.44s to 0.25s as measured by manager's data_json_load_time. If I time dump times specifically, I see a saving of 0.65s to 0.07s. Overall, a pretty reasonable perf win -- should we make it a required dependency? See also #3456

Follow up to #17955

hauntsaninja force-pushed the use-orjson branch from f6f7cb9 to 82969b8 Compare October 15, 2024 08:21

hauntsaninja force-pushed the use-orjson branch from b0db034 to 4103892 Compare October 15, 2024 08:25

This comment has been minimized.

Sign in to view

hauntsaninja added 2 commits October 15, 2024 01:47

misc

a27b3d4

fix test

3f8ff75

hauntsaninja commented Oct 15, 2024

View reviewed changes

This comment has been minimized.

Sign in to view

sort_keys

263cb10

hauntsaninja commented Oct 16, 2024

View reviewed changes

hauntsaninja mentioned this pull request Oct 16, 2024

1.12/1.13 Release Tracking Issue #17815

Open

This comment has been minimized.

Sign in to view

JukkaL approved these changes Oct 16, 2024

View reviewed changes

fix missing spot

bd6536d

hauntsaninja merged commit c1f2db3 into python:master Oct 16, 2024
17 checks passed

hauntsaninja deleted the use-orjson branch October 16, 2024 21:03

hauntsaninja mentioned this pull request Oct 17, 2024

Add faster-cache extra, test in CI #17978

Merged

JukkaL pushed a commit that referenced this pull request Oct 17, 2024

Add faster-cache extra, test in CI (#17978)

61ad5a4

Follow up to #17955

hauntsaninja added a commit that referenced this pull request Oct 20, 2024

Add faster-cache extra, test in CI (#17978)

5c4d2db

Follow up to #17955

hauntsaninja mentioned this pull request Oct 21, 2024

Changelog for 1.13 #18000

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use orjson instead of json, when available #17955

Use orjson instead of json, when available #17955

hauntsaninja commented Oct 15, 2024 •

edited

Loading

This comment has been minimized.

JukkaL commented Oct 15, 2024

hauntsaninja Oct 15, 2024 •

edited

Loading

JukkaL Oct 16, 2024

hauntsaninja commented Oct 15, 2024 •

edited

Loading

This comment has been minimized.

JukkaL commented Oct 15, 2024

hauntsaninja left a comment

This comment has been minimized.

JukkaL commented Oct 16, 2024

JukkaL Oct 16, 2024

hauntsaninja commented Oct 16, 2024 •

edited

Loading

github-actions bot commented Oct 16, 2024

hauntsaninja commented Oct 16, 2024

Use orjson instead of json, when available #17955

Use orjson instead of json, when available #17955

Conversation

hauntsaninja commented Oct 15, 2024 • edited Loading

This comment has been minimized.

JukkaL commented Oct 15, 2024

hauntsaninja Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

JukkaL Oct 16, 2024

Choose a reason for hiding this comment

hauntsaninja commented Oct 15, 2024 • edited Loading

This comment has been minimized.

JukkaL commented Oct 15, 2024

hauntsaninja left a comment

Choose a reason for hiding this comment

This comment has been minimized.

JukkaL commented Oct 16, 2024

JukkaL Oct 16, 2024

Choose a reason for hiding this comment

hauntsaninja commented Oct 16, 2024 • edited Loading

github-actions bot commented Oct 16, 2024

hauntsaninja commented Oct 16, 2024

hauntsaninja commented Oct 15, 2024 •

edited

Loading

hauntsaninja Oct 15, 2024 •

edited

Loading

hauntsaninja commented Oct 15, 2024 •

edited

Loading

hauntsaninja commented Oct 16, 2024 •

edited

Loading