Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up (de)serialization of Base bits types in abstract containers #30221

Merged
merged 1 commit into from
Dec 4, 2018

Conversation

JeffBezanson
Copy link
Sponsor Member

helps #30148

Before:
Serialize array of Union{Missing,Int}:

  7.327029 seconds (100.00 M allocations: 1.990 GiB, 4.15% gc time)

Deserialize that:

 54.608122 seconds (400.37 M allocations: 8.810 GiB, 1.18% gc time)

Serialize array of Union{Missing,UInt}:

 22.448258 seconds (200.19 M allocations: 3.493 GiB, 3.03% gc time)

Deserialize that:

 62.783948 seconds (399.90 M allocations: 8.787 GiB, 0.94% gc time)

Same tests with this PR:

  5.504755 seconds (100.00 M allocations: 1.990 GiB, 12.72% gc time)
  6.396924 seconds (199.90 M allocations: 4.317 GiB, 5.81% gc time)
  4.939101 seconds (100.22 M allocations: 2.001 GiB, 8.17% gc time)
  6.633005 seconds (200.56 M allocations: 4.349 GiB, 4.78% gc time)

The times are a bit variable but the basic story is clear. I tested both Int and UInt, since Int already had some special-case code. Other Base number types (Int8, UInt8, ...) should have roughly the same performance characteristics as UInt.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Nov 30, 2018

Isn't this breaking though (because of the new list of special objects), so can only be partially backported?

@KristofferC
Copy link
Sponsor Member

I thought we gave no guarantees about being able to deserialize data from any previous version of Julia?

@JeffBezanson
Copy link
Sponsor Member Author

No this should be 100% backwards compatible. It just uses if statements instead of dispatch, and additional specialized methods. There were already special tags for those types.

@@ -748,6 +756,9 @@ function handle_deserialize(s::AbstractSerializer, b::Int32)
return unwrap_unionall(tname.wrapper)
elseif b == OBJECT_TAG
t = deserialize(s)
if t === Missing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe do the same for Nothing?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing already has a special representation; in a future version I'll add one for missing and then this code can be removed.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just do isdefined(t, :instance) && return t.instance here? Since deserialize is required to return that object, I don't think there's anything very meaningful that a deserializing implementation would be permitted to do after dispatch

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but I would consider that slightly breaking so I'll do it separately.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also inclined to say that custom deserialize methods aren't called for primitive types (and possibly more things). Of course that's even more breaking, but it's hard to get performance otherwise (e.g. #14678).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello
The example and the fix seems to be only focusing on vector{union{int, missing}}
How about other primitive types. Eg vector {union {float64,missing}} or float32 float16 bool char?
Thank you

@JeffBezanson JeffBezanson merged commit 11c5680 into master Dec 4, 2018
@JeffBezanson JeffBezanson deleted the jb/serspeedups branch December 4, 2018 19:17
KristofferC pushed a commit that referenced this pull request Dec 6, 2018
@KristofferC KristofferC mentioned this pull request Dec 6, 2018
61 tasks
KristofferC pushed a commit that referenced this pull request Dec 12, 2018
KristofferC pushed a commit that referenced this pull request Feb 11, 2019
KristofferC pushed a commit that referenced this pull request Feb 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants