-
Notifications
You must be signed in to change notification settings - Fork 996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve online deserialization latency #2164
Conversation
dea6406
to
f6a3c5c
Compare
Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com>
f6a3c5c
to
c34aa30
Compare
@tsotnet Relevant for your benchmarking perhaps? |
This is great and much cleaner to read @judahrand - I'm curious where the speed up is coming from, is it just because we're avoiding the excess branching and if statements? |
I think it is the branching and ifs but also for list types the old implementation was looping over each element of the list and type casting it. This implementation avoids the need for that by not using From what I can tell |
I'd also add that this is relying much more on letting Protobuf sort out the conversions which might mean more of this is being done in C++ extensions? Edit: actually I'm not sure this is true but I'll look into it. |
Looks like most platform's wheels from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: achals, judahrand The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@achals Interestingly, there is the possibility of getting another small performance benefit by doing what is documented here: https://m.atthew.uk/posts/post.html This is probably not worth the deploy nightmare, however. |
Thanks for the detailed info! Yeah I'm inclined to opt for simplicity over speed in this particular case. I'm sure there's other avenues to speed things up that won't complicate our packaging and deploy story. |
Signed-off-by: Judah Rand 17158624+judahrand@users.noreply.github.com
What this PR does / why we need it:
For
_LIST
type features this implementation is up to 7x faster in my testing.For serializing and deserializing 10k rows of 3 columns
DOUBLE
, 3 columnsFLOAT_LIST
.Old implementation:
New implementation:
Which issue(s) this PR fixes:
Fixes #
Does this PR introduce a user-facing change?: