Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reevaluate worker payload and serialization #7011

Closed
mourner opened this issue Jul 23, 2018 · 6 comments
Closed

Reevaluate worker payload and serialization #7011

mourner opened this issue Jul 23, 2018 · 6 comments
Labels
performance ⚡ Speed, stability, CPU usage, memory usage, or power usage

Comments

@mourner
Copy link
Member

mourner commented Jul 23, 2018

Spent some time investigating noticeable jank (abnormally long frames) that occurs when browsing a typical Mapbox Streets map. Here's a typical flame chart for these frames:

image

In my observation, a lot of the frames where jank occurs are heavily occupied by Actor receive. Most of it is reported as self time (so deserialization isn't the bottleneck). Since there's nothing special happening in that method other than the fact it gets called when getting a response back from a worker thread, I assume that it spends most of the time natively getting the JSON data in the message. This is plausible given that we've seen very bad performance of structured cloning before.

I stringified a typical message that the main thread gets when a tile loads and cut out all values that are transferable (typed arrays) to get a hang of how much structured cloning actually happens. Here's the sample: https://gist.github.com/mourner/ae43942f402f8e5346974945da83a748

The sample is about 500kb of JSON (minified), with tons of nested objects. I think we have a very good opportunity to reduce the jank by looking closer at the payload, with the following things in mind:

  • What objects can we fully omit from the payload?
  • What objects can we recreate from scratch on the main thread without serialized properties from the worker?
  • Is there anything we can do with the serialization/deserialiazation implementation to make the payload smaller?
  • What objects can we pack into typed array for transfering? (e.g. feature state id map, symbol instance geometries)

cc @anandthakker @jfirebaugh @asheemmamoowala @ChrisLoer

@mourner mourner added the performance ⚡ Speed, stability, CPU usage, memory usage, or power usage label Jul 23, 2018
@mourner
Copy link
Member Author

mourner commented Jul 23, 2018

I should also note that we don't capture jank in our benchmarks — they either measures something isolated in a single thread (e.g. layout), or rendering a static view without much tile loads (various paint benchmarks).

@ChrisLoer
Copy link
Contributor

Yikes, I didn't realize we were serializing the whole SymbolFeature as part of SymbolInstance! I don't think there's any reason we need to do that. When we look up features in symbol querying we just use an index into the "rawTileData" to get the feature. I just quickly removed it from the serialization and render/query/flow are still passing.

I don't know the full history there (presumably it was needed for something at some point), but it looks like when we added expression support we promoted an old list of feature properties to be an entire feature?

I suspect we can get rid of all the collision box serialization overhead just by switching to doing it in the foreground (#5474) -- it's a non-trivial refactoring and without actually doing it I don't know what the foreground CPU impact would be, but since we're already doing the "tile-to-viewport" transform on every collision box in the foreground, I suspect it might be pretty similar cost to just do the whole thing in the foreground.

@anandthakker
Copy link
Contributor

Nice, @mourner
In addition to the auditing you're suggesting, just a note that I think #4875 will likely set us up for more improvements here

@ChrisLoer
Copy link
Contributor

Another thing I'd note is that it's no longer necessary for "symbolInstances" to live on the worker after a layout result is delivered, since we no longer re-do placement on the worker side. So in theory they could be transferred instead of copied.

I'm confused by the results from #1504 (comment) though, they make it seem like avoiding the copy isn't actually the key difference -- am I understanding it right that basically doing your own deep copy and transferring it is way faster than letting Chrome do the deep copy for you? Why would that be the case?

@mourner
Copy link
Member Author

mourner commented Jul 24, 2018

@ChrisLoer I suppose because "copy and then transfer" is only applicable to typed arrays, while structured cloning is universal.

@mourner
Copy link
Member Author

mourner commented Aug 28, 2018

This ticket should be sufficiently addressed with #7124, #7132 and #7127.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance ⚡ Speed, stability, CPU usage, memory usage, or power usage
Projects
None yet
Development

No branches or pull requests

3 participants