Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the GeoJSON ⇢ worker transfer #1504

Closed
jfirebaugh opened this issue Sep 18, 2015 · 26 comments
Closed

Optimize the GeoJSON ⇢ worker transfer #1504

jfirebaugh opened this issue Sep 18, 2015 · 26 comments
Labels
performance ⚡ Speed, stability, CPU usage, memory usage, or power usage

Comments

@jfirebaugh
Copy link
Contributor

Of course, it's always best to load directly into the worker with an external URL, but many use cases with GeoJSON requiring generating / filtering / sorting GeoJSON on the main thread and then passing it to a GeoJSON layer. We could try using blob URLs to see if it helps.

On the main thread:

var blob = new Blob([JSON.stringify(geoJSON)], { type: 'application/json' });
var url = (window.URL || window.webkitURL).createObjectURL(blob);

On the worker thread, do a normal ajax request and then revokeObjectURL.

My guess is this won't really help because you still have to stringify the GeoJSON, but it's worth a shot.

@buchanae
Copy link
Contributor

@buchanae
Copy link
Contributor

Although the transferred object would be removed from the main process? Not sure how you'd deal with that unless you want to deep clone the data before transfer. Maybe that's just as slow as stringify.

I guess postMessage(object) is doing the fastest deep clone you could get :/

Maybe you could transfer into the worker and transfer back? heh

@jfirebaugh
Copy link
Contributor Author

That's another thing to try, although again, it would require a stringify + parse, plus additional overhead in copying bytes from a string to a typed array and back.

@buchanae
Copy link
Contributor

Ya. Was later thinking that maybe one way to ensure great performance would be to get your ad hoc code into the worker somehow.

@mcwhittemore
Copy link
Contributor

Per chat with @jfirebaugh we should perf out a few different ways to move data to Worker processes before we try to improve this perf. He recommended looking into:

  • Compare object
  • self-stringify
  • blob trick
  • Transferrable object

I'd like to dive into this, but I'm a few weeks out before I'd have the time to really solve this problem.

@lucaswoj
Copy link
Contributor

In #2001, @dcervelli provided a PR that accepts pre-stringified GeoJSON, citing a 6x perf boost.

We decided to take a step back and think about the best long term solution to the problem before including anything in the next release.

@mourner suggested two additional transfer techniques we should investigate

  • stringified JSON as a transferrable byte array
  • geobuf as a byte array

@mourner
Copy link
Member

mourner commented Jan 28, 2016

@dcervelli @jfirebaugh @lucaswoj Just did a quick test of the stringified JSON approach. Performance on a 11MB GeoJSON in Chrome:

stringify json: 244.846ms
convert to Uint16Array: 70.497ms

convert back to string: 558.237ms
parse json: 231.529ms

The first two would happen on the main thread, then sent as transferable to the worker. Converting back to string on the worker side is expensive, but at least it's not blocking the main thread. Geobuf would probably cope much better in this regard though. Also not sure how much less this blocks compared to structured cloning transfer of json — needs a benchmark. Test code:

console.time('stringify');
var str = JSON.stringify(json);
console.timeEnd('stringify');

console.time('convert to Uint16Array');
var bytes = new Uint16Array(str.length);
for (var i = 0; i < str.length; i++) bytes[i] = str.charCodeAt(i);
console.timeEnd('convert to Uint16Array');

console.time('convert back to string');
var str2 = '';
for (var i = 0; i < bytes.length; i++) str2 += String.fromCharCode(bytes[i]);
console.timeEnd('convert back to string');

console.time('parse json');
var json2 = JSON.parse(str2);
console.timeEnd('parse json');

@lucaswoj lucaswoj changed the title check if the Blob + createObjectURL trick lowers the cost of GeoJSON ⇢ worker transfer Optimize the GeoJSON ⇢ worker transfer Feb 3, 2016
@lucaswoj
Copy link
Contributor

lucaswoj commented Feb 3, 2016

I'd like to add the "skipWorker" approach to the list of things we should test

@mourner
Copy link
Member

mourner commented Feb 4, 2016

Spent some time experimenting today. Results of my benchmarks for sending a 11MB JSON sample to a worker in Chrome:

# structured cloning: 
send: 3351ms

# transferable string bytes hack
pack: 235ms
send: 401ms
unpack: 612ms
roundtrip: 1249ms

# stringify/parse
pack: 162ms
send: 412ms
unpack: 183ms
roundtrip: 757ms

# transferable geobuf
pack: 89ms
send: 366ms
unpack: 132ms
roundtrip: 587ms

Geobuf is the best option overall, with stringify/parse (without transferable) coming close. Big surprise here is the fact it sends a huge string in about the same time as doing an ownership transfer of an array buffer. Also, I really don't understand why structured cloning of JSON is so incredibly slow in Chrome. Now curious to check this stuff on Firefox and Safari.

@mourner
Copy link
Member

mourner commented Feb 4, 2016

Firefox is even faster sending strings, and has proper transferable performance:

# stringify/parse
pack: 190ms
send: 150ms
unpack: 171ms

# geobuf
pack: 147ms
send: 24ms
unpack: 182ms

Now I wonder if there's a bug with transferable objects in the current Chrome.

@mourner
Copy link
Member

mourner commented Feb 4, 2016

OK figured out the Chrome issue — I was instantiating a worker and then immediately sending a message to it, but it wasn't ready to immediately respond; if I do benchmarks on 1s timeout after worker, all browsers properly do buffer transfer in 0ms.

That brings us to the following results in Chrome:

# structured cloning: 
send: 2385ms

# stringify/parse
pack: 170ms
send: 14ms
unpack: 173ms
roundtrip: 357ms

# transferable geobuf
pack: 89ms
send: 0ms
unpack: 132ms
roundtrip: 221ms

@mourner
Copy link
Member

mourner commented Feb 4, 2016

For the record: also tried MessagePack, and my own Sax-style Protobuf-based JSON serialization. Both worse than stringify/parse and geobuf:

# message pack
pack: 257ms
unpack: 200ms

# jsonbuf
pack: 341ms
unpack: 360ms

@lucaswoj
Copy link
Contributor

lucaswoj commented Feb 4, 2016

Round-trip performance on very small GeoJSON sources is equally important to performance on very large GeoJSON sources. We should try to benchmark both.

@mourner
Copy link
Member

mourner commented Feb 4, 2016

@lucaswoj sure, although performance depends linearly on size in this case, so I'm certain the results will generally be similar.

@lucaswoj
Copy link
Contributor

lucaswoj commented Feb 4, 2016

The bottlenecks processing 11MB GeoJSON file are very different from those of a GeoJSON file with a dozen vertices. Having very fast processing of small GeoJSON files is important for gl-draw and other interactivity / animation use cases.

@lucaswoj
Copy link
Contributor

lucaswoj commented Feb 4, 2016

That is to say, the linear-time costs might be superseded by some constant-time costs.

@mourner
Copy link
Member

mourner commented Feb 4, 2016

@lucaswoj what would be a good test case for this? Sending a very simple GeoJSON (a few polys, lines and points drawn in geojson.io) there and back a 1000 times?

@lucaswoj
Copy link
Contributor

lucaswoj commented Feb 4, 2016

That sounds like a good test case to me 👍

@mourner
Copy link
Member

mourner commented Feb 5, 2016

@lucaswoj @jfirebaugh made a benchmark for a relatively small (80kb) GeoJSON. It runs as follows:

  1. The main thread encodes the GeoJSON and sends to the worker.
  2. The worker decodes it and sends "ok" to the main thread.
  3. The main thread receives "ok" from worker and runs step 1 again.
  4. Repeat 1000 times total.

Results:

  • Structured cloning: 17,206ms
  • Stringify/parse: 3,276ms
  • Transferable Geobuf: 1,251ms

@mourner
Copy link
Member

mourner commented Feb 5, 2016

Also tested a super-tiny GeoJSON, just a few simple features (5kb). This time Geobuf is pretty close to Stringify/Prase.

  • Structured cloning: 396ms
  • Stringify/parse: 169ms
  • Transferable Geobuf: 162ms

@lucaswoj
Copy link
Contributor

#1584 might help too!

@mourner
Copy link
Member

mourner commented Feb 11, 2016

@lucaswoj that one probably has a minor difference, I wanted to fix this mainly so that it doesn't eat up errors.

nickpeihl pushed a commit to nickpeihl/turf-async that referenced this issue Feb 19, 2016
According to mapbox/mapbox-gl-js#1504 the
transferring of GeoJSON files between main and worker threads can cause
a bottleneck. We should be able to reduce this bottleneck by converting
and passing the GeoJSON file created in the Worker as a geobuf and
decoding it on the main thread.

This commit is broken because of a possible bug with Webworkify not
pulling in correct dependencies. See
browserify/webworkify#14
mourner added a commit that referenced this issue Mar 4, 2016
This is a reapplication of #2001 without changing the API, and serves
as the middle ground for improving `setData` performance as discussed
in #1504.
mourner added a commit that referenced this issue Mar 4, 2016
This is a reapplication of #2001 without changing the API, and serves
as the middle ground for improving `setData` performance as discussed
in #1504.
@nrathi
Copy link

nrathi commented May 31, 2016

Any progress on this?

@mourner
Copy link
Member

mourner commented May 31, 2016

@nrathi do you have a particular problem with this? It's much faster with recent releases.

@nrathi
Copy link

nrathi commented May 31, 2016

I am trying to "animate" a 6.67 MB GeoJson and it's not great.

Every 500ms, I create a new GeoJson on the client-side fro a pre-loaded GeoJson and JSON, which takes about 10-15ms, and call setData. About 1s (or more) later, the new GeoJson is finally drawn. I even drop frames as the animation progresses. Do you have any recommendations? Thanks in advance.

@lucaswoj lucaswoj added the performance ⚡ Speed, stability, CPU usage, memory usage, or power usage label Jul 28, 2016
@lucaswoj
Copy link
Contributor

Our GeoJSON transfer performance is much faster in recent releases. Let's ticket out future perf improvement concepts separately in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance ⚡ Speed, stability, CPU usage, memory usage, or power usage
Projects
None yet
Development

No branches or pull requests

6 participants