[core] Blocking the Map Thread when destroying TileData objects #3724

tmpsantos · 2016-01-28T16:44:35Z

We create and destroy TileData objects constantly, specially when zooming in/out fast, and we might block and destroying them. This issue is noticeable on iOS (tested on a iPad Mini 2 Retina) and can be reproduced two ways:

Pinch the map fast, zooming in and out and you will see the map hanging sometimes.
Move the map to a place somewhat far from the current location and click on the top right button to trigger a "fly to" animation, that should hang a few times (can be made worse with the map on full tilt).

While the first scenario is hardly a real life use case, the second is. The root cause is that we cannot cancel worker tasks once they are started (see #2644).

tmpsantos · 2016-01-28T17:07:02Z

👀 @jfirebaugh @kkaefer

tmpsantos · 2016-01-28T17:10:53Z

Also easy to see using the "Start World Tour" option on the iOS app.

jfirebaugh · 2016-01-28T22:09:34Z

This adds additional complexity to an area where reasoning about threading behavior and object lifetimes is already difficult. I would much prefer we continue to investigate and eliminate the direct causes of blocking, rather than delaying TileData destructors.

tmpsantos · 2016-01-29T12:59:11Z

This adds additional complexity to an area where reasoning about threading behavior and object lifetimes is already difficult. I would much prefer we continue to investigate and eliminate the direct causes of blocking, rather than delaying TileData destructors.

The alternative I see would be change how we use workers. We would always transfer the ownership of what is pass on the parameters to the worker, so cancellation would still need a mutex for flipping a flag, but executing the task on the worker would not acquire that mutex. The worker would only acquire the mutex for posting back the callback to the Map Thread or not if it was cancelled.

Canceling an ongoing task would be a matter of let the task go on and self destruct at the end. Only destroying the worker itself would block.

Good thing is these objects momentarily owned by the worker would be destroyed on the worker thread in this case, offloading the destruction job from the Map Thread, but I don't think this will make a difference. The advantage I see is cancelling a task can be considered effectively non-blocking (not truly because the cancellation flag mutex).

Bad thing is we need to enforce a safe design pattern that I don't think it can be done programmatically like it is today. We would only pass as arguments to the worker std::unique_ptr or std::shared_ptr. The later only to immutable objects.

1ec5 · 2016-01-29T23:43:14Z

Pulling off the 3.1.0 milestone given #3731.

kkaefer · 2016-02-01T14:23:29Z

Thinking some more about this, I think we should actually implement what @tmpsantos is suggesting in #3724 (comment). Having a custom deleter will add unnecessary complexity and essentially add a custom deferred shared pointer destruction that could be very complicated to debug.

This is what our workers currently use:

layers from the stylesheet (these are already copied ✓)
actual tile data (they are moved ✓)
placement configuration (already copied ✓)
tile ID, source ID, map mode (already copied ✓)
SpriteStore, GlyphAtlas, GlyphStore, these are currently references and shared among all workers and the main thread. In JS, these live on the main thread, and web workers get a copy when initializing and communicate with the main thread to mutate them. We should do this in native as well, and make it easier to communicate across thread boundaries.

jfirebaugh · 2016-02-02T02:48:53Z

Agreed @kkaefer; that's been our long term goal since #667.

Add a `::tryCancel()` method to the WorkTask so objects that own a WorkTask can try to cancel and if it will block, postpone the cancellation. On debug mode, the RunLoop will print a warning when we are blocking trying to cancel tasks on the main thread which we should never do.

We all the time are creating and destroying TileData objects and when we try to destroy, we first cancel any ongoing WorkRequest. When canceling a WorkRequest, there are three scenarios: - We cancel before the RunLoop queue of the Worker Thread executes the task. This should not block the Map Thread. - We cancel after the work was already executed (but we didn't pick the callback yet). This should not block either, the bad thing is the work is wasted, but we can live with that. - We cancel but there is a work ongoing. This is the worse case scenario, because we block until the work is completed by the Worker Thread. This is a design choice because the Worker might share objects with the producer thread. It gets worse if for the particular device, parsing is slow. This new method `::tryCancel` will return false if canceling is not possible without blocking, so we can defer.

Emit notifications when the work is completed. This will be used by the next patch.

We cannot simply delete the TileData at will because it might block. The custom deleter makes sure that it will only delete the object when it is non-blocking.

jfirebaugh · 2016-04-15T19:20:11Z

Closing; I think we should investigate alternative solutions here per above comments.

jfirebaugh assigned tmpsantos Jan 28, 2016

jfirebaugh added the in progress label Jan 28, 2016

tmpsantos added the bug label Jan 28, 2016

1ec5 added this to the ios-v3.1.0 milestone Jan 28, 2016

1ec5 added the performance Speed, stability, CPU usage, memory usage, or power usage label Jan 28, 2016

tmpsantos mentioned this pull request Jan 28, 2016

Hang on each tile load — VectorTileData, TileWorker::parseLayer #3636

Closed

1ec5 removed this from the ios-v3.1.0 milestone Jan 29, 2016

jfirebaugh removed the in progress label Mar 24, 2016

tmpsantos and others added 5 commits April 14, 2016 18:14

[core] Make TileData observable

97b14d4

Emit notifications when the work is completed. This will be used by the next patch.

[core] Add a custom deleter for the TileData

12471d8

We cannot simply delete the TileData at will because it might block. The custom deleter makes sure that it will only delete the object when it is non-blocking.

Fix build

e3a4a03

jfirebaugh force-pushed the 3724-blocking_when_destroying_tiledata branch from a74e11b to e3a4a03 Compare April 15, 2016 01:34

jfirebaugh mentioned this pull request Apr 15, 2016

zooming is janky #4712

Merged

jfirebaugh closed this Apr 15, 2016

jfirebaugh deleted the 3724-blocking_when_destroying_tiledata branch April 15, 2016 19:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] Blocking the Map Thread when destroying TileData objects #3724

[core] Blocking the Map Thread when destroying TileData objects #3724

tmpsantos commented Jan 28, 2016

tmpsantos commented Jan 28, 2016

tmpsantos commented Jan 28, 2016

jfirebaugh commented Jan 28, 2016

tmpsantos commented Jan 29, 2016

1ec5 commented Jan 29, 2016

kkaefer commented Feb 1, 2016

jfirebaugh commented Feb 2, 2016

jfirebaugh commented Apr 15, 2016

[core] Blocking the Map Thread when destroying TileData objects #3724

[core] Blocking the Map Thread when destroying TileData objects #3724

Conversation

tmpsantos commented Jan 28, 2016

tmpsantos commented Jan 28, 2016

tmpsantos commented Jan 28, 2016

jfirebaugh commented Jan 28, 2016

tmpsantos commented Jan 29, 2016

1ec5 commented Jan 29, 2016

kkaefer commented Feb 1, 2016

jfirebaugh commented Feb 2, 2016

jfirebaugh commented Apr 15, 2016