Async / Chunked Preloading #4925

kmeisthax · 2021-07-24T02:54:59Z

This PR implements an asynchronous version of the preload step of Ruffle's movie clip lifecycle. This allows very complicated SWFs to load, remain interactive, and not bottleneck the player as they are loading.

Background task suspension - this allows us to run the preload step in an async task and yield time back to the interactive event loop
Partial preloading (based on a tag count) on the root movie clip
Transition all movie loading paths (script-initiated, initial fetched, and initial provided) to async preload
Block movie playback on the preload step (frame granularity)
Partial preloading on movie clips defined in DefineSprite tags
Use execution time rather than tag count to determine when to suspend and resume preloading
Propagate preload step status to movie preloaders
Figure out how [S] Cascade's preloader works

Herschel · 2021-07-24T09:42:03Z

Thank you!

Future thoughts:

I'm not 100% convinced I want to use Rust async/await all the way through the stack -- my first attempt would be to wrap preload in a simple timed while loop. Will async/await code be simpler than this? Particularly things like SuspendFuture feel a little awkward -- it seems like a more-complex way to yield execution compared to wrapping preload with a dumb timed while loop that executed each frame on the main thread. Basically, I see three parts:

Network I/O and fetching. This should definitely be Rust async, as it is currently. The I/O task's job is to stream downloaded data into a pre-allocated buffer, and fire some callbacks. All downloadable data should eventually goes through this layer (SWF, JPG, MP3, ...). A buffer gets filled, someone gets notified as the downloads progresses, completes, or errors.
SWF preload parsing, the focus of this PR. This steps through new data in the buffer, updating things like frames_loaded as ShowFrame tags are encountered, and loads data from definition tags. This could be async, but this doesn't necessarily have to happen immediately in the same task as Part 1; we could ingest the data from the buffer at any point, such as the next frame loop. There is some benefit if we can offload this to a separate thread, but the crux of the work is:
Definition tag handling. These are CPU-bound tasks such as tessellating shapes, image decoding, etc. These involve third-party crates which are generally not async. We want to eventually offload this from the main thread. Using the JS event loop, a long task will still block everything unless we use workers.

Part 1 definitely benefits from using async/await. Parts 2 and 3, I'm not completely sure. Part 3 is CPU bound, so a long running tessellation will block everything on both web and the current desktop executor. We eventually want to toss Part 3 to different threads when possible (particularly on desktop). On destkop, we can switch to a thread pool executor. On web, perhaps someday we toss the tessellation to a worker, but I'm not sure this will mesh cleanly with Rust async (for one, the worker code has to be in a separate JS file). All that said, I'm not against relying on async, but just weighing the pros/cons -- I'd like to see the benefits of going one way over the other.

Another note:
Should we swap out our desktop executor for some crate? Our current executor is more or less futures::executor::LocalPool, so if it's worthwhile, it might be nice to chuck our custom executor for a community-maintained one. We also may be able to switch to futures::executor::ThreadPool or something like smol.

web/src/navigator.rs

kmeisthax · 2021-07-24T22:43:35Z

My original thought was to make the platform specific code handle scheduling the preload directly, by changing, say, player.preload to just say if it needs more preloading done, and then having all the Ruffle ports have to hit it in order to work. However I realized that this would mean changing every (Rust-side) user of Ruffle to handle that; in a way that wouldn't even trip compile errors if I, say, forgot to update the exporter.

Furthermore, we already do need an asynchronous process for preloading, because user-initiated movie loads exist and need to be able to send onLoadComplete when done. That needs a way to both background the preload and be notified when it completes... which means we need a future that suspends the task until it's appropriate to run background tasks. All the other ways movies get loaded into Ruffle are special cases of this, with only set_root_movie currently having no interaction with the async loader machinery.

I feel like I should probably rename suspend and SuspendFuture to background and BackgroundFuture at some point, though. The purpose is to allow some long-running task to live on the main thread while still being lower priority than interactive events. And the nomenclature would make a bit more sense: an async task can de-prioritize itself by awaiting context.navigator.background().

Moving to a proper executor on desktop probably would help; I wrote the existing ones (the null executor and desktop's GLUTIN-based executor) as a bare-minimum effort to get some asynchronous behavior working. One other thing I'd like to do is make some macros or something to make async code less painful to write with our context system, though I'm not sure exactly how to do that yet.

adrian17 · 2021-07-25T10:31:25Z

Furthermore, we already do need an asynchronous process for preloading, because user-initiated movie loads exist and need to be able to send onLoadComplete when done. That needs a way to both background the preload and be notified when it completes... which means we need a future that suspends the task until it's appropriate to run background tasks. All the other ways movies get loaded into Ruffle are special cases of this, with only set_root_movie currently having no interaction with the async loader machinery.

That still doesn't mean it needs the async/await machinery, right?

I'm not saying it's objectively worse - it definitely makes it simpler to manage state than if partial preload was manually interspersed with ticking. However, at least for me, it makes it harder to reason as to what's going on.

ousia · 2021-07-26T18:35:59Z

Partial preloading on movie clips defined in DefineSprite tags.

@kmeisthax, will the task above solve #1561 (comment)?

Many thanks for your help.

kmeisthax · 2021-07-27T22:13:16Z

@ousia No, as your problem appears to have to do with the way AVM1 handles display object name resolution. I've posted a question in that other issue and I may be able to fix it quickly if I'm right about the root cause. It will be a different PR.

ousia · 2021-07-28T06:18:26Z

@kmeisthax, many thanks for your reply and your help.

It would be great to have #1561 fixed.

kmeisthax · 2021-08-01T01:22:23Z

So, one thing I noticed early on is that [S] Cascade (the first movie I tried) doesn't preload properly. It does remain mostly interactive on desktop, at least. This appears to be a development oversight on the part of the original creator; all of the assets are on frame 1 before the preloader. This isn't a problem on Flash Player because it doesn't take nearly as long to scan through SWF data as we do in the preload step.

Other preloaders appear to be working. I don't think this PR is just a draft anymore...

I'm going to say that this patch is going to need extra QA before it merges, because it affects fundamental parts of the player. I'd specifically like to see which preloaders do and don't load correctly with this. There's probably other movies that have very heavy assets in frame 1, and will probably not preload correctly; but I'd like to know if there's any other problems. OTOH, any movie that skips it's preloader on this patch only would be a huge red flag that I missed some other API surface for getting loader progress.

I'd also like to know what people's feelings would be about me renaming MovieClipStatic to MovieClipShared and making it mutable. Right now it gets reallocated every preload and I don't think that's tenable anymore.

@adrian17 AVM1 has MovieClipLoader which lets you attach an event listener for onLoadCompleted, which is currently fired in the middle of the asynchronous path in loader.rs. That needs to be fired after all of the preload calls finish. So if I wanted to move preloading out of loader.rs, and into a Player.tick method, I'd need to have preload hold onto the MovieClipLoader and fire onLoadCompleted when done, instead of loader.rs. I don't think it's preload's job to do that.

(I should point out that everything I wrote about adding a new Player method breaking clients was only half-true. In fact, it turns out I wound up breaking tests because of how NullExecutor works...)

Part of the reason why it's difficult to understand what's going on in loader.rs is that our current player architecture and Rust async go together like Twizzlers and guacamole. There's a lot of annoying boilerplate involved with the async code triggering Player updates to get at 'a or 'gc data, because going the other way around would make all the futures tied to the current stack and thus not spawnable. I've actually brought this up to the Rust core team, but I'm not sure if we'll see new language features to fix this. It's entirely out of scope for this PR, but I'm considering adding macros of some kind to augment Rust async and make it easier to understand within this environment.

This technically works, but if multiple preload chunks are ran at once, tests start failing and Bad Things happen.

…onously.

Right now, synchronous preload contexts do not set a chunk limit, while asynchronous ones have a chunk size of one.

…nishes.

… other tasks without a particular event source.

…ases

…yet.

This massively improves the responsiveness of Ruffle when async preloading occurs.

…ground task suspension

…efore it can be called

…oding to stop

The preload frame counting logic starts from one and continues to the end of the file, which results in a completely preloaded movie having one more frame "loaded" than there is in the file. This fixes that.

If we don't preload every movie clip we make, then they'll indicate themselves as not having been fully loaded yet, even though they are.

…efore running frames.

An "empty clip" is any clip created by the `new` or `new_with_avm2` function, intended for dynamically-created movie clips with no association to a movie symbol.

This requires estimating a count of bytes loaded of the original compressed stream, even though we only have uncompressed byte totals at this point. I instead rescale the uncompressed bytes by the compressed count to roughly estimate what would be the `bytesLoaded` had we been actually streaming bytes in and preloading them synchronously.

…nd runtime. Actions are abstract; here we're using it to count bytes loaded (as a proxy for execution time). AVM code could potentially be adapted to count operations run instead.

…hecked after every 10000 bytes.

…rent movie's frame time.

Hexstream · 2021-08-16T16:58:07Z

Note that this is a serious usability issue for some of the heavier content, as it can otherwise take a few minutes to load.

On the web, users are known to abandon pages after just a few seconds...

kmeisthax · 2021-08-22T01:35:44Z

FWIW I tried implementing this without the background method on NavigatorBackend, as a prelude to removing async behavior on preload entirely. This didn't work out well, see chunked-preload-alt on my local fork to get a hint as to why.

So, the reason why I used async is because child movie loads need to still fire completion events in AVM1 (and AVM2 when we implement loading there). Yes, the root movie technically doesn't need that, and we can just make movie clips to remember to fire their own completion events instead of letting an async task do that. The problem with this asyncless approach is that the proposed alternative, Player.tick, does not actually get called by all of our Ruffle client code. Outside of desktop and web, all other clients just blindly call run_frame however many times they need to until they get what they want, and then exit.

For the record, that means:

Regression tests hung until I made the test harness repeatedly call LoadManager.tick until the executor ran out of tasks. In fact, I have to do that every frame, or the MovieClipLoader tests break.
Exporter needs to do the same, but I gave up before implementing this
The desktop timedemo is probably broken as well, when I made NullNavigatorBackend require an executor it got flagged

So, why isn't this a problem with chunked-preload as is? Well, simple enough: I made NullNavigatorBackend just not suspend any tasks. It's designed to work with executors being optional, so all the "async" methods it implements block and return instead of yielding. This makes async optional for clients that want everything to run deterministically.

We can't make all clients call Player.tick because the regression tests and exporter need to run a specific number of frames and have them run in lockstep with some other process, such as exporting frames. tick is intended for realtime playback, where we're trying to pace playback to a certain amount of execution time. Clients should be able to bypass this logic if they don't need it.

That being said, one thing I am going to do is have tick send feedback to LoadManager about what the current background time budget should be. As @adrian17 pointed out, if you load two child movies, they'll both consume 75% of the frametime, which is actually 150%, and you'll get stutter. For clients that don't call tick, we'll stick with the 75% default and rely on NullNavigatorBackend to just never actually suspend us.

geotsak · 2021-09-06T20:24:27Z

I tried using this PR in my private personal multifile flash project. The issue I faced was that loading stops midway, with only half the game's files loaded. This issue doesn't appear with the master build, it only appears when using this PR. I have privately sent the link to my project to kmeisthax.

CelestialAmber · 2021-12-15T05:28:07Z

Have you made any progress on fixing the issues? This PR has been open for a while now...

torokati44 · 2022-09-09T20:16:57Z

So, I guess this can be closed now?

CelestialAmber · 2022-09-09T21:40:57Z

finally

Hexstream · 2022-09-09T22:08:25Z

It seems that this and/or other improvements have cut down the Banned from Equestria 1.5 load time from 70 seconds to 33 seconds (on my machine), which is of course a large and welcome improvement, but that content is supposed to load instantly.

adrian17 · 2022-09-09T22:25:27Z

finally

This is being closed due to being superseded by #7858 ;) But hopefully that one is much closer to being complete.

It seems that this and/or other improvements have cut down the Banned from Equestria 1.5 load time from 70 seconds to 33 seconds (on my machine), which is of course a large and welcome improvement, but #5056.

This change wasn't merged yet.
The point of async preload is not to speed up loading (if anything, it might slow it down a couple %), but making the preloader frame actually show up during the loading process instead of hanging the entire page for the duration. So it'll still take it ~30s to load the entire thing, but the first frame will show much faster, after 1-2s.
EDIT: just checked, indeed takes roughly a second.

Hexstream · 2022-09-09T23:20:50Z

This change wasn't merged yet.

Right, of course, I was careless.

It seems that there has been large related improvements since Ruffle nightly 2022-06-25, then...

The point of async preload is not to speed up loading (if anything, it might slow it down a couple %), but making the preloader frame actually show up during the loading process instead of hanging the entire page for the duration. So it'll still take it ~30s to load the entire thing, but the first frame will show much faster, after 1-2s.

Right, of course what really matters here is the user-perceived loading time...

EDIT: just checked, indeed takes roughly a second.

🎉🎉🎉 GOD, then I can't wait to upgrade to this once it lands, then!!! 🎉🎉🎉

SN902 · 2022-09-09T23:48:51Z

It seems that this and/or other improvements have cut down the Banned from Equestria 1.5 load time from 70 seconds to 33 seconds (on my machine), which is of course a large and welcome improvement, but that content is supposed to load instantly.

On my i9 11900 load time 8 sec for this game

Hexstream · 2022-09-10T01:25:04Z

On my i9 11900 load time 8 sec for this game

Yes, I realize everything loads quickly on extremely powerful computers.

By the way, this used to take around 3:30 to load on my slow computer.

Herschel reviewed Jul 24, 2021

View reviewed changes

web/src/navigator.rs Show resolved Hide resolved

kmeisthax mentioned this pull request Aug 1, 2021

Extension: Sidestep CORS issues by offering to load the SWF in a tab #4665

Merged

kmeisthax marked this pull request as ready for review August 1, 2021 01:22

kmeisthax changed the title ~~[DRAFT] Async / Chunked Preloading~~ Async / Chunked Preloading Aug 1, 2021

kmeisthax force-pushed the chunked-preload branch from 8824c34 to ce41664 Compare August 9, 2021 22:11

kmeisthax added 18 commits August 9, 2021 20:47

core: Half-implement chunked decoding.

a00299e

This technically works, but if multiple preload chunks are ran at once, tests start failing and Bad Things happen.

core: When preloading other movie clips, forcibly preload them synchr…

7139df2

…onously.

core: Make the preload chunk size configurable per caller.

c5bc571

Right now, synchronous preload contexts do not set a chunk limit, while asynchronous ones have a chunk size of one.

core: Don't preload the sound stream end until preloading actually fi…

f93d141

…nishes.

core: Add a mechanism for CPU-bound async code to yield their time to…

5a61ced

… other tasks without a particular event source.

core: Run the asynchronous preload step for root movie loads in all c…

352cfe6

…ases

core: Prohibit playing movie clip frames that haven't been preloaded …

859dc7b

…yet.

desktop: Only unsuspend background tasks after redraw finishes.

db35de9

This massively improves the responsiveness of Ruffle when async preloading occurs.

chore: Reduce type complexity on the desktop's implementation of back…

ce097a8

…ground task suspension

web: Convert the callback into a JS value to avoid it being dropped b…

fdf6a8b

…efore it can be called

core: Don't attempt to await async preload *while the player is locked*

793d2ba

core: Allow callers of decode_tags to flag when they would like dec…

1872c25

…oding to stop

core: On unexpected EOF, mark the clip preload as completed

9690642

core: Make frame loaded count no longer be off by one

e3b8139

The preload frame counting logic starts from one and continues to the end of the file, which results in a completely preloaded movie having one more frame "loaded" than there is in the file. This fixes that.

avm2: Preload dynamically created movie clips

19ee7e7

If we don't preload every movie clip we make, then they'll indicate themselves as not having been fully loaded yet, even though they are.

tests: Test framework should block on all futures (e.g. preloading) b…

ecdd634

…efore running frames.

core: Ensure empty movie clips are born preloaded.

5c3214e

An "empty clip" is any clip created by the `new` or `new_with_avm2` function, intended for dynamically-created movie clips with no association to a movie symbol.

core: Allow chunked preloading within a DefineSprite tag

4a941f1

kmeisthax added 7 commits August 9, 2021 20:47

avm1: Expose preload progress to AVM1 via getBytesLoaded

b67d4a5

core: Add a concept of an ExecutionLimit that checks both actions a…

77ced35

…nd runtime. Actions are abstract; here we're using it to count bytes loaded (as a proxy for execution time). AVM code could potentially be adapted to count operations run instead.

core: Set some reasonable-sounding execution limits: 5ms per chunk, c…

e47dd7a

…hecked after every 10000 bytes.

core: Rename suspend to background where possible

4a49e0c

core: Actually decrease the current action limit correctly

c053da0

core: Dynamically scale the preload execution limit to 75% of the cur…

d10421c

…rent movie's frame time.

kmeisthax force-pushed the chunked-preload branch from ce41664 to d10421c Compare August 10, 2021 00:47

Bale001 mentioned this pull request Aug 13, 2021

Abnormally long load times #5056

Open

Toad06 mentioned this pull request Feb 4, 2022

Unable to create VAO #1905

Open

kmeisthax mentioned this pull request Sep 3, 2022

Chunked Preloading, too #7858

Merged

kmeisthax closed this in #7858 Sep 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async / Chunked Preloading #4925

Async / Chunked Preloading #4925

kmeisthax commented Jul 24, 2021 •

edited

Loading

Herschel commented Jul 24, 2021 •

edited

Loading

kmeisthax commented Jul 24, 2021

adrian17 commented Jul 25, 2021

ousia commented Jul 26, 2021

kmeisthax commented Jul 27, 2021

ousia commented Jul 28, 2021

kmeisthax commented Aug 1, 2021

Hexstream commented Aug 16, 2021

kmeisthax commented Aug 22, 2021 •

edited

Loading

geotsak commented Sep 6, 2021

CelestialAmber commented Dec 15, 2021

torokati44 commented Sep 9, 2022

CelestialAmber commented Sep 9, 2022

Hexstream commented Sep 9, 2022

adrian17 commented Sep 9, 2022 •

edited

Loading

Hexstream commented Sep 9, 2022

SN902 commented Sep 9, 2022

Hexstream commented Sep 10, 2022

Async / Chunked Preloading #4925

Async / Chunked Preloading #4925

Conversation

kmeisthax commented Jul 24, 2021 • edited Loading

Herschel commented Jul 24, 2021 • edited Loading

kmeisthax commented Jul 24, 2021

adrian17 commented Jul 25, 2021

ousia commented Jul 26, 2021

kmeisthax commented Jul 27, 2021

ousia commented Jul 28, 2021

kmeisthax commented Aug 1, 2021

Hexstream commented Aug 16, 2021

kmeisthax commented Aug 22, 2021 • edited Loading

geotsak commented Sep 6, 2021

CelestialAmber commented Dec 15, 2021

torokati44 commented Sep 9, 2022

CelestialAmber commented Sep 9, 2022

Hexstream commented Sep 9, 2022

adrian17 commented Sep 9, 2022 • edited Loading

Hexstream commented Sep 9, 2022

SN902 commented Sep 9, 2022

Hexstream commented Sep 10, 2022

kmeisthax commented Jul 24, 2021 •

edited

Loading

Herschel commented Jul 24, 2021 •

edited

Loading

kmeisthax commented Aug 22, 2021 •

edited

Loading

adrian17 commented Sep 9, 2022 •

edited

Loading