Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async / Chunked Preloading #4925

Closed
wants to merge 25 commits into from
Closed

Conversation

kmeisthax
Copy link
Member

@kmeisthax kmeisthax commented Jul 24, 2021

image

This PR implements an asynchronous version of the preload step of Ruffle's movie clip lifecycle. This allows very complicated SWFs to load, remain interactive, and not bottleneck the player as they are loading.

  • Background task suspension - this allows us to run the preload step in an async task and yield time back to the interactive event loop
  • Partial preloading (based on a tag count) on the root movie clip
  • Transition all movie loading paths (script-initiated, initial fetched, and initial provided) to async preload
  • Block movie playback on the preload step (frame granularity)
  • Partial preloading on movie clips defined in DefineSprite tags
  • Use execution time rather than tag count to determine when to suspend and resume preloading
  • Propagate preload step status to movie preloaders
  • Figure out how [S] Cascade's preloader works

@Herschel
Copy link
Member

Herschel commented Jul 24, 2021

Thank you!

Future thoughts:

I'm not 100% convinced I want to use Rust async/await all the way through the stack -- my first attempt would be to wrap preload in a simple timed while loop. Will async/await code be simpler than this? Particularly things like SuspendFuture feel a little awkward -- it seems like a more-complex way to yield execution compared to wrapping preload with a dumb timed while loop that executed each frame on the main thread. Basically, I see three parts:

  1. Network I/O and fetching. This should definitely be Rust async, as it is currently. The I/O task's job is to stream downloaded data into a pre-allocated buffer, and fire some callbacks. All downloadable data should eventually goes through this layer (SWF, JPG, MP3, ...). A buffer gets filled, someone gets notified as the downloads progresses, completes, or errors.
  2. SWF preload parsing, the focus of this PR. This steps through new data in the buffer, updating things like frames_loaded as ShowFrame tags are encountered, and loads data from definition tags. This could be async, but this doesn't necessarily have to happen immediately in the same task as Part 1; we could ingest the data from the buffer at any point, such as the next frame loop. There is some benefit if we can offload this to a separate thread, but the crux of the work is:
  3. Definition tag handling. These are CPU-bound tasks such as tessellating shapes, image decoding, etc. These involve third-party crates which are generally not async. We want to eventually offload this from the main thread. Using the JS event loop, a long task will still block everything unless we use workers.

Part 1 definitely benefits from using async/await. Parts 2 and 3, I'm not completely sure. Part 3 is CPU bound, so a long running tessellation will block everything on both web and the current desktop executor. We eventually want to toss Part 3 to different threads when possible (particularly on desktop). On destkop, we can switch to a thread pool executor. On web, perhaps someday we toss the tessellation to a worker, but I'm not sure this will mesh cleanly with Rust async (for one, the worker code has to be in a separate JS file). All that said, I'm not against relying on async, but just weighing the pros/cons -- I'd like to see the benefits of going one way over the other.

Another note:
Should we swap out our desktop executor for some crate? Our current executor is more or less futures::executor::LocalPool, so if it's worthwhile, it might be nice to chuck our custom executor for a community-maintained one. We also may be able to switch to futures::executor::ThreadPool or something like smol.

@kmeisthax
Copy link
Member Author

My original thought was to make the platform specific code handle scheduling the preload directly, by changing, say, player.preload to just say if it needs more preloading done, and then having all the Ruffle ports have to hit it in order to work. However I realized that this would mean changing every (Rust-side) user of Ruffle to handle that; in a way that wouldn't even trip compile errors if I, say, forgot to update the exporter.

Furthermore, we already do need an asynchronous process for preloading, because user-initiated movie loads exist and need to be able to send onLoadComplete when done. That needs a way to both background the preload and be notified when it completes... which means we need a future that suspends the task until it's appropriate to run background tasks. All the other ways movies get loaded into Ruffle are special cases of this, with only set_root_movie currently having no interaction with the async loader machinery.

I feel like I should probably rename suspend and SuspendFuture to background and BackgroundFuture at some point, though. The purpose is to allow some long-running task to live on the main thread while still being lower priority than interactive events. And the nomenclature would make a bit more sense: an async task can de-prioritize itself by awaiting context.navigator.background().

Moving to a proper executor on desktop probably would help; I wrote the existing ones (the null executor and desktop's GLUTIN-based executor) as a bare-minimum effort to get some asynchronous behavior working. One other thing I'd like to do is make some macros or something to make async code less painful to write with our context system, though I'm not sure exactly how to do that yet.

@adrian17
Copy link
Collaborator

Furthermore, we already do need an asynchronous process for preloading, because user-initiated movie loads exist and need to be able to send onLoadComplete when done. That needs a way to both background the preload and be notified when it completes... which means we need a future that suspends the task until it's appropriate to run background tasks. All the other ways movies get loaded into Ruffle are special cases of this, with only set_root_movie currently having no interaction with the async loader machinery.

That still doesn't mean it needs the async/await machinery, right?

I'm not saying it's objectively worse - it definitely makes it simpler to manage state than if partial preload was manually interspersed with ticking. However, at least for me, it makes it harder to reason as to what's going on.

@ousia
Copy link
Contributor

ousia commented Jul 26, 2021

  • Partial preloading on movie clips defined in DefineSprite tags.

@kmeisthax, will the task above solve #1561 (comment)?

Many thanks for your help.

@kmeisthax
Copy link
Member Author

@ousia No, as your problem appears to have to do with the way AVM1 handles display object name resolution. I've posted a question in that other issue and I may be able to fix it quickly if I'm right about the root cause. It will be a different PR.

@ousia
Copy link
Contributor

ousia commented Jul 28, 2021

@kmeisthax, many thanks for your reply and your help.

It would be great to have #1561 fixed.

@kmeisthax
Copy link
Member Author

So, one thing I noticed early on is that [S] Cascade (the first movie I tried) doesn't preload properly. It does remain mostly interactive on desktop, at least. This appears to be a development oversight on the part of the original creator; all of the assets are on frame 1 before the preloader. This isn't a problem on Flash Player because it doesn't take nearly as long to scan through SWF data as we do in the preload step.

Other preloaders appear to be working. I don't think this PR is just a draft anymore...

I'm going to say that this patch is going to need extra QA before it merges, because it affects fundamental parts of the player. I'd specifically like to see which preloaders do and don't load correctly with this. There's probably other movies that have very heavy assets in frame 1, and will probably not preload correctly; but I'd like to know if there's any other problems. OTOH, any movie that skips it's preloader on this patch only would be a huge red flag that I missed some other API surface for getting loader progress.

I'd also like to know what people's feelings would be about me renaming MovieClipStatic to MovieClipShared and making it mutable. Right now it gets reallocated every preload and I don't think that's tenable anymore.

@adrian17 AVM1 has MovieClipLoader which lets you attach an event listener for onLoadCompleted, which is currently fired in the middle of the asynchronous path in loader.rs. That needs to be fired after all of the preload calls finish. So if I wanted to move preloading out of loader.rs, and into a Player.tick method, I'd need to have preload hold onto the MovieClipLoader and fire onLoadCompleted when done, instead of loader.rs. I don't think it's preload's job to do that.

(I should point out that everything I wrote about adding a new Player method breaking clients was only half-true. In fact, it turns out I wound up breaking tests because of how NullExecutor works...)

Part of the reason why it's difficult to understand what's going on in loader.rs is that our current player architecture and Rust async go together like Twizzlers and guacamole. There's a lot of annoying boilerplate involved with the async code triggering Player updates to get at 'a or 'gc data, because going the other way around would make all the futures tied to the current stack and thus not spawnable. I've actually brought this up to the Rust core team, but I'm not sure if we'll see new language features to fix this. It's entirely out of scope for this PR, but I'm considering adding macros of some kind to augment Rust async and make it easier to understand within this environment.

@kmeisthax kmeisthax marked this pull request as ready for review August 1, 2021 01:22
@kmeisthax kmeisthax changed the title [DRAFT] Async / Chunked Preloading Async / Chunked Preloading Aug 1, 2021
This technically works, but if multiple preload chunks are ran at once, tests start failing and Bad Things happen.
Right now, synchronous preload contexts do not set a chunk limit, while asynchronous ones have a chunk size of one.
… other tasks without a particular event source.
This massively improves the responsiveness of Ruffle when async preloading occurs.
The preload frame counting logic starts from one and continues to the end of the file, which results in a completely preloaded movie having one more frame "loaded" than there is in the file. This fixes that.
If we don't preload every movie clip we make, then they'll indicate themselves as not having been fully loaded yet, even though they are.
An "empty clip" is any clip created by the `new` or `new_with_avm2` function, intended for dynamically-created movie clips with no association to a movie symbol.
This requires estimating a count of bytes loaded of the original compressed stream, even though we only have uncompressed byte totals at this point. I instead rescale the uncompressed bytes by the compressed count to roughly estimate what would be the `bytesLoaded` had we been actually streaming bytes in and preloading them synchronously.
…nd runtime.

Actions are abstract; here we're using it to count bytes loaded (as a proxy for execution time). AVM code could potentially be adapted to count operations run instead.
@Hexstream
Copy link

Note that this is a serious usability issue for some of the heavier content, as it can otherwise take a few minutes to load.

On the web, users are known to abandon pages after just a few seconds...

@kmeisthax
Copy link
Member Author

kmeisthax commented Aug 22, 2021

FWIW I tried implementing this without the background method on NavigatorBackend, as a prelude to removing async behavior on preload entirely. This didn't work out well, see chunked-preload-alt on my local fork to get a hint as to why.

So, the reason why I used async is because child movie loads need to still fire completion events in AVM1 (and AVM2 when we implement loading there). Yes, the root movie technically doesn't need that, and we can just make movie clips to remember to fire their own completion events instead of letting an async task do that. The problem with this asyncless approach is that the proposed alternative, Player.tick, does not actually get called by all of our Ruffle client code. Outside of desktop and web, all other clients just blindly call run_frame however many times they need to until they get what they want, and then exit.

For the record, that means:

  • Regression tests hung until I made the test harness repeatedly call LoadManager.tick until the executor ran out of tasks. In fact, I have to do that every frame, or the MovieClipLoader tests break.
  • Exporter needs to do the same, but I gave up before implementing this
  • The desktop timedemo is probably broken as well, when I made NullNavigatorBackend require an executor it got flagged

So, why isn't this a problem with chunked-preload as is? Well, simple enough: I made NullNavigatorBackend just not suspend any tasks. It's designed to work with executors being optional, so all the "async" methods it implements block and return instead of yielding. This makes async optional for clients that want everything to run deterministically.

We can't make all clients call Player.tick because the regression tests and exporter need to run a specific number of frames and have them run in lockstep with some other process, such as exporting frames. tick is intended for realtime playback, where we're trying to pace playback to a certain amount of execution time. Clients should be able to bypass this logic if they don't need it.

That being said, one thing I am going to do is have tick send feedback to LoadManager about what the current background time budget should be. As @adrian17 pointed out, if you load two child movies, they'll both consume 75% of the frametime, which is actually 150%, and you'll get stutter. For clients that don't call tick, we'll stick with the 75% default and rely on NullNavigatorBackend to just never actually suspend us.

@geotsak
Copy link

geotsak commented Sep 6, 2021

I tried using this PR in my private personal multifile flash project. The issue I faced was that loading stops midway, with only half the game's files loaded. This issue doesn't appear with the master build, it only appears when using this PR. I have privately sent the link to my project to kmeisthax.

@CelestialAmber
Copy link

Have you made any progress on fixing the issues? This PR has been open for a while now...

@Toad06 Toad06 mentioned this pull request Feb 4, 2022
@torokati44
Copy link
Member

So, I guess this can be closed now?

@CelestialAmber
Copy link

finally

@Hexstream
Copy link

It seems that this and/or other improvements have cut down the Banned from Equestria 1.5 load time from 70 seconds to 33 seconds (on my machine), which is of course a large and welcome improvement, but that content is supposed to load instantly.

@adrian17
Copy link
Collaborator

adrian17 commented Sep 9, 2022

finally

This is being closed due to being superseded by #7858 ;) But hopefully that one is much closer to being complete.

It seems that this and/or other improvements have cut down the Banned from Equestria 1.5 load time from 70 seconds to 33 seconds (on my machine), which is of course a large and welcome improvement, but #5056.

This change wasn't merged yet.
The point of async preload is not to speed up loading (if anything, it might slow it down a couple %), but making the preloader frame actually show up during the loading process instead of hanging the entire page for the duration. So it'll still take it ~30s to load the entire thing, but the first frame will show much faster, after 1-2s.
EDIT: just checked, indeed takes roughly a second.

@Hexstream
Copy link

This change wasn't merged yet.

Right, of course, I was careless.

It seems that there has been large related improvements since Ruffle nightly 2022-06-25, then...

The point of async preload is not to speed up loading (if anything, it might slow it down a couple %), but making the preloader frame actually show up during the loading process instead of hanging the entire page for the duration. So it'll still take it ~30s to load the entire thing, but the first frame will show much faster, after 1-2s.

Right, of course what really matters here is the user-perceived loading time...

EDIT: just checked, indeed takes roughly a second.

🎉🎉🎉 GOD, then I can't wait to upgrade to this once it lands, then!!! 🎉🎉🎉

@SN902
Copy link

SN902 commented Sep 9, 2022

It seems that this and/or other improvements have cut down the Banned from Equestria 1.5 load time from 70 seconds to 33 seconds (on my machine), which is of course a large and welcome improvement, but that content is supposed to load instantly.

On my i9 11900 load time 8 sec for this game

@Hexstream
Copy link

On my i9 11900 load time 8 sec for this game

Yes, I realize everything loads quickly on extremely powerful computers.

By the way, this used to take around 3:30 to load on my slow computer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants