-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System sets and parallel executor v2 #1144
Conversation
Renamed `access::Access` to `ArchetypeAccess`, made it private. No particular reason. Removed `ThreadLocalExecution`, `System::thread_local_execution()` - this information is now encoded in `::archetype_component_access()`, `::resource_access()`, and new `::is_thread_local()`. Extended `TypeAccess` with support for "reads/writes everything" to facilitate the above. Renamed `ThreadLocalSystemFn` and the module `into_thread_local` to `ExclusiveSystemFn` and `into_exclusive` respectively, to differentiate systems with this access pattern from actually thread-local systems. Implemented `IntoSystem` for `FnMut(&mut World)` and `FnMut(&mut Resources)`, both result in `ExclusiveSystemFn`. Just because. Probably in vain. Rewrote `SerialSystemStageExecutor` to account for `ThreadLocalExecution` removal. Implemented `ThreadLocal` system parameter. Implemented mk1 label and dependencies methods on `SystemSet` and `SystemStage`. Added `ShouldRun::NoAndLoop`.
Renamed `System::update()` to `System::update_access()`, for clarity. Plumbed labels and dependencies into the stage executor, via `SystemStage::labels` (a stage-global map of label to system index), `SystemSet::system_label()`, and `SystemSet::system_dependencies()`. Rewrote `SerialSystemStageExecutor` to support exclusive systems at start of stage as well; unfinished.
Began implementation; doesn't do anything yet because the pathway from data it gets to data it wants is not implemented. Changed system sets to store systems in `NonNull` rather than `Box`. Requires auditing and a sanity check.
Minor cleanup. Partially implemented run criteria handling.
Minor refactor. Fixed a future bug in not yet implemented `can_start_now()`. Accounted for run criteria for parallel systems: if a system shouldn't run in this iteration, its task is not spawned and it's never queued to run.
Added `CondensedTypeAccess` and related machinery. Moved the dependency resolution burden from executor to the stage; verification is not yet implemented. Systems with no dependencies are immediately queued to run at the start of each iteration. Implemented rebuilding scheduling data and updating access information. Implemented `can_start_now()`. Running tests violates access, right now.
# Conflicts: # crates/bevy_app/src/app.rs # crates/bevy_app/src/app_builder.rs # crates/bevy_asset/src/lib.rs # crates/bevy_core/src/time/fixed_timestep.rs # crates/bevy_ecs/Cargo.toml # crates/bevy_ecs/src/lib.rs # crates/bevy_ecs/src/schedule/mod.rs # crates/bevy_ecs/src/schedule/stage.rs # crates/bevy_ecs/src/schedule/stage_executor.rs # crates/bevy_ecs/src/schedule/state.rs # crates/bevy_ecs/src/system/into_system.rs # crates/bevy_ecs/src/system/into_thread_local.rs # crates/bevy_ecs/src/system/system.rs # crates/bevy_ecs/src/system/system_chaining.rs # crates/bevy_ecs/src/system/system_param.rs # crates/bevy_input/src/lib.rs # crates/bevy_render/src/lib.rs # crates/bevy_render/src/render_graph/system.rs # crates/bevy_scene/src/lib.rs # crates/bevy_transform/src/hierarchy/hierarchy_maintenance_system.rs # crates/bevy_transform/src/transform_propagate_system.rs # crates/bevy_ui/src/lib.rs # crates/bevy_ui/src/update.rs # crates/bevy_winit/src/lib.rs # examples/2d/texture_atlas.rs # examples/ecs/ecs_guide.rs # examples/ecs/fixed_timestep.rs # examples/ecs/state.rs
Can confirm, before e23c8a8 the score text sometimes is not visible (strange that I didn't notice that before), after it's always visible (for N=25, anyway). |
Alrighty I think it is time to merge this. I do think its worth investing some time in optimization. Its a bit of a shame that most "simple" games like "breakout" (and "simple" benchmarks) will regress in perf. Breakout frame time regresses by ~0.1 milliseconds (0.6% of our total frame time budget at 60fps). |
Simple games should probably be using |
In theory: yes, but in practice people will use the default (which should almost certainly be parallel or we're doing something wrong). |
I've added more tests to ambiguity detection. Couldn't yet fix the bug we found earlier, so CI should fail. If someone wants to take a stab at it the relevant function is |
It should be fixed now - it's passing the tests (some are pretty convoluted). It's starting to remind me of matrix operations more and more; a path towards further optimisations, perhaps. |
System sets and parallel executor v2
It's a wall of text, I know - you don't have to read all of it, its purpose is to clarify the details should questions arise.
History
Prior discussion: this comment and on.
This PR builds on top of the incomplete
SystemSet
implementation branch that didn't make it into 0.4; it was split fromschedule-v2
branch before the merge.The branch introduced system sets:
SystemStage
is now made up of one or moreSystemSet
, each has its own run criterion and contains the systems whose execution said criterion specifies.It also moved
ShouldRun
to schedule's level in the file hierarchy, and implemented run criteria and related methods as a reusable struct.The rebase that was required to make this branch mergeable was messy beyond belief, so I opted to merge in the master instead. It's still rather messy, but at least the commits are coherent.
Glossary
!Send
storage).Collateral changes
2021-01-23: this is not exhaustive, and outdated - the list does not cover commits made since original post.
Necessary (and not so much) changes that aren't directly related to the implementation:
access::Access
toArchetypeAccess
, made it private. No particular reason.ThreadLocalExecution
,System::thread_local_execution()
- this information is now encoded in::archetype_component_access()
,::resource_access()
, and new::is_thread_local()
.TypeAccess
with support for "reads/writes everything" to facilitate the above.CondensedTypeAccess
.thread_local_func
(command buffer merging) and similar fields from parallelizableSystem
implementors into theirrun_exclusive()
implementation.ThreadLocalSystemFn
and the moduleinto_thread_local
toExclusiveSystemFn
andinto_exclusive
respectively, to differentiate systems with this access pattern from actually thread-local systems.IntoSystem
forFnMut(&mut World)
andFnMut(&mut Resources)
, both result inExclusiveSystemFn
. No particular reason.ThreadLocal
system parameter.SystemSet
andSystemStage
.ShouldRun::NoAndLoop
.System::update()
toSystem::update_access()
, for clarity.NonNull
rather thanBox
. Requires auditing, and a sanity check.Algorithm
2021-01-23: this is slightly outdated now, due to changes to exclusive systems and availability of local-to-thread tasks.
Reading is not required. It's here because people were curious and picking apart someone else's code is unpleasant, annotated or not.
The idea is similar to those I used in
yaks
and this prototype. I also wrote an entire long thing about the topic.Abridged:
Full(ish):
<ParallelSystemStageExecutor as SystemStageExecutor>::execute_stage()
with&mut [SystemSet]
that contains the systems with their uncondensed access sets,&HashMap<SystemIndex, Vec<SystemIndex>>
that encodes the dependency tree,&mut World
,&mut Resources
.SystemIndex
to the respective system'susize
index in the parallel systems vector is populated.Parallelization check
System's condensed resource access is checked against active resource access, ditto for archetype-component access. "Checked against" here means "if one reads all, other can't write anything, otherwise check if bitsets of writes are disjoint with bitsets of reads and writes". That's it.
Notes
SystemIndex
is a pair ofusize
indices: system's parent set and its position in it.SystemIndex
mapping.usize
indices (corresponding to systems' position in the vector of parallel scheduling data) and bitsets.Things to do
Numbered for discussion convenience:
ThreadLocal
system parameter implementation,NonNull
use inSystemSet
.Dependency tree verification. I think the best place to do it is inmostly done, still refactoringSystemStage::run_once()
, where all the necessary information is first seen in one place.Exclusive systems dependencies and sorting. Should be done at the same time as point 3. There are stubs in the implementation that allow executing exclusive systems at either start of stage or end of stage, but no way to specify when such system would like to be ran. I think this could be handled by dependencies: exclusive systems that depend on parallel systems are automatically shoved to the end of stage, and vice versa; conflicts result in a panic during tree verification.partially irrelevant due to full exclusive/parallel schism, sorting is tolopogical now but willSerial stage executor. As of now, it's barely functional(?) and does no dependency-based sorting whatsoever; this should probably be handled after point 4.mostly done, still refactoringDecide if we want the safety bit. I've never seen that check fail, but it's about as lightweight as it can be, so I say we keep it.removed, it became completely trivial with introduction of local-to-thread tasksShouldRun::No
andShouldRun::NoAndLoop
). Current strategy is to panic, could use a more meaningful panic message.Decide if we want to sort parallel systems' command buffer merging in any way. Currently, it's not defined, but should be ordered by set then insertion order into set.it's topologically sorted now, exploiting a side-effect of validating the dependency graphDecide if we want to merge in parallel systems' command buffers before or after end-of-stage exclusives are run. Currently, it's before.we have both options now, thanks to point 12Consider "optional dependencies". Right now, if systemall dependencies are now "soft", disabling a system does not disable its dependants; considering supporting "hard" dependenciesA
depends on systemB
andB
gets disabled by a run criterion of its parent system set,A
will not run and there is no way to make it run without runningB
. This should not be the case: execution order dependencies should only specify execution order and never execution fact.Consider simplifying thedone in c4e8261TypeAccess
implementation by mergingAccessSet
into it. SeeCondensedTypeAccess
for comparison.Consider better API for system insertion into stage/set. Related: Improved system insertion syntax #1060.adopted the "system descriptor" builder pattern&'static str
.Plumb the labels/dependencies API throughout the hierarchy; best done after point 12.StateStage
, etc.This should cover vast majority of things, but I'm bound to have missed something. I will slowly start on some of the items in the todo list, however, most of them will benefit from discussion at this point, and some can be tackled by someone else and/or in a different PR.
Happy holidays!