Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Networked Replication #19

Closed
wants to merge 43 commits into from
Closed
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
4f84a9c
first commit
maniwani Apr 24, 2021
de08dd5
Update network_replication.md
maniwani Apr 24, 2021
883af51
Update network_replication.md
maniwani Apr 24, 2021
4c5b886
fixed some grammar errors
maniwani Apr 24, 2021
c2fe4d0
Update network_replication.md
maniwani Apr 24, 2021
779d1f5
changed file name to match feature name
maniwani Apr 24, 2021
69714f3
Update networked_replication.md
maniwani Apr 24, 2021
21c7e95
clarified some prediction stuff in implementation_details.md
maniwani Apr 24, 2021
485eadf
added a doc-like explanation
maniwani Apr 24, 2021
5aa233c
fixed some typos
maniwani Apr 24, 2021
df61d3a
Added a network mode table, revised some wording, and a note on "play…
maniwani Apr 25, 2021
4b3dac6
shortened some bulleted text because I didn't like the indents
maniwani Apr 25, 2021
1db785c
more minor changes
maniwani Apr 25, 2021
4306eb1
added some things, revised some things
maniwani Apr 25, 2021
dfec5c2
replaced some I/O term usage
maniwani Apr 25, 2021
a79fb40
fixed a typo
maniwani Apr 25, 2021
236fea4
reverted arrows to ASCII; added another example
maniwani Apr 26, 2021
efaf47f
some rewording
maniwani Apr 26, 2021
b4450bd
reorganized implementaton_details.md
maniwani Apr 26, 2021
7d3d0c7
edit lag comp description
maniwani Apr 27, 2021
943b336
re-did user-facing explanation, moved stuff around
maniwani Apr 27, 2021
bec485c
some tweaks
maniwani Apr 28, 2021
e4a71b6
moved snapshot and interest management stuff into implementation_deta…
maniwani May 1, 2021
cdd0586
lints
maniwani May 1, 2021
e3730b6
edited TODOs
maniwani May 1, 2021
bfdfa53
changed link wording
maniwani May 3, 2021
c38824d
clarification on tick-based simulation and visually smoothing predict…
maniwani May 3, 2021
7a3b6ee
fixed typo
maniwani May 3, 2021
b0075ee
added some details for interest management
maniwani May 8, 2021
e953da6
more cleanup
maniwani May 8, 2021
5cae2c2
updated some notes; moved rest of impl strategy into the other doc
maniwani May 28, 2021
3db9cdf
fixed link typo
maniwani May 28, 2021
b34786f
events still open problem, fixed some inconsistent term usage, bullet…
maniwani May 29, 2021
e5cb8b8
more typos and grammar
maniwani May 29, 2021
1811069
sorry to whoever gets pinged with these, must fix all typos
maniwani May 29, 2021
0a2e656
added words on type ids and edited interest management section again
maniwani Jun 1, 2021
ffd9eac
small clarifications
maniwani Jun 5, 2021
93fb316
Revised time sync and other sections
maniwani Jul 8, 2021
f39fc9f
formatting/typos
maniwani Jul 8, 2021
17d3490
added snapshot interpolation back to time sync section
maniwani Jul 9, 2021
ae03f01
forgot the header
maniwani Jul 9, 2021
627fb21
algebruh mistake
maniwani Jul 9, 2021
e6db841
fixed typo
maniwani Jul 9, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
166 changes: 166 additions & 0 deletions implementation_details.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@

# Implementation Details
## Delta Compression
TBD

## Interest Management
TBD

## RPC
RPCs are best for sending global alerts and any gameplay mechanics you explicitly want modeled as request-reply (or one-way) interactions. They can be reliable or unreliable.

TBD

## Clients are not players...
I know I've been using the terms somewhat interchangeably, but `Player` and `Connection` should be separate tokens. No reason to force one player per connection in the engine API. Having `Player` be its own thing makes it easier to do stuff like replace leaving players with bots.

## "Clock" Synchronization
Ideally, clients predict ahead by just enough to have their inputs reach the server right before they're needed. For some reason, people frequently arrive at the idea that clients should estimate the clock time on the server (with some SNTP handshake) and use that to schedule the next simulation step.

That's overcomplicating it. What we really care about is: How much time passes between when the server receives my input and when that input is consumed? If the server simply tells clients how long their inputs are waiting in its buffer, the clients can use that information to converge on the correct lead.

```rust
if received_newer_server_update:
// an exponential moving average is a simple smoothing filter
smoothed_age = (31 / 32) * smoothed_age + (1 / 32) * age

// too late -> positive error -> speed up
// too early -> negative error -> slow down
error = target_age - smoothed_age

// reset accumulator
accumulated_correction = 0.0


time_dilation = remap(error + accumulated_correction, -max_error, max_error, -0.1, 0.1)
accumulated_correction += time_dilation * simulation_timestep

tick_cost = (1.0 + time_dilation) * fixed_delta_time
```

If its inputs are arriving too early, a client can temporarily run fewer ticks each second to relax its lead. For example, a client simulating 10% slower would shrink their lead by 1 tick for every 10.

Interpolation is the same. All that matters is the interval between received packets and how it varies. You want the interpolation delay to be as small as possible.

```rust
if received_newer_server_update:
// an exponential moving average is simple smoothing filter
smoothed_delay = (31 / 32) * smoothed_delay + (1 / 32) * delay
smoothed_jitter = (31 / 32) * smoothed_jitter + (1 / 32) * abs(smoothed_delay - delay)

target_interp_delay = smoothed_delay + (2.0 * smoothed_jitter);
smoothed_interp_delay = (31 / 32) * smoothed_interp_delay + (1 / 32) * (latest_snapshot_time - interp_time);

// too early -> positive error -> slow down
// too late -> negative error -> speed up
error = -(target_interp_delay - smoothed_interp_delay)

// reset accumulator
accumulated_correction = 0.0


time_dilation = remap(error + accumulated_correction, -max_error, max_error, -0.1, 0.1)
accumulated_correction += time_dilation * delta_time

interp_time += (1.0 + time_dilation) * delta_time
interp_time = max(interp_time, predicted_time - max_lag_comp)
```

The key idea here is that simplifying the client-server relationship makes the problem easier. You *could* have the server apply inputs whenever they arrive, rolling back if necessary, but that would only complicate things. If the server never accepts late inputs and never changes its pace, no one needs to coordinate.

## Lag Compensation
Lag compensation mainly deals with colliders. To avoid weird outcomes, lag compensation needs to run after all motion and physics systems.

Again, people get weird ideas about having the server estimate what interpolated state the client was looking at based on their RTT. Again, that kind of guesswork is unnecessary.

Clients can just tell the server what they were looking at by bundling the interpolated tick numbers and the blend value inside the input payloads.

```
<packet header>
tick number (predicted)
tick number (interpolated from)
tick number (interpolated to)
interpolation blend value
<rest of payload>
```
With this information, the server can reconstruct *exactly* what each client saw.

Lag compensation goes like this:
1. Queue projectile spawns, tagged with their shooter's interpolation data.
2. Restore all colliders to the earliest interpolated moment.
3. Replay forward to the current tick, spawning the projectiles at the appropriate times and registering hits.

After that's done, any surviving projectiles will exist in the correct time. The process is the same for raycast weapons.

There's a lot to learn from *Overwatch* here.

*Overwatch* [allows defensive abilities to mitigate lag-compensated shots](https://youtu.be/W3aieHjyNvw?t=2492). AFAIK this is simple to do. If a player activates any defensive bonus, just apply it to all their buffered hitboxes.

*Overwatch* also [finds the movement envelope of each entity](https://youtu.be/W3aieHjyNvw?t=2226), the "sum" of its bounding volumes over the full lag compensation window, to reduce the number of intersection tests, only rewinding characters whose movement envelopes intersect projectiles.

For clients with very high ping, their interpolated time will lag too far behind their predicted time. You generally don't want to favor the shooter past a certain limit (e.g. 250ms), so [those clients have to extrapolate the difference](https://youtu.be/W3aieHjyNvw?t=2347). Not extrapolating is also valid, but then lagging clients would abruptly have to start leading their targets.

This limit is the only relation between the predicted time and the interpolated time. They're otherwise decoupled.

## Smooth Rendering
Whenever clients receive an update with new remote entities, those entities shouldn't be rendered until that update is interpolated.

Cameras need a little special treatment. Inputs to the view rotation need to be accumulated at the render rate and re-applied just before rendering.

Is an exponential decay enough for smooth error correction or are there better algorithms?

## Prediction ⟷ Interpolation
maniwani marked this conversation as resolved.
Show resolved Hide resolved
Clients can't directly modify the authoritative state, but they should be able to predict whatever they want locally. One obvious implementation is to literally fork the latest authoritative state. If copying the full state ends up being too expensive, we can probably use a copy-on-write layer.

Clients should predict the entities driven by their input, the entities they spawn (until confirmed), and any entities mutated as a result of the first two. I think that should cover it. Predicting *everything* would be a compile-time choice.

I said entities, but we can predict with component granularity. The million-dollar question is how to shift things between prediction and interpolation. My current idea is for everything to default to interpolation (reset upon receiving a server update) and then use specialized change detection `DerefMut` magic to flag local predictions.

```
Predicted<T>
PredictAdded<T>
PredictRemoved<T>
Confirmed<T>
ConfirmAdded<T>
ConfirmRemoved<T>
Cancelled<T>
CancelAdded<T>
CancelRemoved<T>
```

With these, we can explicitly opt-out of funneling non-predicted components through expensive systems. We can also generate events that only trigger on authoritative changes and events that trigger on predicted changes to be confirmed or cancelled later. The latter are necessary for handling sounds and particle effects. Those shouldn't be duplicated during rollbacks and should be faded out if mispredicted.

All systems that handle "predictable" interactions (pushing a button, putting an item in your inventory) should probably run *before* the expensive stuff (physics, path-planning). Rendering should come after `NetworkFixedUpdate`.

Should UI be allowed to reference predicted state or only verified state?

## Predicting Entity Creation
This requires some special consideration.

The naive solution is to have clients spawn dummy entities. When an update that confirms the result arrives, clients can simply destroy the dummy and spawn the true entity. IMO this is a poor solution because it prevents clients from smoothly blending these entities from predicted time into interpolated time. It won't look right.

A better solution is for the server to assign each networked entity a global ID (`NetworkID`) that the spawning client can predict and map to its local instance.

- The simplest form of this would be an incrementing generational index whose upper bits are fixed to match the spawning player's ID. This is my recommendation. Basically, reuse `Entity` and reserve some of the upper bits in the ID.

- Alternatively, PRNGs could be used to generate shared keys (called "prediction keys" in some places) for pairing global and local IDs. Rather than predict the global ID, the client would predict the shared key. Server updates that confirm the predicted entity would include both its global ID and the shared key, which the client can then use to pair the IDs. This method adds complexity but bypasses the previous method's implicit entity limit.

- A more extreme solution would be to somehow bake global IDs directly into the memory allocation. If memory layouts are mirrored, relative pointers become global IDs, which don't need to be explicitly written into packets. This would save 4-8 bytes per entity before compression.

## Unconditional Rollbacks
Every article on "rollback netcode" and "client-side prediction and server reconciliation" encourages having clients compare their predicted state to the authoritative state and reconciling *if* they mispredicted. But how do you actually detect a mispredict?

I thought of two methods while I was writing this:

1. Unordered scan looking for first difference.
2. Ordered scan to compute checksum and compare.

The first option has an unpredictable speed. The second option requires a fixed walk of the game state (checksums *are* probably worth having even if only for debugging non-determinism). There may be options I didn't consider, but the point I'm trying to make is that detecting changes among large numbers of entities isn't cheap.

Let's consider a simpler default:

3. Always rollback and re-simulate.

Now, if you're thinking that's wasteful, the "if mispredicted" gives you a false sense of security. If I make a game and claim it can rollback 250ms, that basically should mean *any* 250ms, with no stuttering. If clients *always* rollback and re-sim, it'll be easier to profile and optimize for that. As a bonus, clients never need to store old predicted states.
Copy link

@vladbat00 vladbat00 Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

I just wanted to clarify this bit:

As a bonus, clients never need to store old predicted states.

It means that we still store the history of authoritative updates, but clients can simply avoid adding predicted states on top of those, correct? (I.e. they can store just the latest one.) And we still need to store a buffer of local players' commands, to be able to re-sim the predicted states.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's exactly it.


Constant rollbacks may sound expensive, but there were games with rollback running on the original Playstation over 20 years ago.
Loading