Skip to content

Commit

Permalink
docs: libp2p resource management (#9468)
Browse files Browse the repository at this point in the history
Co-authored-by: Antonio Navarro Perez <antnavper@gmail.com>
Co-authored-by: Marcin Rataj <lidel@lidel.org>
  • Loading branch information
3 people authored Dec 8, 2022
1 parent 5e5d15a commit 01e0bfa
Show file tree
Hide file tree
Showing 5 changed files with 181 additions and 69 deletions.
5 changes: 4 additions & 1 deletion core/commands/swarm.go
Original file line number Diff line number Diff line change
Expand Up @@ -337,12 +337,15 @@ The scope can be one of the following:
- all -- reports the resource usage for all currently active scopes.
The output of this command is JSON.
To see all resources that are close to hitting their respective limit, one can do something like:
ipfs swarm stats --min-used-limit-perc=90 all
`},
Arguments: []cmds.Argument{
cmds.StringArg("scope", true, false, "scope of the stat report"),
},
Options: []cmds.Option{
cmds.IntOption(swarmUsedResourcesPercentageName, "Display only resources that are using above the specified percentage"),
cmds.IntOption(swarmUsedResourcesPercentageName, "Only display resources that are using above the specified percentage of their respective limit"),
},
Run: func(req *cmds.Request, res cmds.ResponseEmitter, env cmds.Environment) error {
node, err := cmdenv.GetNode(env)
Expand Down
4 changes: 2 additions & 2 deletions core/node/libp2p/rcmgr_logging.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,11 @@ func (n *loggingResourceManager) start(ctx context.Context) {
n.limitExceededErrs = make(map[string]int)

for e, count := range errs {
n.logger.Warnf("Protected from exceeding resource limits %d times: %q.", count, e)
n.logger.Warnf("Protected from exceeding resource limits %d times. libp2p message: %q.", count, e)
}

if len(errs) != 0 {
n.logger.Warnf("Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr")
n.logger.Warnf("Learn more about potential actions to take at: https://github.com/ipfs/kubo/blob/master/docs/libp2p-resource-management.md")
}

n.mut.Unlock()
Expand Down
2 changes: 1 addition & 1 deletion core/node/libp2p/rcmgr_logging_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ func TestLoggingResourceManager(t *testing.T) {
if oLogs.Len() == 0 {
continue
}
require.Equal(t, "Protected from exceeding resource limits 2 times: \"system: cannot reserve inbound connection: resource limit exceeded\".", oLogs.All()[0].Message)
require.Equal(t, "Protected from exceeding resource limits 2 times. libp2p message: \"system: cannot reserve inbound connection: resource limit exceeded\".", oLogs.All()[0].Message)
return
}
}
Expand Down
72 changes: 7 additions & 65 deletions docs/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,10 +141,6 @@ config file at runtime.
- [`Swarm.ConnMgr.HighWater`](#swarmconnmgrhighwater)
- [`Swarm.ConnMgr.GracePeriod`](#swarmconnmgrgraceperiod)
- [`Swarm.ResourceMgr`](#swarmresourcemgr)
- [Levels of Configuration](#levels-of-configuration)
- [Default Limits](#default-limits)
- [Active Limits](#active-limits)
- [libp2p resource monitoring](#libp2p-resource-monitoring)
- [`Swarm.ResourceMgr.Enabled`](#swarmresourcemgrenabled)
- [`Swarm.ResourceMgr.MaxMemory`](#swarmresourcemgrmaxmemory)
- [`Swarm.ResourceMgr.MaxFileDescriptors`](#swarmresourcemgrmaxfiledescriptors)
Expand Down Expand Up @@ -1803,66 +1799,12 @@ Type: `optionalDuration`

### `Swarm.ResourceMgr`

The [libp2p Network Resource Manager](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#readme) allows setting limits per [Resource Scope](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#resource-scopes),
and tracking recource usage over time.

##### Levels of Configuration

libp2p's resource manager provides tremendous flexibility but also adds a lot of complexity.
There are these levels of limit configuration for resource management protection:
1. "The user who does nothing" - In this case they get some sane defaults discussed below
based on the amount of memory and file descriptors their system has.
This should protect the node from many attacks.
2. "Slightly more advanced user" - They can tweak the default limits discussed below.
Where the defaults aren't good enough, a good set of higher-level "knobs" are exposed to satisfy most use cases
without requiring users to wade into all the intricacies of libp2p's resource manager.
The "knobs"/inputs are `Swarm.ResourceMgr.MaxMemory` and `Swarm.ResourceMgr.MaxFileDescriptors` as described below.
3. "Power user" - They specify all the default limits from below they want override via `Swarm.ResourceMgr.Limits`;

##### Default Limits

With these inputs defined, [resource manager limits](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#limits) are created at the
[system](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#the-system-scope),
[transient](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#the-transient-scope),
and [peer](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#peer-scopes) scopes.
Other scopes are ignored (by being set to "~infinity".

The reason these scopes are chosen is because:
- system - This gives us the coarse-grained control we want so we can reason about the system as a whole.
It is the backstop, and allows us to reason about resource consumption more easily
since don't have think about the interaction of many other scopes.
- transient - Limiting connections that are in process of being established provides backpressure so not too much work queues up.
- peer - The peer scope doesn't protect us against intentional DoS attacks.
It's just as easy for an attacker to send 100 requests/second with 1 peerId vs. 10 requests/second with 10 peers.
We are reliant on the system scope for protection here in the malicious case.
The reason for having a peer scope is to protect against unintentional DoS attacks
(e.g., bug in a peer which is causing it to "misbehave").
In the unintional case, we want to make sure a "misbehaving" node doesn't consume more resources than necessary.

Within these scopes, limits are just set on
[memory](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#memory),
[file descriptors (FD)](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#file-descriptors), [*inbound* connections](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#connections),
and [*inbound* streams](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#streams).
Limits are set based on the inputs above.
We trust this node to behave properly and thus don't limit *outbound* connection/stream limits.
We apply any limits that libp2p has for its protocols/services
since we assume libp2p knows best here.

##### Active Limits
A dump of what limits were computed and are actually being used by the resource manager
can be obtained by `ipfs swarm limit all`.

##### libp2p resource monitoring
For [monitoring libp2p resource usage](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#monitoring),
various `*rcmgr_*` metrics can be accessed as the prometheus endpoint at `{Addresses.API}/debug/metrics/prometheus` (default: `http://127.0.0.1:5001/debug/metrics/prometheus`).
There are also [pre-built Grafana dashboards](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager/obs/grafana-dashboards) that can be added to a Grafana instance.

A textual view of current resource usage and a list of services, protocols, and peers can be
obtained via `ipfs swarm stats --help`
Learn more about Kubo's usage of libp2p Network Resource Manager
in the [dedicated resource management docs](./libp2p-resource-management.md).

#### `Swarm.ResourceMgr.Enabled`

Enables the libp2p Resource Manager using limits based on the defaults and/or other configuration as discussed above.
Enables the libp2p Resource Manager using limits based on the defaults and/or other configuration as discussed in [libp2p resource management](./libp2p-resource-management.md).

Default: `true`
Type: `flag`
Expand All @@ -1872,7 +1814,7 @@ Type: `flag`
This is the max amount of memory to allow libp2p to use.
libp2p's resource manager will prevent additional resource creation while this limit is reached.
This value is also used to scale the limit on various resources at various scopes
when the default limits (discuseed above) are used.
when the default limits (discussed in [libp2p resource management](./libp2p-resource-management.md)) are used.
For example, increasing this value will increase the default limit for incoming connections.

Default: `[TOTAL_SYSTEM_MEMORY]/4`
Expand All @@ -1898,7 +1840,7 @@ The map supports fields from the [`LimitConfig` struct](https://github.com/libp2

The `Swarm.ResourceMgr.Limits` override the default limits described above.
Any override `BaseLimits` or limit <key,value>s from `Swarm.ResourceMgr.Limits`
that aren't specified will use the default limits.
that aren't specified will use the [computed default limits](./libp2p-resource-management.md#computed-default-limits).

Example #1: setting limits for a specific scope
```json
Expand Down Expand Up @@ -1937,10 +1879,10 @@ Example #2: setting a specific <key,value> limit
}
```

It is also possible to adjust some runtime limits via `ipfs swarm limit --help`.
It is also possible to inspect and adjust some runtime limits via `ipfs swarm stats --help` and `ipfs swarm limit --help`.
Changes made via `ipfs swarm limit` are persisted in `Swarm.ResourceMgr.Limits`.

Default: `{}` (use the safe implicit defaults described above)
Default: `{}` (use the [computed defaults](./libp2p-resource-management.md#computed-default-limits))

Type: `object[string->object]`

Expand Down
Loading

0 comments on commit 01e0bfa

Please sign in to comment.