Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: libp2p resource management #9468

Merged
merged 8 commits into from
Dec 8, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion core/commands/swarm.go
Original file line number Diff line number Diff line change
Expand Up @@ -337,12 +337,15 @@ The scope can be one of the following:
- all -- reports the resource usage for all currently active scopes.

The output of this command is JSON.

To see all resources that are close to hitting their respective limit, one can do something like:
ipfs swarm stats --min-used-limit-perc=90 all
`},
Arguments: []cmds.Argument{
cmds.StringArg("scope", true, false, "scope of the stat report"),
},
Options: []cmds.Option{
cmds.IntOption(swarmUsedResourcesPercentageName, "Display only resources that are using above the specified percentage"),
cmds.IntOption(swarmUsedResourcesPercentageName, "Only display resources that are using above the specified percentage of their respective limit"),
},
Run: func(req *cmds.Request, res cmds.ResponseEmitter, env cmds.Environment) error {
node, err := cmdenv.GetNode(env)
Expand Down
4 changes: 2 additions & 2 deletions core/node/libp2p/rcmgr_logging.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,11 @@ func (n *loggingResourceManager) start(ctx context.Context) {
n.limitExceededErrs = make(map[string]int)

for e, count := range errs {
n.logger.Warnf("Protected from exceeding resource limits %d times: %q.", count, e)
n.logger.Warnf("Protected from exceeding resource limits %d times. libp2p message: %q.", count, e)
}

if len(errs) != 0 {
n.logger.Warnf("Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr")
n.logger.Warnf("Learn more about potential actions to take at: https://github.com/ipfs/kubo/blob/master/docs/libp2p-resource-management.md")
}

n.mut.Unlock()
Expand Down
2 changes: 1 addition & 1 deletion core/node/libp2p/rcmgr_logging_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ func TestLoggingResourceManager(t *testing.T) {
if oLogs.Len() == 0 {
continue
}
require.Equal(t, "Protected from exceeding resource limits 2 times: \"system: cannot reserve inbound connection: resource limit exceeded\".", oLogs.All()[0].Message)
require.Equal(t, "Protected from exceeding resource limits 2 times. libp2p message: \"system: cannot reserve inbound connection: resource limit exceeded\".", oLogs.All()[0].Message)
BigLep marked this conversation as resolved.
Show resolved Hide resolved
return
}
}
Expand Down
72 changes: 7 additions & 65 deletions docs/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,10 +141,6 @@ config file at runtime.
- [`Swarm.ConnMgr.HighWater`](#swarmconnmgrhighwater)
- [`Swarm.ConnMgr.GracePeriod`](#swarmconnmgrgraceperiod)
- [`Swarm.ResourceMgr`](#swarmresourcemgr)
- [Levels of Configuration](#levels-of-configuration)
- [Default Limits](#default-limits)
- [Active Limits](#active-limits)
- [libp2p resource monitoring](#libp2p-resource-monitoring)
- [`Swarm.ResourceMgr.Enabled`](#swarmresourcemgrenabled)
- [`Swarm.ResourceMgr.MaxMemory`](#swarmresourcemgrmaxmemory)
- [`Swarm.ResourceMgr.MaxFileDescriptors`](#swarmresourcemgrmaxfiledescriptors)
Expand Down Expand Up @@ -1803,66 +1799,12 @@ Type: `optionalDuration`

### `Swarm.ResourceMgr`

The [libp2p Network Resource Manager](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#readme) allows setting limits per [Resource Scope](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#resource-scopes),
and tracking recource usage over time.

##### Levels of Configuration

libp2p's resource manager provides tremendous flexibility but also adds a lot of complexity.
There are these levels of limit configuration for resource management protection:
1. "The user who does nothing" - In this case they get some sane defaults discussed below
based on the amount of memory and file descriptors their system has.
This should protect the node from many attacks.
2. "Slightly more advanced user" - They can tweak the default limits discussed below.
Where the defaults aren't good enough, a good set of higher-level "knobs" are exposed to satisfy most use cases
without requiring users to wade into all the intricacies of libp2p's resource manager.
The "knobs"/inputs are `Swarm.ResourceMgr.MaxMemory` and `Swarm.ResourceMgr.MaxFileDescriptors` as described below.
3. "Power user" - They specify all the default limits from below they want override via `Swarm.ResourceMgr.Limits`;

##### Default Limits

With these inputs defined, [resource manager limits](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#limits) are created at the
[system](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#the-system-scope),
[transient](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#the-transient-scope),
and [peer](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#peer-scopes) scopes.
Other scopes are ignored (by being set to "~infinity".

The reason these scopes are chosen is because:
- system - This gives us the coarse-grained control we want so we can reason about the system as a whole.
It is the backstop, and allows us to reason about resource consumption more easily
since don't have think about the interaction of many other scopes.
- transient - Limiting connections that are in process of being established provides backpressure so not too much work queues up.
- peer - The peer scope doesn't protect us against intentional DoS attacks.
It's just as easy for an attacker to send 100 requests/second with 1 peerId vs. 10 requests/second with 10 peers.
We are reliant on the system scope for protection here in the malicious case.
The reason for having a peer scope is to protect against unintentional DoS attacks
(e.g., bug in a peer which is causing it to "misbehave").
In the unintional case, we want to make sure a "misbehaving" node doesn't consume more resources than necessary.

Within these scopes, limits are just set on
[memory](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#memory),
[file descriptors (FD)](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#file-descriptors), [*inbound* connections](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#connections),
and [*inbound* streams](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#streams).
Limits are set based on the inputs above.
We trust this node to behave properly and thus don't limit *outbound* connection/stream limits.
We apply any limits that libp2p has for its protocols/services
since we assume libp2p knows best here.

##### Active Limits
A dump of what limits were computed and are actually being used by the resource manager
can be obtained by `ipfs swarm limit all`.

##### libp2p resource monitoring
For [monitoring libp2p resource usage](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#monitoring),
various `*rcmgr_*` metrics can be accessed as the prometheus endpoint at `{Addresses.API}/debug/metrics/prometheus` (default: `http://127.0.0.1:5001/debug/metrics/prometheus`).
There are also [pre-built Grafana dashboards](https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager/obs/grafana-dashboards) that can be added to a Grafana instance.

A textual view of current resource usage and a list of services, protocols, and peers can be
obtained via `ipfs swarm stats --help`
Learn more about Kubo's usage of libp2p Network Resource Manager
BigLep marked this conversation as resolved.
Show resolved Hide resolved
in the [dedicated resource management docs](./libp2p-resource-management.md).

#### `Swarm.ResourceMgr.Enabled`

Enables the libp2p Resource Manager using limits based on the defaults and/or other configuration as discussed above.
Enables the libp2p Resource Manager using limits based on the defaults and/or other configuration as discussed in [libp2p resource management](./libp2p-resource-management.md).

Default: `true`
Type: `flag`
Expand All @@ -1872,7 +1814,7 @@ Type: `flag`
This is the max amount of memory to allow libp2p to use.
libp2p's resource manager will prevent additional resource creation while this limit is reached.
This value is also used to scale the limit on various resources at various scopes
when the default limits (discuseed above) are used.
when the default limits (discussed in [libp2p resource management](./libp2p-resource-management.md)) are used.
For example, increasing this value will increase the default limit for incoming connections.

Default: `[TOTAL_SYSTEM_MEMORY]/4`
Expand All @@ -1898,7 +1840,7 @@ The map supports fields from the [`LimitConfig` struct](https://github.com/libp2

The `Swarm.ResourceMgr.Limits` override the default limits described above.
Any override `BaseLimits` or limit <key,value>s from `Swarm.ResourceMgr.Limits`
that aren't specified will use the default limits.
that aren't specified will use the [computed default limits](./libp2p-resource-management.md#computed-default-limits).

Example #1: setting limits for a specific scope
```json
Expand Down Expand Up @@ -1937,10 +1879,10 @@ Example #2: setting a specific <key,value> limit
}
```

It is also possible to adjust some runtime limits via `ipfs swarm limit --help`.
It is also possible to inspect and adjust some runtime limits via `ipfs swarm stats --help` and `ipfs swarm limit --help`.
Changes made via `ipfs swarm limit` are persisted in `Swarm.ResourceMgr.Limits`.

Default: `{}` (use the safe implicit defaults described above)
Default: `{}` (use the [computed defaults](./libp2p-resource-management.md#computed-default-limits))

Type: `object[string->object]`

Expand Down
Loading