-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: opt-in Swarm.ResourceMgr (go-libp2p v0.18) #8680
Conversation
013048b
to
7bd8191
Compare
6f9c3c9
to
caba218
Compare
Do you have plans to publish metrics when a RM req is rejected due to exceeded scope limits? That would be very useful operationally to help know when to scale gateway fleets. |
The resource manager has a tracer integration (see https://github.com/libp2p/go-libp2p-resource-manager/blob/master/trace.go), so it would be possible to hook this up with Prometheus in one way or the other. |
travis asked for the same thing so i will add metrics support to the
reource manager.
…On Fri, Feb 11, 2022, 14:50 Marten Seemann ***@***.***> wrote:
The resource manager has a tracer integration (see
https://github.com/libp2p/go-libp2p-resource-manager/blob/master/trace.go),
so it would be possible to hook this up with Prometheus in one way or the
other.
—
Reply to this email directly, view it on GitHub
<#8680 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAI4SREFOVU7UPT2MVGGMTU2UA2XANCNFSM5MBCYZ7Q>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
caba218
to
c3be8e9
Compare
"exchange files" → "cat file" interop test always gets stuck after Quick way to reproduce/debug is to run specific test with: Given this PR adds resource manager, I guess some limit is hit? Update: set |
c3be8e9
to
3bb2ecf
Compare
Revealed connection being dropped by go-libp2p-resource-manager:
|
There's no connection being dropped. The resource manager is blocking a memory with prio 128, which is probably a yamux stream window increase: https://github.com/libp2p/go-yamux/blob/cd22a37b789cf6e1fe4f2c2cff6da5df7a49dfac/stream.go#L229. This won't cause any problems. Do we have logs from the other node? |
These logs I decided to remove 👉 With this setup you can tweak Node A (source)One terminal: $ export IPFS_PATH=/tmp/node-a
$ ipfs daemon --init --init-profile server,randomports,test Second terminal: $ ipfs id | jq '.Addresses[0]'
{addrA}
$ ipfs swarm connect {addrB}
$ head -c 65M /dev/urandom | ipfs add -Q
{cid-OK}
$ head -c 8M /dev/urandom | ipfs add -Q
{cid-FAIL} Node B (destination)One terminal: $ export IPFS_PATH=/tmp/node-b
$ ipfs daemon --init --init-profile server,randomports,test Second terminal: $ ipfs id | jq '.Addresses[0]'
{addrB}
$ ipfs swarm connect {addrA}
$ ipfs cat {cid-OK} > /tmp/8M # finished instantly
$ ipfs cat {cid-FAIL} > /tmp/65M # stops in the middle (around 20-50%)
30.50 MiB / 65.00 MiB [==================================>--------------------------------------] 46.92% |
@marten-seemann @lidel I'm not totally sure, but am wondering if we're running into a combination of go-bitswap not gracefully handling the lack of streams and go-bitswap using more streams than it should (ipfs/boxo#80). |
or yamux not handling properly the refusal yo increase the window from the
other side, that also a possibility.
…On Thu, Feb 24, 2022, 16:17 Adin Schmahmann ***@***.***> wrote:
@marten-seemann <https://github.com/marten-seemann> @lidel
<https://github.com/lidel> I'm not totally sure, but am wondering if
we're running into a combination of go-bitswap not gracefully handling the
lack of streams and go-bitswap using more streams than it should (
ipfs/boxo#80 <ipfs/boxo#80>).
—
Reply to this email directly, view it on GitHub
<#8680 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAI4SXHGGPWMEJAWSY2AHLU4Y4ZBANCNFSM5MBCYZ7Q>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
6ddf5bf
to
0c9ddba
Compare
fc84c9a
to
c4fb623
Compare
2022-03-03 conversation:
|
* add metrics for the resource manager * export protocol and service name in Prometheus metrics * fix: expose rcmgr metrics only when enabled Co-authored-by: Marcin Rataj <lidel@lidel.org>
This includes CI fix for go-ipfs-http-client
This file defines implicit limit defaults used when Swarm.ResourceMgr.Enabled We keep vendored copy to ensure go-ipfs is not impacted when go-libp2p decides to change defaults in any of the future releases.
Cleans up the way we initialize defaults and adds a fix for case when connection manager runs with high limits. It also hides `Swarm.ResourceMgr.Limits` until we have a better understanding what syntax makes sense.
Applied changes based on feedback from stewards sync:
Before this is merged, we should make a decision if we are ok with
|
* update go-libp2p to v0.18.0 * initialize the resource manager * add resource manager stats/limit commands * load limit file when building resource manager * log absent limit file * write rcmgr to file when IPFS_DEBUG_RCMGR is set * fix: mark swarm limit|stats as experimental * feat(cfg): opt-in Swarm.ResourceMgr This ensures we can safely test the resource manager without impacting default behavior. - Resource manager is disabled by default - Default for Swarm.ResourceMgr.Enabled is false for now - Swarm.ResourceMgr.Limits allows user to tweak limits per specific scope in a way that is persisted across restarts - 'ipfs swarm limit system' outputs human-readable json - 'ipfs swarm limit system new-limits.json' sets new runtime limits (but does not change Swarm.ResourceMgr.Limits in the config) Conventions to make libp2p devs life easier: - 'IPFS_RCMGR=1 ipfs daemon' overrides the config and enables resource manager - 'limit.json' overrides implicit defaults from libp2p (if present) * docs(config): small tweaks * fix: skip libp2p.ResourceManager if disabled This ensures 'ipfs swarm limit|stats' work only when enabled. * fix: use NullResourceManager when disabled This reverts commit b19f7c9eca4cee4187f8cba3389dc2c930258512. after clarification feedback from ipfs/kubo#8680 (comment) * style: rename IPFS_RCMGR to LIBP2P_RCMGR preexisting libp2p toggles use LIBP2P_ prefix * test: Swarm.ResourceMgr * fix: location of opt-in limit.json and rcmgr.json.gz Places these files inside of IPFS_PATH * Update docs/config.md * feat: expose rcmgr metrics when enabled (#8785) * add metrics for the resource manager * export protocol and service name in Prometheus metrics * fix: expose rcmgr metrics only when enabled Co-authored-by: Marcin Rataj <lidel@lidel.org> * refactor: rcmgr_metrics.go * refactor: rcmgr_defaults.go This file defines implicit limit defaults used when Swarm.ResourceMgr.Enabled We keep vendored copy to ensure go-ipfs is not impacted when go-libp2p decides to change defaults in any of the future releases. * refactor: adjustedDefaultLimits Cleans up the way we initialize defaults and adds a fix for case when connection manager runs with high limits. It also hides `Swarm.ResourceMgr.Limits` until we have a better understanding what syntax makes sense. * chore: cleanup after a review * fix: restore go-ipld-prime v0.14.2 * fix: restore go-ds-flatfs v0.5.1 Co-authored-by: Lucas Molas <schomatis@gmail.com> Co-authored-by: Marcin Rataj <lidel@lidel.org>
* update go-libp2p to v0.18.0 * initialize the resource manager * add resource manager stats/limit commands * load limit file when building resource manager * log absent limit file * write rcmgr to file when IPFS_DEBUG_RCMGR is set * fix: mark swarm limit|stats as experimental * feat(cfg): opt-in Swarm.ResourceMgr This ensures we can safely test the resource manager without impacting default behavior. - Resource manager is disabled by default - Default for Swarm.ResourceMgr.Enabled is false for now - Swarm.ResourceMgr.Limits allows user to tweak limits per specific scope in a way that is persisted across restarts - 'ipfs swarm limit system' outputs human-readable json - 'ipfs swarm limit system new-limits.json' sets new runtime limits (but does not change Swarm.ResourceMgr.Limits in the config) Conventions to make libp2p devs life easier: - 'IPFS_RCMGR=1 ipfs daemon' overrides the config and enables resource manager - 'limit.json' overrides implicit defaults from libp2p (if present) * docs(config): small tweaks * fix: skip libp2p.ResourceManager if disabled This ensures 'ipfs swarm limit|stats' work only when enabled. * fix: use NullResourceManager when disabled This reverts commit b19f7c9eca4cee4187f8cba3389dc2c930258512. after clarification feedback from ipfs/kubo#8680 (comment) * style: rename IPFS_RCMGR to LIBP2P_RCMGR preexisting libp2p toggles use LIBP2P_ prefix * test: Swarm.ResourceMgr * fix: location of opt-in limit.json and rcmgr.json.gz Places these files inside of IPFS_PATH * Update docs/config.md * feat: expose rcmgr metrics when enabled (#8785) * add metrics for the resource manager * export protocol and service name in Prometheus metrics * fix: expose rcmgr metrics only when enabled Co-authored-by: Marcin Rataj <lidel@lidel.org> * refactor: rcmgr_metrics.go * refactor: rcmgr_defaults.go This file defines implicit limit defaults used when Swarm.ResourceMgr.Enabled We keep vendored copy to ensure go-ipfs is not impacted when go-libp2p decides to change defaults in any of the future releases. * refactor: adjustedDefaultLimits Cleans up the way we initialize defaults and adds a fix for case when connection manager runs with high limits. It also hides `Swarm.ResourceMgr.Limits` until we have a better understanding what syntax makes sense. * chore: cleanup after a review * fix: restore go-ipld-prime v0.14.2 * fix: restore go-ds-flatfs v0.5.1 Co-authored-by: Lucas Molas <schomatis@gmail.com> Co-authored-by: Marcin Rataj <lidel@lidel.org> This commit was moved from ipfs/kubo@514411b
* update go-libp2p to v0.18.0 * initialize the resource manager * add resource manager stats/limit commands * load limit file when building resource manager * log absent limit file * write rcmgr to file when IPFS_DEBUG_RCMGR is set * fix: mark swarm limit|stats as experimental * feat(cfg): opt-in Swarm.ResourceMgr This ensures we can safely test the resource manager without impacting default behavior. - Resource manager is disabled by default - Default for Swarm.ResourceMgr.Enabled is false for now - Swarm.ResourceMgr.Limits allows user to tweak limits per specific scope in a way that is persisted across restarts - 'ipfs swarm limit system' outputs human-readable json - 'ipfs swarm limit system new-limits.json' sets new runtime limits (but does not change Swarm.ResourceMgr.Limits in the config) Conventions to make libp2p devs life easier: - 'IPFS_RCMGR=1 ipfs daemon' overrides the config and enables resource manager - 'limit.json' overrides implicit defaults from libp2p (if present) * docs(config): small tweaks * fix: skip libp2p.ResourceManager if disabled This ensures 'ipfs swarm limit|stats' work only when enabled. * fix: use NullResourceManager when disabled This reverts commit b19f7c9eca4cee4187f8cba3389dc2c930258512. after clarification feedback from ipfs/kubo#8680 (comment) * style: rename IPFS_RCMGR to LIBP2P_RCMGR preexisting libp2p toggles use LIBP2P_ prefix * test: Swarm.ResourceMgr * fix: location of opt-in limit.json and rcmgr.json.gz Places these files inside of IPFS_PATH * Update docs/config.md * feat: expose rcmgr metrics when enabled (#8785) * add metrics for the resource manager * export protocol and service name in Prometheus metrics * fix: expose rcmgr metrics only when enabled Co-authored-by: Marcin Rataj <lidel@lidel.org> * refactor: rcmgr_metrics.go * refactor: rcmgr_defaults.go This file defines implicit limit defaults used when Swarm.ResourceMgr.Enabled We keep vendored copy to ensure go-ipfs is not impacted when go-libp2p decides to change defaults in any of the future releases. * refactor: adjustedDefaultLimits Cleans up the way we initialize defaults and adds a fix for case when connection manager runs with high limits. It also hides `Swarm.ResourceMgr.Limits` until we have a better understanding what syntax makes sense. * chore: cleanup after a review * fix: restore go-ipld-prime v0.14.2 * fix: restore go-ds-flatfs v0.5.1 Co-authored-by: Lucas Molas <schomatis@gmail.com> Co-authored-by: Marcin Rataj <lidel@lidel.org> This commit was moved from ipfs/kubo@514411b
* update go-libp2p to v0.18.0 * initialize the resource manager * add resource manager stats/limit commands * load limit file when building resource manager * log absent limit file * write rcmgr to file when IPFS_DEBUG_RCMGR is set * fix: mark swarm limit|stats as experimental * feat(cfg): opt-in Swarm.ResourceMgr This ensures we can safely test the resource manager without impacting default behavior. - Resource manager is disabled by default - Default for Swarm.ResourceMgr.Enabled is false for now - Swarm.ResourceMgr.Limits allows user to tweak limits per specific scope in a way that is persisted across restarts - 'ipfs swarm limit system' outputs human-readable json - 'ipfs swarm limit system new-limits.json' sets new runtime limits (but does not change Swarm.ResourceMgr.Limits in the config) Conventions to make libp2p devs life easier: - 'IPFS_RCMGR=1 ipfs daemon' overrides the config and enables resource manager - 'limit.json' overrides implicit defaults from libp2p (if present) * docs(config): small tweaks * fix: skip libp2p.ResourceManager if disabled This ensures 'ipfs swarm limit|stats' work only when enabled. * fix: use NullResourceManager when disabled This reverts commit b19f7c9eca4cee4187f8cba3389dc2c930258512. after clarification feedback from ipfs/kubo#8680 (comment) * style: rename IPFS_RCMGR to LIBP2P_RCMGR preexisting libp2p toggles use LIBP2P_ prefix * test: Swarm.ResourceMgr * fix: location of opt-in limit.json and rcmgr.json.gz Places these files inside of IPFS_PATH * Update docs/config.md * feat: expose rcmgr metrics when enabled (#8785) * add metrics for the resource manager * export protocol and service name in Prometheus metrics * fix: expose rcmgr metrics only when enabled Co-authored-by: Marcin Rataj <lidel@lidel.org> * refactor: rcmgr_metrics.go * refactor: rcmgr_defaults.go This file defines implicit limit defaults used when Swarm.ResourceMgr.Enabled We keep vendored copy to ensure go-ipfs is not impacted when go-libp2p decides to change defaults in any of the future releases. * refactor: adjustedDefaultLimits Cleans up the way we initialize defaults and adds a fix for case when connection manager runs with high limits. It also hides `Swarm.ResourceMgr.Limits` until we have a better understanding what syntax makes sense. * chore: cleanup after a review * fix: restore go-ipld-prime v0.14.2 * fix: restore go-ds-flatfs v0.5.1 Co-authored-by: Lucas Molas <schomatis@gmail.com> Co-authored-by: Marcin Rataj <lidel@lidel.org> This commit was moved from ipfs/kubo@514411b
* update go-libp2p to v0.18.0 * initialize the resource manager * add resource manager stats/limit commands * load limit file when building resource manager * log absent limit file * write rcmgr to file when IPFS_DEBUG_RCMGR is set * fix: mark swarm limit|stats as experimental * feat(cfg): opt-in Swarm.ResourceMgr This ensures we can safely test the resource manager without impacting default behavior. - Resource manager is disabled by default - Default for Swarm.ResourceMgr.Enabled is false for now - Swarm.ResourceMgr.Limits allows user to tweak limits per specific scope in a way that is persisted across restarts - 'ipfs swarm limit system' outputs human-readable json - 'ipfs swarm limit system new-limits.json' sets new runtime limits (but does not change Swarm.ResourceMgr.Limits in the config) Conventions to make libp2p devs life easier: - 'IPFS_RCMGR=1 ipfs daemon' overrides the config and enables resource manager - 'limit.json' overrides implicit defaults from libp2p (if present) * docs(config): small tweaks * fix: skip libp2p.ResourceManager if disabled This ensures 'ipfs swarm limit|stats' work only when enabled. * fix: use NullResourceManager when disabled This reverts commit b19f7c9eca4cee4187f8cba3389dc2c930258512. after clarification feedback from ipfs/kubo#8680 (comment) * style: rename IPFS_RCMGR to LIBP2P_RCMGR preexisting libp2p toggles use LIBP2P_ prefix * test: Swarm.ResourceMgr * fix: location of opt-in limit.json and rcmgr.json.gz Places these files inside of IPFS_PATH * Update docs/config.md * feat: expose rcmgr metrics when enabled (#8785) * add metrics for the resource manager * export protocol and service name in Prometheus metrics * fix: expose rcmgr metrics only when enabled Co-authored-by: Marcin Rataj <lidel@lidel.org> * refactor: rcmgr_metrics.go * refactor: rcmgr_defaults.go This file defines implicit limit defaults used when Swarm.ResourceMgr.Enabled We keep vendored copy to ensure go-ipfs is not impacted when go-libp2p decides to change defaults in any of the future releases. * refactor: adjustedDefaultLimits Cleans up the way we initialize defaults and adds a fix for case when connection manager runs with high limits. It also hides `Swarm.ResourceMgr.Limits` until we have a better understanding what syntax makes sense. * chore: cleanup after a review * fix: restore go-ipld-prime v0.14.2 * fix: restore go-ds-flatfs v0.5.1 Co-authored-by: Lucas Molas <schomatis@gmail.com> Co-authored-by: Marcin Rataj <lidel@lidel.org> This commit was moved from ipfs/kubo@514411b
* update go-libp2p to v0.18.0 * initialize the resource manager * add resource manager stats/limit commands * load limit file when building resource manager * log absent limit file * write rcmgr to file when IPFS_DEBUG_RCMGR is set * fix: mark swarm limit|stats as experimental * feat(cfg): opt-in Swarm.ResourceMgr This ensures we can safely test the resource manager without impacting default behavior. - Resource manager is disabled by default - Default for Swarm.ResourceMgr.Enabled is false for now - Swarm.ResourceMgr.Limits allows user to tweak limits per specific scope in a way that is persisted across restarts - 'ipfs swarm limit system' outputs human-readable json - 'ipfs swarm limit system new-limits.json' sets new runtime limits (but does not change Swarm.ResourceMgr.Limits in the config) Conventions to make libp2p devs life easier: - 'IPFS_RCMGR=1 ipfs daemon' overrides the config and enables resource manager - 'limit.json' overrides implicit defaults from libp2p (if present) * docs(config): small tweaks * fix: skip libp2p.ResourceManager if disabled This ensures 'ipfs swarm limit|stats' work only when enabled. * fix: use NullResourceManager when disabled This reverts commit b19f7c9eca4cee4187f8cba3389dc2c930258512. after clarification feedback from ipfs/kubo#8680 (comment) * style: rename IPFS_RCMGR to LIBP2P_RCMGR preexisting libp2p toggles use LIBP2P_ prefix * test: Swarm.ResourceMgr * fix: location of opt-in limit.json and rcmgr.json.gz Places these files inside of IPFS_PATH * Update docs/config.md * feat: expose rcmgr metrics when enabled (#8785) * add metrics for the resource manager * export protocol and service name in Prometheus metrics * fix: expose rcmgr metrics only when enabled Co-authored-by: Marcin Rataj <lidel@lidel.org> * refactor: rcmgr_metrics.go * refactor: rcmgr_defaults.go This file defines implicit limit defaults used when Swarm.ResourceMgr.Enabled We keep vendored copy to ensure go-ipfs is not impacted when go-libp2p decides to change defaults in any of the future releases. * refactor: adjustedDefaultLimits Cleans up the way we initialize defaults and adds a fix for case when connection manager runs with high limits. It also hides `Swarm.ResourceMgr.Limits` until we have a better understanding what syntax makes sense. * chore: cleanup after a review * fix: restore go-ipld-prime v0.14.2 * fix: restore go-ds-flatfs v0.5.1 Co-authored-by: Lucas Molas <schomatis@gmail.com> Co-authored-by: Marcin Rataj <lidel@lidel.org> This commit was moved from ipfs/kubo@514411b
Part of #8761
ipfs swarm limit --help
ipfs swarm stats --help
Swarm.ResourceMgr
Swarm.ResourceMgr.Enabled
is a flag, disabled by defaultCloses #8722
Closes #1482 🧙♂️