Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic in "ipfs bitswap reprovide" #9418

Closed
lidel opened this issue Nov 17, 2022 · 6 comments
Closed

panic in "ipfs bitswap reprovide" #9418

lidel opened this issue Nov 17, 2022 · 6 comments
Labels
kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization P1 High: Likely tackled by core team if no one steps up

Comments

@lidel
Copy link
Member

lidel commented Nov 17, 2022

This issue was identified by @AnnaArchivist in #9416 – extracting it to own ticket so we can prioritize this.

Problem

Panic manifests only when Reprovider.Strategy is set to roots
and/or some of the blocks are missing.

Version

0.16.0

Config

Click to expand
{
  "API": {
    "HTTPHeaders": {}
  },
  "Addresses": {
    "API": "/ip4/127.0.0.1/tcp/5001",
    "Announce": [],
    "AppendAnnounce": [],
    "Gateway": "/ip4/127.0.0.1/tcp/8080",
    "NoAnnounce": [],
    "Swarm": [
      "/ip4/0.0.0.0/tcp/54957",
      "/ip6/::/tcp/54957",
      "/ip4/0.0.0.0/udp/54957/quic",
      "/ip6/::/udp/54957/quic"
    ]
  },
  "AutoNAT": {},
  "Bootstrap": [
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt",
    "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/ip4/104.131.131.82/udp/4001/quic/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ"
  ],
  "DNS": {
    "Resolvers": {}
  },
  "Datastore": {
    "BloomFilterSize": 0,
    "GCPeriod": "1h",
    "HashOnRead": false,
    "Spec": {
      "mounts": [
        {
          "child": {
            "path": "blocks",
            "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2",
            "sync": true,
            "type": "flatfs"
          },
          "mountpoint": "/blocks",
          "prefix": "flatfs.datastore",
          "type": "measure"
        },
        {
          "child": {
            "compression": "none",
            "path": "datastore",
            "type": "levelds"
          },
          "mountpoint": "/",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    },
    "StorageGCWatermark": 90,
    "StorageMax": "10GB"
  },
  "Discovery": {
    "MDNS": {
      "Enabled": true
    }
  },
  "Experimental": {
    "AcceleratedDHTClient": false,
    "FilestoreEnabled": true,
    "GraphsyncEnabled": false,
    "Libp2pStreamMounting": false,
    "P2pHttpProxy": false,
    "StrategicProviding": false,
    "UrlstoreEnabled": false
  },
  "Gateway": {
    "APICommands": [],
    "HTTPHeaders": {
      "Access-Control-Allow-Headers": [
        "X-Requested-With",
        "Range",
        "User-Agent"
      ],
      "Access-Control-Allow-Methods": [
        "GET"
      ],
      "Access-Control-Allow-Origin": [
        "*"
      ]
    },
    "NoDNSLink": false,
    "NoFetch": false,
    "PathPrefixes": [],
    "PublicGateways": null,
    "RootRedirect": "",
    "Writable": false
  },
  "Identity": {
    "PeerID": "<redacted>"
  },
  "Internal": {},
  "Ipns": {
    "RecordLifetime": "",
    "RepublishPeriod": "",
    "ResolveCacheSize": 128
  },
  "Migration": {
    "DownloadSources": [],
    "Keep": ""
  },
  "Mounts": {
    "FuseAllowOther": false,
    "IPFS": "/ipfs",
    "IPNS": "/ipns"
  },
  "Peering": {
    "Peers": null
  },
  "Pinning": {
    "RemoteServices": {}
  },
  "Plugins": {
    "Plugins": null
  },
  "Provider": {
    "Strategy": ""
  },
  "Pubsub": {
    "DisableSigning": false,
    "Router": ""
  },
  "Reprovider": {
    "Interval": "12h",
    "Strategy": "roots"
  },
  "Routing": {
    "Methods": null,
    "Routers": null,
    "Type": "dhtclient"
  },
  "Swarm": {
    "AddrFilters": null,
    "ConnMgr": {
      "GracePeriod": "20s",
      "HighWater": 900,
      "LowWater": 600,
      "Type": "basic"
    },
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": false,
    "RelayClient": {},
    "RelayService": {},
    "ResourceMgr": {},
    "Transports": {
      "Multiplexers": {},
      "Network": {},
      "Security": {}
    }
  }
}

Description

Sometimes, but not always, the ipfs bitswap reprovide step mentioned above crashes the daemon with the following message:

Daemon is ready
panic: close of closed channel

goroutine 535 [running]:
github.com/ipfs/go-ipfs-provider/simple.(*Reprovider).Run(0xc002071630)
	github.com/ipfs/go-ipfs-provider@v0.7.1/simple/reprovide.go:116 +0x3c8
created by github.com/ipfs/go-ipfs-provider.(*system).Run
	github.com/ipfs/go-ipfs-provider@v0.7.1/system.go:30 +0xd5
@lidel lidel added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization P1 High: Likely tackled by core team if no one steps up labels Nov 17, 2022
@lidel lidel moved this to 🥞 Todo in IPFS Shipyard Team Nov 17, 2022
@BigLep
Copy link
Contributor

BigLep commented Dec 13, 2022

2022-12-13 converstaion:

  • need to be able to create a reproduction case
  • not helping with defaults as far as we know

@BigLep
Copy link
Contributor

BigLep commented Jan 3, 2023

2023-01-03 conversation: we suspect this isn't happening in the default configuration.

After we have 0.18 RC2, we'll ask the original reporters to see if the issue still manifests.

@BigLep
Copy link
Contributor

BigLep commented Jan 20, 2023

@AnnaArchivist: is this issue still happening with 0.18-rc2?

@BigLep
Copy link
Contributor

BigLep commented Apr 19, 2023

Closing since don't have a reproducible case and haven't heard it happening in 0.18 and beyond.

@BigLep BigLep closed this as completed Apr 19, 2023
@github-project-automation github-project-automation bot moved this from 🥞 Todo to 🎉 Done in IPFS Shipyard Team Apr 19, 2023
@LeDechaine
Copy link

LeDechaine commented Sep 12, 2024

"ipfs bitswap reprovide" needs to be fixed, I got a similar problem with "Reprovider.Strategy" not set. (The default is "all", I belive).

"ipfs bitswap reprovide" apparently "triggers reprovider to announce our data to network" and is a recommended way to make IPFS work better on forums. Having only a 20mb website (about 100 files) on IPFS, I set a cron job on two different servers, to do "ipfs bitswap reprovide" every hour on two different VPS's. Doing the command manually appeared to just hang, no info whatsoever, and I had to "ctrl+C" out of it, but tried it anyway. TL,DR: Don't.

My website was always online when I visited it, but "Uptimia" told me, multiple times per day, there was a "503 service unavailable" error. i.e.: IPFS was crashing/restarting. So, I investigated.

"journalctl -u ipfs" on server 1

Sep 12 01:50:42 server systemd[1]: ipfs.service: Main process exited, code=killed, status=9/KILL
Sep 12 01:50:42 server systemd[1]: ipfs.service: Failed with result 'signal'.
Sep 12 01:50:42 server systemd[1]: ipfs.service: Consumed 1h 9min 48.690s CPU time.
Sep 12 01:50:42 server systemd[1]: ipfs.service: Scheduled restart job, restart counter is at 12.
(...)
Sep 12 11:18:06 server systemd[1]: ipfs.service: Consumed 1h 32min 30.556s CPU time.
Sep 12 11:18:06 server systemd[1]: ipfs.service: Scheduled restart job, restart counter is at 16.
Sep 12 11:18:06 server systemd[1]: Stopped ipfs.service - IPFS daemon.
Sep 12 11:18:06 server systemd[1]: ipfs.service: Consumed 1h 32min 30.556s CPU time.

"journalctl -u ipfs" on server 2

Sep 09 00:09:18 server2 systemd[1]: ipfs.service: Scheduled restart job, restart counter is at 1.
Sep 09 00:09:18 server2 systemd[1]: Stopped IPFS daemon.
Sep 09 00:09:18 server2 systemd[1]: ipfs.service: Consumed 9min 10.547s CPU time.
Sep 09 00:09:18 server2 systemd[1]: Started IPFS daemon.
Sep 09 02:45:53 server2 systemd[1]: ipfs.service: Scheduled restart job, restart counter is at 3.
(...)
Sep 09 13:35:12 server2 systemd[1]: ipfs.service: Main process exited, code=killed, status=9/KILL
Sep 09 13:35:12 server2 systemd[1]: ipfs.service: Failed with result 'signal'.
Sep 09 13:35:12 server2 systemd[1]: ipfs.service: Consumed 16min 24.878s CPU time.
Sep 09 13:35:12 server2 systemd[1]: ipfs.service: Scheduled restart job, restart counter is at 5.

But no restarts since 4 days ago on server2?

"ps aux | grep ipfs" on server2:

ledecha+ 16088 2.5 12.8 2294940 56084 ? Ssl Sep11 40:41 ipfs daemon --migrate=true --enable-gc --routing=dhtclient
ledecha+ 16194 0.0 0.0 2480 0 ? Ss Sep11 0:00 /bin/sh -c ipfs bitswap reprovide
ledecha+ 16195 0.0 1.6 1659356 7276 ? Sl Sep11 0:29 ipfs bitswap reprovide
ledecha+ 16479 0.0 0.0 2480 0 ? Ss Sep11 0:00 /bin/sh -c ipfs bitswap reprovide
ledecha+ 16480 0.0 0.0 1733088 0 ? Sl Sep11 0:31 ipfs bitswap reprovide
ledecha+ 16730 0.0 0.0 2480 0 ? Ss Sep11 0:00 /bin/sh -c ipfs bitswap reprovide
ledecha+ 16731 0.0 0.0 1659356 4 ? Sl Sep11 0:24 ipfs bitswap reprovide

...gave me 26 instances of "ipfs bitswap reprovide" running

Long story short: executing "ipfs bitswap reprovide", even for 20mb (about 200 files), is too much for 512mb or 1gb of ram, and will systematically crash your IPFS daemon every time. I cannot try that on servers with multiple cores and multiple gb's of ram, but... big server or not, this is definitely not the intended result. IPFS worked fine, stable, no crashes, for multiple months, without "ipfs bitswap reprovide" as a cron job.

@lidel
Copy link
Member Author

lidel commented Sep 12, 2024

Hi, @LeDechaine, thank you for feedback, but this is unrelated to this already closed bug.

Please see https://github.com/ipfs/kubo#minimal-system-requirements
If you experience issues on a box that matches minimal requirements, please fill a new issue, this one is closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization P1 High: Likely tackled by core team if no one steps up
Projects
No open projects
Archived in project
Development

No branches or pull requests

3 participants