Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.18.1 panics on startup with certain routing config #9682

Closed
guseggert opened this issue Mar 1, 2023 · 7 comments · Fixed by #9708
Closed

v0.18.1 panics on startup with certain routing config #9682

guseggert opened this issue Mar 1, 2023 · 7 comments · Fixed by #9708
Assignees
Labels
kind/bug A bug in existing code (including security flaws) P0 Critical: Tackled by core team ASAP status/in-progress In progress status/WIP This a Work In Progress

Comments

@guseggert
Copy link
Contributor

guseggert commented Mar 1, 2023

Version

v0.18.1

Config

"Routing": {
    "Methods": {
      "find-peers": {
        "RouterName": "DHTs"
      },
      "find-providers": {
        "RouterName": "DHTsAndIndexers"
      },
      "get-ipns": {
        "RouterName": "DHTs"
      },
      "provide": {
        "RouterName": "DHTs"
      },
      "put-ipns": {
        "RouterName": "DHTs"
      }
    },
    "Routers": {
      "CidContact": {
        "Parameters": {
          "Endpoint": "https://cid.contact"
        },
        "Type": "http"
      },
      "DHTs": {
        "Parameters": {
          "Routers": [
            {
              "IgnoreErrors": false,
              "RouterName": "WanDHT",
              "Timeout": "5m"
            },
            {
              "IgnoreErrors": false,
              "RouterName": "LanDHT",
              "Timeout": "1m"
            }
          ]
        },
        "Type": "parallel"
      },
      "DHTsAndIndexers": {
        "Parameters": {
          "Routers": [
            {
              "IgnoreErrors": true,
              "RouterName": "CidContact",
              "Timeout": "3s"
            },
            {
              "IgnoreErrors": false,
              "RouterName": "DHTs",
              "Timeout": "5m"
            }
          ]
        },
        "Type": "parallel"
      },
      "LanDHT": {
        "Parameters": {
          "AcceleratedDHTClient": false,
          "Mode": "auto",
          "PublicIPNetwork": false
        },
        "Type": "dht"
      },
      "WanDHT": {
        "Parameters": {
          "AcceleratedDHTClient": true,
          "Mode": "auto",
          "PublicIPNetwork": true
        },
        "Type": "dht"
      }
    },
    "Type": "custom"
  },

Description

Kubo panics on preload nodes with the above config:

{"level":"error","ts":"2023-02-23T17:47:01.556Z","logger":"cmds/http","caller":"http/handler.go:94","msg":"a panic has occurred in the commands handler!"}                                                                                                                        
{"level":"error","ts":"2023-02-23T17:47:01.556Z","logger":"cmds/http","caller":"http/handler.go:95","msg":"runtime error: invalid memory address or nil pointer dereference"}                                                                                                     
{"level":"error","ts":"2023-02-23T17:47:01.556Z","logger":"cmds/http","caller":"http/handler.go:96","msg":"stack trace:\ngoroutine 2087244 [running]:\nruntime/debug.Stack()\n\truntime/debug/stack.go:24 +0x65\ngh.neting.cc/ipfs/go-ipfs-cmds/http.(*handler).ServeHTTP.func1()\n\
tgh.neting.cc/ipfs/go-ipfs-cmds@v0.8.2/http/handler.go:96 +0xfa\npanic({0x2395380, 0x399af20})\n\truntime/panic.go:890 +0x262\ngh.neting.cc/ipfs/kubo/core/commands.glob..func50(0xc026e76d90, {0x7f79b1115f08, 0xc026e76e00}, {0x248df80?, 0xc002a89cc0?})\n\tgh.neting.cc/ipfs/kubo/c$
re/commands/dht.go:122 +0x16b\ngh.neting.cc/ipfs/go-ipfs-cmds.(*Command).call(0xc0367d5190?, 0xc026e76d90, {0x7f79b1115f08, 0xc026e76e00}, {0x248df80, 0xc002a89cc0})\n\tgh.neting.cc/ipfs/go-ipfs-cmds@v0.8.2/command.go:189 +0x1ab\ngh.neting.cc/ipfs/go-ipfs-cmds.(*Command).Call(0xc
002a89cc0?, 0xc026e76d90?, {0x7f79b1115f08, 0xc026e76e00}, {0x248df80?, 0xc002a89cc0?})\n\tgh.neting.cc/ipfs/go-ipfs-cmds@v0.8.2/command.go:159 +0x4b\ngh.neting.cc/ipfs/go-ipfs-cmds/http.(*handler).ServeHTTP(0xc0021ee060, {0x7f79b1115df8, 0xc01ab3e0f0}, 0xc021b4b300)\n\tgithub.
com/ipfs/go-ipfs-cmds@v0.8.2/http/handler.go:192 +0xa75\ngh.neting.cc/ipfs/go-ipfs-cmds/http.prefixHandler.ServeHTTP({{0x26827a8?, 0xc049660005?}, {0x2babdc0?, 0xc0021ee060?}}, {0x7f79b1115df8, 0xc01ab3e0f0}, 0xc021b4b300)\n\tgh.neting.cc/ipfs/go-ipfs-cmds@v0.8.2/http/apiprefix
.go:24 +0x142\ngh.neting.cc/rs/cors.(*Cors).Handler.func1({0x7f79b1115df8, 0xc01ab3e0f0}, 0xc021b4b300)\n\tgh.neting.cc/rs/cors@v1.7.0/cors.go:219 +0x1bd\nnet/http.HandlerFunc.ServeHTTP(0x70?, {0x7f79b1115df8?, 0xc01ab3e0f0?}, 0xfa21c5?)\n\tnet/http/server.go:2109 +0x2f\nnet/ht
tp.(*ServeMux).ServeHTTP(0xc049641681?, {0x7f79b1115df8, 0xc01ab3e0f0}, 0xc021b4b300)\n\tnet/http/server.go:2487 +0x149\ngh.neting.cc/ipfs/kubo/core/corehttp.CheckVersionOption.func1.1({0x7f79b1115df8, 0xc01ab3e0f0}, 0xc021b4b300)\n\tgh.neting.cc/ipfs/kubo/core/corehttp/command
s.go:188 +0x2d8\nnet/http.HandlerFunc.ServeHTTP(0x7f79b1115df8?, {0x7f79b1115df8?, 0xc01ab3e0f0?}, 0xc04d52f718?)\n\tnet/http/server.go:2109 +0x2f\nnet/http.(*ServeMux).ServeHTTP(0x7f79b1115df8?, {0x7f79b1115df8, 0xc01ab3e0f0}, 0xc021b4b300)\n\tnet/http/server.go:2487 +0x14
9\ngh.neting.cc/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerResponseSize.func1({0x7f79b1115df8?, 0xc01ab3e0a0?}, 0xc021b4b300)\n\tgh.neting.cc/prometheus/client_golang@v1.14.0/prometheus/promhttp/instrument_server.go:288 +0xc5\nnet/http.HandlerFunc.ServeHTTP(0
xe0a1bf?, {0x7f79b1115df8?, 0xc01ab3e0a0?}, 0xc04d52f8c8?)\n\tnet/http/server.go:2109 +0x2f\ngh.neting.cc/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerRequestSize.func2({0x7f79b1115df8?, 0xc01ab3e0a0?}, 0xc021b4b300)\n\tgh.neting.cc/prometheus/client_golang@v1.
14.0/prometheus/promhttp/instrument_server.go:249 +0x77\nnet/http.HandlerFunc.ServeHTTP(0xc04d52f900?, {0x7f79b1115df8?, 0xc01ab3e0a0?}, 0x30?)\n\tnet/http/server.go:2109 +0x2f\ngh.neting.cc/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2({0x7f79b
1115df8, 0xc01ab3e0a0}, 0xc021b4b300)\n\tgh.neting.cc/prometheus/client_golang@v1.14.0/prometheus/promhttp/instrument_server.go:108 +0xbf\nnet/http.HandlerFunc.ServeHTTP(0x2bbe930?, {0x7f79b1115df8?, 0xc01ab3e0a0?}, 0x24f1d40?)\n\tnet/http/server.go:2109 +0x2f\ngh.neting.cc/pro
metheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1({0x2bbe930?, 0xc03e6177a0?}, 0xc021b4b300)\n\tgh.neting.cc/prometheus/client_golang@v1.14.0/prometheus/promhttp/instrument_server.go:146 +0xb8\nnet/http.HandlerFunc.ServeHTTP(0x100c625d8?, {0x2bbe930?, 0
xc03e6177a0?}, 0xf18109?)\n\tnet/http/server.go:2109 +0x2f\nnet/http.(*ServeMux).ServeHTTP(0x0?, {0x2bbe930, 0xc03e6177a0}, 0xc021b4b300)\n\tnet/http/server.go:2487 +0x149\ngh.neting.cc/ipfs/kubo/core/corehttp.makeHandler.func1({0x2bbe930?, 0xc03e6177a0?}, 0x7f79b86b9728?)\n\
tgh.neting.cc/ipfs/kubo/core/corehttp/corehttp.go:54 +0x6b\nnet/http.HandlerFunc.ServeHTTP(0xc049660017?, {0x2bbe930?, 0xc03e6177a0?}, 0xe6c7ce?)\n\tnet/http/server.go:2109 +0x2f\nnet/http.serverHandler.ServeHTTP({0xc03960b950?}, {0x2bbe930, 0xc03e6177a0}, 0xc021b4b300)\n\tne
t/http/server.go:2947 +0x30c\nnet/http.(*conn).serve(0xc04bafa820, {0x2bc0a40, 0xc01b788570})\n\tnet/http/server.go:1991 +0x607\ncreated by net/http.(*Server).Serve\n\tnet/http/server.go:3102 +0x4db\n"} 

For more context: https://github.com/protocol/bifrost-infra/pull/2356

@guseggert guseggert added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Mar 1, 2023
@guseggert guseggert self-assigned this Mar 1, 2023
@guseggert guseggert added P0 Critical: Tackled by core team ASAP status/in-progress In progress status/WIP This a Work In Progress and removed need/triage Needs initial labeling and prioritization labels Mar 1, 2023
@guseggert guseggert moved this to 🏃‍♀️ In Progress in IPFS Shipyard Team Mar 1, 2023
@BigLep BigLep mentioned this issue Mar 1, 2023
@guseggert
Copy link
Contributor Author

Basically the ipfs dht query command assumes that Kubo is using a DHT, which is not always the case anymore due to the addition of delegated routing. It also has some special logic to favor the WAN DHT and fall back to the LAN DHT if the WAN DHT doesn't have any peers. When custom routing config is specified, this code has no knowledge of whether the content router is a DHT nor a dual DHT (and thus has no knowledge of the WAN and LAN DHTs). The panic that is happening here is due to this--it is assuming a WAN DHT is present, but it's not because custom routing config has been used.

A quick hack that we can use here is to search around in the config for a content router that supports FindClosestPeers and use the first one we stumble on. I think that would unblock bifrost and most users. It's not much extra effort though to just add "query" as a method to the routing config, which lets users specify which router they want to use for "query". Then the equivalent behavior for the existing "query" function (fall back to LAN if WAN is offline) would be to just query both in parallel using a parallel router, which is usually already configured.

@guseggert
Copy link
Contributor Author

This also leads to the question of whether to add "query" or "find-closest-peers" to the delegated routing API. That's something we can add later, just mentioning it because I don't think it's been considered for delegated routing yet.

@BigLep
Copy link
Contributor

BigLep commented Mar 7, 2023

For anyone watching this issue, this is "P0" since it is blocking the rollout of "HTTP Delegated Routing" https://github.com/protocol/bifrost-infra/issues/2183 .

@guseggert : please update here with next steps as they are known.

@guseggert
Copy link
Contributor Author

I think the best way to quickly unblock here is to add a new routing type called "autoclient", which is the same as "auto" except it disables DHT servers. This is similar as "dht" and "dhtclient", so we would now have "auto" and "autoclient". Then bifrost would switch to use "autoclient" with Experimental.AcceleratedDHTClient enabled. I think this would meet their requirements, and IIRC this is something that Brave et al would like to use too.

This does not resolve the problem of "custom" routing not working correctly with ipfs dht query and ipfs stats dht. That will require more significant work and is something we need to do, because they are both broken now with Routing.Type=custom, but we can at least unblock bifrost with this and it does not add significant amounts of tech debt.

I have already tested this and it works, I am writing tests now for it.

@guseggert
Copy link
Contributor Author

#9708

@BigLep
Copy link
Contributor

BigLep commented Mar 8, 2023

Thanks @guseggert . A couple of things:

  1. I'll let @lidel comment on the utility of this beyond Brave contexts (ideally linking to any issues - I thought we had one but am struggling to find it.)
  2. Do we need to open up separate issues around "custom routing not working correctly with ipfs dht query and ipfs stats dht"? I assume there are comments from chore: deprecate remaining dht query command #9690 we should extract?

@guseggert
Copy link
Contributor Author

There is already one open for ipfs stats dht, we can just use that since it's the same root cause: #9482

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) P0 Critical: Tackled by core team ASAP status/in-progress In progress status/WIP This a Work In Progress
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants