Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic on start-up in 2.3.0-rc4 #7207

Closed
clambin opened this issue Aug 23, 2020 · 10 comments
Closed

Panic on start-up in 2.3.0-rc4 #7207

clambin opened this issue Aug 23, 2020 · 10 comments
Assignees
Milestone

Comments

@clambin
Copy link

clambin commented Aug 23, 2020

What did you do?

Since I moved to 2.3.0-rc4, traefik panics during start-up with the following call stack:

Stack: goroutine 59 [running]: 
runtime/debug.Stack(0x4d20140, 0x2662937, 0x17)
        /usr/local/go/src/runtime/debug/stack.go:24 +0x78
github.com/containous/traefik/v2/pkg/safe.defaultRecoverGoroutine(0x2232840, 0x41a7570)
        /go/src/github.com/containous/traefik/pkg/safe/routine.go:66 +0x90
github.com/containous/traefik/v2/pkg/safe.GoWithRecover.func1.1(0x276cc34)
        /go/src/github.com/containous/traefik/pkg/safe/routine.go:56 +0x44
panic(0x2232840, 0x41a7570)
        /usr/local/go/src/runtime/panic.go:969 +0x118
runtime/internal/atomic.goLoad64(0x4fbab54, 0x51b6920, 0x2350f01)
        /usr/local/go/src/runtime/internal/atomic/atomic_arm.go:131 +0x1c
github.com/go-kit/kit/metrics/generic.(*Counter).Add(0x4fbab40, 0x0, 0x3ff00000)
        /go/pkg/mod/github.com/go-kit/kit@v0.9.0/metrics/generic/generic.go:46 +0x28
github.com/containous/traefik/v2/pkg/metrics.(*pilotCounter).Add(0x4c0ea70, 0x0, 0x3ff00000)
        /go/src/github.com/containous/traefik/pkg/metrics/pilot.go:229 +0xa4
github.com/go-kit/kit/metrics/multi.Counter.Add(0x4d11940, 0x2, 0x2, 0x0, 0x3ff00000)
        /go/pkg/mod/github.com/go-kit/kit@v0.9.0/metrics/multi/multi.go:20 +0x40
main.setupServer.func3(0x4fcd810, 0x51b68f8, 0x51b6900, 0x51a1f00)
        /go/src/github.com/containous/traefik/cmd/traefik/traefik.go:258 +0x4c
github.com/containous/traefik/v2/pkg/server.(*ConfigurationWatcher).loadMessage(0x51265a0, 0x2626245, 0x4, 0x4fcd710)
        /go/src/github.com/containous/traefik/pkg/server/configurationwatcher.go:150 +0x14c
github.com/containous/traefik/v2/pkg/server.(*ConfigurationWatcher).listenConfigurations(0x51265a0, 0x2ae2d18, 0x4fba0c0)
        /go/src/github.com/containous/traefik/pkg/server/configurationwatcher.go:132 +0x2c
github.com/containous/traefik/v2/pkg/safe.(*Pool).GoCtx.func1()
        /go/src/github.com/containous/traefik/pkg/safe/routine.go:36 +0x5c
github.com/containous/traefik/v2/pkg/safe.GoWithRecover.func1(0x276cc34, 0x51c8010)
        /go/src/github.com/containous/traefik/pkg/safe/routine.go:59 +0x48
created by github.com/containous/traefik/v2/pkg/safe.GoWithRecover
        /go/src/github.com/containous/traefik/pkg/safe/routine.go:53 +0x34

The offending call is in src/github.com/containous/traefik/pkg/metrics/pilot.go:229

     pc.(*pilotCounter).c.Add(delta)

Looks like it's related to the new Pilot functionality. And indeed, if I remove my pilot token from my static configuration, the problem doesn't occur.

Output of traefik version: (What version of Traefik are you using?)

Version:      2.3.0-rc4
Codename:     picodon
Go version:   go1.15
Built:        2020-08-19T16:02:03Z
OS/Arch:      linux/arm

What is your environment & configuration (arguments, toml, provider, platform, ...)?

providers:
  file:
    directory: /etc/traefik/dynamic
  docker:
    exposedbydefault: false
    network: traefik_public
    swarmMode: true
    watch: true

entryPoints:
  web:
    address: ":80"
    http:
      redirections:
        entryPoint:
          to: websecure
  websecure:
    address: ":443"
    http:
      tls:
        certResolver: letsencrypt
  metrics:
    address: ":8082"

certificatesResolvers:
  letsencrypt:
    acme:
      email: me@example.com
      storage: /letsencrypt/acme.json
      httpChallenge:
        entryPoint: web

api:
  insecure: true

metrics:
  prometheus:
    entryPoint: metrics

accessLog:
  filePath: /logs/access.log
  fields:
    defaultMode: keep
    headers:
      defaultMode: keep

log:
  filePath: /logs/traefik.log
  level: DEBUG

experimental:
  pilot:
    token: ********-****-****-****-************

If applicable, please paste the log output in DEBUG level (--log.level=DEBUG switch)

Nothing exceptional in the logs, except for the stack trace mentioned above:

time="2020-08-23T16:08:44Z" level=info msg="Traefik version 2.3.0-rc4 built on 2020-08-19T16:02:03Z"
[...]
time="2020-08-23T16:08:46Z" level=error msg="Error in Go routine: runtime error: invalid memory address or nil pointer dereference"
time="2020-08-23T16:08:46Z" level=error msg="Stack: goroutine 77 [running]:\nruntime/debug.Stack(0x487c180, 0x2662937, 0x17)\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x78\ngithub.com/containous/traefik/v2/pkg/safe.defaultRecoverGoroutine(0x2232840, 0x41a7570)\n\t/go/src/github.com/containous/traefik/pkg/safe/routine.go:66 +0x90\ngithub.com/containous/traefik/v2/pkg/safe.GoWithRecover.func1.1(0x276cc34)\n\t/go/src/github.com/containous/traefik/pkg/safe/routine.go:56 +0x44\npanic(0x2232840, 0x41a7570)\n\t/usr/local/go/src/runtime/panic.go:969 +0x118\nruntime/internal/atomic.goLoad64(0x4a5be14, 0x4c9e840, 0x2350f01)\n\t/usr/local/go/src/runtime/internal/atomic/atomic_arm.go:131 +0x1c\ngithub.com/go-kit/kit/metrics/generic.(*Counter).Add(0x4a5be00, 0x0, 0x3ff00000)\n\t/go/pkg/mod/github.com/go-kit/kit@v0.9.0/metrics/generic/generic.go:46 +0x28\ngithub.com/containous/traefik/v2/pkg/metrics.(*pilotCounter).Add(0x4c9e430, 0x0, 0x3ff00000)\n\t/go/src/github.com/containous/traefik/pkg/metrics/pilot.go:229 +0xa4\ngithub.com/go-kit/kit/metrics/multi.Counter.Add(0x4a92b00, 0x2, 0x2, 0x0, 0x3ff00000)\n\t/go/pkg/mod/github.com/go-kit/kit@v0.9.0/metrics/multi/multi.go:20 +0x40\nmain.setupServer.func3(0x4fae2f0, 0x4b69930, 0x4b69938, 0x4a2dba0)\n\t/go/src/github.com/containous/traefik/cmd/traefik/traefik.go:258 +0x4c\ngithub.com/containous/traefik/v2/pkg/server.(*ConfigurationWatcher).loadMessage(0x4adc780, 0x4b38060, 0x10, 0x4fae2a0)\n\t/go/src/github.com/containous/traefik/pkg/server/configurationwatcher.go:150 +0x14c\ngithub.com/containous/traefik/v2/pkg/server.(*ConfigurationWatcher).listenConfigurations(0x4adc780, 0x2ae2d18, 0x4a5b380)\n\t/go/src/github.com/containous/traefik/pkg/server/configurationwatcher.go:132 +0x2c\ngithub.com/containous/traefik/v2/pkg/safe.(*Pool).GoCtx.func1()\n\t/go/src/github.com/containous/traefik/pkg/safe/routine.go:36 +0x5c\ngithub.com/containous/traefik/v2/pkg/safe.GoWithRecover.func1(0x276cc34, 0x4a92e10)\n\t/go/src/github.com/containous/traefik/pkg/safe/routine.go:59 +0x48\ncreated by github.com/containous/traefik/v2/pkg/safe.GoWithRecover\n\t/go/src/github.com/containous/traefik/pkg/safe/routine.go:53 +0x34\n"
@ldez
Copy link
Contributor

ldez commented Aug 23, 2020

Hello,

could you provide the full logs because I don't reproduce, at startup, with your configuration.

thanks.

@clambin
Copy link
Author

clambin commented Aug 23, 2020

Here you go.

traefik.log

@ldez
Copy link
Contributor

ldez commented Aug 23, 2020

I still not able to reproduce could you create a minimal reproducible example?

@ldez ldez added kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. and removed kind/bug/confirmed a confirmed bug (reproducible). labels Aug 23, 2020
@ldez
Copy link
Contributor

ldez commented Aug 23, 2020

Maybe it's related to the architecture: I'm using amd64, and you are using arm and the nil pointer comes from:

/usr/local/go/src/runtime/internal/atomic/atomic_arm.go:131

🤔

@clambin
Copy link
Author

clambin commented Aug 23, 2020

I'm actually on ARM32 (not 64). Even with the following minimum config, I'm getting this error:

docker-compose.yml:

version: "2"

services:
  proxy:
    image: traefik:v2.3.0-rc4
    ports:
      - 81:80
    volumes:
      - ./traefik.yml:/etc/traefik/traefik.yml
      - ./logs:/logs

Note: was running in docker swarm, but this example is just straight docker.

traefik.yml:

log:
  filePath: /logs/traefik.log
  level: DEBUG

experimental:
  pilot:
    token: ********-****-****-****-***********

traefik.log:

time="2020-08-23T20:26:14Z" level=info msg="Traefik version 2.3.0-rc4 built on 2020-08-19T16:02:03Z"
time="2020-08-23T20:26:14Z" level=debug msg="Static configuration loaded {\"global\":{\"checkNewVersion\":true},\"serversTransport\":{\"maxIdleConnsPerHost\":200},\"entryPoints\":{\"http\":{\"address\":\":80\",\"transport\":{\"lifeCycle\":{\"graceTimeOut\":10000000000},\"respondingTimeouts\":{\"idleTimeout\":180000000000}},\"forwardedHeaders\":{},\"http\":{}}},\"providers\":{\"providersThrottleDuration\":2000000000},\"log\":{\"level\":\"DEBUG\",\"filePath\":\"/logs/traefik.log\",\"format\":\"common\"},\"experimental\":{\"pilot\":{\"token\":\"********-****-****-****-***********\"}}}"
time="2020-08-23T20:26:14Z" level=info msg="\nStats collection is disabled.\nHelp us improve Traefik by turning this feature on :)\nMore details on: https://docs.traefik.io/contributing/data-collection/\n"
time="2020-08-23T20:26:14Z" level=info msg="Starting provider aggregator.ProviderAggregator {}"
time="2020-08-23T20:26:14Z" level=debug msg="Start TCP Server" entryPointName=http
time="2020-08-23T20:26:14Z" level=info msg="Starting provider *traefik.Provider {}"
time="2020-08-23T20:26:14Z" level=debug msg="Configuration received from provider internal: {\"http\":{\"services\":{\"noop\":{}}},\"tcp\":{},\"tls\":{}}" providerName=internal
time="2020-08-23T20:26:14Z" level=error msg="Error in Go routine: runtime error: invalid memory address or nil pointer dereference"
time="2020-08-23T20:26:14Z" level=error msg="Stack: goroutine 38 [running]:\nruntime/debug.Stack(0x4c7c180, 0x2662937, 0x17)\n\t/usr/local/go/src/runtime/debug/stack.go:24 +0x78\ngithub.com/containous/traefik/v2/pkg/safe.defaultRecoverGoroutine(0x2232840, 0x41a7570)\n\t/go/src/github.com/containous/traefik/pkg/safe/routine.go:66 +0x90\ngithub.com/containous/traefik/v2/pkg/safe.GoWithRecover.func1.1(0x276cc34)\n\t/go/src/github.com/containous/traefik/pkg/safe/routine.go:56 +0x44\npanic(0x2232840, 0x41a7570)\n\t/usr/local/go/src/runtime/panic.go:969 +0x118\nruntime/internal/atomic.goLoad64(0x4f5ce74, 0x4db2968, 0x0)\n\t/usr/local/go/src/runtime/internal/atomic/atomic_arm.go:131 +0x1c\ngithub.com/go-kit/kit/metrics/generic.(*Counter).Add(0x4f5ce60, 0x0, 0x3ff00000)\n\t/go/pkg/mod/github.com/go-kit/kit@v0.9.0/metrics/generic/generic.go:46 +0x28\ngithub.com/containous/traefik/v2/pkg/metrics.(*pilotCounter).Add(0x4db2728, 0x0, 0x3ff00000)\n\t/go/src/github.com/containous/traefik/pkg/metrics/pilot.go:229 +0xa4\ngithub.com/go-kit/kit/metrics/multi.Counter.Add(0x4db27f0, 0x1, 0x1, 0x0, 0x3ff00000)\n\t/go/pkg/mod/github.com/go-kit/kit@v0.9.0/metrics/multi/multi.go:20 +0x40\nmain.setupServer.func3(0x4e04960, 0x4db2940, 0x4db2948, 0x4f5d5e0)\n\t/go/src/github.com/containous/traefik/cmd/traefik/traefik.go:258 +0x4c\ngithub.com/containous/traefik/v2/pkg/server.(*ConfigurationWatcher).loadMessage(0x4d1ec00, 0x263056f, 0x8, 0x4e4a3e0)\n\t/go/src/github.com/containous/traefik/pkg/server/configurationwatcher.go:150 +0x14c\ngithub.com/containous/traefik/v2/pkg/server.(*ConfigurationWatcher).listenConfigurations(0x4d1ec00, 0x2ae2d18, 0x4f5cdc0)\n\t/go/src/github.com/containous/traefik/pkg/server/configurationwatcher.go:132 +0x2c\ngithub.com/containous/traefik/v2/pkg/safe.(*Pool).GoCtx.func1()\n\t/go/src/github.com/containous/traefik/pkg/safe/routine.go:36 +0x5c\ngithub.com/containous/traefik/v2/pkg/safe.GoWithRecover.func1(0x276cc34, 0x5148bb0)\n\t/go/src/github.com/containous/traefik/pkg/safe/routine.go:59 +0x48\ncreated by github.com/containous/traefik/v2/pkg/safe.GoWithRecover\n\t/go/src/github.com/containous/traefik/pkg/safe/routine.go:53 +0x34\n"

@clambin
Copy link
Author

clambin commented Aug 23, 2020

Not very familiar with the traefik code base (or golang for that matter), but going down the stack trace, this seems to originate from this call in /cmd/traefik/traefik.go:

	watcher.AddListener(func(_ dynamic.Configuration) {
		metricsRegistry.ConfigReloadsCounter().Add(1)
		metricsRegistry.LastConfigReloadSuccessGauge().Set(float64(time.Now().Unix()))
	})

How that ends up in the metrics/pilot.go code, I do not understand ...

@ldez
Copy link
Contributor

ldez commented Aug 23, 2020

I know the source of the bug, but currently I'm not able to reproduce, I have to find an arm device.

//go:nosplit
func goLoad64(addr *uint64) uint64 {
	if uintptr(unsafe.Pointer(addr))&7 != 0 {
		*(*int)(nil) = 0 // crash on unaligned uint64 <----------------------------------------
	}
	_ = *addr // if nil, fault before taking the lock
	var r uint64
	addrLock(addr).lock()
	r = *addr
	addrLock(addr).unlock()
	return r
}

golang/go#599

@clambin
Copy link
Author

clambin commented Aug 23, 2020

sure. if I can help, let me know.

@ldez ldez self-assigned this Aug 23, 2020
@ldez ldez added kind/bug/confirmed a confirmed bug (reproducible). and removed kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. labels Aug 24, 2020
@ldez
Copy link
Contributor

ldez commented Aug 24, 2020

I was able to reproduce with my old Pi2-B.

The bug is due to alignment issues in 32-bit systems related to structures within the go-kit library.

I did a PR in order to fix this problem: go-kit/kit#1007

@traefiker traefiker added this to the 2.3 milestone Sep 15, 2020
@traefiker
Copy link
Contributor

Closed by #7289.

@traefik traefik locked and limited conversation to collaborators Oct 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants