Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP 500 while accessing the status API #827

Closed
alexanderfefelov opened this issue Oct 13, 2020 · 3 comments · Fixed by #829
Closed

HTTP 500 while accessing the status API #827

alexanderfefelov opened this issue Oct 13, 2020 · 3 comments · Fixed by #829
Labels

Comments

@alexanderfefelov
Copy link

Fist server with --bootstrap-expect 1

Everything works fine. $ http http://dkron-server-1.backpack.test:8900/v1:

HTTP/1.1 200 OK
Content-Length: 1309
Content-Type: application/json; charset=utf-8
Date: Tue, 13 Oct 2020 15:26:17 GMT

{
    "agent": {
        "name": "dkron-server-1.backpack.test",
        "version": "3.0.5"
    },
    "scheduler": {
        "1": "\"Job: dkron-server-1_backpack_test___backup-dkron-jobs, scheduled at: @every 4h, tags:map[host:dkron-server-1.backpack.test]\"",
        "2": "\"Job: dkron-server-1_backpack_test___remove-old-dkron-backups, scheduled at: @every 8h, tags:map[host:dkron-server-1.backpack.test]\"",
        "3": "\"Job: dkron-server-1_backpack_test___remove-old-dkron-logs, scheduled at: @every 8h, tags:map[host:dkron-server-1.backpack.test]\"",
        "4": "\"Job: dkron-server-2_backpack_test___backup-dkron-jobs, scheduled at: @every 4h, tags:map[host:dkron-server-2.backpack.test]\"",
        "5": "\"Job: dkron-server-2_backpack_test___remove-old-dkron-backups, scheduled at: @every 8h, tags:map[host:dkron-server-2.backpack.test]\"",
        "6": "\"Job: dkron-server-2_backpack_test___remove-old-dkron-logs, scheduled at: @every 8h, tags:map[host:dkron-server-2.backpack.test]\""
    },
    "serf": {
        "coordinate_resets": "0",
        "encrypted": "false",
        "event_queue": "0",
        "event_time": "1",
        "failed": "0",
        "health_score": "0",
        "intent_queue": "0",
        "left": "0",
        "member_time": "4",
        "members": "4",
        "query_queue": "0",
        "query_time": "1"
    },
    "tags": {
        "dc": "backpack",
        "expect": "1",
        "host": "dkron-server-1.backpack.test",
        "port": "6868",
        "region": "test",
        "role": "dkron",
        "rpc_addr": "172.17.0.12:6868",
        "server": "true",
        "version": "3.0.5"
    }
}

Second server with no bootstrap specified

$ http http://dkron-server-2.backpack.test:8903/v1:

HTTP/1.1 500 Internal Server Error
Content-Length: 0
Date: Tue, 13 Oct 2020 15:30:32 GMT

and

2020/10/13 18:41:45 [Recovery] 2020/10/13 - 18:41:45 panic recovered:
runtime error: invalid memory address or nil pointer dereference
/opt/hostedtoolcache/go/1.14.7/x64/src/runtime/panic.go:212 (0x449fc9)
/opt/hostedtoolcache/go/1.14.7/x64/src/runtime/signal_unix.go:695 (0x449e18)
/home/runner/go/pkg/mod/github.com/robfig/cron/v3@v3.0.1/cron.go:178 (0xdb0981)
/home/runner/work/dkron/dkron/dkron/api.go:121 (0x208732b)
/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.6.3/context.go:161 (0xc00b7a)
/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.6.3/recovery.go:83 (0xc1430f)
/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.6.3/context.go:161 (0xc00b7a)
/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.6.3/logger.go:241 (0xc13440)
/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.6.3/context.go:161 (0xc00b7a)
/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.6.3/gin.go:409 (0xc0a955)
/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.6.3/gin.go:367 (0xc0a06c)
/opt/hostedtoolcache/go/1.14.7/x64/src/net/http/server.go:2836 (0x711ea2)
/opt/hostedtoolcache/go/1.14.7/x64/src/net/http/server.go:1924 (0x70d80b)
/opt/hostedtoolcache/go/1.14.7/x64/src/runtime/asm_amd64.s:1373 (0x463e30)

Environment

alexanderfefelov added a commit to alexanderfefelov/docker-backpack that referenced this issue Oct 13, 2020
@yvanoers
Copy link
Collaborator

yvanoers commented Oct 14, 2020

This is the same bug as reported in #822.
It happens when the /v1 endpoint is requested on a node that is not the leader. /v1 returns some server info and recently was made to return a summary of scheduled jobs. Only the leader has the scheduler though, causing this call to fail on any other node.

This isn't hard to fix, but there are a couple of ways to go about it:

  1. Remove the listing of scheduled job from /v1 altogether
  2. Not returning the listing of scheduled jobs from /v1 on non-leaders
  3. Returning null as the value for 'schedules' on non-leaders
  4. Properly return the listing of scheduled jobs in the output by consulting the leader if necessary

IMHO, it is not this endpoint's responsibility to return the scheduled jobs. And especially considering feature requests like pagination and reported problems like slow response times, I don't think it is a good idea to keep the schedule summary in here.

My vote goes to option no. 1.
Perhaps it is worth considering a new endpoint that can return a summary of all scheduled jobs, or introducing a request parameter that enables this behavior.

@Victorcoder What say you?

@alexanderfefelov
Copy link
Author

+1 for "Remove the listing of scheduled job from /v1 altoghether"

@vcastellm vcastellm added the bug label Oct 15, 2020
@vcastellm
Copy link
Member

The reason to include scheduled jobs here is to have and endpoint that returns the snapshot of the jobs that are scheduled, helping to monitor what is really running/prepared to run, at all times.

Clearly I introduced the bug, and you are right, it's not this endpoint responsibility to return the job list. I will find a better way to implement this.

Reverting the PR.

Thanks for all your suggestions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants