Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cost analyzer and grafana are using stale service account tokens #1428

Closed
michelesr opened this issue May 18, 2022 · 19 comments
Closed

Cost analyzer and grafana are using stale service account tokens #1428

michelesr opened this issue May 18, 2022 · 19 comments
Labels
bug Something isn't working stale

Comments

@michelesr
Copy link

michelesr commented May 18, 2022

Describe the bug
Kubernetes version 1.21 graduated BoundServiceAccountTokenVolume feature to beta and enabled it by default. This feature improves security of service account tokens by requiring a one hour expiry time, over the previous default of no expiration. This means that applications that do not refetch service account tokens periodically will receive an HTTP 401 unauthorized error response on requests to Kubernetes API server with expired tokens.

kubernetes/enhancements#542

Looking at the apiserver audit logs it can be seen that cost analyzer and grafana pods are using stale tokens. The solution for grafana should be simply updating, since we don't get that error in the grafana from the kube-prometheus-stack chart, while for cost analyzer it might involve some application changes.

There are warning such as this in the apiserver log:

 "authentication.k8s.io/stale-token": "subject: system:serviceaccount:kube-cost:kubecost-grafana, seconds after warning threshold: 10351"

Reproduced on EKS 1.22, which apparently only warns you instead of rejecting stale tokens, extending expiration to 90 days instead of 1 hour.

To Reproduce
Steps to reproduce the behavior:

  1. Install this chart on Kubernetes >= 1.21
  2. Enable apiserver audit log
  3. Grep for messages with authentication.k8s.io/stale-token

Expected behavior
No warnings in the apiserver log

gz#1834

(related to Zendesk ticket #1834)

┆Issue is synchronized with this Jira Task by Unito

@michelesr michelesr added the bug Something isn't working label May 18, 2022
@jcharcalla
Copy link
Contributor

Thank you for reporting this!

Adding @wolfeaustin

Additional documentation.

@AjayTripathy
Copy link
Contributor

Hi @michelesr could you clarify what version of kubecost you are using? And which pod specifically is producing this warning? We should already be supporting this in later versions of kubecost and in the cost-model container (the cost-analyzer-server container has been deprecated)

@michelesr
Copy link
Author

Hi @michelesr could you clarify what version of kubecost you are using? And which pod specifically is producing this warning? We should already be supporting this in later versions of kubecost and in the cost-model container (the cost-analyzer-server container has been deprecated)

Chart name: cost-analyzer
Chart version: 1.87.3
App version: 1.87.3

The pods are kubecost-cost-analzyer and kubecost-grafana

@AjayTripathy
Copy link
Contributor

Do you mind upgrading and letting us know if this persists?

@michelesr
Copy link
Author

michelesr commented Jun 1, 2022 via email

@kirbsauce
Copy link
Contributor

I believe @AjayTripathy is referring to upgrading your entire kubecost installation from 1.87 to the latest version of 1.93.

@michelesr
Copy link
Author

michelesr commented Jun 1, 2022

I've installed 1.93.2 now, I'll check later as it takes 1 hour before the warning starts popping up.

@michelesr
Copy link
Author

Right now I can still see warnings coming from grafana, but not cost-analyzer, I'll let you know if I spot warnings from the cost-analyzer in the following days.

@michelesr
Copy link
Author

Here's the event:

{
  "kind": "Event",
  "apiVersion": "audit.k8s.io/v1",
  "level": "Metadata",
  "auditID": "e7dafa4e-8611-4114-a1ec-dd7ae91695e9",
  "stage": "ResponseComplete",
  "requestURI": "/api/v1/namespaces/kube-cost/configmaps?labelSelector=grafana_dashboard&resourceVersion=369263796&watch=True",
  "verb": "watch",
  "user": {
    "username": "system:serviceaccount:kube-cost:kubecost-grafana",
    "uid": "b96db416-49c7-11ea-9635-06c8dc20265a",
    "groups": [
      "system:serviceaccounts",
      "system:serviceaccounts:kube-cost",
      "system:authenticated"
    ],
    "extra": {
      "authentication.kubernetes.io/pod-name": [
        "kubecost-grafana-66fd8c5d55-jdkz5"
      ],
      "authentication.kubernetes.io/pod-uid": [
        "0a7fd5ed-e98e-4df5-a087-dd4af75696c1"
      ]
    }
  },
  "sourceIPs": [
    "10.145.24.195"
  ],
  "userAgent": "OpenAPI-Generator/11.0.0/python",
  "objectRef": {
    "resource": "configmaps",
    "namespace": "kube-cost",
    "apiVersion": "v1"
  },
  "responseStatus": {
    "metadata": {},
    "status": "Success",
    "message": "Connection closed early",
    "code": 200
  },
  "requestReceivedTimestamp": "2022-06-01T22:34:24.151653Z",
  "stageTimestamp": "2022-06-01T22:34:24.157361Z",
  "annotations": {
    "authentication.k8s.io/stale-token": "subject: system:serviceaccount:kube-cost:kubecost-grafana, seconds after warning threshold: 4027",
    "authorization.k8s.io/decision": "allow",
    "authorization.k8s.io/reason": "RBAC: allowed by ClusterRoleBinding \"kubecost-grafana-clusterrolebinding\" of ClusterRole \"kubecost-grafana-clusterrole\" to ServiceAccount \"kubecost-grafana/kube-cost\""
  }
}

@jcharcalla
Copy link
Contributor

@Adam-Stack-PM Do you know what the current status of this is? I have a customer ZD:2412 reporting this warning in their API Server logs.

"[authentication.k8s.io/stale-token](http://authentication.k8s.io/stale-token)": "subject: system:serviceaccount:{REDACTED}:kubecost-cost-analyzer, seconds after warning threshold: 789369"

@Adam-Stack-PM
Copy link

@AjayTripathy, Do we need any additional info to assign this a priority status.

@jcharcalla, Have it marked v1.97 for tracking only. Would not promise this will be fixed in v1.97 yet.

@Adam-Stack-PM
Copy link

@AjayTripathy, Friendly nudge

@MrJW27
Copy link

MrJW27 commented Oct 20, 2022

@AjayTripathy & @Adam-Stack-PM - Do we have an update on this?

@AjayTripathy
Copy link
Contributor

@jcharcalla can you check what version your customer is running on? This is fixed in kubecost itself, just not in the bundled grafana. We may want to upgrade our bundles grafana to get rid of this altogether.

@jcharcalla
Copy link
Contributor

Their latest bug report has them at 1.97, I've asked if they are still seeing the error in relation to Grafana. Will update when they reply.

@jcharcalla
Copy link
Contributor

Here is an excerpt from the customer cloudtrail log, It looks to be related to the network-costs pod.

@logStream	
kube-apiserver-audit-<REDACTED>
@message	
{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Request","auditID":"<REDACTED>","stage":"ResponseStarted","requestURI":"/api/v1/pods?resourceVersion=2801283761\u0026timeoutSeconds=361\u0026watch=true","verb":"watch","user":{"username":"system:serviceaccount:<REDACTED>:kubecost-cost-analyzer","uid":"<REDACTED>","groups":["system:serviceaccounts","system:serviceaccounts:<REDACTED>","system:authenticated"],"extra":{"authentication.kubernetes.io/pod-name":["kubecost-network-costs-<REDACTED>"],"authentication.kubernetes.io/pod-uid":["<REDACTED>"]}},"sourceIPs":["<REDACTED>"],"userAgent":"app/v0.0.0 (linux/amd64) kubernetes/$Format","objectRef":{"resource":"pods","apiVersion":"v1"},"responseStatus":{"metadata":{},"status":"Success","message":"Connection closed early","code":200},"requestReceivedTimestamp":"2022-11-02T21:35:21.278927Z","stageTimestamp":"2022-11-02T21:41:22.281513Z","annotations":{"authentication.k8s.io/stale-token":"subject: system:serviceaccount:<REDACTED>:kubecost-cost-analyzer, seconds after warning threshold: 3530500","authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"kubecost\" of ClusterRole \"kubecost\" to ServiceAccount \"kubecost-cost-analyzer/<REDACTED>\""}}
@timestamp	
1667425282499
annotations.authentication.k8s.io/stale-token	
subject: system:serviceaccount:<REDACTED>:kubecost-cost-analyzer, seconds after warning threshold: 3530500
annotations.authorization.k8s.io/decision	
allow
annotations.authorization.k8s.io/reason	
RBAC: allowed by ClusterRoleBinding "kubecost" of ClusterRole "kubecost" to ServiceAccount "kubecost-cost-analyzer/<REDACTED>"
apiVersion	
audit.k8s.io/v1

Looking at the AWS docs I see that "Go version 0.15.7 and later" is required. Cost-model is at v1.16.2, but it looks like OpenCost is v1.13.0. I do not see the AWS SDK in network-costs.

@AjayTripathy
Copy link
Contributor

AjayTripathy commented Nov 9, 2022

#1569 related -- once that's closed, this should also be closed, since it's the optional kubecost-grafana that still seems to have this issue

Copy link

This issue has been marked as stale because it has been open for 360 days with no activity. Please remove the stale label or comment or this issue will be closed in 5 days.

@github-actions github-actions bot added the stale label Dec 29, 2023
Copy link

github-actions bot commented Jan 3, 2024

This issue was closed because it has been inactive for 365 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

7 participants