Skip to content

Commit

Permalink
Fix gpu-exporter and prometheus demo (#1087)
Browse files Browse the repository at this point in the history
Signed-off-by: Syulin7 <735122171@qq.com>
  • Loading branch information
Syulin7 authored May 29, 2024
1 parent 37d8ab4 commit 64808b6
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 15 deletions.
2 changes: 0 additions & 2 deletions docs/top/prometheus.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,6 @@ $ kubectl apply -f kubernetes-artifacts/prometheus/gpu-exporter.yaml

!!! note

* the prometheus and gpu-exporter components should be deployed in namespace ``kube-system``, and so that ``arena top job <job name>`` can work.

* if the your prometheus has been existed in cluster,please make sure the k8s service whose port is 9090 has the label `kubernetes.io/service-name=prometheus-server`

3\. You can check the GPU metrics by prometheus SQL request
Expand Down
14 changes: 4 additions & 10 deletions kubernetes-artifacts/prometheus/gpu-exporter.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,17 +21,13 @@ spec:
operator: Exists
hostPID: true
volumes:
- hostPath:
path: /var/run/docker.sock
type: FileOrCreate
name: docker-sock
- hostPath:
path: /run/containerd/containerd.sock
type: FileOrCreate
type: Socket
name: containerd-sock
containers:
- name: node-gpu-exporter
image: registry.cn-hangzhou.aliyuncs.com/acs/gpu-prometheus-exporter:0.1-0e21b28
image: registry.cn-hangzhou.aliyuncs.com/acs/gpu-prometheus-exporter:v1.0.1-b2c2f9b
imagePullPolicy: Always
ports:
- containerPort: 9445
Expand All @@ -40,11 +36,9 @@ spec:
memory: 30Mi
cpu: 100m
limits:
memory: 50Mi
cpu: 200m
memory: 2000Mi
cpu: 1000m
volumeMounts:
- mountPath: /var/run/docker.sock
name: docker-sock
- mountPath: /run/containerd/containerd.sock
name: containerd-sock

Expand Down
6 changes: 3 additions & 3 deletions kubernetes-artifacts/prometheus/prometheus.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@ data:
storage-retention: 360h
---

apiVersion: rbac.authorization.k8s.io/v1beta1
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
namespace: arena-system
rules:
- apiGroups: ["", "extensions", "apps"]
- apiGroups: [""]
resources:
- nodes
- nodes/proxy
Expand All @@ -32,7 +32,7 @@ metadata:
name: prometheus
namespace: arena-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
Expand Down

0 comments on commit 64808b6

Please sign in to comment.