Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Diawang/dockercleaner #2119

Merged
merged 33 commits into from
Feb 13, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
12bf7c4
Cleaner logic
wangdian Jan 31, 2019
3fbc7e8
Fix bug
wangdian Jan 31, 2019
dae2b65
1. Remove worker
wangdian Feb 3, 2019
2e9482b
Fix '\t'
wangdian Feb 3, 2019
faa18f0
Bug fix
wangdian Feb 3, 2019
f88c51e
Import re in common
wangdian Feb 3, 2019
04b67df
Update openjdk build version
wangdian Feb 11, 2019
f0c3c19
[JDK] Remove JDK version hardcode and add print in pai_build for debu…
ydye Feb 11, 2019
3398ab1
Change kill signal to 10 (SIGUSR1)
wangdian Feb 11, 2019
6b2842f
[Pod Eviction] Disable kubernetes's pod eviction (#2124)
ydye Feb 11, 2019
c7f8261
Add k8s_POD to white list
wangdian Feb 11, 2019
1f05b00
Minor bug fix
wangdian Feb 11, 2019
d45083d
Cleaner logic
wangdian Jan 31, 2019
467e4dd
Fix bug
wangdian Jan 31, 2019
0290770
1. Remove worker
wangdian Feb 3, 2019
b64da73
Fix '\t'
wangdian Feb 3, 2019
b6e2d04
Bug fix
wangdian Feb 3, 2019
bd6dc38
Import re in common
wangdian Feb 3, 2019
43598a0
Update openjdk build version
wangdian Feb 11, 2019
ad36c98
Change kill signal to 10 (SIGUSR1)
wangdian Feb 11, 2019
130f811
Add k8s_POD to white list
wangdian Feb 11, 2019
ceec020
Merge branch 'diawang/dockercleaner' of https://github.com/Microsoft/…
wangdian Feb 12, 2019
2db1cd0
Cleaner logic
wangdian Jan 31, 2019
39f4cff
Fix bug
wangdian Jan 31, 2019
75a4a82
1. Remove worker
wangdian Feb 3, 2019
4f481b5
Fix '\t'
wangdian Feb 3, 2019
c21f1c3
Bug fix
wangdian Feb 3, 2019
a93baf4
Import re in common
wangdian Feb 3, 2019
2a56c41
Update openjdk build version
wangdian Feb 11, 2019
7c94642
Change kill signal to 10 (SIGUSR1)
wangdian Feb 11, 2019
3871fd5
Add k8s_POD to white list
wangdian Feb 11, 2019
fc0af72
Merge branch 'diawang/dockercleaner' of https://github.com/Microsoft/…
wangdian Feb 12, 2019
1353fc9
Add comment on cleaner kill handler in rest-server
wangdian Feb 13, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,7 @@ __pycache__
*.swp
*.swo
.DS_Store
*.sln
*.pyproj.user
*.pyproj
*.vs
8 changes: 3 additions & 5 deletions build/core/build_center.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@

import os
import sys
import traceback
import logging
import logging.config

Expand Down Expand Up @@ -122,8 +123,9 @@ def build_center(self):
build_worker.build_single_component(self.graph.services[item])
self.logger.info("Build all components succeed")

except:
except Exception, err:
self.logger.error("Build all components failed")
traceback.print_exc()
sys.exit(1)

finally:
Expand Down Expand Up @@ -156,7 +158,3 @@ def push_center(self):
self.docker_cli.docker_image_tag(image,self.build_config['dockerRegistryInfo']['dockerTag'])
self.docker_cli.docker_image_push(image,self.build_config['dockerRegistryInfo']['dockerTag'])
self.logger.info("Push image:{0} successfully".format(image))




Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

cp etcd-reconfiguration-restart/etcd.yaml /etc/kubernetes/manifests/

4 changes: 3 additions & 1 deletion deployment/k8sPaiLibrary/template/kubelet.sh.template
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,9 @@ docker run \
--image-pull-progress-deadline=10m \
--docker-root=${DOCKER_ROOT_DIR_FOR_KUBELET} \
--system-reserved=memory=3Gi \
--eviction-hard="memory.available<5%,nodefs.available<5%,imagefs.available<5%,nodefs.inodesFree<5%,imagefs.inodesFree<5%" \
--eviction-hard= \
--image-gc-high-threshold=100 \
--image-gc-low-threshold=95 \
--healthz-bind-address="0.0.0.0" \
--healthz-port="10248" \
--feature-gates="DevicePlugins=true,TaintBasedEvictions=true" \
Expand Down
5 changes: 5 additions & 0 deletions deployment/quick-start/services-configuration.yaml.template
Original file line number Diff line number Diff line change
Expand Up @@ -128,3 +128,8 @@ rest-server:
# uncomment following section if you want to customize the port of pylon
# pylon:
# port: 80

# uncomment following section if you want to customize the threshold of cleaner
# cleaner:
# threshold: 94
# interval: 60
5 changes: 5 additions & 0 deletions examples/cluster-configuration/services-configuration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -128,3 +128,8 @@ rest-server:
# uncomment following section if you want to customize the port of pylon
# pylon:
# port: 80

# uncomment following section if you want to customize the threshold of cleaner
# cleaner:
# threshold: 94
# interval: 60
1 change: 0 additions & 1 deletion paictl.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,4 +99,3 @@ def main(args):

setup_logging()
main(sys.argv[1:])

4 changes: 2 additions & 2 deletions src/base-image/build/base-image.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ RUN apt-get -y update && \
python-dev \
python-pip \
python-mysqldb \
openjdk-8-jre=8u191-b12-0ubuntu0.16.04.1 \
openjdk-8-jdk=8u191-b12-0ubuntu0.16.04.1 \
openjdk-8-jre \
openjdk-8-jdk \
openssh-server \
openssh-client \
git \
Expand Down
28 changes: 8 additions & 20 deletions src/cleaner/cleaner_main.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
import argparse
import os
from datetime import timedelta
from cleaner.scripts import clean_docker_cache, check_deleted_files
from cleaner.scripts.clean_docker import DockerCleaner
from cleaner.worker import Worker
from cleaner.utils.logger import LoggerMixin
from cleaner.utils import common
Expand Down Expand Up @@ -76,33 +76,21 @@ def sync(self):
time.sleep(1)


def get_worker(arg):
if arg == "docker_cache":
worker = Worker(clean_docker_cache.check_and_clean, 50, timeout=timedelta(minutes=10), cool_down_time=1800)
elif arg == "deleted_files":
worker = Worker(check_deleted_files.list_and_check_files, None, timeout=timedelta(minutes=10), cool_down_time=1800)
else:
raise ValueError("arguments %s is not supported.", arg)
return worker


liveness_files = {
"docker_cache": "docker-cache-cleaner-healthy",
"deleted_files": "deleted-files-cleaner-healthy"
}
def get_worker(threshold):
wangdian marked this conversation as resolved.
Show resolved Hide resolved
worker = Worker(clean_docker.check_and_clean, threshold, timeout=timedelta(minutes=10), cool_down_time=60)
return worker;


def main():
parser = argparse.ArgumentParser()
parser.add_argument("option", help="the functions currently supported: [docker_cache | deleted_files]")
parser.add_argument("-t", "--threshold", help="the disk usage precent to start cleaner")
parser.add_argument("-i", "--interval", help="the base interval to check disk usage")
args = parser.parse_args()

common.setup_logging()

cleaner = Cleaner(liveness_files[args.option])
cleaner.add_worker(args.option, get_worker(args.option))
cleaner.start()
cleaner.sync()
cleaner = DockerCleaner(args.threshold, args.interval, timedelta(minutes=10))
cleaner.run()


if __name__ == "__main__":
Expand Down
54 changes: 54 additions & 0 deletions src/cleaner/config/cleaner.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
## Cleaner section parser

- [Default Configuration](#D_Config)
- [How to Configure](#HT_Config)
- [Generated Configuraiton](#G_Config)
- [Data Table](#T_config)

#### Default configuration <a name="D_Config"></a>

[cleaner default configuration](cleaner.yaml)

#### How to configure cluster section in service-configuraiton.yaml <a name="HT_Config"></a>

All configurations in this section is optional. If you want to customized these value, you can configure it in service-configuration.yaml.

For example, if you want to use different threshold than the default value 94, add following to your service-configuration.yaml as following:
```yaml
cleaner:
threshold: new-value
interval: new-value
```

#### Generated Configuration <a name="G_Config"></a>

After parsing, object model looks like:
```yaml
cleaner:
threshold: 94
interval: 60
```


#### Table <a name="T_Config"></a>

<table>
<tr>
<td>Data in Configuration File</td>
<td>Data in Cluster Object Model</td>
<td>Data in Jinja2 Template</td>
<td>Data type</td>
</tr>
<tr>
<td>cleaner.threshold</td>
<td>com["cleaner"]["threshold"]</td>
<td>cluster_cfg["cleaner"]["threshold"]</td>
<td>Int</td>
</tr>
<tr>
<td>cleaner.interval</td>
<td>com["cleaner"]["interval"]</td>
<td>cluster_cfg["cleaner"]["interval"]</td>
<td>Int</td>
</tr>
</table>
37 changes: 27 additions & 10 deletions src/cleaner/config/cleaner.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,23 +18,40 @@

import logging
import logging.config
import copy

class Cleaner(object):

class Cleaner:

def __init__(self, cluster_configuration, service_configuration, default_service_configuraiton):
def __init__(self, cluster_conf, service_conf, default_service_conf):
self.logger = logging.getLogger(__name__)

self.cluster_configuration = cluster_configuration
self.cluster_conf = cluster_conf
self.service_conf = service_conf
self.default_service_conf = default_service_conf
wangdian marked this conversation as resolved.
Show resolved Hide resolved

def validation_pre(self):
return True, None

def run(self):
com = {}

return com

def validation_post(self, cluster_object_model):
result = copy.deepcopy(self.default_service_conf)
result.update(self.service_conf)
return result

def validation_post(self, conf):
threshold = conf["cleaner"].get("threshold")
if type(threshold) != int:
wangdian marked this conversation as resolved.
Show resolved Hide resolved
msg = "expect threshold in cleaner to be int but get %s with type %s" % \
(threshold, type(threshold))
return False, msg
else:
if threshold < 0 or threshold > 100:
msg = "expect threshold in [0, 100]"
return False, msg

interval = conf["cleaner"].get("interval")
if type(interval) != int:
msg = "expect interval in cleaner to be int but get %s with type %s" % \
(interval, type(interval))
return False, msg

return True, None

2 changes: 2 additions & 0 deletions src/cleaner/config/cleaner.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,5 @@
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

threshold: 94
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A markdown document is needed to describe this service's configuration. Like this: https://github.com/Microsoft/pai/blob/pai-0.9.y/src/cluster/config/cluster.md

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will add it

interval: 60
36 changes: 8 additions & 28 deletions src/cleaner/deploy/cleaner.yaml.template
Original file line number Diff line number Diff line change
Expand Up @@ -31,42 +31,17 @@ spec:
hostPID: true
hostNetwork: true
containers:
- name: docker-cache-cleaner
- name: docker-cleaner
image: {{ cluster_cfg["cluster"]["docker-registry"]["prefix"] }}cleaner:{{ cluster_cfg["cluster"]["docker-registry"]["tag"] }}
args:
- 'docker_cache'
- -t {{ cluster_cfg["cleaner"]["threshold"] }}
- -i {{ cluster_cfg["cleaner"]["interval"] }}
imagePullPolicy: Always
securityContext:
privileged: True
volumeMounts:
- mountPath: /var/run/docker.sock
name: docker-socket
livenessProbe:
exec:
command:
- test
- '`find /tmp/docker-cache-cleaner-healthy -mmin -1`'
initialDelaySeconds: 60
periodSeconds: 30
{%- if cluster_cfg['cluster']['common']['qos-switch'] == "true" %}
resources:
limits:
memory: "1Gi"
{%- endif %}
- name: deleted-files-cleaner
image: {{ cluster_cfg["cluster"]["docker-registry"]["prefix"] }}cleaner:{{ cluster_cfg["cluster"]["docker-registry"]["tag"] }}
args:
- 'deleted_files'
imagePullPolicy: Always
securityContext:
privileged: True
livenessProbe:
exec:
command:
- test
- '`find /tmp/deleted-files-cleaner-healthy -mmin -1`'
initialDelaySeconds: 60
periodSeconds: 30
{%- if cluster_cfg['cluster']['common']['qos-switch'] == "true" %}
resources:
limits:
Expand All @@ -78,3 +53,8 @@ spec:
- name: docker-socket
hostPath:
path: /var/run/docker.sock
tolerations:
- key: node.kubernetes.io/memory-pressure
operator: "Exists"
- key: node.kubernetes.io/disk-pressure
operator: "Exists"
Loading