Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

docker cache deploy #5290

Merged
merged 80 commits into from
Feb 22, 2021
Merged
Show file tree
Hide file tree
Changes from 53 commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
727dfa8
Add basic docker-cache folder
SwordFaith Jan 15, 2021
e95e05b
Update docker-cache configs and scripts
SwordFaith Jan 18, 2021
d25c6d1
Add uri in docker-cache config
SwordFaith Jan 18, 2021
4ef4cb4
Add pylon env and location for docker cache
SwordFaith Jan 18, 2021
241307e
Add basic docker_cache.py
SwordFaith Jan 19, 2021
10ca21b
Add service.yaml
SwordFaith Jan 19, 2021
efc4230
Update docker-cache.yaml.template
SwordFaith Jan 19, 2021
facd854
Fix missing docker_cache bug in docker-cache.yaml.template
SwordFaith Jan 19, 2021
4fe370c
Seperate docker-cache.yaml.template to multiple template files
SwordFaith Jan 19, 2021
29bfb12
Remove docker-cache namespace
SwordFaith Jan 19, 2021
798f141
Try to fix containerPort type error
SwordFaith Jan 19, 2021
784a373
Fix stop.sh file not find error and docker-cache-service namespace error
SwordFaith Jan 19, 2021
75a1a77
Modify default service nodeport to valid range
SwordFaith Jan 19, 2021
2eadc5c
Merge branch 'master' into swordfaith/docker-cache-deploy
SwordFaith Jan 22, 2021
c17ca58
Add docker-cache option in role docker-runtime
SwordFaith Jan 27, 2021
d010e92
Add docker-cache related config in kubespray
SwordFaith Jan 27, 2021
23f7293
Add test play book
SwordFaith Jan 27, 2021
1518147
Move docker-cache-dist to parent dir
SwordFaith Jan 27, 2021
6d5f47c
Add enable docker-cache info in role tasks
SwordFaith Jan 27, 2021
d9ad362
Disable docker-cache htpasswd auth
SwordFaith Jan 27, 2021
6986826
Change daemon.json mirror to default nodeport conf
SwordFaith Jan 28, 2021
fbd696a
Change legacy daemon.json path to default
SwordFaith Jan 28, 2021
4bef655
Add systemctl reload task
SwordFaith Jan 28, 2021
8de7c3a
Avoid install nvidia-runtime by default
SwordFaith Jan 28, 2021
9d2619c
Fix bug in config dist load config command
SwordFaith Jan 28, 2021
f351c45
Revert docker-runtime role as it is
SwordFaith Jan 28, 2021
ad794c5
Move docker-cache related op to docker-cache role
SwordFaith Jan 28, 2021
fc8b753
Update docker-cache daemon.json
SwordFaith Jan 28, 2021
a25a04a
Update a version works for job submit cache
SwordFaith Jan 29, 2021
eb81c5f
Add sample deploy config and script
SwordFaith Jan 29, 2021
bf83b5d
Add process logic if undefine other mirrors and insecure registries
SwordFaith Jan 29, 2021
3731d46
Add docker-cache config to service-conf
SwordFaith Jan 29, 2021
699a599
Fix bug in get docker config func name
SwordFaith Jan 29, 2021
4307571
Fix typo in docker cache config keys
SwordFaith Jan 29, 2021
4520d3e
Fix bug in docker-cache template list
SwordFaith Jan 29, 2021
e0b46a4
Modify docker-cache start.sh.template
SwordFaith Jan 29, 2021
97d93f7
Fix start.sh.template condition
SwordFaith Jan 29, 2021
f580371
Test another template condition to docker-cache start.sh
SwordFaith Jan 29, 2021
1428fd0
Cancel start.sh.template to start.sh.
SwordFaith Jan 29, 2021
9bbbf81
update
SwordFaith Jan 29, 2021
665f845
Test start.sh.template
SwordFaith Jan 29, 2021
f7b9f13
update
SwordFaith Jan 29, 2021
d2cbe5a
update
SwordFaith Jan 29, 2021
936db62
Add docker-cache as prerequisite of most job
SwordFaith Feb 4, 2021
a1ec787
Merge branch 'master' into swordfaith/docker-cache-deploy
SwordFaith Feb 4, 2021
e5dd6b2
Add docker-cache config to kubespray config
SwordFaith Feb 5, 2021
50d0119
Add insecure_registry and mirror diff logic
SwordFaith Feb 5, 2021
21e1b6b
Fix docker insecure registry and mirror format error
SwordFaith Feb 5, 2021
8c79630
Merge branch 'master' into swordfaith/docker-cache-deploy
SwordFaith Feb 5, 2021
624b366
Fix pylint errors
SwordFaith Feb 7, 2021
6157910
Fix deployment delete error
SwordFaith Feb 7, 2021
b253761
Add enable option in docker-cache to support start.sh.template
SwordFaith Feb 7, 2021
d55f4be
Fix service conf template for enabled in docker-cache
SwordFaith Feb 7, 2021
e2cdfdb
Merge branch 'master' into swordfaith/docker-cache-deploy
yiyione Feb 9, 2021
8d8fd94
Add fs storage backend support
SwordFaith Feb 9, 2021
7a9b41b
Fix typo in docker-cache and add trailing new line
SwordFaith Feb 9, 2021
82a3e25
Fix missing quote in service conf template
SwordFaith Feb 9, 2021
6bf0386
Change docker-cache config update logic
SwordFaith Feb 9, 2021
1f21b60
Fix typo
SwordFaith Feb 9, 2021
e8cac8f
Use py script to change docker config
SwordFaith Feb 9, 2021
2e1898c
Fix typo in docker-cache role
SwordFaith Feb 9, 2021
88c22b0
Change arg format in ansible builtin command
SwordFaith Feb 9, 2021
431dac8
Change add docker config task format
SwordFaith Feb 9, 2021
cdc7bc2
Change format of add docker cache config task
SwordFaith Feb 9, 2021
4641920
Remove unused files in docker-cache role
SwordFaith Feb 9, 2021
f6a6d7d
Rename docker-cache-dist to docker-cache-config-distribute
SwordFaith Feb 19, 2021
1b86ff6
Remove redundant function do generate specific docker-cache-config
SwordFaith Feb 19, 2021
9d9f878
Add parameter to docker-cache role
SwordFaith Feb 19, 2021
aa1a651
Update
SwordFaith Feb 19, 2021
ec34eed
Change docker_cache_host pass by environ
SwordFaith Feb 19, 2021
66eba51
Revert to distribute localhost node port only
SwordFaith Feb 19, 2021
03fd7ce
Add support to dynamic docker-cache config dist host
SwordFaith Feb 20, 2021
61f7ff9
Delete cluster type: yarn use docker-cache
SwordFaith Feb 20, 2021
d5e5f37
Add trailing new line
SwordFaith Feb 20, 2021
0934044
Change auth htpasswd enabled when user specify htpasswd secret
SwordFaith Feb 20, 2021
6b882dc
Add explaination to registry-htpasswd parameter
SwordFaith Feb 20, 2021
e062992
Add trainling new lin to service.yaml
SwordFaith Feb 20, 2021
7dd0359
Squeeze service-conf by jinja2 whitespace control
SwordFaith Feb 22, 2021
e6c439a
Squeeze docker-cache-config template by jinja2 whitespace control
SwordFaith Feb 22, 2021
97ca25a
Remove redundant sudo in add-docker-cache-config task
SwordFaith Feb 22, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions contrib/kubespray/config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,12 @@ docker_image_tag: v1.5.0
# OpenPAI Customized Settings #
#######################################################################
# enable_hived_scheduler: true
# enable_docker_cache: false
# docker_cache_azure_account_name: ""
SwordFaith marked this conversation as resolved.
Show resolved Hide resolved
# docker_cache_azure_account_key: ""
# docker_cache_azure_container_name: "dockerregistry"
# docker_cache_remote_url: "https://registry-1.docker.io"
# docker_cache_htpasswd: ""
# enable_marketplace: false

#############################################
Expand Down
11 changes: 11 additions & 0 deletions contrib/kubespray/docker-cache-dist.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
- hosts: all
become: true
become_user: root
gather_facts: true
roles:
- { role: '../roles/docker-cache/install', enable_docker_cache: true }
tasks:
- name: Restart service docker config from /etc/docker/daemon.json after update
ansible.builtin.systemd:
name: docker
state: reloaded
10 changes: 10 additions & 0 deletions contrib/kubespray/quick-start/services-configuration.yaml.template
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,16 @@ cluster:
# Must be lower case, e.g., regsecret.
secret-name: pai-secret

{% if env["cfg"]["enable_docker_cache"]|default(false) is sameas true %}
docker-cache:
enabled: {{ env["cfg"]["enable_docker_cache"] }}
SwordFaith marked this conversation as resolved.
Show resolved Hide resolved
azure_account_name: {{ env["docker_cache"]["azure_account_name"] }}
azure_account_key: {{ env["docker_cache"]["azure_account_key"] }}
azure_container_name: {{ env["docker_cache"]["azure_container_name"] }}
remote_url: {{ env["docker_cache"]["remote_url"] }}
registry-htpasswd: {{ env["docker_cache"]["registry_htpasswd"] }}
{% else %}
{% endif %}
suiguoxin marked this conversation as resolved.
Show resolved Hide resolved

rest-server:
# # launcher type. k8s or yarn
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
enable_docker_cache: false
debuggy marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"storage-driver": "overlay2",
"registry-mirrors": [
"http://localhost:30500"
debuggy marked this conversation as resolved.
Show resolved Hide resolved
],
"insecure-registries": [
"http://localhost:30500"
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"storage-driver": "overlay2"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@

---
- name: create docker in /etc
file:
path: /etc/docker
state: directory
recurse: yes

- name: copy default runtime configuration file into /etc/docker
copy:
src: daemon-openpai-default-runtime-docker-cache.json
dest: /etc/docker/daemon.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
- name: create docker in /etc
file:
path: /etc/docker
state: directory
recurse: yes

- name: copy default runtime configuration file into /etc/docker
copy:
src: daemon-openpai-default-runtime.json
dest: /etc/docker/daemon.json
10 changes: 10 additions & 0 deletions contrib/kubespray/roles/docker-cache/install/tasks/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
- name: "Nvidia-smi is not detected, include default docker runtime task"
include_tasks: default-runtime.yml
when:
- not enable_docker_cache

- name: "Nvidia-smi is not detected, docker-cache enabled, include default docker runtime task"
include_tasks: default-runtime-with-docker-cache.yml
when:
- enable_docker_cache
Original file line number Diff line number Diff line change
@@ -1 +1 @@
install_run_time: true
install_run_time: false
66 changes: 66 additions & 0 deletions contrib/kubespray/script/k8s_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,54 @@
logger = get_logger(__name__)


def get_docker_cache_config_and_mirrors(layout, cluster_config):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function can be removed. During installation, using config.yml + services-configuration.yaml.template is enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

"""
generate hived config from layout.yaml and config.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments are incorrect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Resources (gpu/cpu/mem) specified in layout.yaml is considered as the total resources.

Parameters:
-----------
layout: dict
layout
cluster_config: dict
cluster config

Returns:
--------
dict, list
docker-cache mirrors, used to render docker-cache mirrors template
Example:
{
"azure_account_name": "",
"azure_account_key": "",
"azure_container_name": "dockerregistry",
"remote_url": "",
"registry-htpasswd": "",
}, [mirror_list]
"""
pai_master_ips = []
for machine in layout['machine-list']:
if 'pai-master' in machine and machine['pai-master'] == 'true':
pai_master_ips.append(machine['hostip'])
docker_cache_mirrors = ["http://{}:30500".format(ip) for ip in pai_master_ips]

if "docker_cache_azure_container_name" not in cluster_config:
cluster_config['docker_cache_azure_container_name'] = "dockerregistry"
if "docker_cache_remote_url" not in cluster_config:
cluster_config['docker_cache_remote_url'] = "https://registry-1.docker.io"
if "docker_cache_htpasswd" in cluster_config:
cluster_config["docker_cache_htpasswd"] = ""
docker_cache_config = {
"azure_account_name": cluster_config['docker_cache_azure_account_name'],
"azure_account_key": cluster_config['docker_cache_azure_account_key'],
"azure_container_name": cluster_config['docker_cache_azure_container_name'],
"remote_url": cluster_config['docker_cache_remote_url'],
"registry_htpasswd": cluster_config['docker_cache_htpasswd'],
}

return docker_cache_config, docker_cache_mirrors


def main():
parser = argparse.ArgumentParser()
parser.add_argument('-l', '--layout', dest="layout", required=True,
Expand Down Expand Up @@ -46,6 +94,24 @@ def main():
logger.warning("https://docs.projectcalico.org/reference/public-cloud/azure#why-doesnt-azure-support-calico-networking")
sys.exit(1)

# Docker-cache is disabled by default.
# But if the user sets enable_hived_scheduler to true manually,
# we should enable it.
if 'enable_docker_cache' in cluster_config and cluster_config['enable_docker_cache'] is True:
_, docker_cache_mirrors = get_docker_cache_config_and_mirrors(layout, cluster_config)
else:
_, docker_cache_mirrors = {}, []
cluster_config['enable_docker_cache'] = False

if "openpai_docker_registry_mirrors" in cluster_config:
cluster_config["openpai_docker_registry_mirrors"] += docker_cache_mirrors
else:
cluster_config["openpai_docker_registry_mirrors"] = docker_cache_mirrors
if "openpai_docker_insecure_registries" in cluster_config:
cluster_config["openpai_docker_insecure_registries"] += docker_cache_mirrors
else:
cluster_config["openpai_docker_insecure_registries"] = docker_cache_mirrors

environment = {
'masters': masters,
'workers': workers,
Expand Down
69 changes: 68 additions & 1 deletion contrib/kubespray/script/openpai_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,54 @@ def get_hived_config(layout, cluster_config):
return { "skus": skus }


def get_docker_cache_config_and_mirrors(layout, cluster_config):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function can be removed. During installation, using config.yml + services-configuration.yaml.template is enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

"""
generate hived config from layout.yaml and config.yaml
Resources (gpu/cpu/mem) specified in layout.yaml is considered as the total resources.

Parameters:
-----------
layout: dict
layout
cluster_config: dict
cluster config

Returns:
--------
dict, list
docker-cache mirrors, used to render docker-cache mirrors template
Example:
{
"azure_account_name": "",
"azure_account_key": "",
"azure_container_name": "dockerregistry",
"remote_url": "",
"registry-htpasswd": "",
}, [mirror_list]
"""
pai_master_ips = []
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change with storage backend type

for machine in layout['machine-list']:
if 'pai-master' in machine and machine['pai-master'] == 'true':
pai_master_ips.append(machine['hostip'])
docker_cache_mirrors = ["http://{}:30500".format(ip) for ip in pai_master_ips]

if "docker_cache_azure_container_name" not in cluster_config:
cluster_config['docker_cache_azure_container_name'] = "dockerregistry"
if "docker_cache_remote_url" not in cluster_config:
cluster_config['docker_cache_remote_url'] = "https://registry-1.docker.io"
if "docker_cache_htpasswd" in cluster_config:
cluster_config["docker_cache_htpasswd"] = ""
docker_cache_config = {
"azure_account_name": cluster_config['docker_cache_azure_account_name'],
"azure_account_key": cluster_config['docker_cache_azure_account_key'],
"azure_container_name": cluster_config['docker_cache_azure_container_name'],
"remote_url": cluster_config['docker_cache_remote_url'],
"registry_htpasswd": cluster_config['docker_cache_htpasswd'],
}

return docker_cache_config, docker_cache_mirrors


def main():
parser = argparse.ArgumentParser()
parser.add_argument('-l', '--layout', dest="layout", required=True,
Expand All @@ -258,12 +306,31 @@ def main():
else:
hived_config = get_hived_config(layout, cluster_config)

# Docker-cache is disabled by default.
# But if the user sets enable_hived_scheduler to true manually,
# we should enable it.
if 'enable_docker_cache' in cluster_config and cluster_config['enable_docker_cache'] is True:
docker_cache_config, docker_cache_mirrors = get_docker_cache_config_and_mirrors(layout, cluster_config)
else:
docker_cache_config, docker_cache_mirrors = {}, []
cluster_config['enable_docker_cache'] = False

if "openpai_docker_registry_mirrors" in cluster_config:
cluster_config["openpai_docker_registry_mirrors"] += docker_cache_mirrors
else:
cluster_config["openpai_docker_registry_mirrors"] = docker_cache_mirrors
if "openpai_docker_insecure_registries" in cluster_config:
cluster_config["openpai_docker_insecure_registries"] += docker_cache_mirrors
else:
cluster_config["openpai_docker_insecure_registries"] = docker_cache_mirrors

environment = {
'masters': masters,
'workers': workers,
'cfg': cluster_config,
'head_node': head_node,
'hived': hived_config
'hived': hived_config,
"docker_cache": docker_cache_config,
}

map_table = {
Expand Down
1 change: 1 addition & 0 deletions src/device-plugin/deploy/service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ cluster-type:

prerequisite:
- cluster-configuration
- docker-cache

template-list:
- start.sh
Expand Down
13 changes: 13 additions & 0 deletions src/docker-cache/config/docker-cache.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

enableed: false
azure_account_name: ""
SwordFaith marked this conversation as resolved.
Show resolved Hide resolved
azure_account_key: ""
azure_container_name: dockerregistry
registry_listener: ":5000"
container_port: 5000
service_port: 5000
service_nodeport: 30500
remote_url: https://registry-1.docker.io
registry-htpasswd: dGVzdDokMnkkMDUkRUZiaWphaHovMHl4UC5xMFk1VW52TzljU2hHMThCRzM3QzBHNFhoRmFtTXdTUXdQLjBqQi4KCg==
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a default passwd? What is the exact meaning of this string?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's test test processed by htpasswd tool and base64 encoded, commented out in new commit.

26 changes: 26 additions & 0 deletions src/docker-cache/config/docker_cache.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

import copy

class DockerCache(object):
def __init__(self, cluster_conf, service_conf, default_service_conf):
self.cluster_conf = cluster_conf
self.service_conf = dict(default_service_conf, **service_conf)

def validation_pre(self):
machine_list = self.cluster_conf['machine-list']
if len([host for host in machine_list if host.get('pai-master') == 'true']) < 1:
return False, '"pai-master=true" machine is required to deploy the docker-cache service'
return True, None

def run(self):
result = copy.deepcopy(self.service_conf)
machine_list = self.cluster_conf['machine-list']
server_port = self.service_conf['service_nodeport']
master_ip = [host['hostip'] for host in machine_list if host.get('pai-master') == 'true'][0]
result['uri'] = 'http://{0}:{1}'.format(master_ip, server_port)
return result

def validation_post(self, conf):
return True, None
10 changes: 10 additions & 0 deletions src/docker-cache/deploy/delete.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash

# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

pushd $(dirname "$0") > /dev/null

/bin/bash stop.sh || exit $?

popd > /dev/null
39 changes: 39 additions & 0 deletions src/docker-cache/deploy/docker-cache-config.yaml.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License

apiVersion: v1
kind: ConfigMap
metadata:
name: registry-config
namespace: default
data:
config.yml: |
version: 0.1
log:
fields:
service: registry
storage:
cache:
blobdescriptor: inmemory
delete:
enabled: true
azure:
SwordFaith marked this conversation as resolved.
Show resolved Hide resolved
accountname: {{ cluster_cfg["docker-cache"]["azure_account_name"] }}
accountkey: {{ cluster_cfg["docker-cache"]["azure_account_key"] }}
container: {{ cluster_cfg["docker-cache"]["azure_container_name"] }}
# realm: core.windows.net
http:
addr: {{ cluster_cfg["docker-cache"]["registry_listener"] }}
headers:
X-Content-Type-Options: [nosniff]
# auth:
# htpasswd:
# realm: basic-realm
# path: /auth/htpasswd # /etc/registry
proxy:
remoteurl: {{ cluster_cfg["docker-cache"]["remote_url"] }}
health:
storagedriver:
enabled: true
interval: 10s
threshold: 3
11 changes: 11 additions & 0 deletions src/docker-cache/deploy/docker-cache-secret.yaml.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License

apiVersion: v1
kind: Secret
metadata:
name: registry-htpasswd
namespace: default
data:
htpasswd: | # test test as default, to generate htpasswd please refer to README
{{ cluster_cfg["docker-cache"]["registry-htpasswd"] }}
Loading