Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow IO performance inside container compared with the host. #21485

Closed
alkmim opened this issue Mar 24, 2016 · 62 comments
Closed

Slow IO performance inside container compared with the host. #21485

alkmim opened this issue Mar 24, 2016 · 62 comments

Comments

@alkmim
Copy link

alkmim commented Mar 24, 2016

Output of docker version:

docker version
Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5
 Built:
 OS/Arch:      linux/amd64

Output of docker info:

docker info
Containers: 2
Images: 3
Server Version: 1.9.1
Storage Driver: devicemapper
 Pool Name: docker-254:1-4458480-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem:
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 2.52 GB
 Data Space Total: 107.4 GB
 Data Space Available: 96.38 GB
 Metadata Space Used: 2.081 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.145 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.03.01 (2011-10-15)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.16.6-2-desktop
Operating System: openSUSE 13.2 (Harlequin) (x86_64)
CPUs: 4
Total Memory: 7.722 GiB
Name: gustavo-host
ID: MRRI:5WIP:JOYH:4KZT:BVMU:HMMR:4BL6:6NKP:VM5H:36AN:6LFR:YHK7
WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):

  • Physical Environment (8GB RAM, Core i5-4590, ext4)

Steps to reproduce the issue:

  1. Install openSuSE 13.2
  2. Install docker
  3. run Iometer benchmark on the host
  4. run Iometer benchmark on a container based on SuSE. A data volume was used as the partition were to execute the benchmark.
  5. Compare results

Describe the results you received:
Below is a table showing the results. For all tests, the performance inside the container was around 40% of the performance of the host.
image

Describe the results you expected:
I expected the performance of the container be closer to the host.

@MHBauer
Copy link
Contributor

MHBauer commented Mar 24, 2016

Is this above the performance degradation described in the docs?

@HackToday
Copy link
Contributor

hi @alkmim it seems you are using loop device not real device, is it ? loop device is slow

@thaJeztah
Copy link
Member

@alkmim with "A data volume was used as the partition were to execute the benchmark." do you mean, that a volume is used, e.g. -v /some/path ?

@alkmim
Copy link
Author

alkmim commented Mar 28, 2016

Hi.

@thaJeztah: Yes.
@MHBauer: The docs do not specify how much will be the performance degradation.
@HackToday: Despite I'm not using a real device for the LVM, the benchmarck was executed inside a data volume. According to the docs: "One final point, data volumes provide the best and most predictable performance. This is because they bypass the storage driver and do not incur any of the potential overheads introduced by thin provisioning and copy-on-write. For this reason, you should to place heavy write workloads on data volumes."

Considering I am using data volume (-v /some/path) to execute the benchmark, a performance degradation of 60% is too much.

@thaJeztah
Copy link
Member

@alkmim what is the backing filesystem that /var/lib/docker is on? I see docker is unable to detect it (Backing Filesystem:)

@alkmim
Copy link
Author

alkmim commented Mar 29, 2016

@thaJeztah: It is an ext4 on an lvm. I'm not sure why docker did not detect it.
Output of mount:

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=4039308k,nr_inodes=1009827,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
/dev/mapper/system-root on / type ext4 (rw,relatime,data=ordered)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
gvfsd-fuse on /run/user/1000/gvfs type fuse.gvfsd-fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=100)

@unclejack
Copy link
Contributor

@alkmim Please provide the exact commands you've used to run the container and the benchmarks.

@alkmim
Copy link
Author

alkmim commented Apr 26, 2016

Hello @unclejack

The container was started running: docker run -v /datavolume:/datavolume opensuse /bin/bash

The benchmark used was the iometer.
Command to start server (windows): double-click on the iometer application.
Command to start the target (linux being tested): ./dynamo -i server_ip -m target_ip
A good tutorial about Iometer can be found here: http://greg.porter.name/wiki/HowTo:iometer
Iometer configuration file is attach.

Iometer.zip

@unclejack
Copy link
Contributor

unclejack commented Apr 30, 2016

@alkmim You're using devicemapper with loopback mounted block devices. This is known to have poor performance and it's also potentially unsafe.

You've mentioned that bind mounts were being used with -v, but there's no path argument in any of the commands you've mentioned. The config you've provided also seems to mention Target / [ext4] as the path it's going to use for testing. This root directory is found on the devicemapper block device created for the container on the loopback mounted devicemapper block device. Your test was actually testing the performance of this storage, not the bind mounted directory from the host, based on what you've provided as configuration for iometer.

Loopback mounted block devices have poor performance. This is something to be expected. Docker makes use of the exact same loopback mounted block devices by default for devicemapper. There's nothing Docker itself can do to gain back the loss in performance when using the loopback mounted storage over using the host's disk directly.

You might want to take a look at the official docs on storage drivers to figure out how to get a setup which uses devicemapper on real block devices. You might also want to try to make use of the bind mounted storage for benchmarks.

There's nothing more to investigate for this issue. I'll close it now. Please feel free to comment.

@alkmim
Copy link
Author

alkmim commented May 2, 2016

@unclejack: I posted the wrong command and forgot to add the configuration file for the tests on the container. The attachment had just the configuration file for the execution on the host. This is the reason why you saw the tests running on the "/". I'm sorry for this mistake.

I did use the "-v" option followed by the path. The container was started running:
docker run -v /datavolume:/build opensuse /bin/bash

The configuration file I sent was for the test on the host. I forgot to send you the configuration file for the test on the container. I'm sorry for this. Attached are both configuration files. I just changed the "Test Description" field to protect confidential information.

Sorry for the confusion. Could you please re-open the issue?

IometerConfFiles.zip

@unclejack unclejack reopened this May 2, 2016
@rodrigooshiro
Copy link

I am also noticing that when my containers writes data on my host with "docker run -v /workspace:/home/workspace ...", etc. it is so much slower than when I run my services directly on my host.

Is this bug being fixed on docker already?

@rodrigooshiro
Copy link

rodrigooshiro commented May 17, 2016

This does not happen only on docker version 1.9.1 as the original author has posted, it's on 1.11.1 as well... It's still very easy to replicate the results on my environment and to verify that the performance can be improved on docker vs host. Here is the information from my environment to compare this:

  1. Docker's Info:
# docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.11.1
Storage Driver: btrfs
 Build Version: Btrfs v3.16.2+20141003
 Library Version: 101
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null host
Kernel Version: 3.16.7-35-default
Operating System: openSUSE 13.2 (Harlequin) (x86_64)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 7.814 GiB
ID: PVBJ:ORUD:ERVF:CO3L:GBQU:LQQ3:J6YL:KWKF:E353:7E3N:RLQE:2NTV
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
WARNING: No kernel memory limit support
  1. Running the test 3 times on my host:
# time dd if=/dev/zero of=/workspace/test_host1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 4.02334 s, 127 kB/s

real    0m4.025s
user    0m0.000s
sys     0m0.588s
# time dd if=/dev/zero of=/workspace/test_host2.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 4.15106 s, 123 kB/s

real    0m4.153s
user    0m0.008s
sys     0m0.140s
# time dd if=/dev/zero of=/workspace/test_host3.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 3.67524 s, 139 kB/s

real    0m3.677s
user    0m0.008s
sys     0m0.140s
  1. Running the test 3 times on my container:
# docker run --rm --net=host -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 18.7555 s, 27.3 kB/s

real    0m18.761s
user    0m0.000s
sys     0m0.096s
# docker run --rm --net=host -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/test_container2.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 17.9839 s, 28.5 kB/s

real    0m17.985s
user    0m0.004s
sys     0m0.092s

# docker run --rm --net=host -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/test_container3.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 17.4889 s, 29.3 kB/s

real    0m17.490s
user    0m0.024s
sys     0m0.060s
  1. List of files created on my workspace
# ls -l
total 2102232
-rw-r--r-- 1 root root     512000 May 17 09:24 test_container1.img
-rw-r--r-- 1 root root     512000 May 17 09:25 test_container2.img
-rw-r--r-- 1 root root     512000 May 17 09:24 test_container3.img
-rw-r--r-- 1 root root     512000 May 17 09:23 test_host1.img
-rw-r--r-- 1 root root     512000 May 17 09:23 test_host2.img
-rw-r--r-- 1 root root     512000 May 17 09:23 test_host3.img

@rodrigooshiro
Copy link

My kernel is: 3.16.7-35
My docker is: 1.11.1-107.1

As from the tests above, the difference between 100kB/s to 30kB/s impacts my applications when I use docker containers. I have installed all the latest components available to test this.

@thaJeztah
Copy link
Member

I don't see these differences (although timing between runs can differ quite a bit);

On the host;

[root@fedora-2gb-ams3-01 ~]# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.11208 s, 460 kB/s

real    0m1.114s
user    0m0.000s
sys 0m0.075s
[root@fedora-2gb-ams3-01 ~]# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.05731 s, 484 kB/s

real    0m1.059s
user    0m0.003s
sys 0m0.071s
[root@fedora-2gb-ams3-01 ~]# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.743486 s, 689 kB/s

real    0m0.745s
user    0m0.005s
sys 0m0.048s

In a container (I added "read-only", to verify that nothing is written to the container's filesystem);

[root@fedora-2gb-ams3-01 ~]# docker run --rm --read-only -it --net=host -v "/workspace:/workspace" opensuse bash


bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.789156 s, 649 kB/s

real    0m0.790s
user    0m0.000s
sys 0m0.048s
bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.901782 s, 568 kB/s

real    0m0.903s
user    0m0.006s
sys 0m0.048s

bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.04446 s, 490 kB/s

real    0m1.047s
user    0m0.002s
sys 0m0.071s

@rodrigooshiro
Copy link

Running exactly like you did it still slow:

rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync

1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 3.56118 s, 144 kB/s

real 0m3.562s
user 0m0.000s
sys 0m0.140s

rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync

1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 3.26732 s, 157 kB/s

real 0m3.268s
user 0m0.000s
sys 0m0.148s

rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync

1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 4.57667 s, 112 kB/s

real 0m4.578s
user 0m0.000s
sys 0m0.152s

docker run --rm --read-only -it --net=host -v "/workspace:/workspace" opensuse bash

bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 19.3217 s, 26.5 kB/s

real 0m19.324s
user 0m0.000s
sys 0m0.104s

bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 17.2857 s, 29.6 kB/s

real 0m17.287s
user 0m0.000s
sys 0m0.092s

bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 17.348 s, 29.5 kB/s

real 0m17.349s
user 0m0.000s
sys 0m0.092s

@rodrigooshiro
Copy link

If I mount /workspace on a RAM (tmpfs) or use an SSD storage it gets better, but its expensive to have this for every application. What hardware/infrasctructure are you validating this?

@thaJeztah
Copy link
Member

This is simply a fresh install of docker on a DigitalOcean droplet, nothing fancy; Fedora 23 (was preparing to reproduce another issue), 2 GB Memory / 40 GB Disk

@rodrigooshiro
Copy link

That's why I mentioned that using RAM or SSD might get a different results. If I look at the plans offered by DigitalOceans, they are selling SSD cloud server plans...

Currently I am running virtual machines with regular disks and physical ones all running on openSUSE.

@thaJeztah
Copy link
Member

But SSD or not; it's about the difference between in a container and outside.

@cyphar can you reproduce this on OpenSUSE?

@rodrigooshiro
Copy link

That's why I am worried... Even if I setup it fine on my environment, its not like I could ask everybody to adopt this SSD disk solution in order to have the same hardware as I recommend... I am investigating this on Ubuntu as well just to make sure its not openSUSE's specific... But it also happens on SLES 12.

@rodrigooshiro
Copy link

rodrigooshiro commented May 17, 2016

So, on Ubuntu I got results closer to yours, @thaJeztah, thanks for looking... There was not much impact when running on a host vs a container.

Unfortunately there is no aufs driver on docker for openSUSE/SLES... Looks like something related with the storage driver, either btrfs I that am experiencing and devicemapper that @alkmim posted when this issue was reported.

Here is another VM (Ubuntu 14-04LTS) I have set on the same hardware as the openSUSE 13.2's:

  1. Environment
# docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.11.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 2
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: null host bridge
Kernel Version: 3.19.0-25-generic
Operating System: Ubuntu 14.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.954 GiB
ID: KVRL:JOHJ:FJLT:EWZ5:5QSK:LO3E:C52E:ROIX:GGPN:OGO2:PXEB:JWC6
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
  1. Tests
mkdir -p /docker_workspace
rm -f /docker_workspace/*

time dd if=/dev/zero of=/docker_workspace/test_host1.img bs=512 count=1000 oflag=dsync
test1: 512000 bytes (512 kB) copied, 5.3491 s, 95.7 kB/s
test2: 512000 bytes (512 kB) copied, 8.01828 s, 63.9 kB/s
test3: 512000 bytes (512 kB) copied, 8.3278 s, 61.5 kB/s

docker run --rm --net=host --read-only -v "/docker_workspace:/docker_workspace" opensuse bash -c "time dd if=/dev/zero of=/docker_workspace/test_container1.img bs=512 count=1000 oflag=dsync"
test1: 512000 bytes (512 kB) copied, 7.24514 s, 70.7 kB/s
test2: 512000 bytes (512 kB) copied, 8.99948 s, 56.9 kB/s
test3: 512000 bytes (512 kB) copied, 6.52957 s, 78.4 kB/s

@thaJeztah
Copy link
Member

There was not much impact when running on a host vs a container.

Basically, there should be no impact; when using a bind-mounted directory, or a volume, there's nothing between the process and the disk, it's just a mounted directory. The only thing that docker can do, is set a constraint (but these are disabled by default), such as;

--device-read-bps=[]          Limit read rate (bytes per second) from a device (e.g., --device-read-bps=/dev/sda:1mb)
--device-read-iops=[]         Limit read rate (IO per second) from a device (e.g., --device-read-iops=/dev/sda:1000)
--device-write-bps=[]         Limit write rate (bytes per second) to a device (e.g., --device-write-bps=/dev/sda:1mb)
--device-write-iops=[]        Limit write rate (IO per second) to a device (e.g., --device-write-bps=/dev/sda:1000) 

@rodrigooshiro
Copy link

There is nothing special with the directory I am using as a volume to share, its just a regular folder on the host. The only difference I could think of was the storage-driver that docker was using. So I am clueless then how to get this fixed on SuSE. I tried using the constrains, but there was no difference on the transfer rates:

  1. container running on opensuse 13.2's host:
# docker run --rm --net=host --device-read-bps=/dev/sda:1mb --device-write-bps=/dev/sda:1mb --device-read-iops=/dev/sda:1000 -v "/docker_workspace:/docker_workspace" opensuse bash -c "time dd if=/dev/zero of=/docker_workspace/test_container2.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 16.6024 s, 30.8 kB/s

real    0m16.604s
user    0m0.032s
sys     0m0.060s
  1. opensuse 13.2's host:
# time dd if=/dev/zero of=/docker_workspace/test_container2.img bs=512 count=1000 oflag=dsync                                                                                         1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 2.17687 s, 235 kB/s

real    0m2.179s
user    0m0.000s
sys     0m0.128s

Changing "--device-write-bps", etc. makes it hang, so I could not test this one.

@rodrigooshiro
Copy link

Mounting the shared folder (/workspace) on RAM makes it run instantaneously, but then again, it's an expensive resource like SSD. I'm still having performance issues when running applications on HDD storages... What puzzles me is how come docker, as just like a regular process writing on my folder could be so much slower?

# mount -t tmpfs -o size=1G tmpfs /workspace
# time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync                                                                                                                   1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.00166943 s, 307 MB/s

real    0m0.004s
user    0m0.000s
sys     0m0.000s

# docker run --rm --net=host -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.00108018 s, 474 MB/s

real    0m0.002s
user    0m0.000s
sys     0m0.000s

# umount /workspace

# docker run --rm --net=host -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 16.4395 s, 31.1 kB/s

real    0m16.445s
user    0m0.000s
sys     0m0.128s

# time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 2.10887 s, 243 kB/s

real    0m2.111s
user    0m0.000s
sys     0m0.128s

@cyphar
Copy link
Contributor

cyphar commented May 18, 2016

@ipeoshir I can't reproduce this on Tumbleweed. I believe it's probably a kernel issue, since Docker doesn't do anything special with bindmounts. Can you try to reproduce this on an openSUSE distribution with a newer kernel (for example openSUSE Leap or Tumbleweed)? openSUSE 13.2 has very old packages.

% uname -a
Linux gondor 4.5.3-1-default #1 SMP PREEMPT Thu May 5 05:03:39 UTC 2016 (d29747f) x86_64 x86_64 x86_64 GNU/Linux
% lsb_release -a
LSB Version:    n/a
Distributor ID: openSUSE project
Description:    openSUSE Tumbleweed (20160514) (x86_64)
Release:        20160514
Codename:       n/a
% docker run --rm --read-only -it --net=host -v /workspace:/workspace opensuse/amd64:tumbleweed sh -c "time dd if=/dev/zero of=/workspace/test bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 3.06674 s, 167 kB/s

real    0m3.068s
user    0m0.000s
sys     0m0.136s
% sh -c "time dd if=/dev/zero of=/workspace/test bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 3.07103 s, 167 kB/s

real    0m3.072s
user    0m0.000s
sys     0m0.164s

@cyphar
Copy link
Contributor

cyphar commented May 31, 2016

Can you run blktrace -d <the device> while the dd is running?

@cyphar
Copy link
Contributor

cyphar commented Jun 1, 2016

@ipeoshir I have some proposed fixes from our kernel team, which were mirrored in the internal ticket. Basically it boils down to three options that can help the performance:

  1. Switch IO scheduler on the underlying disk to 'deadline' - by that
    you'll completely lose propotional IO weighting between blkio cgroups and
    also some other features of CFQ IO scheduler. But it may work fine.
    You can do the switch by doing:

echo deadline >/sys/block/<device>/queue/scheduler

  1. A less drastic option - turn off CFQ scheduler idling by:

echo 0 >/sys/block/<device>/queue/iosched/slice_idle
echo 0 >/sys/block/<device>/queue/iosched/group_idle

After that CFQ IO scheduler will not wait before switching to serving
another process / blkio cgroup. So performance will not suffer when using
blkio cgroups but "IO hungry" cgroup / process can get disproportionate
amount of IO time compared to cgroup that does not have IO always ready.

  1. Switch the underlying filesystem to btrfs or XFS.

Using data=journal mode of ext4 as mentioned in <previous comment> has other performance implications (in general the performance is going to be much worse because all the writes happen twice - once to the journal and once to the final location on disk) so I would not consider that an ideal solution.

@rodrigooshiro
Copy link

@cyphar, thanks!
I will try to follow 1) and 2) and see if it will fit as solutions to my use cases. Suggestion 3) will not work, we already tried any filesystem out there to have the performance improved but to no avail. I will see if I can have 980615 updated with that in mind, the issue is about the performance in general. The "dd" example was one point we could find that was requiring a fix.

@rodrigooshiro
Copy link

Suggestions for:

  1. works for "dd", I switched to noop, deadline and cfq and could see that only cfq reduces the transfer rates.
  2. it only works when its on cfq, otherwise it outputs "Permission denied", but it has the same results.

However the overall performance is still slower, I can noticed that as described in #23137.

In the bug id 980615 we expect to fix the performance in general, I believe the root cause is not addressed in this case. Perhaps we can update it to not mention "ext3 and ext4 journaled filesystems", but then we would have to open another bugzilla anyway... So let's try to fix the root cause and figure out why it's impacting docker on SuSE..

@cyphar
Copy link
Contributor

cyphar commented Jun 1, 2016

The problem @ipeoshir is that the only solution from the Docker (runC + libcontainer) side is non-trivial (I describe it in opencontainers/runc#861). And from the kernel side as far as I understand it's a fundamental design of how the CFQ IO scheduler works when weighting different cgroups (this is an upstream kernel thing). I'm not sure why it doesn't appear to happen on Ubuntu, but I'm going to look at it to see why that's the case.

To be clear, the filesystem isn't the cause. It's because of how IO scheduling works in the kernel.

@rodrigooshiro
Copy link

Well, we better wait for a good fix then. I have a more detailed spreasheet with the same use case running on ext2 on containers, ext3 on containers and ext3 on host and during a certain step (package installation) the time measure was this: 6812, 7206, 887 respectively, so it's really a bottleneck to proceed using docker.

@rodrigooshiro
Copy link

@cyphar, @flavio, I added more detailed information on: bugzilla number 983015.

Tracing the versions of docker that can be installed on SLES12, this bug first appeared in "docker-1.5.0-23.1" while version "docker-1.5.0-20.1" and older had no issue with performance lags.

Installing just 34 packages on containers is taking "1m12.571s" (1.5.0-23.1 and newer) against "0m3.614s" (1.5.0-20.1 and older) when this issue is not there. Version 1.5.0-20.1 has the same performance when running on the host, so there could be some way to fix this on docker as this was working fine at some point.

@cyphar
Copy link
Contributor

cyphar commented Jun 3, 2016

This appears to be something we changed in the SUSE package (my guess is that we backported a patch that caused this). I'm trying to pin down what revision caused this.

@rodrigooshiro
Copy link

Not sure if it was a patch... I downloaded the binaries directly from docker, extracted them and it also had the issue.

@cyphar
Copy link
Contributor

cyphar commented Jun 3, 2016

@ipeoshir But the -23.1 and -20.1 are the SLE package versions right? You didn't take them from somewhere else?

@rodrigooshiro
Copy link

rodrigooshiro commented Jun 3, 2016

Yes, on the bugzilla I used solely packages from SLES12 to get the results for you.

But to test the binaries I downloaded from:

** And replaced the RPM binary (in this case it was an openSUSE 13.2) with those non-patched binaries, just to make sure it was not something from the distribution... So it's probably on docker's source code. In there (openSUSE) we can detect the latency problems between 1.5.0-21.1 and 1.6.0-25.1 available on the update repo (http://download.opensuse.org/repositories/openSUSE:/13.2:/Update/standard).

On SLES12 the difference is on those two package versions I pointed out...

@cyphar
Copy link
Contributor

cyphar commented Jul 8, 2016

This issue resulted in #24307 (which fixes the odd regression between 1.5.0 packages). In addition, we discovered that the reason that Ubuntu doesn't have this problem is because Ubuntu uses the deadline IO scheduler by default (SUSE distributions use CFQ), which doesn't suffer as badly from the blkio cgroup performance issue.

In any case, this issue can be closed (everything has been done on the Docker side that is possible).

/cc @thaJeztah

@LK4D4
Copy link
Contributor

LK4D4 commented Jul 8, 2016

@cyphar yeah, and this issue already is too big. Thanks for investigation and fix!
@alkmim @ipeoshir feel free to report another issue if problem is still here with docker master.

@LK4D4 LK4D4 closed this as completed Jul 8, 2016
@thaJeztah
Copy link
Member

thanks @cyphar, and thanks for doing the research

@mimizone
Copy link

Why the test outside the container/cgroup is not impacted by the bug/slow performance? Isn't the I/O scheduler always in the data path, cgroup or not?
like described on this great diagram https://www.thomas-krenn.com/en/wiki/Linux_Storage_Stack_Diagram#Diagram_for_Linux_Kernel_4.0

@cyphar
Copy link
Contributor

cyphar commented Sep 29, 2016

@mimizone According to the kernel guys at SUSE, the reason is that the CFQ IO scheduler will add latency to requests in order to make sure that two racing cgroups don't starve one another. I don't fully understand the code behind it, but that's what they told me and the experimental results back this up (deadline scheduler would never dream of adding latency).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests