Slow IO performance inside container compared with the host. #21485

alkmim · 2016-03-24T18:21:56Z

Output of docker version:

docker version
Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   a34a1d5
 Built:
 OS/Arch:      linux/amd64

Output of docker info:

docker info
Containers: 2
Images: 3
Server Version: 1.9.1
Storage Driver: devicemapper
 Pool Name: docker-254:1-4458480-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem:
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 2.52 GB
 Data Space Total: 107.4 GB
 Data Space Available: 96.38 GB
 Metadata Space Used: 2.081 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.145 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.03.01 (2011-10-15)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.16.6-2-desktop
Operating System: openSUSE 13.2 (Harlequin) (x86_64)
CPUs: 4
Total Memory: 7.722 GiB
Name: gustavo-host
ID: MRRI:5WIP:JOYH:4KZT:BVMU:HMMR:4BL6:6NKP:VM5H:36AN:6LFR:YHK7
WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):

Physical Environment (8GB RAM, Core i5-4590, ext4)

Steps to reproduce the issue:

Install openSuSE 13.2
Install docker
run Iometer benchmark on the host
run Iometer benchmark on a container based on SuSE. A data volume was used as the partition were to execute the benchmark.
Compare results

Describe the results you received:
Below is a table showing the results. For all tests, the performance inside the container was around 40% of the performance of the host.

Describe the results you expected:
I expected the performance of the container be closer to the host.

The text was updated successfully, but these errors were encountered:

MHBauer · 2016-03-24T22:50:43Z

Is this above the performance degradation described in the docs?

HackToday · 2016-03-25T01:02:42Z

hi @alkmim it seems you are using loop device not real device, is it ? loop device is slow

thaJeztah · 2016-03-26T02:25:36Z

@alkmim with "A data volume was used as the partition were to execute the benchmark." do you mean, that a volume is used, e.g. -v /some/path ?

alkmim · 2016-03-28T12:23:55Z

Hi.

@thaJeztah: Yes.
@MHBauer: The docs do not specify how much will be the performance degradation.
@HackToday: Despite I'm not using a real device for the LVM, the benchmarck was executed inside a data volume. According to the docs: "One final point, data volumes provide the best and most predictable performance. This is because they bypass the storage driver and do not incur any of the potential overheads introduced by thin provisioning and copy-on-write. For this reason, you should to place heavy write workloads on data volumes."

Considering I am using data volume (-v /some/path) to execute the benchmark, a performance degradation of 60% is too much.

thaJeztah · 2016-03-28T21:37:24Z

@alkmim what is the backing filesystem that /var/lib/docker is on? I see docker is unable to detect it (Backing Filesystem:)

alkmim · 2016-03-29T11:22:33Z

@thaJeztah: It is an ext4 on an lvm. I'm not sure why docker did not detect it.
Output of mount:

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=4039308k,nr_inodes=1009827,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
/dev/mapper/system-root on / type ext4 (rw,relatime,data=ordered)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
gvfsd-fuse on /run/user/1000/gvfs type fuse.gvfsd-fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=100)

unclejack · 2016-04-11T19:25:46Z

@alkmim Please provide the exact commands you've used to run the container and the benchmarks.

alkmim · 2016-04-26T17:48:23Z

Hello @unclejack

The container was started running: docker run -v /datavolume:/datavolume opensuse /bin/bash

The benchmark used was the iometer.
Command to start server (windows): double-click on the iometer application.
Command to start the target (linux being tested): ./dynamo -i server_ip -m target_ip
A good tutorial about Iometer can be found here: http://greg.porter.name/wiki/HowTo:iometer
Iometer configuration file is attach.

Iometer.zip

unclejack · 2016-04-30T15:07:59Z

@alkmim You're using devicemapper with loopback mounted block devices. This is known to have poor performance and it's also potentially unsafe.

You've mentioned that bind mounts were being used with -v, but there's no path argument in any of the commands you've mentioned. The config you've provided also seems to mention Target / [ext4] as the path it's going to use for testing. This root directory is found on the devicemapper block device created for the container on the loopback mounted devicemapper block device. Your test was actually testing the performance of this storage, not the bind mounted directory from the host, based on what you've provided as configuration for iometer.

Loopback mounted block devices have poor performance. This is something to be expected. Docker makes use of the exact same loopback mounted block devices by default for devicemapper. There's nothing Docker itself can do to gain back the loss in performance when using the loopback mounted storage over using the host's disk directly.

You might want to take a look at the official docs on storage drivers to figure out how to get a setup which uses devicemapper on real block devices. You might also want to try to make use of the bind mounted storage for benchmarks.

There's nothing more to investigate for this issue. I'll close it now. Please feel free to comment.

alkmim · 2016-05-02T12:09:27Z

@unclejack: I posted the wrong command and forgot to add the configuration file for the tests on the container. The attachment had just the configuration file for the execution on the host. This is the reason why you saw the tests running on the "/". I'm sorry for this mistake.

I did use the "-v" option followed by the path. The container was started running:
docker run -v /datavolume:/build opensuse /bin/bash

The configuration file I sent was for the test on the host. I forgot to send you the configuration file for the test on the container. I'm sorry for this. Attached are both configuration files. I just changed the "Test Description" field to protect confidential information.

Sorry for the confusion. Could you please re-open the issue?

IometerConfFiles.zip

rodrigooshiro · 2016-05-11T19:53:26Z

I am also noticing that when my containers writes data on my host with "docker run -v /workspace:/home/workspace ...", etc. it is so much slower than when I run my services directly on my host.

Is this bug being fixed on docker already?

rodrigooshiro · 2016-05-17T13:12:28Z

This does not happen only on docker version 1.9.1 as the original author has posted, it's on 1.11.1 as well... It's still very easy to replicate the results on my environment and to verify that the performance can be improved on docker vs host. Here is the information from my environment to compare this:

Docker's Info:

# docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.11.1
Storage Driver: btrfs
 Build Version: Btrfs v3.16.2+20141003
 Library Version: 101
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null host
Kernel Version: 3.16.7-35-default
Operating System: openSUSE 13.2 (Harlequin) (x86_64)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 7.814 GiB
ID: PVBJ:ORUD:ERVF:CO3L:GBQU:LQQ3:J6YL:KWKF:E353:7E3N:RLQE:2NTV
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
WARNING: No kernel memory limit support

Running the test 3 times on my host:

# time dd if=/dev/zero of=/workspace/test_host1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 4.02334 s, 127 kB/s

real    0m4.025s
user    0m0.000s
sys     0m0.588s

# time dd if=/dev/zero of=/workspace/test_host2.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 4.15106 s, 123 kB/s

real    0m4.153s
user    0m0.008s
sys     0m0.140s

# time dd if=/dev/zero of=/workspace/test_host3.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 3.67524 s, 139 kB/s

real    0m3.677s
user    0m0.008s
sys     0m0.140s

Running the test 3 times on my container:

# docker run --rm --net=host -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 18.7555 s, 27.3 kB/s

real    0m18.761s
user    0m0.000s
sys     0m0.096s

# docker run --rm --net=host -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/test_container2.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 17.9839 s, 28.5 kB/s

real    0m17.985s
user    0m0.004s
sys     0m0.092s

# docker run --rm --net=host -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/test_container3.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 17.4889 s, 29.3 kB/s

real    0m17.490s
user    0m0.024s
sys     0m0.060s

List of files created on my workspace

# ls -l
total 2102232
-rw-r--r-- 1 root root     512000 May 17 09:24 test_container1.img
-rw-r--r-- 1 root root     512000 May 17 09:25 test_container2.img
-rw-r--r-- 1 root root     512000 May 17 09:24 test_container3.img
-rw-r--r-- 1 root root     512000 May 17 09:23 test_host1.img
-rw-r--r-- 1 root root     512000 May 17 09:23 test_host2.img
-rw-r--r-- 1 root root     512000 May 17 09:23 test_host3.img

rodrigooshiro · 2016-05-17T13:17:24Z

My kernel is: 3.16.7-35
My docker is: 1.11.1-107.1

As from the tests above, the difference between 100kB/s to 30kB/s impacts my applications when I use docker containers. I have installed all the latest components available to test this.

thaJeztah · 2016-05-17T13:51:25Z

I don't see these differences (although timing between runs can differ quite a bit);

On the host;

[root@fedora-2gb-ams3-01 ~]# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.11208 s, 460 kB/s

real    0m1.114s
user    0m0.000s
sys 0m0.075s
[root@fedora-2gb-ams3-01 ~]# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.05731 s, 484 kB/s

real    0m1.059s
user    0m0.003s
sys 0m0.071s
[root@fedora-2gb-ams3-01 ~]# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.743486 s, 689 kB/s

real    0m0.745s
user    0m0.005s
sys 0m0.048s

In a container (I added "read-only", to verify that nothing is written to the container's filesystem);

[root@fedora-2gb-ams3-01 ~]# docker run --rm --read-only -it --net=host -v "/workspace:/workspace" opensuse bash


bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.789156 s, 649 kB/s

real    0m0.790s
user    0m0.000s
sys 0m0.048s
bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.901782 s, 568 kB/s

real    0m0.903s
user    0m0.006s
sys 0m0.048s

bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 1.04446 s, 490 kB/s

real    0m1.047s
user    0m0.002s
sys 0m0.071s

rodrigooshiro · 2016-05-17T13:58:23Z

Running exactly like you did it still slow:

rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync

1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 3.56118 s, 144 kB/s

real 0m3.562s
user 0m0.000s
sys 0m0.140s

rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync

1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 3.26732 s, 157 kB/s

real 0m3.268s
user 0m0.000s
sys 0m0.148s

rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync

1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 4.57667 s, 112 kB/s

real 0m4.578s
user 0m0.000s
sys 0m0.152s

docker run --rm --read-only -it --net=host -v "/workspace:/workspace" opensuse bash

bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 19.3217 s, 26.5 kB/s

real 0m19.324s
user 0m0.000s
sys 0m0.104s

bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 17.2857 s, 29.6 kB/s

real 0m17.287s
user 0m0.000s
sys 0m0.092s

bash-4.2# rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 17.348 s, 29.5 kB/s

real 0m17.349s
user 0m0.000s
sys 0m0.092s

rodrigooshiro · 2016-05-17T14:00:31Z

If I mount /workspace on a RAM (tmpfs) or use an SSD storage it gets better, but its expensive to have this for every application. What hardware/infrasctructure are you validating this?

thaJeztah · 2016-05-17T14:08:02Z

This is simply a fresh install of docker on a DigitalOcean droplet, nothing fancy; Fedora 23 (was preparing to reproduce another issue), 2 GB Memory / 40 GB Disk

rodrigooshiro · 2016-05-17T16:15:41Z

That's why I mentioned that using RAM or SSD might get a different results. If I look at the plans offered by DigitalOceans, they are selling SSD cloud server plans...

Currently I am running virtual machines with regular disks and physical ones all running on openSUSE.

thaJeztah · 2016-05-17T16:21:31Z

But SSD or not; it's about the difference between in a container and outside.

@cyphar can you reproduce this on OpenSUSE?

rodrigooshiro · 2016-05-17T16:28:10Z

That's why I am worried... Even if I setup it fine on my environment, its not like I could ask everybody to adopt this SSD disk solution in order to have the same hardware as I recommend... I am investigating this on Ubuntu as well just to make sure its not openSUSE's specific... But it also happens on SLES 12.

rodrigooshiro · 2016-05-17T18:36:12Z

So, on Ubuntu I got results closer to yours, @thaJeztah, thanks for looking... There was not much impact when running on a host vs a container.

Unfortunately there is no aufs driver on docker for openSUSE/SLES... Looks like something related with the storage driver, either btrfs I that am experiencing and devicemapper that @alkmim posted when this issue was reported.

Here is another VM (Ubuntu 14-04LTS) I have set on the same hardware as the openSUSE 13.2's:

Environment

# docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.11.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 2
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: null host bridge
Kernel Version: 3.19.0-25-generic
Operating System: Ubuntu 14.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.954 GiB
ID: KVRL:JOHJ:FJLT:EWZ5:5QSK:LO3E:C52E:ROIX:GGPN:OGO2:PXEB:JWC6
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support

Tests

mkdir -p /docker_workspace
rm -f /docker_workspace/*

time dd if=/dev/zero of=/docker_workspace/test_host1.img bs=512 count=1000 oflag=dsync
test1: 512000 bytes (512 kB) copied, 5.3491 s, 95.7 kB/s
test2: 512000 bytes (512 kB) copied, 8.01828 s, 63.9 kB/s
test3: 512000 bytes (512 kB) copied, 8.3278 s, 61.5 kB/s

docker run --rm --net=host --read-only -v "/docker_workspace:/docker_workspace" opensuse bash -c "time dd if=/dev/zero of=/docker_workspace/test_container1.img bs=512 count=1000 oflag=dsync"
test1: 512000 bytes (512 kB) copied, 7.24514 s, 70.7 kB/s
test2: 512000 bytes (512 kB) copied, 8.99948 s, 56.9 kB/s
test3: 512000 bytes (512 kB) copied, 6.52957 s, 78.4 kB/s

thaJeztah · 2016-05-17T18:42:13Z

There was not much impact when running on a host vs a container.

Basically, there should be no impact; when using a bind-mounted directory, or a volume, there's nothing between the process and the disk, it's just a mounted directory. The only thing that docker can do, is set a constraint (but these are disabled by default), such as;

--device-read-bps=[]          Limit read rate (bytes per second) from a device (e.g., --device-read-bps=/dev/sda:1mb)
--device-read-iops=[]         Limit read rate (IO per second) from a device (e.g., --device-read-iops=/dev/sda:1000)
--device-write-bps=[]         Limit write rate (bytes per second) to a device (e.g., --device-write-bps=/dev/sda:1mb)
--device-write-iops=[]        Limit write rate (IO per second) to a device (e.g., --device-write-bps=/dev/sda:1000)

rodrigooshiro · 2016-05-17T19:01:34Z

There is nothing special with the directory I am using as a volume to share, its just a regular folder on the host. The only difference I could think of was the storage-driver that docker was using. So I am clueless then how to get this fixed on SuSE. I tried using the constrains, but there was no difference on the transfer rates:

container running on opensuse 13.2's host:

# docker run --rm --net=host --device-read-bps=/dev/sda:1mb --device-write-bps=/dev/sda:1mb --device-read-iops=/dev/sda:1000 -v "/docker_workspace:/docker_workspace" opensuse bash -c "time dd if=/dev/zero of=/docker_workspace/test_container2.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 16.6024 s, 30.8 kB/s

real    0m16.604s
user    0m0.032s
sys     0m0.060s

opensuse 13.2's host:

# time dd if=/dev/zero of=/docker_workspace/test_container2.img bs=512 count=1000 oflag=dsync                                                                                         1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 2.17687 s, 235 kB/s

real    0m2.179s
user    0m0.000s
sys     0m0.128s

Changing "--device-write-bps", etc. makes it hang, so I could not test this one.

rodrigooshiro · 2016-05-17T19:16:59Z

Mounting the shared folder (/workspace) on RAM makes it run instantaneously, but then again, it's an expensive resource like SSD. I'm still having performance issues when running applications on HDD storages... What puzzles me is how come docker, as just like a regular process writing on my folder could be so much slower?

# mount -t tmpfs -o size=1G tmpfs /workspace
# time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync                                                                                                                   1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.00166943 s, 307 MB/s

real    0m0.004s
user    0m0.000s
sys     0m0.000s

# docker run --rm --net=host -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 0.00108018 s, 474 MB/s

real    0m0.002s
user    0m0.000s
sys     0m0.000s

# umount /workspace

# docker run --rm --net=host -v "/workspace:/workspace" opensuse bash -c "time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 16.4395 s, 31.1 kB/s

real    0m16.445s
user    0m0.000s
sys     0m0.128s

# time dd if=/dev/zero of=/workspace/image.img bs=512 count=1000 oflag=dsync
1000+0 records in
1000+0 records out
512000 bytes (512 kB) copied, 2.10887 s, 243 kB/s

real    0m2.111s
user    0m0.000s
sys     0m0.128s

cyphar · 2016-05-18T05:59:52Z

@ipeoshir I can't reproduce this on Tumbleweed. I believe it's probably a kernel issue, since Docker doesn't do anything special with bindmounts. Can you try to reproduce this on an openSUSE distribution with a newer kernel (for example openSUSE Leap or Tumbleweed)? openSUSE 13.2 has very old packages.

% uname -a
Linux gondor 4.5.3-1-default #1 SMP PREEMPT Thu May 5 05:03:39 UTC 2016 (d29747f) x86_64 x86_64 x86_64 GNU/Linux
% lsb_release -a
LSB Version:    n/a
Distributor ID: openSUSE project
Description:    openSUSE Tumbleweed (20160514) (x86_64)
Release:        20160514
Codename:       n/a
% docker run --rm --read-only -it --net=host -v /workspace:/workspace opensuse/amd64:tumbleweed sh -c "time dd if=/dev/zero of=/workspace/test bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 3.06674 s, 167 kB/s

real    0m3.068s
user    0m0.000s
sys     0m0.136s
% sh -c "time dd if=/dev/zero of=/workspace/test bs=512 count=1000 oflag=dsync"
1000+0 records in
1000+0 records out
512000 bytes (512 kB, 500 KiB) copied, 3.07103 s, 167 kB/s

real    0m3.072s
user    0m0.000s
sys     0m0.164s

cyphar · 2016-05-31T22:06:07Z

Can you run blktrace -d <the device> while the dd is running?

cyphar · 2016-06-01T09:31:14Z

@ipeoshir I have some proposed fixes from our kernel team, which were mirrored in the internal ticket. Basically it boils down to three options that can help the performance:

Switch IO scheduler on the underlying disk to 'deadline' - by that
you'll completely lose propotional IO weighting between blkio cgroups and
also some other features of CFQ IO scheduler. But it may work fine.
You can do the switch by doing:

echo deadline >/sys/block/<device>/queue/scheduler

A less drastic option - turn off CFQ scheduler idling by:

echo 0 >/sys/block/<device>/queue/iosched/slice_idle
echo 0 >/sys/block/<device>/queue/iosched/group_idle

After that CFQ IO scheduler will not wait before switching to serving
another process / blkio cgroup. So performance will not suffer when using
blkio cgroups but "IO hungry" cgroup / process can get disproportionate
amount of IO time compared to cgroup that does not have IO always ready.

Switch the underlying filesystem to btrfs or XFS.

Using data=journal mode of ext4 as mentioned in <previous comment> has other performance implications (in general the performance is going to be much worse because all the writes happen twice - once to the journal and once to the final location on disk) so I would not consider that an ideal solution.

rodrigooshiro · 2016-06-01T12:01:24Z

@cyphar, thanks!
I will try to follow 1) and 2) and see if it will fit as solutions to my use cases. Suggestion 3) will not work, we already tried any filesystem out there to have the performance improved but to no avail. I will see if I can have 980615 updated with that in mind, the issue is about the performance in general. The "dd" example was one point we could find that was requiring a fix.

rodrigooshiro · 2016-06-01T12:53:16Z

Suggestions for:

works for "dd", I switched to noop, deadline and cfq and could see that only cfq reduces the transfer rates.
it only works when its on cfq, otherwise it outputs "Permission denied", but it has the same results.

However the overall performance is still slower, I can noticed that as described in #23137.

In the bug id 980615 we expect to fix the performance in general, I believe the root cause is not addressed in this case. Perhaps we can update it to not mention "ext3 and ext4 journaled filesystems", but then we would have to open another bugzilla anyway... So let's try to fix the root cause and figure out why it's impacting docker on SuSE..

cyphar · 2016-06-01T12:58:53Z

The problem @ipeoshir is that the only solution from the Docker (runC + libcontainer) side is non-trivial (I describe it in opencontainers/runc#861). And from the kernel side as far as I understand it's a fundamental design of how the CFQ IO scheduler works when weighting different cgroups (this is an upstream kernel thing). I'm not sure why it doesn't appear to happen on Ubuntu, but I'm going to look at it to see why that's the case.

To be clear, the filesystem isn't the cause. It's because of how IO scheduling works in the kernel.

rodrigooshiro · 2016-06-01T13:05:37Z

Well, we better wait for a good fix then. I have a more detailed spreasheet with the same use case running on ext2 on containers, ext3 on containers and ext3 on host and during a certain step (package installation) the time measure was this: 6812, 7206, 887 respectively, so it's really a bottleneck to proceed using docker.

rodrigooshiro · 2016-06-03T12:40:56Z

@cyphar, @flavio, I added more detailed information on: bugzilla number 983015.

Tracing the versions of docker that can be installed on SLES12, this bug first appeared in "docker-1.5.0-23.1" while version "docker-1.5.0-20.1" and older had no issue with performance lags.

Installing just 34 packages on containers is taking "1m12.571s" (1.5.0-23.1 and newer) against "0m3.614s" (1.5.0-20.1 and older) when this issue is not there. Version 1.5.0-20.1 has the same performance when running on the host, so there could be some way to fix this on docker as this was working fine at some point.

cyphar · 2016-06-03T13:28:41Z

This appears to be something we changed in the SUSE package (my guess is that we backported a patch that caused this). I'm trying to pin down what revision caused this.

rodrigooshiro · 2016-06-03T13:38:49Z

Not sure if it was a patch... I downloaded the binaries directly from docker, extracted them and it also had the issue.

cyphar · 2016-06-03T13:39:38Z

@ipeoshir But the -23.1 and -20.1 are the SLE package versions right? You didn't take them from somewhere else?

rodrigooshiro · 2016-06-03T13:45:37Z

Yes, on the bugzilla I used solely packages from SLES12 to get the results for you.

But to test the binaries I downloaded from:

https://get.docker.com/builds/Linux/x86_64/docker-latest.tgz

** And replaced the RPM binary (in this case it was an openSUSE 13.2) with those non-patched binaries, just to make sure it was not something from the distribution... So it's probably on docker's source code. In there (openSUSE) we can detect the latency problems between 1.5.0-21.1 and 1.6.0-25.1 available on the update repo (http://download.opensuse.org/repositories/openSUSE:/13.2:/Update/standard).

On SLES12 the difference is on those two package versions I pointed out...

cyphar · 2016-07-08T14:38:05Z

This issue resulted in #24307 (which fixes the odd regression between 1.5.0 packages). In addition, we discovered that the reason that Ubuntu doesn't have this problem is because Ubuntu uses the deadline IO scheduler by default (SUSE distributions use CFQ), which doesn't suffer as badly from the blkio cgroup performance issue.

In any case, this issue can be closed (everything has been done on the Docker side that is possible).

/cc @thaJeztah

LK4D4 · 2016-07-08T14:43:11Z

@cyphar yeah, and this issue already is too big. Thanks for investigation and fix!
@alkmim @ipeoshir feel free to report another issue if problem is still here with docker master.

thaJeztah · 2016-07-08T17:10:31Z

thanks @cyphar, and thanks for doing the research

mimizone · 2016-09-27T23:52:06Z

Why the test outside the container/cgroup is not impacted by the bug/slow performance? Isn't the I/O scheduler always in the data path, cgroup or not?
like described on this great diagram https://www.thomas-krenn.com/en/wiki/Linux_Storage_Stack_Diagram#Diagram_for_Linux_Kernel_4.0

cyphar · 2016-09-29T10:41:54Z

@mimizone According to the kernel guys at SUSE, the reason is that the CFQ IO scheduler will add latency to requests in order to make sure that two racing cgroups don't starve one another. I don't fully understand the code behind it, but that's what they told me and the experimental results back this up (deadline scheduler would never dream of adding latency).

programmerq added the area/runtime label Mar 30, 2016

thaJeztah added the area/performance label Apr 11, 2016

unclejack closed this as completed Apr 30, 2016

unclejack reopened this May 2, 2016

icecrime added the version/1.9 label May 12, 2016

thaJeztah mentioned this issue May 31, 2016

Slow performance when installing RPMs to create new Docker images #23137

Closed

LK4D4 closed this as completed Jul 8, 2016

thaJeztah mentioned this issue Aug 12, 2016

Performance degradation in data volume inside docker with kernel 3.16 #25656

Closed

danpat mentioned this issue Oct 24, 2016

Slow performance of osrm-extract inside Docker container Project-OSRM/osrm-backend-docker#1

Open

aseering mentioned this issue Nov 25, 2016

[question] user-space linux API vs full virtualization performance? microsoft/WSL#1417

Closed

atzoum mentioned this issue Jan 23, 2017

Kernel configuration for supporting docker in swarm mode longsleep/linux-pine64#50

Merged

wohali mentioned this issue Feb 1, 2018

Fabric timeouts preventing shard opening apache/couchdb#1119

Closed

wollanup mentioned this issue May 17, 2018

MySQL extremely slow just with docker-ce installed docker/for-linux#247

Closed

3 tasks

vsoch mentioned this issue Jun 25, 2018

The interface is super slow singularityhub/interface#35

Open

eliasnaur mentioned this issue Feb 25, 2019

x/build: sharded iOS builders golang/go#23824

Open

matpen mentioned this issue Jul 6, 2020

Low performance with volumes on btrfs containers/podman#6862

Closed

kikislater mentioned this issue Jul 7, 2020

QGIS Server performance questions 3liz/py-qgis-server#10

Closed

polarathene mentioned this issue Mar 9, 2023

Revert commit that changed LimitNOFILE to infinity to avoid regressions containerd/containerd#7566

Closed

polarathene mentioned this issue May 13, 2023

fix: Normalize RLIMIT_NOFILE (LimitNOFILE) to sensible defaults #45534

Merged

Slow IO performance inside container compared with the host. #21485

Slow IO performance inside container compared with the host. #21485

Comments

alkmim commented Mar 24, 2016

MHBauer commented Mar 24, 2016

HackToday commented Mar 25, 2016

thaJeztah commented Mar 26, 2016

alkmim commented Mar 28, 2016

thaJeztah commented Mar 28, 2016

alkmim commented Mar 29, 2016

unclejack commented Apr 11, 2016

alkmim commented Apr 26, 2016 • edited Loading

unclejack commented Apr 30, 2016 • edited Loading

alkmim commented May 2, 2016 • edited Loading

rodrigooshiro commented May 11, 2016

rodrigooshiro commented May 17, 2016 • edited Loading

rodrigooshiro commented May 17, 2016

thaJeztah commented May 17, 2016

rodrigooshiro commented May 17, 2016

rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync

rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync

rm -f /workspace/* && time dd if=/dev/zero of=/workspace/test_container1.img bs=512 count=1000 oflag=dsync

docker run --rm --read-only -it --net=host -v "/workspace:/workspace" opensuse bash

rodrigooshiro commented May 17, 2016

thaJeztah commented May 17, 2016

rodrigooshiro commented May 17, 2016

thaJeztah commented May 17, 2016

rodrigooshiro commented May 17, 2016

rodrigooshiro commented May 17, 2016 • edited Loading

thaJeztah commented May 17, 2016

rodrigooshiro commented May 17, 2016

rodrigooshiro commented May 17, 2016

cyphar commented May 18, 2016 • edited Loading

cyphar commented May 31, 2016

cyphar commented Jun 1, 2016

rodrigooshiro commented Jun 1, 2016

rodrigooshiro commented Jun 1, 2016

cyphar commented Jun 1, 2016 • edited Loading

rodrigooshiro commented Jun 1, 2016

rodrigooshiro commented Jun 3, 2016

cyphar commented Jun 3, 2016 • edited Loading

rodrigooshiro commented Jun 3, 2016

cyphar commented Jun 3, 2016

rodrigooshiro commented Jun 3, 2016 • edited Loading

cyphar commented Jul 8, 2016 • edited Loading

LK4D4 commented Jul 8, 2016

thaJeztah commented Jul 8, 2016

mimizone commented Sep 27, 2016

cyphar commented Sep 29, 2016

alkmim commented Apr 26, 2016 •

edited

Loading

unclejack commented Apr 30, 2016 •

edited

Loading

alkmim commented May 2, 2016 •

edited

Loading

rodrigooshiro commented May 17, 2016 •

edited

Loading

rodrigooshiro commented May 17, 2016 •

edited

Loading

cyphar commented May 18, 2016 •

edited

Loading

cyphar commented Jun 1, 2016 •

edited

Loading

cyphar commented Jun 3, 2016 •

edited

Loading

rodrigooshiro commented Jun 3, 2016 •

edited

Loading

cyphar commented Jul 8, 2016 •

edited

Loading