Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance, incomplete file content, broken filesystem events with gRPC FUSE #4999

Closed
2 tasks done
jgonera opened this issue Oct 16, 2020 · 16 comments
Closed
2 tasks done

Comments

@jgonera
Copy link

jgonera commented Oct 16, 2020

I have a project that runs 5 containers, 3 of them have mounted volumes. It's all in TypeScript and using filesystem events (CRA, Gatsby, ts-node-dev). When using gRPC FUSE performance is noticeably slower when restarting any of the servers but most importantly file syncing and filesystem events randomly stop working. Very often it happens after modifying one file just a few times.

My apps either stop picking up file changes or crash because somehow they get incomplete content for that modified file (as if the filesystem event fired before the file finished syncing?). The only remedy seems to be restarting the container (docker-compose down && docker-compose up). Sometimes even that doesn't help and I have to restart the Docker daemon.

For the diagnostics ID pasted below I started docker-compose, changed the file, saved, the server picked up a change and restarted. Then I undid the change, saved, the server picked it up. Then I redid the change, the server no longer picked up the change. From now on no change caused filesystem events to fire. Doing docker-compose exec XYZ cat /path/to/file showed that the file was updated after each change though.

This is just one of the problems. Sometimes it seems like the file is "stuck" with incomplete content and no matter how many times I save it, the server crashes when receiving the filesystem event and trying to reload.

  • I have tried with the latest version of my channel (Stable or Edge)
  • I have uploaded Diagnostics
  • Diagnostics ID: D1FBCD8E-C774-4CC8-9AD3-5FC01B1984F6/20201016093818

I first mentioned this problem in docker/roadmap#7 (comment)

Expected behavior

File changes are picked up, file content is complete and up-to-date, filesystem events fire. Performance is on par or better than osxfs.

Actual behavior

None of the above happens. Performance is worse than osxfs.

Information

  • macOS Version: 10.15.7

Diagnostic logs

Docker for Mac: 2.4.0.0

There's no "Diagnose & Feedback" in the menu so I'm not sure what else to provide in this section.

Steps to reproduce the behavior

Since the problem occurs randomly it's hard for me to find a minimal case to reproduce. I can provide my Dockerfiles but they're basically pretty small Alpine images with a few packages installed and npm ci.

@stephen-turner
Copy link
Contributor

Thanks for the report, @jgonera.
@djs55 PTAL.

@partydrone
Copy link

partydrone commented Oct 19, 2020

I have a Rails project experiencing the same issue—code changes on the host are not reflected inside the container.

Information

  • Latest MacBook Pro running macOS 10.15.7
  • Docker Desktop for Mac 2.4.0.0
  • Diagnostics ID D2127B6D-D216-46C2-9362-86F15716D446/20201019074249

@djs55
Copy link
Contributor

djs55 commented Oct 25, 2020

Thanks for the report. Regarding incomplete content do you mean one single file is truncated or corrupted or do you mean some files are current while some files are out of date? When the server crashes, what kind of crash is it? Is the server getting a file I/O error or something else?

If it's the case that a single file is truncated or corrupted, I wonder if the event injection itself is causing incomplete data to be cached in the VM. In case you have time to experiment I've created a build which only injects ATTRIB events when files are modified which should help rule out this possibility at least: https://desktop-stage.docker.com/mac/edge/49158/Docker.dmg

To help me better understand how I could reproduce this, could you let me know

  • what editor you use to change the files
  • what the directory structure of your project looks like and how this is represented in the docker-compose.yml?

Thanks for your help!

@zorac
Copy link

zorac commented Oct 28, 2020

Another anecdotal data point: I've seen occasional (on the order of daily) cases where individual files on a volume mount become inaccessible with the error operation not permitted, open; the rest of the volume is fine, but the same file is inaccessible across multiple containers with the same mount. The container which picks up the error is running webpack-dev-server with hot reloading, so filesystem scanning and automated rebuilding. The files the error is triggering on don't appear to be ones I'm manually editing, but might be due to a git branch change. I'm switching back to osxfs to see if that clears the problem.

@jlefley
Copy link

jlefley commented Oct 30, 2020

I also experienced an OS Error: Operation not permitted, errno = 1 with Dart build_runner running in a Dart container on MacOS. I never saw this error before the new version of Docker for Mac. I restarted Docker and the services and the error went away.

I also sometimes notice a long delay when starting a container. Previously, docker-compose run would always start running the command right away. After upgrading, it sometimes will take 10s before the actual command starts running in the container. After a restart of Docker, this issue also went away.

@arusahni
Copy link

arusahni commented Nov 4, 2020

Another piece of anecdata:

I've had this same issue with Webpack + tsc in a container watching source files on my host. I'll frequently get errors about files already being open, etc. I found that saving the files again would let me get past the error (as if it were finally detecting the appropriate state).

I was hoping 2.5.0.0 would address the issue. It didn't. Since disabling the gRPC FUSE experiment, I haven't experienced this.

@airdrummingfool
Copy link

I'm also seeing some very weird file behavior in a container with gRPC FUSE turned on. Files "disappear" and can't be listed or deleted, but they can be stated, sometimes cated, and can prevent e.g. rm -rf from completing.

I can reproduce this fairly reliably by trying to remove my PHP project's Composer vendor directory with rm -rf vendor/ from within the container.

Here's an example of a disappearing file.

(container) Try to rm -rf a vendor directory, which fails to remove some files:

root@d11c3b24b44e:/project# rm -rf vendor/
rm: cannot remove 'vendor/mustache/mustache/src/Mustache/Exception': Directory not empty
rm: cannot remove 'vendor/mustache/mustache/src/Mustache/Loader': Directory not empty

(container) Trying to list what files remain shows an empty directory:

root@d11c3b24b44e:/project# ls -hal vendor/mustache/mustache/src/Mustache/Exception
total 0
drwxr-xr-x 9 root root 288 Nov 23  2019 .
drwxr-xr-x 6 root root 192 Nov  7 07:56 ..

(host) from the host, I can see that a few files remain:

Tommy@Tommys-MBP-2 Zend-Backend % ls -hal vendor/mustache/mustache/src/Mustache/Exception
total 56
drwxr-xr-x  9 Tommy  staff   288B Nov 23  2019 .
drwxr-xr-x  6 Tommy  staff   192B Nov  7 01:56 ..
-rw-r--r--  1 Tommy  staff   418B Nov 23  2019 InvalidArgumentException.php
-rw-r--r--  1 Tommy  staff   387B Nov 23  2019 LogicException.php
-rw-r--r--  1 Tommy  staff   393B Nov 23  2019 RuntimeException.php
-rw-r--r--  1 Tommy  staff   921B Nov 23  2019 SyntaxException.php
-rw-r--r--  1 Tommy  staff   965B Nov 23  2019 UnknownFilterException.php
-rw-r--r--  1 Tommy  staff   965B Nov 23  2019 UnknownHelperException.php
-rw-r--r--  1 Tommy  staff   987B Nov 23  2019 UnknownTemplateException.php

(container) if I pick one of those files and stat it in the container, I get a normal result:

root@d11c3b24b44e:/project# stat vendor/mustache/mustache/src/Mustache/Exception/InvalidArgumentException.php
  File: vendor/mustache/mustache/src/Mustache/Exception/InvalidArgumentException.php
  Size: 418             Blocks: 8          IO Block: 4096   regular file
Device: 4ah/74d Inode: 29984449    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2019-11-23 21:40:31.000000000 +0000
Modify: 2019-11-23 21:40:31.000000000 +0000
Change: 2020-11-07 07:46:42.133150556 +0000
 Birth: -

(container) if I try to cat the file, it's a tossup whether it will succeed or fail. I've seen it succeed once and then fail 2 seconds later, and I've also seen it succeed 10 times in a row.

If I turn off gRPC FUSE, I cannot reproduce this behavior.

MacOS 10.15.7 on a 2019 MBP. Docker Desktop 2.5.0.0.

@djs55
Copy link
Contributor

djs55 commented Nov 15, 2020

I have a new experiment build which may help with file truncation problems: https://desktop-stage.docker.com/mac/edge/49788/Docker.dmg Unfortunately it hasn't been notarized.

Regarding other issues, if anyone could provide a repro case that would be very helpful.

@stephen-turner
Copy link
Contributor

Has anyone tried this experimental build? We'd love to know if it fixed the problem.

@USSliberty
Copy link

Has anyone tried this experimental build? We'd love to know if it fixed the problem.

Installed today. Let's see if something comes along ;)

@USSliberty
Copy link

Sorry but I have still the same problems that I have with the stable version.

What is happening to me is:

  1. I enter into my container using the command: docker start container-name && docker exec -it container-name /bin/bash
  2. For a while everything looking good, no sync problems.
  3. Out of blue it starts to losing sync, and for getting the updated files I need to exit from the container's shell and re-enter with command in point 1). From now on I need to exit and re-enter if I want to see the files updated.

Here it is an example of my docker-compose file:

 container-name:
    image: image_name
    container_name: container-name
    build:
      context: .
    volumes:
      - ../:/opt/acme/code/:delegated
      - /opt/acme/code/node_modules/
      - ./aliases.sh:/opt/acme/aliases.sh
    # Overrides default command so things don't shut down after the process ends.
    command: sleep infinity
    ports:
    - 8080:8080
    - 9229:9229
    - 9615:9615
    networks:
    - acme
    depends_on:
    - container-name-2

I uploaded a Diagnostic ID: 98D8C14C-888F-4302-A2D2-E27FEC5B717C/20201117171248

I have a macOS BigSur 11.0.1 (20B29) and Docker Desktop 2.5.1 (49788)

I hope this helps, let me know if you need some other infos.

@Overbryd
Copy link

Overbryd commented Dec 8, 2020

Turning gRPC fuse off, and switching back to osxfs solves the problem for now.

@lraphael
Copy link

lraphael commented Dec 11, 2020

I have the same problem that often after startup and a few seconds, mounted files appear empty. It seems that it recognizes the file as a folder. I have tried the following variations:

volumes:
  - ${DEV_SSH_PATH:-~/.ssh}:/app/.ssh
  - ./asix/backend/:/app/
  - ${DEV_DB_DOWNLOADS:-./dbs}:/_downloads:delegated
  - ./media/asix:/app/media
  - ./constants.backend.py:/app/project/app_constants/constants.py

volumes:
  - ${DEV_SSH_PATH:-~/.ssh}:/app/.ssh
  - ./asix/backend/:/app/
  - ${DEV_DB_DOWNLOADS:-./dbs}:/_downloads:delegated
  - ./media/asix:/app/media
  - type: bind
    source: ${PWD}/constants.backend.py
    target: /app/project/app_constants/constants.py
    read_only: true

Docker versions: 2.5.1, 3.0.0, 3.0.1
Mac Version: 10.15.7

log:

2020-12-11 17:10:10.121323+0100  localhost com.docker.backend[4351]: external: GET /forwards/list 200 "Go-http-client/1.1" ""
2020-12-11 17:10:10.121507+0100  localhost com.docker.driver.amd64-linux[4437]: grpcfuseClient.Add(2813c024184a7b2710308e6c8594009ca65b4c3198ffc408dabf010b31f532e2, /backend.asix.test, [host=/Users/ra/dev/asix-dev-stack/media/asix,VM=/host_mnt/Users/ra/dev/asix-dev-stack/media/asix,dst=/app/media,option=rw host=/Users/ra/dev/asix-dev-stack/asix/backend,VM=/host_mnt/Users/ra/dev/asix-dev-stack/asix/backend,dst=/app,option=rw host=/Users/ra/.ssh,VM=/host_mnt/Users/ra/.ssh,dst=/app/.ssh,option=rw host=/Users/ra/dev/_downloads,VM=/host_mnt/Users/ra/dev/_downloads,dst=/_downloads,option=delegated host=/Users/ra/dev/asix-dev-stack/constants.backend.py,VM=/host_mnt/Users/ra/dev/asix-dev-stack/constants.backend.py,dst=/app/project/app_constants/constants.py,option=])
2020-12-11 17:10:10.121859+0100  localhost com.docker.backend[4351]: watching path /Users/ra/.ssh/
2020-12-11 17:10:10.122359+0100  localhost com.docker.backend[4351]: invalidating caches for /Users/ra/.ssh/
2020-12-11 17:10:10.343512+0100  localhost com.docker.backend[4351]: invalidating caches for /Users/ra/.ssh/ took 221.062237ms
2020-12-11 17:10:10.343601+0100  localhost com.docker.backend[4351]: watching path /Users/ra/dev/_downloads/
2020-12-11 17:10:10.344082+0100  localhost com.docker.backend[4351]: invalidating caches for /Users/ra/dev/_downloads/
2020-12-11 17:10:10.554295+0100  localhost com.docker.backend[4351]: invalidating caches for /Users/ra/dev/_downloads/ took 210.100777ms
2020-12-11 17:10:10.554378+0100  localhost com.docker.backend[4351]: watching path /Users/ra/dev/asix-dev-stack/asix/backend/
2020-12-11 17:10:10.554808+0100  localhost com.docker.backend[4351]: invalidating caches for /Users/ra/dev/asix-dev-stack/asix/backend/
2020-12-11 17:10:10.759113+0100  localhost com.docker.backend[4351]: invalidating caches for /Users/ra/dev/asix-dev-stack/asix/backend/ took 204.21518ms
2020-12-11 17:10:10.759190+0100  localhost com.docker.backend[4351]: watching path /Users/ra/dev/asix-dev-stack/constants.backend.py/
2020-12-11 17:10:10.759747+0100  localhost com.docker.backend[4351]: invalidating caches for /Users/ra/dev/asix-dev-stack/constants.backend.py/
2020-12-11 17:10:10.973464+0100  localhost com.docker.backend[4351]: invalidating caches for /Users/ra/dev/asix-dev-stack/constants.backend.py/ took 213.606792ms
2020-12-11 17:10:10.973548+0100  localhost com.docker.backend[4351]: unwatching path /Users/ra/dev/asix-dev-stack/asix/backend/docker/templatefiles/psql/
2020-12-11 17:10:10.973822+0100  localhost com.docker.backend[4351]: unwatching path /Users/ra/dev/asix-dev-stack/asix/backend/log/nginx/
2020-12-11 17:10:10.974030+0100  localhost com.docker.backend[4351]: unwatching path /Users/ra/dev/asix-dev-stack/asix/backend/static/

If I turn off gRPC, then it works.

@jgonera
Copy link
Author

jgonera commented Dec 29, 2020

I'm sorry I don't have more time for debugging. I can only say that this is still happening for me in Docker Desktop 3.0.3. Additionally, osxfuse stopped propagating file system events as well for some reason.

Out of desperation I set up https://mutagen.io/documentation/orchestration/compose (see also mutagen-io/mutagen#235) which seems to be working so far. I have to run my apps as root in development because of mutagen-io/mutagen#224 but for now it's a small price to pay for a working dev environment.

@docker-robott
Copy link
Collaborator

Issues go stale after 90 days of inactivity.
Mark the issue as fresh with /remove-lifecycle stale comment.
Stale issues will be closed after an additional 30 days of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle stale

@docker-robott
Copy link
Collaborator

Closed issues are locked after 30 days of inactivity.
This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle locked

@docker docker locked and limited conversation to collaborators May 28, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests