Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file operations fail randomly when many (thousands) files are copied #2780

Closed
heldchen opened this issue Dec 22, 2017 · 18 comments
Closed

file operations fail randomly when many (thousands) files are copied #2780

heldchen opened this issue Dec 22, 2017 · 18 comments

Comments

@heldchen
Copy link

heldchen commented Dec 22, 2017

starting in recent builds, actions involving many (possibly concurrent) file create/copy/move IO randomly fail.

  • Your Windows build number: (Type ver at a Windows Command Prompt)

17040 + 17063

  • What you're doing and what's happening: (Copy&paste specific commands and their output, or include screen shots)

observed in the following cases:

  • composer install on large projects (f.e. Magento 2.2, involves 42'000 files)
  • bin/magento setup:static-content:deploy (recompiles magento static files, deletes & generates 19'000 files)
  • npm install
  • yarn install (proprietary project, involves 20'000 files)

in all cases the errors are looking similar, but happen randomly (each time with different files/folders) and not always. in all cases, the files actually exist, despite what the error messages say. a 2nd run of the same command sometimes succeeds, sometimes fails at a different place. I've never seen failure at the same place.

some examples:

error An unexpected error occurred: "ENOENT: no such file or directory, copyfile '/usr/local/share/.cache/yarn/v1/npm-ckeditor-dev-4.8.0-ab350a8a3e2282ed769b5c9413fcf468d98b5163/plugins/devtools/lang/pt-br.js' -> '/mnt/t/someproject/src/node_modules/ckeditor-dev/plugins/devtools/lang/pt-br.js'".
error An unexpected error occurred: "ENOENT: no such file or directory, copyfile '/usr/local/share/.cache/yarn/v1/npm-lodash-4.17.4-78203a4d1c328ae1d86dca6460e369b57f4055ae/fp/before.js' -> '/mnt/t/someproject/src/nodeodules/sifter/node_modules/lodash/fp/before.js'".
error An unexpected error occurred: "ENOENT: no such file or directory, copyfile '/usr/local/share/.cache/yarn/v1/npm-ckeditor-dev-4.8.0-ab350a8a3e2282ed769b5c9413fcf468d98b5163/plugins/clipboard/lang/ka.js' -> '/mnt/t/someproject/src/node_modules/ckeditor-dev/plugins/clipboard/lang/ka.js'".
  [Magento\Framework\Exception\FileSystemException]
  Cannot read contents from file "/mnt/t/anotherproject/src/www/pub/static/adminhtml/Magento/backend/de_CH/Yireo_EmailTester2/less/emailtester.css" Warning!file_get_contents(/mnt/t/anotherproject/
  src/www/pub/static/adminhtml/Magento/backend/de_CH/Yireo_EmailTester2/less/emailtester.css): failed to open stream: No such file or directory
 [Magento\Framework\Exception\FileSystemException]
  The path "/mnt/t/anotherproject/src/www/var/view_preprocessed/pub/static/" is not writable

the error also consistently happens when all actions are done on lxfs (i.e. it is not DrvFs related):

  [Magento\Framework\Exception\FileSystemException]
  Cannot read contents from file "/tmp/test/www/pub/static/frontend/MyTheme/default/fr_CH/My_Ext/css/source/edit.css" Warning!file_get_contents(/tmp/test/www/pub/static/frontend/MyTheme/default/fr_CH/My_Ext/c
  ss/source/edit.css): failed to open stream: No such file or directory

disabling windows defender does not make a difference. /mnt/t is a veracrypt container, but this has worked flawlessly on earlier builds. I cannot remember the first build that showed these errors, but they did happen in the current and last insider fast builds.

it seems to be related to some concurrency issues - for bin/magento setup:static-content:deploy one can specify the parallel jobs to use. the more parallel jobs, the higher the chances the error happens: with 6 parallel jobs almost always, with 1 job "never" (at least never observed in 10 test-runs).

the errors are not happening on real metal ubuntu.

  • What's wrong / what should be happening instead:

files should be copied/created :-)

  • Strace of the failing command, if applicable: (If some_command is failing, then run strace -o some_command.strace -f some_command some_args, and link the contents of some_command.strace in a gist here)

I have not yet managed to catch this in a strace: when using strace, everything is slowed down significantly and all file actions succeed. any pointers on how to narrow this down or even produce a reliable repo are appreciated! using lxfs instead of DrvFs I was able to catch the following error in strace:

  [Magento\Framework\Exception\FileSystemException]
  Cannot read contents from file "/tmp/test/www/pub/static/frontend/Magento/blank/de_CH/mage/gallery/gallery.css" Warning!file_get_contents(/tmp/test/www/pub/static/frontend/Magento/blank/de_CH/mage/gallery/gallery.css): failed to open stream: No such file or directory

unfortunately it's not showing anything interesting:

...
7886  <... stat resumed> {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0
7885  <... access resumed> )            = -1 ENOENT (No such file or directory)
7886  stat("/tmp/test/www/vendor/magento/theme-frontend-blank/Magento_Customer/web/css", {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0
7885  time( <unfinished ...>
7886  stat("/tmp/test/www/vendor/magento/theme-frontend-blank/Magento_Downloadable/web/css",  <unfinished ...>
7885  <... time resumed> NULL)          = 1513968596
7886  <... stat resumed> {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0
7885  lstat("/tmp/test/www/pub/static/frontend/Magento/blank/de_CH/mage/gallery/gallery.css",  <unfinished ...>
7886  stat("/tmp/test/www/vendor/magento/theme-frontend-blank/Magento_GiftCard/web/css",  <unfinished ...>
7885  <... lstat resumed> 0x7fffed975890) = -1 ENOENT (No such file or directory)
7886  <... stat resumed> {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0
7885  lstat("/tmp/test/www/pub/static/frontend/Magento/blank/de_CH/mage/gallery",  <unfinished ...>
7886  stat("/tmp/test/www/vendor/magento/theme-frontend-blank/Magento_GiftCardAccount/web/css",  <unfinished ...>
7885  <... lstat resumed> {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
7886  <... stat resumed> {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0
7885  openat(AT_FDCWD, "/tmp/test/www/pub/static/frontend/Magento/blank/de_CH/mage/gallery/gallery.css", O_RDONLY <unfinished ...>
7886  stat("/tmp/test/www/vendor/magento/theme-frontend-blank/Magento_GiftMessage/web/css",  <unfinished ...>
7885  <... openat resumed> )            = -1 ENOENT (No such file or directory)
7886  <... stat resumed> {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0
7886  stat("/tmp/test/www/vendor/magento/theme-frontend-blank/Magento_GiftRegistry/web/css", {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0
7886  stat("/tmp/test/www/vendor/magento/theme-frontend-blank/Magento_GiftWrapping/web/css",  <unfinished ...>
7885  write(4, "\n", 1 <unfinished ...>
7886  <... stat resumed> {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0
7885  <... write resumed> )             = 1
7886  stat("/tmp/test/www/vendor/magento/theme-frontend-blank/Magento_GroupedProduct/web/css",  <unfinished ...>
7885  pipe( <unfinished ...>
7886  <... stat resumed> {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0
7885  <... pipe resumed> [7, 8])        = 0
7886  stat("/tmp/test/www/vendor/magento/theme-frontend-blank/Magento_Invitation/web/css",  <unfinished ...>
...

full strace

@heldchen heldchen changed the title file operations fail randomy when many (thousands) files are copied file operations fail randomly when many (thousands) files are copied Dec 22, 2017
@Warblefly
Copy link

I wonder if this is related to bug #2712 which has, as its penultimate comment just now, from DHowett-MSFT:

"A few members of my team (myself, @rajsesh@yiyang-msft) are seeing this as well, on builds in the 1705x-1706x range."

@heldchen
Copy link
Author

indeed, that very much looks like the same problem, thanks for pointing to that issue!

@donaldpipowitch
Copy link

I just hit this after switching to the Insiders build for the first time (17074). Is there a workaround for this? :/ Worked fine (even though slow) before.

@heldchen
Copy link
Author

Is there a workaround for this?

not really... as it seems to happen rather randomly you might be lucky just running the same command again. also helping sometimes (but not always): a) reduce/remove parallelism b) use a non-DrvFs storage location

@donaldpipowitch
Copy link

Thank you for the quick response. Damn :( I wonder what to do now 🤔

@heldchen
Copy link
Author

if you have an easily reproducible use case that reliably triggers the problem, I'm sure @sunilmut would appreciate hearing about it. there's more talk about the problem in #2712

@donaldpipowitch
Copy link

Sadly nothing reproducible. It just happens on every "bigger" project where I try it, but always on different files and commands (scandir, copyfile, and so on). :(

@donaldpipowitch
Copy link

Strange... I tried this workaround and now installation finishes successfully and a node_modules directory is generated with the correct folders in it, but without any file. 🤔

@tmcdonnell87
Copy link

I also get this problem with large yarn installs. Unfortunately it's also with proprietary projects but I'm happy to spend the time to track it down if someone wants to give directions.

[4/5] Linking dependencies...
error An unexpected error occurred: "ENOENT: no such file or directory, lstat '/home/terry/.cache/yarn/v1/npm-chai-http-3.0.0-5460d8036e1f1a12b0b5b5cbd529e6dc1d31eb4b/lib/net.js'".
info If you think this is a bug, please open a bug report with the information provided in "/mnt/c/Users/terry/Agathos/web/yarn-error.log".
info Visit https://yarnpkg.com/en/docs/cli/upgrade for documentation about this command.

@heldchen
Copy link
Author

all my use cases are fixed in the 176xx builds.

@levrik
Copy link

levrik commented Sep 26, 2018

Facing the same issue as @tmcdonnell87 on Windows build 17134.285
It makes WSL useless to me.

@noinkling
Copy link

@levrik It's frustrating for sure, but it will usually work with enough repeated attempts. The update that fixes this is supposed to hit the stable channel within a month I believe.

@dhedey
Copy link

dhedey commented Dec 4, 2018

I'm still getting reasonably regular repros on 17763.134 when running npm install on large(ish) dependency trees - does this seem like the same bug?

(npm version: 6.4.1)

...
[npm install] npm ERR! path /mnt/c/REDACTED/node_modules/acorn
[npm install] npm ERR! code ENOENT
[npm install] npm ERR! errno
[npm install] -2
npm ERR! syscall rename
[npm install] npm ERR! enoent ENOENT: no such file or directory, rename '/mnt/c/REDACTED/node_modules/acorn' -> '/mnt/c/REDACTED/node_modules/.acorn.DELETE'
[npm install] npm ERR! enoent This is related to npm not being able to find a file.
npm ERR! enoent

Has anyone else seen this on versions >= 176xx?

@therealkenc
Copy link
Collaborator

^--- that's #1529

@dhedey
Copy link

dhedey commented Dec 4, 2018

Ahh cheers @therealkenc!

(For anyone else who also got here incorrectly - this is a good comment by therealkenc explaining the issue elsewhere: #14 (comment) )

@dhedey
Copy link

dhedey commented Dec 5, 2018

Actually, it's an ENOENT, not a EACCES so I'm not sure it is that issue? But I agree, I don't think it is a recurrence of this WSL issue (and indeed I'm not sure it is even a WSL issue, and could indeed be an npm issue instead - I'll dig a bit more soon).

EDIT: Ah no, it looks like you are right - whilst it appears as ENOENT, this npm issue is perhaps hiding the real reason:
npm/npm#19469
In particular, the move from .staging might fail due to (say) an EACCES, but this is silently swallowed, and then the later rename fails to find the file to rename because the previous move didn't succeed.

@therealkenc
Copy link
Collaborator

the later rename fails to find the file to rename because the previous move didn't succeed

Yep. You can confirm by doing an strace -f -o npm.strace npm ... and then grep for the EACCES. But I don't recommend it. The sequence of events leading to the fail can be nontrivial depending on what npm spawns and what else is happening in your Node ecosystem (either directly or indirectly), and going through the exercise mostly leads to a feeling of sadness inside. This is a well trodden road, and the probability of finding a new WSL actionable is (let's call it) "low".

@JeetChaudhari
Copy link

I am still facing the same issue, this should be reopened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants