-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"BrokenPipeError" on Windows #293
Comments
@yardencsGitHub what happens if you set I see similar errors on this issue, guessing it has to do with some interaction of low-level Torch code, Windows, and a custom dataloader: |
Can you please also run
and reply with those files as an attachment? |
Attaching files |
@yardencsGitHub I'm getting a similar error on Windows 10-- see traceback below--but on the I created a Python 3.6 conda environment, installed pytorch and torchvision from the pytorch channel, and then let pip install everything else when I did Going to try setting up a development environment on Windows to see if it's something conda specific or not Might be the case that I can relax the restrictions on what versions of our dependencies we use, and make it that on Windows we use previous versions that we already know worked before. Will also test that out by using a dev environment Here's the full traceback:
|
I tried to set up a dev environment on Windows using poetry, but as far as I can tell it's not possible right now to declare torch + torchvision as dependencies using poetry, because reasons mainly having to do with how pytorch provides its distributions I even tried a workaround like this, but no dice: Even just creating a new project with poetry and declaring just torch and torchvision as the only dependencies fails at the |
I think for now the fix might be to add back setup.py, build Windows-specific wheels, and then upload those to PyPI. |
related discussion here, for future reference: |
@yardencsGitHub can you please also reply with On Windows I was able to build a source distribution from a setup.py file and successfully install it but haven't had a chance to test whether it runs yet |
after fighting with this more, I think the bug is not due to any change in how we build the distribution package. Ways that I get a similar bug on Windows:
so basically there's some sort of .DLL bug in the current conda binaries for torch/torchvision on Windows? Ways that work on Windows -- do not give me the bug
edit: tried again with conda python==3.6 and dask is hanging weirdly. Seems like python==3.8 works better so for conda do
for venv do
|
I installed newer versions of vak + tweetynet in that environment. |
This worked for me in the beginning but crashed in the first validation attempt: training TweetyNet Epoch 1, batch 249. Loss: 0.4351. Global step: 249: 3%|▊ | 248/8520 [01:40<15:03, 9.15it/s]Traceback (most recent call last): |
Thank you @yardencsGitHub for the update and for attaching the zip file with From the spec-file, it looks like both
I'm pretty sure it's the
Can you please try creating a new
and see if you still get the DLL error or any other error about "the paging file"? |
Traceback (most recent call last): |
@yardencsGitHub it case it helps here's the two conda envs I tested on Windows (not Ubuntu WSL2!) -- attached as zip the one that is just called the one called Attaching them both in case we want to try and do more forensics at some point |
So far running with WSL was smooth. Since I'm repeating experiments it's impossible to repeat just a part of the experiment (The original and new training set durations need to be identical). This means that if such errors happen I need to rerun the whole experiment. Trackback: The above exception was the direct cause of the following exception: Traceback (most recent call last): |
installing `torch` on windows is complicated and there's no simple fix using `poetry`, see discussion on #293
Closing this as stale. We can search through closed issues if we end up having to deal with this again. I think the conda-forge package may help as stated in #381 |
@yardencsGitHub reports the following error on Windows when running
vak learncurve
.This occurs right when training starts, first batch after
tqdm
progress bar pops up.Full error attached as a .txt file
err_msg_output.txt
The text was updated successfully, but these errors were encountered: