Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch 2.0 not working on Windows #90768

Open
Jerry-Master opened this issue Dec 13, 2022 · 48 comments
Open

PyTorch 2.0 not working on Windows #90768

Jerry-Master opened this issue Dec 13, 2022 · 48 comments
Labels
module: windows Windows support for PyTorch oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@Jerry-Master
Copy link

Jerry-Master commented Dec 13, 2022

🐛 Describe the bug

When I try installing the nightly build with the following command:

pip3 install numpy --pre torch[dynamo] --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu117

It gives the following warning

WARNING: torch 2.0.0.dev20221213+cu117 does not provide the extra 'dynamo'

And when I try to check the installation with

git clone https://github.com/pytorch/pytorch
cd tools/dynamo
python verify_dynamo.py

This results in the following error

Traceback (most recent call last):
  File ".\verify_dynamo.py", line 167, in <module>
    main()
  File ".\verify_dynamo.py", line 155, in main
    cuda_ver = check_cuda()
  File ".\verify_dynamo.py", line 64, in check_cuda
    cuda_ver = get_cuda_version()
  File ".\verify_dynamo.py", line 39, in get_cuda_version
    raise VerifyDynamoError(cpp_extension.CUDA_NOT_FOUND_MESSAGE)
__main__.VerifyDynamoError:
CUDA was not found on the system, please set the CUDA_HOME or the CUDA_PATH
environment variable or add NVCC to your system PATH. The extension compilation will fail.

The torch installation is working but dynamo seems not to work. When I run the benchmarks from https://gist.github.com/Chillee/f86675147366a7a0c6e244eaa78660f7 I get the following

C:\Users\user\anaconda3\envs\auxiliar\lib\site-packages\torch\_dynamo\eval_frame.py:428: UserWarning: Windows is not currently supported, torch._dynamo.optimize() will do nothing
  warnings.warn(
eager: 1386.2895965576172us
PT 2.0: 1406.266689300537us

Which again shows that dynamo is not working. If I try installing torchdynamo apart, it gives this error

Installing collected packages: torchdynamo
  Running setup.py install for torchdynamo ... error
  error: subprocess-exited-with-error

  × Running setup.py install for torchdynamo did not run successfully.
  │ exit code: 1
  ╰─> [108 lines of output]
      running install
      C:\Users\user\anaconda3\envs\auxiliar\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
        warnings.warn(
      running build
      running build_py
      creating build
      creating build\lib.win-amd64-cpython-38
      creating build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\allowed_functions.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\bytecode_analysis.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\bytecode_transformation.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\codegen.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\config.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\convert_frame.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\debug_utils.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\eval_frame.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\exc.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\guards.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\logging.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\mutation_guard.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\output_graph.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\profiler.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\replay_record.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\resume_execution.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\side_effects.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\skipfiles.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\source.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\symbolic_convert.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\testing.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\utils.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\__init__.py -> build\lib.win-amd64-cpython-38\torchdynamo
      creating build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\codecache.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\compile_fx.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\config.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\debug.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\decomposition.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\dependencies.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\exc.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\graph.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\ir.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\lowering.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\metrics.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\overrides.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\scheduler.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\sizevars.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\utils.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\virtualized.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\__init__.py -> build\lib.win-amd64-cpython-38\torchinductor
      creating build\lib.win-amd64-cpython-38\torchdynamo\optimizations
      copying torchdynamo\optimizations\analysis.py -> build\lib.win-amd64-cpython-38\torchdynamo\optimizations
      copying torchdynamo\optimizations\backends.py -> build\lib.win-amd64-cpython-38\torchdynamo\optimizations
      copying torchdynamo\optimizations\distributed.py -> build\lib.win-amd64-cpython-38\torchdynamo\optimizations
      copying torchdynamo\optimizations\inference.py -> build\lib.win-amd64-cpython-38\torchdynamo\optimizations
      copying torchdynamo\optimizations\log_args.py -> build\lib.win-amd64-cpython-38\torchdynamo\optimizations
      copying torchdynamo\optimizations\normalize.py -> build\lib.win-amd64-cpython-38\torchdynamo\optimizations
      copying torchdynamo\optimizations\subgraph.py -> build\lib.win-amd64-cpython-38\torchdynamo\optimizations
      copying torchdynamo\optimizations\training.py -> build\lib.win-amd64-cpython-38\torchdynamo\optimizations
      copying torchdynamo\optimizations\__init__.py -> build\lib.win-amd64-cpython-38\torchdynamo\optimizations
      creating build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\base.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\builder.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\builtin.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\constant.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\dicts.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\functions.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\lists.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\misc.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\nn_module.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\tensor.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\torch.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\user_defined.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\__init__.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      creating build\lib.win-amd64-cpython-38\torchinductor\codegen
      copying torchinductor\codegen\autotuner.py -> build\lib.win-amd64-cpython-38\torchinductor\codegen
      copying torchinductor\codegen\common.py -> build\lib.win-amd64-cpython-38\torchinductor\codegen
      copying torchinductor\codegen\cpp.py -> build\lib.win-amd64-cpython-38\torchinductor\codegen
      copying torchinductor\codegen\triton.py -> build\lib.win-amd64-cpython-38\torchinductor\codegen
      copying torchinductor\codegen\triton_template.py -> build\lib.win-amd64-cpython-38\torchinductor\codegen
      copying torchinductor\codegen\wrapper.py -> build\lib.win-amd64-cpython-38\torchinductor\codegen
      copying torchinductor\codegen\__init__.py -> build\lib.win-amd64-cpython-38\torchinductor\codegen
      creating build\lib.win-amd64-cpython-38\torchinductor\triton_ops
      copying torchinductor\triton_ops\autotune.py -> build\lib.win-amd64-cpython-38\torchinductor\triton_ops
      copying torchinductor\triton_ops\batched_matmul.py -> build\lib.win-amd64-cpython-38\torchinductor\triton_ops
      copying torchinductor\triton_ops\conv.py -> build\lib.win-amd64-cpython-38\torchinductor\triton_ops
      copying torchinductor\triton_ops\conv1x1.py -> build\lib.win-amd64-cpython-38\torchinductor\triton_ops
      copying torchinductor\triton_ops\conv_perf_model.py -> build\lib.win-amd64-cpython-38\torchinductor\triton_ops
      copying torchinductor\triton_ops\matmul.py -> build\lib.win-amd64-cpython-38\torchinductor\triton_ops
      copying torchinductor\triton_ops\mm_perf_model.py -> build\lib.win-amd64-cpython-38\torchinductor\triton_ops
      copying torchinductor\triton_ops\utils.py -> build\lib.win-amd64-cpython-38\torchinductor\triton_ops
      copying torchinductor\triton_ops\__init__.py -> build\lib.win-amd64-cpython-38\torchinductor\triton_ops
      copying torchinductor\codegen\cpp_prefix.h -> build\lib.win-amd64-cpython-38\torchinductor\codegen
      copying torchinductor\codegen\triton_conv_delta_x.j2 -> build\lib.win-amd64-cpython-38\torchinductor\codegen
      copying torchinductor\codegen\triton_conv_delta_x_hwc.j2 -> build\lib.win-amd64-cpython-38\torchinductor\codegen
      copying torchinductor\codegen\triton_mm.j2 -> build\lib.win-amd64-cpython-38\torchinductor\codegen
      running build_ext
      building 'torchdynamo._eval_frame' extension
      creating build\temp.win-amd64-cpython-38
      creating build\temp.win-amd64-cpython-38\Release
      creating build\temp.win-amd64-cpython-38\Release\torchdynamo
      "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\user\anaconda3\envs\auxiliar\include -IC:\Users\user\anaconda3\envs\auxiliar\Include "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\ATLMFC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" /Tctorchdynamo/_eval_frame.c /Fobuild\temp.win-amd64-cpython-38\Release\torchdynamo/_eval_frame.obj -Wall
      _eval_frame.c
      C:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt\corecrt_io.h(49): warning C4820: '_finddata32i64_t': '4' bytes de relleno agregados despu‚s de miembro de datos 'name'
      C:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt\corecrt_io.h(54): warning C4820: '_finddata64i32_t': '4' bytes de relleno agregados despu‚s de miembro de datos 'attrib'
      C:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt\corecrt_io.h(64): warning C4820: '__finddata64_t': '4' bytes de relleno agregados despu‚s de miembro de datos 'attrib'
      C:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt\corecrt_io.h(69): warning C4820: '__finddata64_t': '4' bytes de relleno agregados despu‚s de miembro de datos 'name'
      c:\users\user\anaconda3\envs\auxiliar\include\pyconfig.h(205): fatal error C1083: No se puede abrir el archivo incluir: 'basetsd.h': No such file or directory
      error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\x86_amd64\\cl.exe' failed with exit code 2
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> torchdynamo

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

I am using Windows 10, with AMD64 processors (the wheels for python show win_amd64 at the end). I also use 2 NVIDIA RTX 3090 with CUDA 12.0.

Versions

Collecting environment information...
PyTorch version: 2.0.0.dev20221213+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.8.15 (default, Nov 24 2022, 14:38:14) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090
Nvidia driver version: 526.86
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.24.0rc2
[pip3] torch==2.0.0.dev20221213+cu117
[conda] numpy 1.24.0rc2 pypi_0 pypi
[conda] torch 2.0.0.dev20221213+cu117 pypi_0 pypi

cc @peterjc123 @mszhanyi @skyline75489 @nbcsm @ezyang @soumith @msaroufim @wconstab @ngimel @bdhirsh

@malfet malfet added the module: windows Windows support for PyTorch label Dec 13, 2022
@malfet
Copy link
Contributor

malfet commented Dec 13, 2022

[dynamo] extra requirements are redundant at the moment, but on the other hand I think neither CPU nor triton backends are available for Windows at the moment

@msaroufim
Copy link
Member

@malfet speaking of the redundant dynamo I removed it from our main getting started blogs here pytorch/pytorch.github.io#1241 - for whatever reason though the doc build did not update @svekars

@T-Atlas
Copy link

T-Atlas commented Jan 5, 2023

PyTorch version: 2.0.0.dev20230104+cu117
Repeat the same problem in Windows 11

@ngimel ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 10, 2023
@PleezDeez
Copy link

py_cpuinfo works with 11 and gets rid of the lscpu flag when trying to install deepspeed in windows via pip. There's also a hacked together version of triton 2.0.0 some russian made I was able to use to make a working whl from

@g-i-o-r-g-i-o
Copy link

Can't install it on windows

PS D:> pip3 install numpy --pre torch --force-reinstall --index-url https://download.pytorch.org/whl/nightly/cu117
Looking in indexes: https://download.pytorch.org/whl/nightly/cu117
ERROR: Could not find a version that satisfies the requirement numpy (from versions: none)
ERROR: No matching distribution found for numpy

python version 3.9 (can't upgrade it, since it is the maximum version working with the last pytorch)

@inboxedshoe
Copy link

I think for this you can install numpy separately and then repeat the command without numpy in it. If you look in the url under numpy there's no windows binaries provided.

@Jerry-Master
Copy link
Author

After researching a bit, the real problem is that Triton, which is the main compiler in torch 2.0, has no windows support. It is under development, in previous versions it had support, now it hasn't, I'll just wait.

@SlowFeather
Copy link

When I configured everything in Windows, I received this error:

raise RuntimeError("Windows not yet supported for torch.compile")
RuntimeError: Windows not yet supported for torch.compile

I want to know if it can run on Windows 11 + troch2.0 + cuda118 🤯

@Eduard6421
Copy link

Greetings!
Any updates on this issue?

@hajlyx
Copy link

hajlyx commented Jun 20, 2023

Any plans to make torch.compile available on Windows?

@iremyux iremyux self-assigned this Jun 30, 2023
@iremyux
Copy link
Collaborator

iremyux commented Jun 30, 2023

Hi @Jerry-Master, I attempted to reproduce this issue but was unable to encounter it. It appears that the problem may have been resolved. If you are still encountering the same problem, please provide me with more details or steps to reproduce.

@d-kleine
Copy link
Contributor

d-kleine commented Jun 30, 2023

I have the same issue using this code:

import torch
import warnings

gpu_ok = False
if torch.cuda.is_available():
    device_cap = torch.cuda.get_device_capability()
    if device_cap in ((7, 0), (8, 0), (9, 0)):
        gpu_ok = True

if not gpu_ok:
    warnings.warn(
        "GPU is not NVIDIA V100, A100, or H100. Speedup numbers may be lower "
        "than expected."
    )
def foo(x, y):
    a = torch.sin(x)
    b = torch.cos(y)
    return a + b
opt_foo1 = torch.compile(foo)
print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10)))

Cell In[2], line 5
3 b = torch.cos(y)
4 return a + b
----> 5 opt_foo1 = torch.compile(foo)
6 print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10)))

File ...\anaconda3\envs\myenv\Lib\site-packages\torch_init_.py:1441, in compile(model, fullgraph, dynamic, backend, mode, options, disable)
1439 if backend == "inductor":
1440 backend = _TorchCompileInductorWrapper(mode, options, dynamic)
-> 1441 return torch._dynamo.optimize(backend=backend, nopython=fullgraph, dynamic=dynamic, disable=disable)(model)

File ...\anaconda3\envs\myenv\Lib\site-packages\torch_dynamo\eval_frame.py:413, in optimize(backend, nopython, guard_export_fn, guard_fail_fn, disable, dynamic)
380 def optimize(
381 backend="inductor",
382 *,
(...)
387 dynamic=False,
388 ):
389 """
390 The main entrypoint of TorchDynamo. Do graph capture and call
391 backend() to optimize extracted graphs.
(...)
411 ...
...
--> 375 raise RuntimeError("Windows not yet supported for torch.compile")
376 if sys.version_info >= (3, 11):
377 raise RuntimeError("Python 3.11+ not yet supported for torch.compile")

RuntimeError: Windows not yet supported for torch.compile

Torch version:

torch.__version__

'2.0.1+cpu'

Python version:

from platform import python_version
print(python_version())

3.11.3

OS version + machine:

import platform
print(platform.platform())
print(platform.machine())

Windows-10-10.0.19045-SP0
AMD64

I have already tried different Python versions (3.10.9, 3.11.4) and also pytorch with CUDA 11.8 (both tried with pip and conda installation): same issue persists

Also linking this issue: #86566

@Artyom17
Copy link

Artyom17 commented Jul 3, 2023

Yep, torch.compile still doesn't seem to work on 2.0.1+cu118 / Windows. Any updates?

@d-kleine d-kleine mentioned this issue Jul 5, 2023
8 tasks
@devangaggarwal
Copy link
Contributor

Are there any updates from PyTorch team on this? When can we expect Windows support?

@d-kleine
Copy link
Contributor

d-kleine commented Sep 16, 2023

Referring this discussion about the triton windows support issue as well: triton-lang/triton#1640

@wkpark
Copy link

wkpark commented Jan 1, 2024

pytorch also need to be fixed to work under windows+triton (this is WIP. not tested much) wkpark#1 See also openai/triton#2738

Can it work with torch.compile?

main branch has not been tested yet but, v210, v211 have been tested
this is v2.1.1 - wkpark@2716adf (you can cherry-pick and can be applied on top of v210)

@Ken1256
Copy link

Ken1256 commented Jan 1, 2024

pytorch also need to be fixed to work under windows+triton (this is WIP. not tested much) wkpark#1 See also openai/triton#2738

Can it work with torch.compile?

main branch has not been tested yet but, v210, v211 have been tested this is v2.1.1 - wkpark@2716adf (you can cherry-pick and can be applied on top of v210)

pip install --upgrade torch torchvision --index-url https://download.pytorch.org/whl/cu121
After remove check_if_dynamo_supported, I got.
ImportError: cannot import name 'get_cuda_stream' from 'triton.runtime.jit' (C:\Program Files\Python311\Lib\site-packages\triton\runtime\jit.py)

@jensdraht1999
Copy link

@Ken1256 This is just a draft, it has not been merged.

@wkpark Do you have any like step to step guide how to make this happen and how to use it via Torch.Compile. I mean just a simple decorator example for how to use it, so we might test it out. Or perhaps it will get merged in a few weeks perhaps?

@BeyondYourself
Copy link

BeyondYourself commented Jan 10, 2024

The same problem on Windows 11 + troch2.1.1 + cuda118+py3.10:Windows not yet supported for torch.compile

@wkpark
Copy link

wkpark commented Jan 11, 2024

pytorch also need to be fixed to work under windows+triton (this is WIP. not tested much) wkpark#1 See also openai/triton#2738

Can it work with torch.compile?

wkpark#1 has been updated to work with the latest triton(+with win32 fix)
(recent triton update breaks totch.compile() compatibility: Please see triton-lang/triton#2701 triton-lang/triton@72c9833 )

F:\src\pytorch\tools\dynamo>python verify_dynamo.py
Python version: 3.10.11
`torch` version: 2.3.0.dev20240109+cu118
CUDA version: 11.8
ROCM version: None

Microsoft (R) C/C++ 최적화 컴파일러 버전 19.34.31937(x64)
Copyright (c) Microsoft Corporation. All rights reserved.


...
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'triton__0d1d2' for 'sm_89'
ptxas info    : Function properties for triton__0d1d2
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 14 registers, 372 bytes cmem[0]
In file included from C:\Users\WK\AppData\Local\Temp\tmpdi5qu2w_\main.c:4:
In file included from C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\Python.h:118:
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\cpython/pytime.h:120:59: warning: declaration of 'struct timeval' will not be visible outside of this function [-Wvisibility]
  120 | PyAPI_FUNC(int) _PyTime_FromTimeval(_PyTime_t *tp, struct timeval *tv);
      |                                                           ^
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\cpython/pytime.h:127:12: warning: declaration of 'struct timeval' will not be visible outside of this function [-Wvisibility]
  127 |     struct timeval *tv,
      |            ^
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\cpython/pytime.h:132:12: warning: declaration of 'struct timeval' will not be visible outside of this function [-Wvisibility]
  132 |     struct timeval *tv,
      |            ^
C:\Users\WK\AppData\Local\Temp\tmpdi5qu2w_\main.c:20:7: warning: 'strcat' is deprecated: This function or variable may be unsafe. Consider using strcat_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.
      [-Wdeprecated-declarations]
   20 |       strcat(err, prefix);
      |       ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\string.h:91:5: note: 'strcat' has been explicitly marked deprecated here
   91 |     __DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1(
      |     ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\corecrt.h:835:5: note: expanded from macro '__DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1'
  835 |     __DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1_EX(_ReturnType, _ReturnPolicy, _DeclSpec, _FuncName, _FuncName##_s, _DstType, _SalAttributeDst, _DstType, _Dst, _TType1, _TArg1)
      |     ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\corecrt.h:1894:17: note: expanded from macro '__DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1_EX'
 1894 |                 _CRT_INSECURE_DEPRECATE(_SecureFuncName) _DeclSpec _ReturnType __cdecl _FuncName(_SalAttributeDst _DstType *_Dst, _TType1 _TArg1);
      |                 ^
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\vcruntime.h:355:55: note: expanded from macro '_CRT_INSECURE_DEPRECATE'
  355 |         #define _CRT_INSECURE_DEPRECATE(_Replacement) _CRT_DEPRECATE_TEXT(    \
      |                                                       ^
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\vcruntime.h:345:47: note: expanded from macro '_CRT_DEPRECATE_TEXT'
  345 | #define _CRT_DEPRECATE_TEXT(_Text) __declspec(deprecated(_Text))
      |                                               ^
C:\Users\WK\AppData\Local\Temp\tmpdi5qu2w_\main.c:21:7: warning: 'strcat' is deprecated: This function or variable may be unsafe. Consider using strcat_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.
      [-Wdeprecated-declarations]
   21 |       strcat(err, str);
      |       ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\string.h:91:5: note: 'strcat' has been explicitly marked deprecated here
   91 |     __DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1(
      |     ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\corecrt.h:835:5: note: expanded from macro '__DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1'
  835 |     __DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1_EX(_ReturnType, _ReturnPolicy, _DeclSpec, _FuncName, _FuncName##_s, _DstType, _SalAttributeDst, _DstType, _Dst, _TType1, _TArg1)
      |     ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\corecrt.h:1894:17: note: expanded from macro '__DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1_EX'
 1894 |                 _CRT_INSECURE_DEPRECATE(_SecureFuncName) _DeclSpec _ReturnType __cdecl _FuncName(_SalAttributeDst _DstType *_Dst, _TType1 _TArg1);
      |                 ^
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\vcruntime.h:355:55: note: expanded from macro '_CRT_INSECURE_DEPRECATE'
  355 |         #define _CRT_INSECURE_DEPRECATE(_Replacement) _CRT_DEPRECATE_TEXT(    \
      |                                                       ^
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\vcruntime.h:345:47: note: expanded from macro '_CRT_DEPRECATE_TEXT'
  345 | #define _CRT_DEPRECATE_TEXT(_Text) __declspec(deprecated(_Text))
      |                                               ^
5 warnings generated.
   C:\Users\WK\AppData\Local\Temp\tmpdi5qu2w_\triton_.cp310-win_amd64.lib 라이브러리 및 C:\Users\WK\AppData\Local\Temp\tmpdi5qu2w_\triton_.cp310-win_amd64.exp 개체를 생성하고 있습니다.
All required checks passed

@Ken1256
Copy link

Ken1256 commented Jan 12, 2024

pytorch also need to be fixed to work under windows+triton (this is WIP. not tested much) wkpark#1 See also openai/triton#2738

Can it work with torch.compile?

wkpark#1 has been updated to work with the latest triton(+with win32 fix) (recent triton update breaks totch.compile() compatibility: Please see openai/triton#2701 openai/triton@72c9833 )

F:\src\pytorch\tools\dynamo>python verify_dynamo.py
Python version: 3.10.11
`torch` version: 2.3.0.dev20240109+cu118
CUDA version: 11.8
ROCM version: None

Microsoft (R) C/C++ 최적화 컴파일러 버전 19.34.31937(x64)
Copyright (c) Microsoft Corporation. All rights reserved.


...
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'triton__0d1d2' for 'sm_89'
ptxas info    : Function properties for triton__0d1d2
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 14 registers, 372 bytes cmem[0]
In file included from C:\Users\WK\AppData\Local\Temp\tmpdi5qu2w_\main.c:4:
In file included from C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\Python.h:118:
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\cpython/pytime.h:120:59: warning: declaration of 'struct timeval' will not be visible outside of this function [-Wvisibility]
  120 | PyAPI_FUNC(int) _PyTime_FromTimeval(_PyTime_t *tp, struct timeval *tv);
      |                                                           ^
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\cpython/pytime.h:127:12: warning: declaration of 'struct timeval' will not be visible outside of this function [-Wvisibility]
  127 |     struct timeval *tv,
      |            ^
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\cpython/pytime.h:132:12: warning: declaration of 'struct timeval' will not be visible outside of this function [-Wvisibility]
  132 |     struct timeval *tv,
      |            ^
C:\Users\WK\AppData\Local\Temp\tmpdi5qu2w_\main.c:20:7: warning: 'strcat' is deprecated: This function or variable may be unsafe. Consider using strcat_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.
      [-Wdeprecated-declarations]
   20 |       strcat(err, prefix);
      |       ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\string.h:91:5: note: 'strcat' has been explicitly marked deprecated here
   91 |     __DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1(
      |     ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\corecrt.h:835:5: note: expanded from macro '__DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1'
  835 |     __DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1_EX(_ReturnType, _ReturnPolicy, _DeclSpec, _FuncName, _FuncName##_s, _DstType, _SalAttributeDst, _DstType, _Dst, _TType1, _TArg1)
      |     ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\corecrt.h:1894:17: note: expanded from macro '__DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1_EX'
 1894 |                 _CRT_INSECURE_DEPRECATE(_SecureFuncName) _DeclSpec _ReturnType __cdecl _FuncName(_SalAttributeDst _DstType *_Dst, _TType1 _TArg1);
      |                 ^
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\vcruntime.h:355:55: note: expanded from macro '_CRT_INSECURE_DEPRECATE'
  355 |         #define _CRT_INSECURE_DEPRECATE(_Replacement) _CRT_DEPRECATE_TEXT(    \
      |                                                       ^
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\vcruntime.h:345:47: note: expanded from macro '_CRT_DEPRECATE_TEXT'
  345 | #define _CRT_DEPRECATE_TEXT(_Text) __declspec(deprecated(_Text))
      |                                               ^
C:\Users\WK\AppData\Local\Temp\tmpdi5qu2w_\main.c:21:7: warning: 'strcat' is deprecated: This function or variable may be unsafe. Consider using strcat_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.
      [-Wdeprecated-declarations]
   21 |       strcat(err, str);
      |       ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\string.h:91:5: note: 'strcat' has been explicitly marked deprecated here
   91 |     __DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1(
      |     ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\corecrt.h:835:5: note: expanded from macro '__DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1'
  835 |     __DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1_EX(_ReturnType, _ReturnPolicy, _DeclSpec, _FuncName, _FuncName##_s, _DstType, _SalAttributeDst, _DstType, _Dst, _TType1, _TArg1)
      |     ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\corecrt.h:1894:17: note: expanded from macro '__DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1_EX'
 1894 |                 _CRT_INSECURE_DEPRECATE(_SecureFuncName) _DeclSpec _ReturnType __cdecl _FuncName(_SalAttributeDst _DstType *_Dst, _TType1 _TArg1);
      |                 ^
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\vcruntime.h:355:55: note: expanded from macro '_CRT_INSECURE_DEPRECATE'
  355 |         #define _CRT_INSECURE_DEPRECATE(_Replacement) _CRT_DEPRECATE_TEXT(    \
      |                                                       ^
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\vcruntime.h:345:47: note: expanded from macro '_CRT_DEPRECATE_TEXT'
  345 | #define _CRT_DEPRECATE_TEXT(_Text) __declspec(deprecated(_Text))
      |                                               ^
5 warnings generated.
   C:\Users\WK\AppData\Local\Temp\tmpdi5qu2w_\triton_.cp310-win_amd64.lib 라이브러리 및 C:\Users\WK\AppData\Local\Temp\tmpdi5qu2w_\triton_.cp310-win_amd64.exp 개체를 생성하고 있습니다.
All required checks passed

Can you upload wheels for python 3.11 + cuda121?

@wkpark
Copy link

wkpark commented Jan 12, 2024

Can you upload wheels for python 3.11 + cuda121?

If you have some time you can test the latest triton (python 3.11+cuda121) + pytorch

there are no direct wheel links but you can download patched triton wheels at https://github.com/wkpark/triton/actions/runs/7246431088 (also found at triton-lang/triton#2738 )

  • download triton build and install it using pip.
  • install the latest nightly build pytorch (please see https://pytorch.org/get-started/locally/ and select Nightly build)
    • for example you can install torch-2.3.0.dev by pip3 install --pre torch==2.3.0.dev20240110+cu118 --index-url https://download.pytorch.org/whl/nightly/cu118
  • apply pytorch fix from MSVC fixes wkpark/pytorch#1 (https://github.com/wkpark/pytorch/pull/1/files )
    • only 2 3 files need to be fixed. (simple copy will work)
  • fix your_python_site-packages/torch/_dynamo/eval_frame.py # I just fixed it manually to on/off torch.compile()
  • before testing pytorch + triton, your cmd console environment has to be properly setup to work with MSVC. (e.g.) Visual studio 2022 command console for x64 native + nvcc for cuda)
diff --git a/torch/_dynamo/eval_frame.py b/torch/_dynamo/eval_frame.py
index 251dd6d1c32..9349fdc62a5 100644
--- a/torch/_dynamo/eval_frame.py
+++ b/torch/_dynamo/eval_frame.py
@@ -531,8 +531,8 @@ class _NullDecorator(contextlib.nullcontext):  # type: ignore[type-arg]


 def check_if_dynamo_supported():
-    if sys.platform == "win32":
-        raise RuntimeError("Windows not yet supported for torch.compile")
+    #if sys.platform == "win32":
+    #    raise RuntimeError("Windows not yet supported for torch.compile")
     if sys.version_info >= (3, 12):
         raise RuntimeError("Python 3.12+ not yet supported for torch.compile")

@Ken1256
Copy link

Ken1256 commented Jan 13, 2024

Can you upload wheels for python 3.11 + cuda121?

If you have some time you can test the latest triton (python 3.11+cuda121) + pytorch

there are no direct wheel links but you can download patched triton wheels at https://github.com/wkpark/triton/actions/runs/7246431088 (also found at openai/triton#2738 )

  • download triton build and install it using pip.

  • install the latest nightly build pytorch (please see https://pytorch.org/get-started/locally/ and select Nightly build)

  • apply pytorch fix from MSVC fixes wkpark/pytorch#1 (https://github.com/wkpark/pytorch/pull/1/files )

    • only 2 files need to be fixed. (simple copy will work)
  • fix your_python_site-packages/torch/_dynamo/eval_frame.py # I just fixed it manually to on/off torch.compile()

  • before testing pytorch + triton, your cmd console environment has to be properly setup to work with MSVC. (e.g.) Visual studio 2022 command console for x64 native + nvcc for cuda)

diff --git a/torch/_dynamo/eval_frame.py b/torch/_dynamo/eval_frame.py
index 251dd6d1c32..9349fdc62a5 100644
--- a/torch/_dynamo/eval_frame.py
+++ b/torch/_dynamo/eval_frame.py
@@ -531,8 +531,8 @@ class _NullDecorator(contextlib.nullcontext):  # type: ignore[type-arg]


 def check_if_dynamo_supported():
-    if sys.platform == "win32":
-        raise RuntimeError("Windows not yet supported for torch.compile")
+    #if sys.platform == "win32":
+    #    raise RuntimeError("Windows not yet supported for torch.compile")
     if sys.version_info >= (3, 12):
         raise RuntimeError("Python 3.12+ not yet supported for torch.compile")

with out x64 Native Tools Command Prompt for VS 2022
RuntimeError: Failed to find C compiler. Please specify via CC environment variable.

with x64 Native Tools Command Prompt for VS 2022

File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codegen\cpp.py", line 3316, in lines
    elif not self.reduction_var_map and codecache.is_gcc():
                                        ^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 962, in is_gcc
    return bool(re.search(r"(gcc|g\+\+)", cpp_compiler()))
                                          ^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 902, in cpp_compiler
    return cpp_compiler_search(search)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 932, in cpp_compiler_search
    raise exc.InvalidCxxCompiler()
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
InvalidCxxCompiler: No working C++ compiler found in torch._inductor.config.cpp.cxx: (None, 'g++')

@wkpark
Copy link

wkpark commented Jan 13, 2024

Can you upload wheels for python 3.11 + cuda121?

If you have some time you can test the latest triton (python 3.11+cuda121) + pytorch
there are no direct wheel links but you can download patched triton wheels at https://github.com/wkpark/triton/actions/runs/7246431088 (also found at openai/triton#2738 )

  • download triton build and install it using pip.

  • install the latest nightly build pytorch (please see https://pytorch.org/get-started/locally/ and select Nightly build)

  • apply pytorch fix from MSVC fixes wkpark/pytorch#1 (https://github.com/wkpark/pytorch/pull/1/files )

    • only 2 files need to be fixed. (simple copy will work)
  • fix your_python_site-packages/torch/_dynamo/eval_frame.py # I just fixed it manually to on/off torch.compile()

  • before testing pytorch + triton, your cmd console environment has to be properly setup to work with MSVC. (e.g.) Visual studio 2022 command console for x64 native + nvcc for cuda)

diff --git a/torch/_dynamo/eval_frame.py b/torch/_dynamo/eval_frame.py
index 251dd6d1c32..9349fdc62a5 100644
--- a/torch/_dynamo/eval_frame.py
+++ b/torch/_dynamo/eval_frame.py
@@ -531,8 +531,8 @@ class _NullDecorator(contextlib.nullcontext):  # type: ignore[type-arg]


 def check_if_dynamo_supported():
-    if sys.platform == "win32":
-        raise RuntimeError("Windows not yet supported for torch.compile")
+    #if sys.platform == "win32":
+    #    raise RuntimeError("Windows not yet supported for torch.compile")
     if sys.version_info >= (3, 12):
         raise RuntimeError("Python 3.12+ not yet supported for torch.compile")

with out x64 Native Tools Command Prompt for VS 2022 RuntimeError: Failed to find C compiler. Please specify via CC environment variable.

with x64 Native Tools Command Prompt for VS 2022

File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codegen\cpp.py", line 3316, in lines
    elif not self.reduction_var_map and codecache.is_gcc():
                                        ^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 962, in is_gcc
    return bool(re.search(r"(gcc|g\+\+)", cpp_compiler()))
                                          ^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 902, in cpp_compiler
    return cpp_compiler_search(search)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 932, in cpp_compiler_search
    raise exc.InvalidCxxCompiler()
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
InvalidCxxCompiler: No working C++ compiler found in torch._inductor.config.cpp.cxx: (None, 'g++')

I guess, I've missed some fix.

Please set CXX=cl (you can set CXX env at cmd prompt like set CXX=cl)

or add wkpark@ee55add fix.

@Ken1256
Copy link

Ken1256 commented Jan 14, 2024

Can you upload wheels for python 3.11 + cuda121?

If you have some time you can test the latest triton (python 3.11+cuda121) + pytorch
there are no direct wheel links but you can download patched triton wheels at https://github.com/wkpark/triton/actions/runs/7246431088 (also found at openai/triton#2738 )

  • download triton build and install it using pip.

  • install the latest nightly build pytorch (please see https://pytorch.org/get-started/locally/ and select Nightly build)

  • apply pytorch fix from MSVC fixes wkpark/pytorch#1 (https://github.com/wkpark/pytorch/pull/1/files )

    • only 2 files need to be fixed. (simple copy will work)
  • fix your_python_site-packages/torch/_dynamo/eval_frame.py # I just fixed it manually to on/off torch.compile()

  • before testing pytorch + triton, your cmd console environment has to be properly setup to work with MSVC. (e.g.) Visual studio 2022 command console for x64 native + nvcc for cuda)

diff --git a/torch/_dynamo/eval_frame.py b/torch/_dynamo/eval_frame.py
index 251dd6d1c32..9349fdc62a5 100644
--- a/torch/_dynamo/eval_frame.py
+++ b/torch/_dynamo/eval_frame.py
@@ -531,8 +531,8 @@ class _NullDecorator(contextlib.nullcontext):  # type: ignore[type-arg]


 def check_if_dynamo_supported():
-    if sys.platform == "win32":
-        raise RuntimeError("Windows not yet supported for torch.compile")
+    #if sys.platform == "win32":
+    #    raise RuntimeError("Windows not yet supported for torch.compile")
     if sys.version_info >= (3, 12):
         raise RuntimeError("Python 3.12+ not yet supported for torch.compile")

with out x64 Native Tools Command Prompt for VS 2022 RuntimeError: Failed to find C compiler. Please specify via CC environment variable.
with x64 Native Tools Command Prompt for VS 2022

File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codegen\cpp.py", line 3316, in lines
    elif not self.reduction_var_map and codecache.is_gcc():
                                        ^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 962, in is_gcc
    return bool(re.search(r"(gcc|g\+\+)", cpp_compiler()))
                                          ^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 902, in cpp_compiler
    return cpp_compiler_search(search)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 932, in cpp_compiler_search
    raise exc.InvalidCxxCompiler()
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
InvalidCxxCompiler: No working C++ compiler found in torch._inductor.config.cpp.cxx: (None, 'g++')

I guess, I've missed some fix.

Please set CXX=cl (you can set CXX env at cmd prompt like set CXX=cl)

or add wkpark@ee55add fix.

still RuntimeError: Failed to find C compiler. Please specify via CC environment variable.

import sys
import os

import torch
import torch._dynamo
# torch._dynamo.config.suppress_errors = True

class MyModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.lin = torch.nn.Linear(100, 10)

    def forward(self, x):
        return torch.nn.functional.relu(self.lin(x))

if __name__ == '__main__':
    os.environ["TORCH_CUDNN_V8_API_ENABLED"] = "1"
    os.environ["CUDA_MODULE_LOADING"] = "LAZY"
    os.environ["CUDA_PATH"] = r"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1"
    os.environ["CXX"] = "cl"

    device = 'cpu'
    device = 'cuda'
    # device = 'cuda:1'
    dtype = torch.float32
    # dtype = torch.float16
    dtype = torch.bfloat16

    mod = MyModule().to(device).to(dtype)
    opt_mod = torch.compile(mod)
    # opt_mod = torch.compile(mod, mode="reduce-overhead")
    x = torch.randn(10, 100).to(device).to(dtype)
    print(opt_mod(x))
**********************************************************************
** Visual Studio 2022 Developer Command Prompt v17.8.4
** Copyright (c) 2022 Microsoft Corporation
**********************************************************************
[vcvarsall.bat] Environment initialized for: 'x64'

C:\Program Files\Microsoft Visual Studio\2022\Community>python "C:\Temp\torch_compile_t01.py"
Traceback (most recent call last):
  File "C:\Temp\torch_compile_t01.py", line 33, in <module>
    print(opt_mod(x))
          ^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\eval_frame.py", line 417, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\eval_frame.py", line 580, in catch_errors
    return callback(frame, cache_entry, hooks, frame_state)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\convert_frame.py", line 741, in _convert_frame
    result = inner_convert(frame, cache_entry, hooks, frame_state)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\convert_frame.py", line 384, in _convert_frame_assert
    return _compile(
           ^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\convert_frame.py", line 643, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\utils.py", line 247, in time_wrapper
    r = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\convert_frame.py", line 524, in compile_inner
    out_code = transform_code_object(code, transform)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\bytecode_transformation.py", line 1033, in transform_code_object
    transformations(instructions, code_options)
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\convert_frame.py", line 151, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\convert_frame.py", line 489, in transform
    tracer.run()
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\symbolic_convert.py", line 2098, in run
    super().run()
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\symbolic_convert.py", line 780, in run
    and self.step()
        ^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\symbolic_convert.py", line 743, in step
    getattr(self, inst.opname)(inst)
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\symbolic_convert.py", line 2216, in RETURN_VALUE
    self.output.compile_subgraph(
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\output_graph.py", line 914, in compile_subgraph
    self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
  File "C:\Program Files\Python311\Lib\contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\output_graph.py", line 1085, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\utils.py", line 247, in time_wrapper
    r = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\output_graph.py", line 1157, in call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\output_graph.py", line 1138, in call_user_compiler
    compiled_fn = compiler_fn(gm, self.example_inputs())
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\repro\after_dynamo.py", line 117, in debug_wrapper
    compiled_gm = compiler_fn(gm, example_inputs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\__init__.py", line 1697, in __call__
    return compile_fx(model_, inputs_, config_patches=self.config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\compile_fx.py", line 1177, in compile_fx
    return aot_autograd(
           ^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\backends\common.py", line 55, in compiler_fn
    cg = aot_module_simplified(gm, example_inputs, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_functorch\aot_autograd.py", line 889, in aot_module_simplified
    compiled_fn = create_aot_dispatcher_function(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\utils.py", line 247, in time_wrapper
    r = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_functorch\aot_autograd.py", line 602, in create_aot_dispatcher_function
    compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config, fw_metadata=fw_metadata)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_functorch\_aot_autograd\runtime_wrappers.py", line 427, in aot_wrapper_dedupe
    return compiler_fn(flat_fn, leaf_flat_args, aot_config, fw_metadata=fw_metadata)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_functorch\_aot_autograd\runtime_wrappers.py", line 632, in aot_wrapper_synthetic_base
    return compiler_fn(flat_fn, flat_args, aot_config, fw_metadata=fw_metadata)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_functorch\_aot_autograd\jit_compile_runtime_wrappers.py", line 295, in aot_dispatch_autograd
    compiled_fw_func = aot_config.fw_compiler(fw_module, adjusted_flat_args)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\utils.py", line 247, in time_wrapper
    r = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\compile_fx.py", line 1105, in fw_compiler_base
    return inner_compile(
           ^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\repro\after_aot.py", line 83, in debug_wrapper
    inner_compiled_fn = compiler_fn(gm, example_inputs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\debug.py", line 304, in inner
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\compile_fx.py", line 317, in compile_fx_inner
    compiled_graph = fx_codegen_and_compile(
                     ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\compile_fx.py", line 551, in fx_codegen_and_compile
    compiled_fn = graph.compile_to_fn()
                  ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\graph.py", line 1157, in compile_to_fn
    return self.compile_to_module().call
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\utils.py", line 247, in time_wrapper
    r = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\graph.py", line 1109, in compile_to_module
    mod = PyCodeCache.load_by_key_path(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 1933, in load_by_key_path
    exec(code, mod.__dict__, mod.__dict__)
  File "C:/Users/X/AppData/Local/Temp/torchinductor_X/rb/crbmdpyo62vuap2itjdzvmmwn6vi6ttrq6ljeirwfz6xok6fqe57.py", line 28, in <module>
    triton_poi_fused_relu_threshold_backward_0 = async_compile.triton('triton_', '''
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 2480, in triton
    return _load_kernel(kernel_name, source_code)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 2331, in _load_kernel
    kernel.precompile()
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\triton_heuristics.py", line 195, in precompile
    compiled_binary, launcher = self._precompile_config(
                                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\triton_heuristics.py", line 349, in _precompile_config
    binary = triton.compile(*compile_args, **compile_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\triton\compiler\compiler.py", line 178, in compile
    so_path = backend.make_launcher_stub(src, metadata)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\triton\compiler\backends\cuda.py", line 250, in make_launcher_stub
    return make_stub(src.name, src.signature, constants, ids, enable_warp_specialization=enable_warp_specialization)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\triton\compiler\make_launcher.py", line 37, in make_stub
    so = _build(name, src_path, tmpdir)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\triton\common\build.py", line 101, in _build
    raise RuntimeError("Failed to find C compiler. Please specify via CC environment variable.")
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
RuntimeError: Failed to find C compiler. Please specify via CC environment variable.

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True

@wkpark
Copy link

wkpark commented Jan 14, 2024

Can you upload wheels for python 3.11 + cuda121?

If you have some time you can test the latest triton (python 3.11+cuda121) + pytorch
there are no direct wheel links but you can download patched triton wheels at https://github.com/wkpark/triton/actions/runs/7246431088 (also found at openai/triton#2738 )

  • download triton build and install it using pip.

  • install the latest nightly build pytorch (please see https://pytorch.org/get-started/locally/ and select Nightly build)

  • apply pytorch fix from MSVC fixes wkpark/pytorch#1 (https://github.com/wkpark/pytorch/pull/1/files )

    • only 2 files need to be fixed. (simple copy will work)
  • fix your_python_site-packages/torch/_dynamo/eval_frame.py # I just fixed it manually to on/off torch.compile()

  • before testing pytorch + triton, your cmd console environment has to be properly setup to work with MSVC. (e.g.) Visual studio 2022 command console for x64 native + nvcc for cuda)

diff --git a/torch/_dynamo/eval_frame.py b/torch/_dynamo/eval_frame.py
index 251dd6d1c32..9349fdc62a5 100644
--- a/torch/_dynamo/eval_frame.py
+++ b/torch/_dynamo/eval_frame.py
@@ -531,8 +531,8 @@ class _NullDecorator(contextlib.nullcontext):  # type: ignore[type-arg]


 def check_if_dynamo_supported():
-    if sys.platform == "win32":
-        raise RuntimeError("Windows not yet supported for torch.compile")
+    #if sys.platform == "win32":
+    #    raise RuntimeError("Windows not yet supported for torch.compile")
     if sys.version_info >= (3, 12):
         raise RuntimeError("Python 3.12+ not yet supported for torch.compile")

with out x64 Native Tools Command Prompt for VS 2022 RuntimeError: Failed to find C compiler. Please specify via CC environment variable.
with x64 Native Tools Command Prompt for VS 2022

File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codegen\cpp.py", line 3316, in lines
    elif not self.reduction_var_map and codecache.is_gcc():
                                        ^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 962, in is_gcc
    return bool(re.search(r"(gcc|g\+\+)", cpp_compiler()))
                                          ^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 902, in cpp_compiler
    return cpp_compiler_search(search)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 932, in cpp_compiler_search
    raise exc.InvalidCxxCompiler()
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
InvalidCxxCompiler: No working C++ compiler found in torch._inductor.config.cpp.cxx: (None, 'g++')

I guess, I've missed some fix.
Please set CXX=cl (you can set CXX env at cmd prompt like set CXX=cl)
or add wkpark@ee55add fix.

still RuntimeError: Failed to find C compiler. Please specify via CC environment variable.

import sys
import os

import torch
import torch._dynamo
# torch._dynamo.config.suppress_errors = True

class MyModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.lin = torch.nn.Linear(100, 10)

    def forward(self, x):
        return torch.nn.functional.relu(self.lin(x))

if __name__ == '__main__':
    os.environ["TORCH_CUDNN_V8_API_ENABLED"] = "1"
    os.environ["CUDA_MODULE_LOADING"] = "LAZY"
    os.environ["CUDA_PATH"] = r"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1"
    os.environ["CXX"] = "cl"

    device = 'cpu'
    device = 'cuda'
    # device = 'cuda:1'
    dtype = torch.float32
    # dtype = torch.float16
    dtype = torch.bfloat16

    mod = MyModule().to(device).to(dtype)
    opt_mod = torch.compile(mod)
    # opt_mod = torch.compile(mod, mode="reduce-overhead")
    x = torch.randn(10, 100).to(device).to(dtype)
    print(opt_mod(x))

did you succeed to run tools/dynamo/verify_dynamo.py before testing?
(you can check your compiler by cl command on your cmd prompt, and/or you can override the cc command by set CC=cl)

@wkpark
Copy link

wkpark commented Jan 14, 2024

some tests under pytorch/test

>cl
Microsoft (R) C/C++ 최적화 컴파일러 버전 19.34.31937(x64)
Copyright (c) Microsoft Corporation. All rights reserved.

사용법: cl [ option... ] filename... [ /link linkoption... ]
>clang --version
clang version 18.0.0 (https://github.com/llvm/llvm-project 5e5a22caf88ac1ccfa8dc5720295fdeba0ad9372)
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\Program Files\LLVM\bin
>clang-cl --version
clang version 18.0.0 (https://github.com/llvm/llvm-project 5e5a22caf88ac1ccfa8dc5720295fdeba0ad9372)
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\Program Files\LLVM\bin
>pip install hypothesis expecttest pytest # install some modules
>cd pytorch\test
>python -m pytest distributed\_tensor\test_dtensor_compile.py
================================================================================== test session starts ===================================================================================
platform win32 -- Python 3.10.11, pytest-7.4.2, pluggy-1.3.0
rootdir: D:\src\pytorch
configfile: pytest.ini
plugins: anyio-3.7.1, hydra-core-1.3.2, hypothesis-6.93.0, xdist-3.5.0
collected 10 items

distributed\_tensor\test_dtensor_compile.py ....In file included from C:\Users\WK\AppData\Local\Temp\tmp7uz995_y\main.c:4:
In file included from C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\Python.h:118:
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\cpython/pytime.h:120:59: warning: declaration of 'struct timeval' will not be
      visible outside of this function [-Wvisibility]
  120 | PyAPI_FUNC(int) _PyTime_FromTimeval(_PyTime_t *tp, struct timeval *tv);
      |                                                           ^
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\cpython/pytime.h:127:12: warning: declaration of 'struct timeval' will not be
      visible outside of this function [-Wvisibility]
  127 |     struct timeval *tv,
      |            ^
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\cpython/pytime.h:132:12: warning: declaration of 'struct timeval' will not be
      visible outside of this function [-Wvisibility]
  132 |     struct timeval *tv,
      |            ^
C:\Users\WK\AppData\Local\Temp\tmp7uz995_y\main.c:20:7: warning: 'strcat' is deprecated: This function or variable may be unsafe. Consider using strcat_s instead. To disable deprecation,
      use _CRT_SECURE_NO_WARNINGS. See online help for details. [-Wdeprecated-declarations]
....(snip)
F:\webui\webui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\_functional_collectives_impl.py:101: UserWarning: Trying to register finalizer to AsyncCollectiveTensor but the inner tensor is already gone
  warnings.warn(
F:\webui\webui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\_functional_collectives_impl.py:101: UserWarning: Trying to register finalizer to AsyncCollectiveTensor but the inner tensor is already gone
  warnings.warn(
[rank1]:[2024-01-14 15:09:25,284] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
[rank3]:[2024-01-14 15:09:25,284] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
[rank2]:[2024-01-14 15:09:25,285] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
[rank0]:[2024-01-14 15:09:25,286] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
.                                                                                                                              [100%]

======================================================================== 7 passed, 3 skipped in 77.75s (0:01:17) =========================================================================

CC, CXX with clang

>set CC=clang
>set CXX=clang++
>python -m pytest distributed\_tensor\test_dtensor_compile.py
================================================================================== test session starts ===================================================================================
platform win32 -- Python 3.10.11, pytest-7.4.2, pluggy-1.3.0
rootdir: D:\src\pytorch
configfile: pytest.ini
plugins: anyio-3.7.1, hydra-core-1.3.2, hypothesis-6.93.0, xdist-3.5.0
collected 10 items

distributed\_tensor\test_dtensor_compile.py .....sss
...
F:\webui\webui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\_functional_collectives_impl.py:101: UserWarning: Trying to register finalizer to AsyncCollectiveTensor but the inner tensor is already gone
  warnings.warn(
F:\webui\webui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\_functional_collectives_impl.py:101: UserWarning: Trying to register finalizer to AsyncCollectiveTensor but the inner tensor is already gone
  warnings.warn(
F:\webui\webui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\_functional_collectives_impl.py:101: UserWarning: Trying to register finalizer to AsyncCollectiveTensor but the inner tensor is already gone
  warnings.warn(
[rank1]:[2024-01-14 15:11:08,763] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
[rank0]:[2024-01-14 15:11:08,763] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
[rank3]:[2024-01-14 15:11:08,763] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
[rank2]:[2024-01-14 15:11:08,764] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
.                                                                                                                              [100%]

======================================================================== 7 passed, 3 skipped in 75.91s (0:01:15) =========================================================================

CC, CXX=clang-cl also works

>CC=clang-cl
>CXX=clang-cl
>python -m pytest distributed\_tensor\test_dtensor_compile.py
================================================================================== test session starts ===================================================================================
platform win32 -- Python 3.10.11, pytest-7.4.2, pluggy-1.3.0
rootdir: D:\src\pytorch
configfile: pytest.ini
plugins: anyio-3.7.1, hydra-core-1.3.2, hypothesis-6.93.0, xdist-3.5.0
collected 10 items
distributed\_tensor\test_dtensor_compile.py .....sss
...
[rank3]:[2024-01-14 15:14:50,463] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
[rank2]:[2024-01-14 15:14:50,465] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
[rank1]:[2024-01-14 15:14:50,467] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
[rank0]:[2024-01-14 15:14:50,467] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
.                                                                                                                              [100%]

======================================================================== 7 passed, 3 skipped in 104.95s (0:01:44) ========================================================================

@iperov
Copy link

iperov commented Feb 4, 2024

Pytorch is trying to reinvent the wheel that was originally invented in Tensorflow 1.x ?

@sako-ranj
Copy link

can someone explain to me i'm a bit new to this stuff. windows doesn't support torch.complie yet?

@alinpahontu2912
Copy link
Collaborator

Hello @Jerry-Master and everyone interested in torch.compile. I believe we can close this issue and follow this thread instead: #122094. I'd like you to know that in the latest nightly there is cpu-only torch.compile support

@xuhancn
Copy link
Collaborator

xuhancn commented Aug 16, 2024

Currently, pytorch Windows is begin to support torch.compile in nightly build. But it is only support CPU device.
Please check: #124245
You can install nightly build:

pip install torch  --index-url https://download.pytorch.org/whl/nightly/cu121 --force-reinstall

@d-kleine
Copy link
Contributor

I believe we can close this issue and follow this thread instead: #122094. I'd like you to know that in the latest nightly there is cpu-only torch.compile support

This issue here has been posted months before than the one you have referenced. Actually #124245 should be closed as that is a duplicate of this original issue reported here.

@d-kleine
Copy link
Contributor

Currently, pytorch Windows is begin to support torch.compile in nightly build. But it is only support CPU device. Please check: #124245 You can install nightly build:

pip install torch  --index-url https://download.pytorch.org/whl/nightly/cu121 --force-reinstall

As long this does not work for CUDA, it's useless as your cannot use the advantages of torch.compile to speed up training. triton is still not supported on Windows.

W0819 15:18:58.469000 36800 torch\_dynamo\convert_frame.py:1100] torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
W0819 15:18:58.469000 36800 torch\_dynamo\convert_frame.py:1100] RuntimeError: Cannot find a working triton installation. Either the package is not installed or it is too old. More information on installing Triton can be found at https://github.com/openai/triton
W0819 15:18:58.469000 36800 torch\_dynamo\convert_frame.py:1100] 
W0819 15:18:58.469000 36800 torch\_dynamo\convert_frame.py:1100] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
W0819 15:18:58.469000 36800 torch\_dynamo\convert_frame.py:1100] 

@xuhancn
Copy link
Collaborator

xuhancn commented Aug 19, 2024

I believe we can close this issue and follow this thread instead: #122094. I'd like you to know that in the latest nightly there is cpu-only torch.compile support

This issue here has been posted months before than the one you have referenced. Actually #124245 should be closed as that is a duplicate of this original issue reported here.

No, #124245 is using to track my progress.

@xuhancn
Copy link
Collaborator

xuhancn commented Aug 19, 2024

Currently, pytorch Windows is begin to support torch.compile in nightly build. But it is only support CPU device. Please check: #124245 You can install nightly build:

pip install torch  --index-url https://download.pytorch.org/whl/nightly/cu121 --force-reinstall

As long this does not work for CUDA, it's useless as your cannot use the advantages of torch.compile to speed up training. triton is still not supported on Windows.

W0819 15:18:58.469000 36800 torch\_dynamo\convert_frame.py:1100] torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
W0819 15:18:58.469000 36800 torch\_dynamo\convert_frame.py:1100] RuntimeError: Cannot find a working triton installation. Either the package is not installed or it is too old. More information on installing Triton can be found at https://github.com/openai/triton
W0819 15:18:58.469000 36800 torch\_dynamo\convert_frame.py:1100] 
W0819 15:18:58.469000 36800 torch\_dynamo\convert_frame.py:1100] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
W0819 15:18:58.469000 36800 torch\_dynamo\convert_frame.py:1100] 

For triton status: triton-lang/triton#4045 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: windows Windows support for PyTorch oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Status: No status
Development

No branches or pull requests