PyTorch 2.0 not working on Windows #90768

Jerry-Master · 2022-12-13T15:00:47Z

🐛 Describe the bug

When I try installing the nightly build with the following command:

pip3 install numpy --pre torch[dynamo] --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu117

It gives the following warning

WARNING: torch 2.0.0.dev20221213+cu117 does not provide the extra 'dynamo'

And when I try to check the installation with

git clone https://github.com/pytorch/pytorch
cd tools/dynamo
python verify_dynamo.py

This results in the following error

Traceback (most recent call last):
  File ".\verify_dynamo.py", line 167, in <module>
    main()
  File ".\verify_dynamo.py", line 155, in main
    cuda_ver = check_cuda()
  File ".\verify_dynamo.py", line 64, in check_cuda
    cuda_ver = get_cuda_version()
  File ".\verify_dynamo.py", line 39, in get_cuda_version
    raise VerifyDynamoError(cpp_extension.CUDA_NOT_FOUND_MESSAGE)
__main__.VerifyDynamoError:
CUDA was not found on the system, please set the CUDA_HOME or the CUDA_PATH
environment variable or add NVCC to your system PATH. The extension compilation will fail.

The torch installation is working but dynamo seems not to work. When I run the benchmarks from https://gist.github.com/Chillee/f86675147366a7a0c6e244eaa78660f7 I get the following

C:\Users\user\anaconda3\envs\auxiliar\lib\site-packages\torch\_dynamo\eval_frame.py:428: UserWarning: Windows is not currently supported, torch._dynamo.optimize() will do nothing
  warnings.warn(
eager: 1386.2895965576172us
PT 2.0: 1406.266689300537us

Which again shows that dynamo is not working. If I try installing torchdynamo apart, it gives this error

Installing collected packages: torchdynamo
  Running setup.py install for torchdynamo ... error
  error: subprocess-exited-with-error

  × Running setup.py install for torchdynamo did not run successfully.
  │ exit code: 1
  ╰─> [108 lines of output]
      running install
      C:\Users\user\anaconda3\envs\auxiliar\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
        warnings.warn(
      running build
      running build_py
      creating build
      creating build\lib.win-amd64-cpython-38
      creating build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\allowed_functions.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\bytecode_analysis.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\bytecode_transformation.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\codegen.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\config.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\convert_frame.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\debug_utils.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\eval_frame.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\exc.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\guards.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\logging.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\mutation_guard.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\output_graph.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\profiler.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\replay_record.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\resume_execution.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\side_effects.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\skipfiles.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\source.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\symbolic_convert.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\testing.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\utils.py -> build\lib.win-amd64-cpython-38\torchdynamo
      copying torchdynamo\__init__.py -> build\lib.win-amd64-cpython-38\torchdynamo
      creating build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\codecache.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\compile_fx.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\config.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\debug.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\decomposition.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\dependencies.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\exc.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\graph.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\ir.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\lowering.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\metrics.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\overrides.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\scheduler.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\sizevars.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\utils.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\virtualized.py -> build\lib.win-amd64-cpython-38\torchinductor
      copying torchinductor\__init__.py -> build\lib.win-amd64-cpython-38\torchinductor
      creating build\lib.win-amd64-cpython-38\torchdynamo\optimizations
      copying torchdynamo\optimizations\analysis.py -> build\lib.win-amd64-cpython-38\torchdynamo\optimizations
      copying torchdynamo\optimizations\backends.py -> build\lib.win-amd64-cpython-38\torchdynamo\optimizations
      copying torchdynamo\optimizations\distributed.py -> build\lib.win-amd64-cpython-38\torchdynamo\optimizations
      copying torchdynamo\optimizations\inference.py -> build\lib.win-amd64-cpython-38\torchdynamo\optimizations
      copying torchdynamo\optimizations\log_args.py -> build\lib.win-amd64-cpython-38\torchdynamo\optimizations
      copying torchdynamo\optimizations\normalize.py -> build\lib.win-amd64-cpython-38\torchdynamo\optimizations
      copying torchdynamo\optimizations\subgraph.py -> build\lib.win-amd64-cpython-38\torchdynamo\optimizations
      copying torchdynamo\optimizations\training.py -> build\lib.win-amd64-cpython-38\torchdynamo\optimizations
      copying torchdynamo\optimizations\__init__.py -> build\lib.win-amd64-cpython-38\torchdynamo\optimizations
      creating build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\base.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\builder.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\builtin.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\constant.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\dicts.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\functions.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\lists.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\misc.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\nn_module.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\tensor.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\torch.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\user_defined.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      copying torchdynamo\variables\__init__.py -> build\lib.win-amd64-cpython-38\torchdynamo\variables
      creating build\lib.win-amd64-cpython-38\torchinductor\codegen
      copying torchinductor\codegen\autotuner.py -> build\lib.win-amd64-cpython-38\torchinductor\codegen
      copying torchinductor\codegen\common.py -> build\lib.win-amd64-cpython-38\torchinductor\codegen
      copying torchinductor\codegen\cpp.py -> build\lib.win-amd64-cpython-38\torchinductor\codegen
      copying torchinductor\codegen\triton.py -> build\lib.win-amd64-cpython-38\torchinductor\codegen
      copying torchinductor\codegen\triton_template.py -> build\lib.win-amd64-cpython-38\torchinductor\codegen
      copying torchinductor\codegen\wrapper.py -> build\lib.win-amd64-cpython-38\torchinductor\codegen
      copying torchinductor\codegen\__init__.py -> build\lib.win-amd64-cpython-38\torchinductor\codegen
      creating build\lib.win-amd64-cpython-38\torchinductor\triton_ops
      copying torchinductor\triton_ops\autotune.py -> build\lib.win-amd64-cpython-38\torchinductor\triton_ops
      copying torchinductor\triton_ops\batched_matmul.py -> build\lib.win-amd64-cpython-38\torchinductor\triton_ops
      copying torchinductor\triton_ops\conv.py -> build\lib.win-amd64-cpython-38\torchinductor\triton_ops
      copying torchinductor\triton_ops\conv1x1.py -> build\lib.win-amd64-cpython-38\torchinductor\triton_ops
      copying torchinductor\triton_ops\conv_perf_model.py -> build\lib.win-amd64-cpython-38\torchinductor\triton_ops
      copying torchinductor\triton_ops\matmul.py -> build\lib.win-amd64-cpython-38\torchinductor\triton_ops
      copying torchinductor\triton_ops\mm_perf_model.py -> build\lib.win-amd64-cpython-38\torchinductor\triton_ops
      copying torchinductor\triton_ops\utils.py -> build\lib.win-amd64-cpython-38\torchinductor\triton_ops
      copying torchinductor\triton_ops\__init__.py -> build\lib.win-amd64-cpython-38\torchinductor\triton_ops
      copying torchinductor\codegen\cpp_prefix.h -> build\lib.win-amd64-cpython-38\torchinductor\codegen
      copying torchinductor\codegen\triton_conv_delta_x.j2 -> build\lib.win-amd64-cpython-38\torchinductor\codegen
      copying torchinductor\codegen\triton_conv_delta_x_hwc.j2 -> build\lib.win-amd64-cpython-38\torchinductor\codegen
      copying torchinductor\codegen\triton_mm.j2 -> build\lib.win-amd64-cpython-38\torchinductor\codegen
      running build_ext
      building 'torchdynamo._eval_frame' extension
      creating build\temp.win-amd64-cpython-38
      creating build\temp.win-amd64-cpython-38\Release
      creating build\temp.win-amd64-cpython-38\Release\torchdynamo
      "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\user\anaconda3\envs\auxiliar\include -IC:\Users\user\anaconda3\envs\auxiliar\Include "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\ATLMFC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" /Tctorchdynamo/_eval_frame.c /Fobuild\temp.win-amd64-cpython-38\Release\torchdynamo/_eval_frame.obj -Wall
      _eval_frame.c
      C:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt\corecrt_io.h(49): warning C4820: '_finddata32i64_t': '4' bytes de relleno agregados despu‚s de miembro de datos 'name'
      C:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt\corecrt_io.h(54): warning C4820: '_finddata64i32_t': '4' bytes de relleno agregados despu‚s de miembro de datos 'attrib'
      C:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt\corecrt_io.h(64): warning C4820: '__finddata64_t': '4' bytes de relleno agregados despu‚s de miembro de datos 'attrib'
      C:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt\corecrt_io.h(69): warning C4820: '__finddata64_t': '4' bytes de relleno agregados despu‚s de miembro de datos 'name'
      c:\users\user\anaconda3\envs\auxiliar\include\pyconfig.h(205): fatal error C1083: No se puede abrir el archivo incluir: 'basetsd.h': No such file or directory
      error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\x86_amd64\\cl.exe' failed with exit code 2
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> torchdynamo

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

I am using Windows 10, with AMD64 processors (the wheels for python show win_amd64 at the end). I also use 2 NVIDIA RTX 3090 with CUDA 12.0.

Versions

Collecting environment information...
PyTorch version: 2.0.0.dev20221213+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.8.15 (default, Nov 24 2022, 14:38:14) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090
Nvidia driver version: 526.86
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.24.0rc2
[pip3] torch==2.0.0.dev20221213+cu117
[conda] numpy 1.24.0rc2 pypi_0 pypi
[conda] torch 2.0.0.dev20221213+cu117 pypi_0 pypi

cc @peterjc123 @mszhanyi @skyline75489 @nbcsm @ezyang @soumith @msaroufim @wconstab @ngimel @bdhirsh

The text was updated successfully, but these errors were encountered:

malfet · 2022-12-13T17:06:29Z

[dynamo] extra requirements are redundant at the moment, but on the other hand I think neither CPU nor triton backends are available for Windows at the moment

msaroufim · 2022-12-13T17:34:46Z

@malfet speaking of the redundant dynamo I removed it from our main getting started blogs here pytorch/pytorch.github.io#1241 - for whatever reason though the doc build did not update @svekars

T-Atlas · 2023-01-05T12:37:11Z

PyTorch version: 2.0.0.dev20230104+cu117
Repeat the same problem in Windows 11

PleezDeez · 2023-01-15T04:38:33Z

py_cpuinfo works with 11 and gets rid of the lscpu flag when trying to install deepspeed in windows via pip. There's also a hacked together version of triton 2.0.0 some russian made I was able to use to make a working whl from

g-i-o-r-g-i-o · 2023-01-20T17:38:21Z

Can't install it on windows

PS D:> pip3 install numpy --pre torch --force-reinstall --index-url https://download.pytorch.org/whl/nightly/cu117
Looking in indexes: https://download.pytorch.org/whl/nightly/cu117
ERROR: Could not find a version that satisfies the requirement numpy (from versions: none)
ERROR: No matching distribution found for numpy

python version 3.9 (can't upgrade it, since it is the maximum version working with the last pytorch)

inboxedshoe · 2023-03-01T09:37:04Z

I think for this you can install numpy separately and then repeat the command without numpy in it. If you look in the url under numpy there's no windows binaries provided.

Jerry-Master · 2023-03-01T09:42:40Z

After researching a bit, the real problem is that Triton, which is the main compiler in torch 2.0, has no windows support. It is under development, in previous versions it had support, now it hasn't, I'll just wait.

SlowFeather · 2023-03-28T06:54:08Z

When I configured everything in Windows, I received this error:

raise RuntimeError("Windows not yet supported for torch.compile")
RuntimeError: Windows not yet supported for torch.compile

I want to know if it can run on Windows 11 + troch2.0 + cuda118 🤯

Eduard6421 · 2023-06-10T17:54:26Z

Greetings!
Any updates on this issue?

hajlyx · 2023-06-20T06:09:55Z

Any plans to make torch.compile available on Windows?

iremyux · 2023-06-30T09:07:17Z

Hi @Jerry-Master, I attempted to reproduce this issue but was unable to encounter it. It appears that the problem may have been resolved. If you are still encountering the same problem, please provide me with more details or steps to reproduce.

d-kleine · 2023-06-30T16:22:56Z

I have the same issue using this code:

import torch
import warnings

gpu_ok = False
if torch.cuda.is_available():
    device_cap = torch.cuda.get_device_capability()
    if device_cap in ((7, 0), (8, 0), (9, 0)):
        gpu_ok = True

if not gpu_ok:
    warnings.warn(
        "GPU is not NVIDIA V100, A100, or H100. Speedup numbers may be lower "
        "than expected."
    )

def foo(x, y):
    a = torch.sin(x)
    b = torch.cos(y)
    return a + b
opt_foo1 = torch.compile(foo)
print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10)))

Cell In[2], line 5
3 b = torch.cos(y)
4 return a + b
----> 5 opt_foo1 = torch.compile(foo)
6 print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10)))

File ...\anaconda3\envs\myenv\Lib\site-packages\torch_init_.py:1441, in compile(model, fullgraph, dynamic, backend, mode, options, disable)
1439 if backend == "inductor":
1440 backend = _TorchCompileInductorWrapper(mode, options, dynamic)
-> 1441 return torch._dynamo.optimize(backend=backend, nopython=fullgraph, dynamic=dynamic, disable=disable)(model)

File ...\anaconda3\envs\myenv\Lib\site-packages\torch_dynamo\eval_frame.py:413, in optimize(backend, nopython, guard_export_fn, guard_fail_fn, disable, dynamic)
380 def optimize(
381 backend="inductor",
382 *,
(...)
387 dynamic=False,
388 ):
389 """
390 The main entrypoint of TorchDynamo. Do graph capture and call
391 backend() to optimize extracted graphs.
(...)
411 ...
...
--> 375 raise RuntimeError("Windows not yet supported for torch.compile")
376 if sys.version_info >= (3, 11):
377 raise RuntimeError("Python 3.11+ not yet supported for torch.compile")

RuntimeError: Windows not yet supported for torch.compile

Torch version:

torch.__version__

'2.0.1+cpu'

Python version:

from platform import python_version
print(python_version())

3.11.3

OS version + machine:

import platform
print(platform.platform())
print(platform.machine())

Windows-10-10.0.19045-SP0
AMD64

I have already tried different Python versions (3.10.9, 3.11.4) and also pytorch with CUDA 11.8 (both tried with pip and conda installation): same issue persists

Also linking this issue: #86566

Artyom17 · 2023-07-03T23:37:00Z

Yep, torch.compile still doesn't seem to work on 2.0.1+cu118 / Windows. Any updates?

devangaggarwal · 2023-08-11T05:44:08Z

Are there any updates from PyTorch team on this? When can we expect Windows support?

d-kleine · 2023-09-16T21:56:51Z

Referring this discussion about the triton windows support issue as well: triton-lang/triton#1640

wkpark · 2024-01-01T09:02:56Z

pytorch also need to be fixed to work under windows+triton (this is WIP. not tested much) wkpark#1 See also openai/triton#2738

Can it work with torch.compile?

main branch has not been tested yet but, v210, v211 have been tested
this is v2.1.1 - wkpark@2716adf (you can cherry-pick and can be applied on top of v210)

Ken1256 · 2024-01-01T11:58:26Z

pytorch also need to be fixed to work under windows+triton (this is WIP. not tested much) wkpark#1 See also openai/triton#2738

Can it work with torch.compile?

main branch has not been tested yet but, v210, v211 have been tested this is v2.1.1 - wkpark@2716adf (you can cherry-pick and can be applied on top of v210)

pip install --upgrade torch torchvision --index-url https://download.pytorch.org/whl/cu121
After remove check_if_dynamo_supported, I got.
ImportError: cannot import name 'get_cuda_stream' from 'triton.runtime.jit' (C:\Program Files\Python311\Lib\site-packages\triton\runtime\jit.py)

jensdraht1999 · 2024-01-01T20:12:13Z

@Ken1256 This is just a draft, it has not been merged.

@wkpark Do you have any like step to step guide how to make this happen and how to use it via Torch.Compile. I mean just a simple decorator example for how to use it, so we might test it out. Or perhaps it will get merged in a few weeks perhaps?

BeyondYourself · 2024-01-10T06:33:21Z

The same problem on Windows 11 + troch2.1.1 + cuda118+py3.10:Windows not yet supported for torch.compile

wkpark · 2024-01-11T05:22:08Z

pytorch also need to be fixed to work under windows+triton (this is WIP. not tested much) wkpark#1 See also openai/triton#2738

Can it work with torch.compile?

wkpark#1 has been updated to work with the latest triton(+with win32 fix)
(recent triton update breaks totch.compile() compatibility: Please see triton-lang/triton#2701 triton-lang/triton@72c9833 )

F:\src\pytorch\tools\dynamo>python verify_dynamo.py
Python version: 3.10.11
`torch` version: 2.3.0.dev20240109+cu118
CUDA version: 11.8
ROCM version: None

Microsoft (R) C/C++ 최적화 컴파일러 버전 19.34.31937(x64)
Copyright (c) Microsoft Corporation. All rights reserved.


...
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'triton__0d1d2' for 'sm_89'
ptxas info    : Function properties for triton__0d1d2
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 14 registers, 372 bytes cmem[0]
In file included from C:\Users\WK\AppData\Local\Temp\tmpdi5qu2w_\main.c:4:
In file included from C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\Python.h:118:
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\cpython/pytime.h:120:59: warning: declaration of 'struct timeval' will not be visible outside of this function [-Wvisibility]
  120 | PyAPI_FUNC(int) _PyTime_FromTimeval(_PyTime_t *tp, struct timeval *tv);
      |                                                           ^
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\cpython/pytime.h:127:12: warning: declaration of 'struct timeval' will not be visible outside of this function [-Wvisibility]
  127 |     struct timeval *tv,
      |            ^
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\cpython/pytime.h:132:12: warning: declaration of 'struct timeval' will not be visible outside of this function [-Wvisibility]
  132 |     struct timeval *tv,
      |            ^
C:\Users\WK\AppData\Local\Temp\tmpdi5qu2w_\main.c:20:7: warning: 'strcat' is deprecated: This function or variable may be unsafe. Consider using strcat_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.
      [-Wdeprecated-declarations]
   20 |       strcat(err, prefix);
      |       ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\string.h:91:5: note: 'strcat' has been explicitly marked deprecated here
   91 |     __DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1(
      |     ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\corecrt.h:835:5: note: expanded from macro '__DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1'
  835 |     __DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1_EX(_ReturnType, _ReturnPolicy, _DeclSpec, _FuncName, _FuncName##_s, _DstType, _SalAttributeDst, _DstType, _Dst, _TType1, _TArg1)
      |     ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\corecrt.h:1894:17: note: expanded from macro '__DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1_EX'
 1894 |                 _CRT_INSECURE_DEPRECATE(_SecureFuncName) _DeclSpec _ReturnType __cdecl _FuncName(_SalAttributeDst _DstType *_Dst, _TType1 _TArg1);
      |                 ^
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\vcruntime.h:355:55: note: expanded from macro '_CRT_INSECURE_DEPRECATE'
  355 |         #define _CRT_INSECURE_DEPRECATE(_Replacement) _CRT_DEPRECATE_TEXT(    \
      |                                                       ^
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\vcruntime.h:345:47: note: expanded from macro '_CRT_DEPRECATE_TEXT'
  345 | #define _CRT_DEPRECATE_TEXT(_Text) __declspec(deprecated(_Text))
      |                                               ^
C:\Users\WK\AppData\Local\Temp\tmpdi5qu2w_\main.c:21:7: warning: 'strcat' is deprecated: This function or variable may be unsafe. Consider using strcat_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.
      [-Wdeprecated-declarations]
   21 |       strcat(err, str);
      |       ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\string.h:91:5: note: 'strcat' has been explicitly marked deprecated here
   91 |     __DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1(
      |     ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\corecrt.h:835:5: note: expanded from macro '__DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1'
  835 |     __DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1_EX(_ReturnType, _ReturnPolicy, _DeclSpec, _FuncName, _FuncName##_s, _DstType, _SalAttributeDst, _DstType, _Dst, _TType1, _TArg1)
      |     ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\corecrt.h:1894:17: note: expanded from macro '__DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1_EX'
 1894 |                 _CRT_INSECURE_DEPRECATE(_SecureFuncName) _DeclSpec _ReturnType __cdecl _FuncName(_SalAttributeDst _DstType *_Dst, _TType1 _TArg1);
      |                 ^
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\vcruntime.h:355:55: note: expanded from macro '_CRT_INSECURE_DEPRECATE'
  355 |         #define _CRT_INSECURE_DEPRECATE(_Replacement) _CRT_DEPRECATE_TEXT(    \
      |                                                       ^
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\vcruntime.h:345:47: note: expanded from macro '_CRT_DEPRECATE_TEXT'
  345 | #define _CRT_DEPRECATE_TEXT(_Text) __declspec(deprecated(_Text))
      |                                               ^
5 warnings generated.
   C:\Users\WK\AppData\Local\Temp\tmpdi5qu2w_\triton_.cp310-win_amd64.lib 라이브러리 및 C:\Users\WK\AppData\Local\Temp\tmpdi5qu2w_\triton_.cp310-win_amd64.exp 개체를 생성하고 있습니다.
All required checks passed

Ken1256 · 2024-01-12T01:42:09Z

pytorch also need to be fixed to work under windows+triton (this is WIP. not tested much) wkpark#1 See also openai/triton#2738

Can it work with torch.compile?

wkpark#1 has been updated to work with the latest triton(+with win32 fix) (recent triton update breaks totch.compile() compatibility: Please see openai/triton#2701 openai/triton@72c9833 )

F:\src\pytorch\tools\dynamo>python verify_dynamo.py
Python version: 3.10.11
`torch` version: 2.3.0.dev20240109+cu118
CUDA version: 11.8
ROCM version: None

Microsoft (R) C/C++ 최적화 컴파일러 버전 19.34.31937(x64)
Copyright (c) Microsoft Corporation. All rights reserved.


...
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function 'triton__0d1d2' for 'sm_89'
ptxas info    : Function properties for triton__0d1d2
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 14 registers, 372 bytes cmem[0]
In file included from C:\Users\WK\AppData\Local\Temp\tmpdi5qu2w_\main.c:4:
In file included from C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\Python.h:118:
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\cpython/pytime.h:120:59: warning: declaration of 'struct timeval' will not be visible outside of this function [-Wvisibility]
  120 | PyAPI_FUNC(int) _PyTime_FromTimeval(_PyTime_t *tp, struct timeval *tv);
      |                                                           ^
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\cpython/pytime.h:127:12: warning: declaration of 'struct timeval' will not be visible outside of this function [-Wvisibility]
  127 |     struct timeval *tv,
      |            ^
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\cpython/pytime.h:132:12: warning: declaration of 'struct timeval' will not be visible outside of this function [-Wvisibility]
  132 |     struct timeval *tv,
      |            ^
C:\Users\WK\AppData\Local\Temp\tmpdi5qu2w_\main.c:20:7: warning: 'strcat' is deprecated: This function or variable may be unsafe. Consider using strcat_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.
      [-Wdeprecated-declarations]
   20 |       strcat(err, prefix);
      |       ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\string.h:91:5: note: 'strcat' has been explicitly marked deprecated here
   91 |     __DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1(
      |     ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\corecrt.h:835:5: note: expanded from macro '__DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1'
  835 |     __DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1_EX(_ReturnType, _ReturnPolicy, _DeclSpec, _FuncName, _FuncName##_s, _DstType, _SalAttributeDst, _DstType, _Dst, _TType1, _TArg1)
      |     ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\corecrt.h:1894:17: note: expanded from macro '__DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1_EX'
 1894 |                 _CRT_INSECURE_DEPRECATE(_SecureFuncName) _DeclSpec _ReturnType __cdecl _FuncName(_SalAttributeDst _DstType *_Dst, _TType1 _TArg1);
      |                 ^
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\vcruntime.h:355:55: note: expanded from macro '_CRT_INSECURE_DEPRECATE'
  355 |         #define _CRT_INSECURE_DEPRECATE(_Replacement) _CRT_DEPRECATE_TEXT(    \
      |                                                       ^
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\vcruntime.h:345:47: note: expanded from macro '_CRT_DEPRECATE_TEXT'
  345 | #define _CRT_DEPRECATE_TEXT(_Text) __declspec(deprecated(_Text))
      |                                               ^
C:\Users\WK\AppData\Local\Temp\tmpdi5qu2w_\main.c:21:7: warning: 'strcat' is deprecated: This function or variable may be unsafe. Consider using strcat_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.
      [-Wdeprecated-declarations]
   21 |       strcat(err, str);
      |       ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\string.h:91:5: note: 'strcat' has been explicitly marked deprecated here
   91 |     __DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1(
      |     ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\corecrt.h:835:5: note: expanded from macro '__DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1'
  835 |     __DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1_EX(_ReturnType, _ReturnPolicy, _DeclSpec, _FuncName, _FuncName##_s, _DstType, _SalAttributeDst, _DstType, _Dst, _TType1, _TArg1)
      |     ^
C:\Program Files (x86)\Windows Kits\10\include\10.0.20348.0\ucrt\corecrt.h:1894:17: note: expanded from macro '__DEFINE_CPP_OVERLOAD_STANDARD_FUNC_0_1_EX'
 1894 |                 _CRT_INSECURE_DEPRECATE(_SecureFuncName) _DeclSpec _ReturnType __cdecl _FuncName(_SalAttributeDst _DstType *_Dst, _TType1 _TArg1);
      |                 ^
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\vcruntime.h:355:55: note: expanded from macro '_CRT_INSECURE_DEPRECATE'
  355 |         #define _CRT_INSECURE_DEPRECATE(_Replacement) _CRT_DEPRECATE_TEXT(    \
      |                                                       ^
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\vcruntime.h:345:47: note: expanded from macro '_CRT_DEPRECATE_TEXT'
  345 | #define _CRT_DEPRECATE_TEXT(_Text) __declspec(deprecated(_Text))
      |                                               ^
5 warnings generated.
   C:\Users\WK\AppData\Local\Temp\tmpdi5qu2w_\triton_.cp310-win_amd64.lib 라이브러리 및 C:\Users\WK\AppData\Local\Temp\tmpdi5qu2w_\triton_.cp310-win_amd64.exp 개체를 생성하고 있습니다.
All required checks passed

Can you upload wheels for python 3.11 + cuda121?

wkpark · 2024-01-12T03:17:09Z

Can you upload wheels for python 3.11 + cuda121?

If you have some time you can test the latest triton (python 3.11+cuda121) + pytorch

there are no direct wheel links but you can download patched triton wheels at https://github.com/wkpark/triton/actions/runs/7246431088 (also found at triton-lang/triton#2738 )

download triton build and install it using pip.
install the latest nightly build pytorch (please see https://pytorch.org/get-started/locally/ and select Nightly build)
- for example you can install torch-2.3.0.dev by pip3 install --pre torch==2.3.0.dev20240110+cu118 --index-url https://download.pytorch.org/whl/nightly/cu118
apply pytorch fix from MSVC fixes wkpark/pytorch#1 (https://github.com/wkpark/pytorch/pull/1/files )
- only 2 3 files need to be fixed. (simple copy will work)
fix your_python_site-packages/torch/_dynamo/eval_frame.py # I just fixed it manually to on/off torch.compile()
before testing pytorch + triton, your cmd console environment has to be properly setup to work with MSVC. (e.g.) Visual studio 2022 command console for x64 native + nvcc for cuda)

diff --git a/torch/_dynamo/eval_frame.py b/torch/_dynamo/eval_frame.py
index 251dd6d1c32..9349fdc62a5 100644
--- a/torch/_dynamo/eval_frame.py
+++ b/torch/_dynamo/eval_frame.py
@@ -531,8 +531,8 @@ class _NullDecorator(contextlib.nullcontext):  # type: ignore[type-arg]


 def check_if_dynamo_supported():
-    if sys.platform == "win32":
-        raise RuntimeError("Windows not yet supported for torch.compile")
+    #if sys.platform == "win32":
+    #    raise RuntimeError("Windows not yet supported for torch.compile")
     if sys.version_info >= (3, 12):
         raise RuntimeError("Python 3.12+ not yet supported for torch.compile")

Ken1256 · 2024-01-13T05:17:27Z

Can you upload wheels for python 3.11 + cuda121?

If you have some time you can test the latest triton (python 3.11+cuda121) + pytorch

there are no direct wheel links but you can download patched triton wheels at https://github.com/wkpark/triton/actions/runs/7246431088 (also found at openai/triton#2738 )

download triton build and install it using pip.

install the latest nightly build pytorch (please see https://pytorch.org/get-started/locally/ and select Nightly build)

apply pytorch fix from MSVC fixes wkpark/pytorch#1 (https://github.com/wkpark/pytorch/pull/1/files )

only 2 files need to be fixed. (simple copy will work)

fix your_python_site-packages/torch/_dynamo/eval_frame.py # I just fixed it manually to on/off torch.compile()

before testing pytorch + triton, your cmd console environment has to be properly setup to work with MSVC. (e.g.) Visual studio 2022 command console for x64 native + nvcc for cuda)
diff --git a/torch/_dynamo/eval_frame.py b/torch/_dynamo/eval_frame.py
index 251dd6d1c32..9349fdc62a5 100644
--- a/torch/_dynamo/eval_frame.py
+++ b/torch/_dynamo/eval_frame.py
@@ -531,8 +531,8 @@ class _NullDecorator(contextlib.nullcontext):  # type: ignore[type-arg]


 def check_if_dynamo_supported():
-    if sys.platform == "win32":
-        raise RuntimeError("Windows not yet supported for torch.compile")
+    #if sys.platform == "win32":
+    #    raise RuntimeError("Windows not yet supported for torch.compile")
     if sys.version_info >= (3, 12):
         raise RuntimeError("Python 3.12+ not yet supported for torch.compile")

with out x64 Native Tools Command Prompt for VS 2022
RuntimeError: Failed to find C compiler. Please specify via CC environment variable.

with x64 Native Tools Command Prompt for VS 2022

File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codegen\cpp.py", line 3316, in lines
    elif not self.reduction_var_map and codecache.is_gcc():
                                        ^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 962, in is_gcc
    return bool(re.search(r"(gcc|g\+\+)", cpp_compiler()))
                                          ^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 902, in cpp_compiler
    return cpp_compiler_search(search)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 932, in cpp_compiler_search
    raise exc.InvalidCxxCompiler()
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
InvalidCxxCompiler: No working C++ compiler found in torch._inductor.config.cpp.cxx: (None, 'g++')

wkpark · 2024-01-13T05:54:38Z

Can you upload wheels for python 3.11 + cuda121?

If you have some time you can test the latest triton (python 3.11+cuda121) + pytorch
there are no direct wheel links but you can download patched triton wheels at https://github.com/wkpark/triton/actions/runs/7246431088 (also found at openai/triton#2738 )

download triton build and install it using pip.

install the latest nightly build pytorch (please see https://pytorch.org/get-started/locally/ and select Nightly build)

apply pytorch fix from MSVC fixes wkpark/pytorch#1 (https://github.com/wkpark/pytorch/pull/1/files )

only 2 files need to be fixed. (simple copy will work)

fix your_python_site-packages/torch/_dynamo/eval_frame.py # I just fixed it manually to on/off torch.compile()

before testing pytorch + triton, your cmd console environment has to be properly setup to work with MSVC. (e.g.) Visual studio 2022 command console for x64 native + nvcc for cuda)
diff --git a/torch/_dynamo/eval_frame.py b/torch/_dynamo/eval_frame.py
index 251dd6d1c32..9349fdc62a5 100644
--- a/torch/_dynamo/eval_frame.py
+++ b/torch/_dynamo/eval_frame.py
@@ -531,8 +531,8 @@ class _NullDecorator(contextlib.nullcontext):  # type: ignore[type-arg]


 def check_if_dynamo_supported():
-    if sys.platform == "win32":
-        raise RuntimeError("Windows not yet supported for torch.compile")
+    #if sys.platform == "win32":
+    #    raise RuntimeError("Windows not yet supported for torch.compile")
     if sys.version_info >= (3, 12):
         raise RuntimeError("Python 3.12+ not yet supported for torch.compile")
with out x64 Native Tools Command Prompt for VS 2022 RuntimeError: Failed to find C compiler. Please specify via CC environment variable.

with x64 Native Tools Command Prompt for VS 2022
File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codegen\cpp.py", line 3316, in lines
    elif not self.reduction_var_map and codecache.is_gcc():
                                        ^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 962, in is_gcc
    return bool(re.search(r"(gcc|g\+\+)", cpp_compiler()))
                                          ^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 902, in cpp_compiler
    return cpp_compiler_search(search)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 932, in cpp_compiler_search
    raise exc.InvalidCxxCompiler()
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
InvalidCxxCompiler: No working C++ compiler found in torch._inductor.config.cpp.cxx: (None, 'g++')

I guess, I've missed some fix.

Please set CXX=cl (you can set CXX env at cmd prompt like set CXX=cl)

or add wkpark@ee55add fix.

Ken1256 · 2024-01-14T04:05:37Z

Can you upload wheels for python 3.11 + cuda121?

If you have some time you can test the latest triton (python 3.11+cuda121) + pytorch
there are no direct wheel links but you can download patched triton wheels at https://github.com/wkpark/triton/actions/runs/7246431088 (also found at openai/triton#2738 )

download triton build and install it using pip.

install the latest nightly build pytorch (please see https://pytorch.org/get-started/locally/ and select Nightly build)

apply pytorch fix from MSVC fixes wkpark/pytorch#1 (https://github.com/wkpark/pytorch/pull/1/files )

only 2 files need to be fixed. (simple copy will work)

fix your_python_site-packages/torch/_dynamo/eval_frame.py # I just fixed it manually to on/off torch.compile()

before testing pytorch + triton, your cmd console environment has to be properly setup to work with MSVC. (e.g.) Visual studio 2022 command console for x64 native + nvcc for cuda)
diff --git a/torch/_dynamo/eval_frame.py b/torch/_dynamo/eval_frame.py
index 251dd6d1c32..9349fdc62a5 100644
--- a/torch/_dynamo/eval_frame.py
+++ b/torch/_dynamo/eval_frame.py
@@ -531,8 +531,8 @@ class _NullDecorator(contextlib.nullcontext):  # type: ignore[type-arg]


 def check_if_dynamo_supported():
-    if sys.platform == "win32":
-        raise RuntimeError("Windows not yet supported for torch.compile")
+    #if sys.platform == "win32":
+    #    raise RuntimeError("Windows not yet supported for torch.compile")
     if sys.version_info >= (3, 12):
         raise RuntimeError("Python 3.12+ not yet supported for torch.compile")
with out x64 Native Tools Command Prompt for VS 2022 RuntimeError: Failed to find C compiler. Please specify via CC environment variable.
with x64 Native Tools Command Prompt for VS 2022
File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codegen\cpp.py", line 3316, in lines
    elif not self.reduction_var_map and codecache.is_gcc():
                                        ^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 962, in is_gcc
    return bool(re.search(r"(gcc|g\+\+)", cpp_compiler()))
                                          ^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 902, in cpp_compiler
    return cpp_compiler_search(search)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 932, in cpp_compiler_search
    raise exc.InvalidCxxCompiler()
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
InvalidCxxCompiler: No working C++ compiler found in torch._inductor.config.cpp.cxx: (None, 'g++')
I guess, I've missed some fix.

Please set CXX=cl (you can set CXX env at cmd prompt like set CXX=cl)

or add wkpark@ee55add fix.

still RuntimeError: Failed to find C compiler. Please specify via CC environment variable.

import sys
import os

import torch
import torch._dynamo
# torch._dynamo.config.suppress_errors = True

class MyModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.lin = torch.nn.Linear(100, 10)

    def forward(self, x):
        return torch.nn.functional.relu(self.lin(x))

if __name__ == '__main__':
    os.environ["TORCH_CUDNN_V8_API_ENABLED"] = "1"
    os.environ["CUDA_MODULE_LOADING"] = "LAZY"
    os.environ["CUDA_PATH"] = r"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1"
    os.environ["CXX"] = "cl"

    device = 'cpu'
    device = 'cuda'
    # device = 'cuda:1'
    dtype = torch.float32
    # dtype = torch.float16
    dtype = torch.bfloat16

    mod = MyModule().to(device).to(dtype)
    opt_mod = torch.compile(mod)
    # opt_mod = torch.compile(mod, mode="reduce-overhead")
    x = torch.randn(10, 100).to(device).to(dtype)
    print(opt_mod(x))

**********************************************************************
** Visual Studio 2022 Developer Command Prompt v17.8.4
** Copyright (c) 2022 Microsoft Corporation
**********************************************************************
[vcvarsall.bat] Environment initialized for: 'x64'

C:\Program Files\Microsoft Visual Studio\2022\Community>python "C:\Temp\torch_compile_t01.py"
Traceback (most recent call last):
  File "C:\Temp\torch_compile_t01.py", line 33, in <module>
    print(opt_mod(x))
          ^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\eval_frame.py", line 417, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\eval_frame.py", line 580, in catch_errors
    return callback(frame, cache_entry, hooks, frame_state)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\convert_frame.py", line 741, in _convert_frame
    result = inner_convert(frame, cache_entry, hooks, frame_state)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\convert_frame.py", line 384, in _convert_frame_assert
    return _compile(
           ^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\convert_frame.py", line 643, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\utils.py", line 247, in time_wrapper
    r = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\convert_frame.py", line 524, in compile_inner
    out_code = transform_code_object(code, transform)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\bytecode_transformation.py", line 1033, in transform_code_object
    transformations(instructions, code_options)
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\convert_frame.py", line 151, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\convert_frame.py", line 489, in transform
    tracer.run()
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\symbolic_convert.py", line 2098, in run
    super().run()
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\symbolic_convert.py", line 780, in run
    and self.step()
        ^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\symbolic_convert.py", line 743, in step
    getattr(self, inst.opname)(inst)
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\symbolic_convert.py", line 2216, in RETURN_VALUE
    self.output.compile_subgraph(
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\output_graph.py", line 914, in compile_subgraph
    self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
  File "C:\Program Files\Python311\Lib\contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\output_graph.py", line 1085, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\utils.py", line 247, in time_wrapper
    r = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\output_graph.py", line 1157, in call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\output_graph.py", line 1138, in call_user_compiler
    compiled_fn = compiler_fn(gm, self.example_inputs())
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\repro\after_dynamo.py", line 117, in debug_wrapper
    compiled_gm = compiler_fn(gm, example_inputs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\__init__.py", line 1697, in __call__
    return compile_fx(model_, inputs_, config_patches=self.config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\compile_fx.py", line 1177, in compile_fx
    return aot_autograd(
           ^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\backends\common.py", line 55, in compiler_fn
    cg = aot_module_simplified(gm, example_inputs, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_functorch\aot_autograd.py", line 889, in aot_module_simplified
    compiled_fn = create_aot_dispatcher_function(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\utils.py", line 247, in time_wrapper
    r = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_functorch\aot_autograd.py", line 602, in create_aot_dispatcher_function
    compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config, fw_metadata=fw_metadata)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_functorch\_aot_autograd\runtime_wrappers.py", line 427, in aot_wrapper_dedupe
    return compiler_fn(flat_fn, leaf_flat_args, aot_config, fw_metadata=fw_metadata)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_functorch\_aot_autograd\runtime_wrappers.py", line 632, in aot_wrapper_synthetic_base
    return compiler_fn(flat_fn, flat_args, aot_config, fw_metadata=fw_metadata)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_functorch\_aot_autograd\jit_compile_runtime_wrappers.py", line 295, in aot_dispatch_autograd
    compiled_fw_func = aot_config.fw_compiler(fw_module, adjusted_flat_args)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\utils.py", line 247, in time_wrapper
    r = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\compile_fx.py", line 1105, in fw_compiler_base
    return inner_compile(
           ^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\repro\after_aot.py", line 83, in debug_wrapper
    inner_compiled_fn = compiler_fn(gm, example_inputs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\debug.py", line 304, in inner
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\compile_fx.py", line 317, in compile_fx_inner
    compiled_graph = fx_codegen_and_compile(
                     ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\compile_fx.py", line 551, in fx_codegen_and_compile
    compiled_fn = graph.compile_to_fn()
                  ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\graph.py", line 1157, in compile_to_fn
    return self.compile_to_module().call
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_dynamo\utils.py", line 247, in time_wrapper
    r = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\graph.py", line 1109, in compile_to_module
    mod = PyCodeCache.load_by_key_path(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 1933, in load_by_key_path
    exec(code, mod.__dict__, mod.__dict__)
  File "C:/Users/X/AppData/Local/Temp/torchinductor_X/rb/crbmdpyo62vuap2itjdzvmmwn6vi6ttrq6ljeirwfz6xok6fqe57.py", line 28, in <module>
    triton_poi_fused_relu_threshold_backward_0 = async_compile.triton('triton_', '''
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 2480, in triton
    return _load_kernel(kernel_name, source_code)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 2331, in _load_kernel
    kernel.precompile()
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\triton_heuristics.py", line 195, in precompile
    compiled_binary, launcher = self._precompile_config(
                                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\triton_heuristics.py", line 349, in _precompile_config
    binary = triton.compile(*compile_args, **compile_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\triton\compiler\compiler.py", line 178, in compile
    so_path = backend.make_launcher_stub(src, metadata)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\triton\compiler\backends\cuda.py", line 250, in make_launcher_stub
    return make_stub(src.name, src.signature, constants, ids, enable_warp_specialization=enable_warp_specialization)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\triton\compiler\make_launcher.py", line 37, in make_stub
    so = _build(name, src_path, tmpdir)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\triton\common\build.py", line 101, in _build
    raise RuntimeError("Failed to find C compiler. Please specify via CC environment variable.")
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
RuntimeError: Failed to find C compiler. Please specify via CC environment variable.

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True

wkpark · 2024-01-14T05:15:38Z

Can you upload wheels for python 3.11 + cuda121?

If you have some time you can test the latest triton (python 3.11+cuda121) + pytorch
there are no direct wheel links but you can download patched triton wheels at https://github.com/wkpark/triton/actions/runs/7246431088 (also found at openai/triton#2738 )

download triton build and install it using pip.

install the latest nightly build pytorch (please see https://pytorch.org/get-started/locally/ and select Nightly build)

apply pytorch fix from MSVC fixes wkpark/pytorch#1 (https://github.com/wkpark/pytorch/pull/1/files )

only 2 files need to be fixed. (simple copy will work)

fix your_python_site-packages/torch/_dynamo/eval_frame.py # I just fixed it manually to on/off torch.compile()

before testing pytorch + triton, your cmd console environment has to be properly setup to work with MSVC. (e.g.) Visual studio 2022 command console for x64 native + nvcc for cuda)
diff --git a/torch/_dynamo/eval_frame.py b/torch/_dynamo/eval_frame.py
index 251dd6d1c32..9349fdc62a5 100644
--- a/torch/_dynamo/eval_frame.py
+++ b/torch/_dynamo/eval_frame.py
@@ -531,8 +531,8 @@ class _NullDecorator(contextlib.nullcontext):  # type: ignore[type-arg]


 def check_if_dynamo_supported():
-    if sys.platform == "win32":
-        raise RuntimeError("Windows not yet supported for torch.compile")
+    #if sys.platform == "win32":
+    #    raise RuntimeError("Windows not yet supported for torch.compile")
     if sys.version_info >= (3, 12):
         raise RuntimeError("Python 3.12+ not yet supported for torch.compile")
with out x64 Native Tools Command Prompt for VS 2022 RuntimeError: Failed to find C compiler. Please specify via CC environment variable.
with x64 Native Tools Command Prompt for VS 2022
File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codegen\cpp.py", line 3316, in lines
    elif not self.reduction_var_map and codecache.is_gcc():
                                        ^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 962, in is_gcc
    return bool(re.search(r"(gcc|g\+\+)", cpp_compiler()))
                                          ^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 902, in cpp_compiler
    return cpp_compiler_search(search)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\torch\_inductor\codecache.py", line 932, in cpp_compiler_search
    raise exc.InvalidCxxCompiler()
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
InvalidCxxCompiler: No working C++ compiler found in torch._inductor.config.cpp.cxx: (None, 'g++')
I guess, I've missed some fix.
Please set CXX=cl (you can set CXX env at cmd prompt like set CXX=cl)
or add wkpark@ee55add fix.
still RuntimeError: Failed to find C compiler. Please specify via CC environment variable.
import sys
import os

import torch
import torch._dynamo
# torch._dynamo.config.suppress_errors = True

class MyModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.lin = torch.nn.Linear(100, 10)

    def forward(self, x):
        return torch.nn.functional.relu(self.lin(x))

if __name__ == '__main__':
    os.environ["TORCH_CUDNN_V8_API_ENABLED"] = "1"
    os.environ["CUDA_MODULE_LOADING"] = "LAZY"
    os.environ["CUDA_PATH"] = r"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1"
    os.environ["CXX"] = "cl"

    device = 'cpu'
    device = 'cuda'
    # device = 'cuda:1'
    dtype = torch.float32
    # dtype = torch.float16
    dtype = torch.bfloat16

    mod = MyModule().to(device).to(dtype)
    opt_mod = torch.compile(mod)
    # opt_mod = torch.compile(mod, mode="reduce-overhead")
    x = torch.randn(10, 100).to(device).to(dtype)
    print(opt_mod(x))

did you succeed to run tools/dynamo/verify_dynamo.py before testing?
(you can check your compiler by cl command on your cmd prompt, and/or you can override the cc command by set CC=cl)

wkpark · 2024-01-14T06:16:30Z

some tests under pytorch/test

>cl
Microsoft (R) C/C++ 최적화 컴파일러 버전 19.34.31937(x64)
Copyright (c) Microsoft Corporation. All rights reserved.

사용법: cl [ option... ] filename... [ /link linkoption... ]
>clang --version
clang version 18.0.0 (https://github.com/llvm/llvm-project 5e5a22caf88ac1ccfa8dc5720295fdeba0ad9372)
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\Program Files\LLVM\bin
>clang-cl --version
clang version 18.0.0 (https://github.com/llvm/llvm-project 5e5a22caf88ac1ccfa8dc5720295fdeba0ad9372)
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\Program Files\LLVM\bin
>pip install hypothesis expecttest pytest # install some modules
>cd pytorch\test
>python -m pytest distributed\_tensor\test_dtensor_compile.py
================================================================================== test session starts ===================================================================================
platform win32 -- Python 3.10.11, pytest-7.4.2, pluggy-1.3.0
rootdir: D:\src\pytorch
configfile: pytest.ini
plugins: anyio-3.7.1, hydra-core-1.3.2, hypothesis-6.93.0, xdist-3.5.0
collected 10 items

distributed\_tensor\test_dtensor_compile.py ....In file included from C:\Users\WK\AppData\Local\Temp\tmp7uz995_y\main.c:4:
In file included from C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\Python.h:118:
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\cpython/pytime.h:120:59: warning: declaration of 'struct timeval' will not be
      visible outside of this function [-Wvisibility]
  120 | PyAPI_FUNC(int) _PyTime_FromTimeval(_PyTime_t *tp, struct timeval *tv);
      |                                                           ^
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\cpython/pytime.h:127:12: warning: declaration of 'struct timeval' will not be
      visible outside of this function [-Wvisibility]
  127 |     struct timeval *tv,
      |            ^
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\Include\cpython/pytime.h:132:12: warning: declaration of 'struct timeval' will not be
      visible outside of this function [-Wvisibility]
  132 |     struct timeval *tv,
      |            ^
C:\Users\WK\AppData\Local\Temp\tmp7uz995_y\main.c:20:7: warning: 'strcat' is deprecated: This function or variable may be unsafe. Consider using strcat_s instead. To disable deprecation,
      use _CRT_SECURE_NO_WARNINGS. See online help for details. [-Wdeprecated-declarations]
....(snip)
F:\webui\webui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\_functional_collectives_impl.py:101: UserWarning: Trying to register finalizer to AsyncCollectiveTensor but the inner tensor is already gone
  warnings.warn(
F:\webui\webui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\_functional_collectives_impl.py:101: UserWarning: Trying to register finalizer to AsyncCollectiveTensor but the inner tensor is already gone
  warnings.warn(
[rank1]:[2024-01-14 15:09:25,284] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
[rank3]:[2024-01-14 15:09:25,284] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
[rank2]:[2024-01-14 15:09:25,285] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
[rank0]:[2024-01-14 15:09:25,286] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
.                                                                                                                              [100%]

======================================================================== 7 passed, 3 skipped in 77.75s (0:01:17) =========================================================================

CC, CXX with clang

>set CC=clang
>set CXX=clang++
>python -m pytest distributed\_tensor\test_dtensor_compile.py
================================================================================== test session starts ===================================================================================
platform win32 -- Python 3.10.11, pytest-7.4.2, pluggy-1.3.0
rootdir: D:\src\pytorch
configfile: pytest.ini
plugins: anyio-3.7.1, hydra-core-1.3.2, hypothesis-6.93.0, xdist-3.5.0
collected 10 items

distributed\_tensor\test_dtensor_compile.py .....sss
...
F:\webui\webui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\_functional_collectives_impl.py:101: UserWarning: Trying to register finalizer to AsyncCollectiveTensor but the inner tensor is already gone
  warnings.warn(
F:\webui\webui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\_functional_collectives_impl.py:101: UserWarning: Trying to register finalizer to AsyncCollectiveTensor but the inner tensor is already gone
  warnings.warn(
F:\webui\webui\stable-diffusion-webui\venv\lib\site-packages\torch\distributed\_functional_collectives_impl.py:101: UserWarning: Trying to register finalizer to AsyncCollectiveTensor but the inner tensor is already gone
  warnings.warn(
[rank1]:[2024-01-14 15:11:08,763] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
[rank0]:[2024-01-14 15:11:08,763] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
[rank3]:[2024-01-14 15:11:08,763] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
[rank2]:[2024-01-14 15:11:08,764] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
.                                                                                                                              [100%]

======================================================================== 7 passed, 3 skipped in 75.91s (0:01:15) =========================================================================

CC, CXX=clang-cl also works

>CC=clang-cl
>CXX=clang-cl
>python -m pytest distributed\_tensor\test_dtensor_compile.py
================================================================================== test session starts ===================================================================================
platform win32 -- Python 3.10.11, pytest-7.4.2, pluggy-1.3.0
rootdir: D:\src\pytorch
configfile: pytest.ini
plugins: anyio-3.7.1, hydra-core-1.3.2, hypothesis-6.93.0, xdist-3.5.0
collected 10 items
distributed\_tensor\test_dtensor_compile.py .....sss
...
[rank3]:[2024-01-14 15:14:50,463] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
[rank2]:[2024-01-14 15:14:50,465] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
[rank1]:[2024-01-14 15:14:50,467] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
[rank0]:[2024-01-14 15:14:50,467] torch.distributed._functional_collectives_impl: [WARNING] ProcessGroupGloo does not support reduce_scatter, falling back with all reduce!
.                                                                                                                              [100%]

======================================================================== 7 passed, 3 skipped in 104.95s (0:01:44) ========================================================================

iperov · 2024-02-04T09:43:44Z

Pytorch is trying to reinvent the wheel that was originally invented in Tensorflow 1.x ?

sako-ranj · 2024-02-07T02:35:52Z

can someone explain to me i'm a bit new to this stuff. windows doesn't support torch.complie yet?

alinpahontu2912 · 2024-08-12T13:29:25Z

Hello @Jerry-Master and everyone interested in torch.compile. I believe we can close this issue and follow this thread instead: #122094. I'd like you to know that in the latest nightly there is cpu-only torch.compile support

xuhancn · 2024-08-16T20:17:18Z

Currently, pytorch Windows is begin to support torch.compile in nightly build. But it is only support CPU device.
Please check: #124245
You can install nightly build:

pip install torch  --index-url https://download.pytorch.org/whl/nightly/cu121 --force-reinstall

d-kleine · 2024-08-19T13:15:50Z

I believe we can close this issue and follow this thread instead: #122094. I'd like you to know that in the latest nightly there is cpu-only torch.compile support

This issue here has been posted months before than the one you have referenced. Actually #124245 should be closed as that is a duplicate of this original issue reported here.

d-kleine · 2024-08-19T13:22:47Z

Currently, pytorch Windows is begin to support torch.compile in nightly build. But it is only support CPU device. Please check: #124245 You can install nightly build:
pip install torch  --index-url https://download.pytorch.org/whl/nightly/cu121 --force-reinstall

As long this does not work for CUDA, it's useless as your cannot use the advantages of torch.compile to speed up training. triton is still not supported on Windows.

W0819 15:18:58.469000 36800 torch\_dynamo\convert_frame.py:1100] torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
W0819 15:18:58.469000 36800 torch\_dynamo\convert_frame.py:1100] RuntimeError: Cannot find a working triton installation. Either the package is not installed or it is too old. More information on installing Triton can be found at https://github.com/openai/triton
W0819 15:18:58.469000 36800 torch\_dynamo\convert_frame.py:1100] 
W0819 15:18:58.469000 36800 torch\_dynamo\convert_frame.py:1100] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
W0819 15:18:58.469000 36800 torch\_dynamo\convert_frame.py:1100]

xuhancn · 2024-08-19T13:50:28Z

I believe we can close this issue and follow this thread instead: #122094. I'd like you to know that in the latest nightly there is cpu-only torch.compile support

This issue here has been posted months before than the one you have referenced. Actually #124245 should be closed as that is a duplicate of this original issue reported here.

No, #124245 is using to track my progress.

xuhancn · 2024-08-19T13:52:07Z

Currently, pytorch Windows is begin to support torch.compile in nightly build. But it is only support CPU device. Please check: #124245 You can install nightly build:
pip install torch  --index-url https://download.pytorch.org/whl/nightly/cu121 --force-reinstall

As long this does not work for CUDA, it's useless as your cannot use the advantages of torch.compile to speed up training. triton is still not supported on Windows.

W0819 15:18:58.469000 36800 torch\_dynamo\convert_frame.py:1100] torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
W0819 15:18:58.469000 36800 torch\_dynamo\convert_frame.py:1100] RuntimeError: Cannot find a working triton installation. Either the package is not installed or it is too old. More information on installing Triton can be found at https://github.com/openai/triton
W0819 15:18:58.469000 36800 torch\_dynamo\convert_frame.py:1100] 
W0819 15:18:58.469000 36800 torch\_dynamo\convert_frame.py:1100] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
W0819 15:18:58.469000 36800 torch\_dynamo\convert_frame.py:1100]

For triton status: triton-lang/triton#4045 (comment)

H-Huang added the oncall: pt2 label Dec 13, 2022

malfet added the module: windows Windows support for PyTorch label Dec 13, 2022

Blackhex added this to PyTorch On Windows Dec 22, 2022

Blackhex moved this to Planned in PyTorch On Windows Dec 22, 2022

ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 10, 2023

powderluv mentioned this issue Apr 19, 2023

Windows not yet supported for torch.compile nod-ai/SHARK-Studio#1315

Closed

DaoMingze mentioned this issue Apr 30, 2023

[BUG]Windows系统环境中的兼容性错误 DaoMingze/zhukebot#4

Closed

jkulhanek mentioned this issue May 20, 2023

Windows is now broken nerfstudio-project/nerfstudio#1961

Closed

iremyux self-assigned this Jun 30, 2023

d-kleine mentioned this issue Jul 5, 2023

Support Python 3.11 #86566

Closed

8 tasks

malfet mentioned this issue Aug 11, 2023

cov to onnx error #107015

Closed

tancik mentioned this issue Sep 13, 2023

Windows 10 RuntimeWarning: Windows does not yet support torch.compile and the performance will be affected. nerfstudio-project/nerfstudio#2426

Closed

andreigh mentioned this issue Sep 20, 2023

[WIP] Dynamo CPU backend under Windows #109677

Closed

iremyux removed their assignment Oct 2, 2023

stellaraccident mentioned this issue Oct 14, 2023

Remove dynamo suported check for Windows. #111313

Closed

seemethere mentioned this issue Mar 18, 2024

Investigate torch.compile Windows support. #122094

Open

nieznanysprawiciel mentioned this issue Jun 14, 2024

Investigate problems with VLLM running on windows golemfactory/gamerhash-facade#151

Closed

alinpahontu2912 moved this from Backlog to Groomed in PyTorch On Windows Aug 7, 2024

alinpahontu2912 self-assigned this Aug 12, 2024

alinpahontu2912 moved this from Groomed to In Progress in PyTorch On Windows Aug 12, 2024

Choco31415 mentioned this issue Aug 13, 2024

Pytorch GPU on windows 10 error: Windows not yet supported for torch.compile Imageomics/pybioclip#26

Closed

xuhancn mentioned this issue Aug 16, 2024

[RFC] Add new CPP builder for inductor on pytorch Windows #124245

Open

malfet mentioned this issue Sep 20, 2024

Torch.compile not working with python 3.12 on windows #136361

Closed

kalaskarsanket unassigned alinpahontu2912 Nov 13, 2024

kalaskarsanket removed the status in PyTorch On Windows Nov 13, 2024

PyTorch 2.0 not working on Windows #90768

PyTorch 2.0 not working on Windows #90768

Comments

Jerry-Master commented Dec 13, 2022 • edited by pytorch-bot bot Loading

🐛 Describe the bug

Versions

malfet commented Dec 13, 2022

msaroufim commented Dec 13, 2022

T-Atlas commented Jan 5, 2023

PleezDeez commented Jan 15, 2023

g-i-o-r-g-i-o commented Jan 20, 2023

inboxedshoe commented Mar 1, 2023

Jerry-Master commented Mar 1, 2023

SlowFeather commented Mar 28, 2023

Eduard6421 commented Jun 10, 2023

hajlyx commented Jun 20, 2023

iremyux commented Jun 30, 2023

d-kleine commented Jun 30, 2023 • edited Loading

Artyom17 commented Jul 3, 2023

devangaggarwal commented Aug 11, 2023

d-kleine commented Sep 16, 2023 • edited Loading

wkpark commented Jan 1, 2024 • edited Loading

Ken1256 commented Jan 1, 2024

jensdraht1999 commented Jan 1, 2024

BeyondYourself commented Jan 10, 2024 • edited Loading

wkpark commented Jan 11, 2024 • edited Loading

Ken1256 commented Jan 12, 2024

wkpark commented Jan 12, 2024 • edited Loading

Ken1256 commented Jan 13, 2024

wkpark commented Jan 13, 2024 • edited Loading

Ken1256 commented Jan 14, 2024

wkpark commented Jan 14, 2024

wkpark commented Jan 14, 2024 • edited Loading

iperov commented Feb 4, 2024

sako-ranj commented Feb 7, 2024

alinpahontu2912 commented Aug 12, 2024

xuhancn commented Aug 16, 2024

d-kleine commented Aug 19, 2024

d-kleine commented Aug 19, 2024

xuhancn commented Aug 19, 2024

xuhancn commented Aug 19, 2024

Jerry-Master commented Dec 13, 2022 •

edited by pytorch-bot bot

Loading

d-kleine commented Jun 30, 2023 •

edited

Loading

d-kleine commented Sep 16, 2023 •

edited

Loading

wkpark commented Jan 1, 2024 •

edited

Loading

BeyondYourself commented Jan 10, 2024 •

edited

Loading

wkpark commented Jan 11, 2024 •

edited

Loading

wkpark commented Jan 12, 2024 •

edited

Loading

wkpark commented Jan 13, 2024 •

edited

Loading

wkpark commented Jan 14, 2024 •

edited

Loading