You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a brief example that has been edited from the README.md file:
importtorchimporttorch.nnasnnimporttorch.optimasoptimfromflashfftconvimportFlashDepthWiseConv1dB=4L=26000d=512k=3padding=k-1dtype=torch.bfloat16device="cuda:4"# set up PyTorch equivalent to get the weights# in_channels = out_channels, and kernel size must be oddx=torch.randn((B,d,L),device=device,dtype=dtype)
conv1d_torch=nn.Conv1d(
in_channels=d,
out_channels=d,
kernel_size=k,
groups=d,
padding=padding,
dtype=dtype,
device=device
)
flash_conv1d=FlashDepthWiseConv1d(
channels=d,
kernel_size=k,
padding=padding,
weights=conv1d_torch.weight,
bias=conv1d_torch.bias,
dtype=dtype# this should be the dtype of the weights
).to(device=device)
out_torch=conv1d_torch(x) # x is B, d, Lout_flash=flash_conv1d(x) # x can be a different dtype than weights# out_torch and out_flash should be the same!out_flash.sum().backward()#Got an error!out_torch.sum().backward()#It's OK
When I ran this sample program, I encountered the following error message:
RuntimeErrorTraceback (mostrecentcalllast)
CellIn[16], line1---->1out_flash.sum().backward()
File~/miniconda3/lib/python3.9/site-packages/torch/_tensor.py:525, inTensor.backward(self, gradient, retain_graph, create_graph, inputs)
515ifhas_torch_function_unary(self):
516returnhandle_torch_function(
517Tensor.backward,
518 (self,),
(...)
523inputs=inputs,
524 )
-->525torch.autograd.backward(
526self, gradient, retain_graph, create_graph, inputs=inputs527 )
File~/miniconda3/lib/python3.9/site-packages/torch/autograd/__init__.py:267, inbackward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
262retain_graph=create_graph264# The reason we repeat the same comment below is that265# some Python versions print out the first line of a multi-line function266# calls in the traceback and some print out the last line-->267_engine_run_backward(
268tensors,
269grad_tensors_,
270retain_graph,
271create_graph,
272inputs,
273allow_unreachable=True,
274accumulate_grad=True,
275 )
File~/miniconda3/lib/python3.9/site-packages/torch/autograd/graph.py:744, in_engine_run_backward(t_outputs, *args, **kwargs)
742unregister_hooks=_register_logging_hooks_on_whole_graph(t_outputs)
743try:
-->744returnVariable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass745t_outputs, *args, **kwargs746 ) # Calls into the C++ engine to run the backward pass747finally:
748ifattach_logging_hooks:
File~/miniconda3/lib/python3.9/site-packages/torch/autograd/function.py:301, inBackwardCFunction.apply(self, *args)
295raiseRuntimeError(
296"Implementing both 'backward' and 'vjp' for a custom "297"Function is not allowed. You should only implement one "298"of them."299 )
300user_fn=vjp_fnifvjp_fnisnotFunction.vjpelsebackward_fn-->301returnuser_fn(self, *args)
File~/miniconda3/lib/python3.9/site-packages/flashfftconv-0.0.0-py3.9.egg/flashfftconv/depthwise_1d.py:20, inconv1dFunc.backward(ctx, dout)
18input, weight, bias=ctx.saved_tensors19dout=dout.contiguous()
--->20du, dk, dbias=conv1d_backward(dout, input, weight, bias, ctx.padding, ctx.is_bhl)
21returndu, dk, dbias, None, NoneRuntimeError: Expectedsizeforfirsttwodimensionsofbatch2tensortobe: [2048, 26000] butgot: [2048, 26002].
Interestingly, this code works if the padding is set to (kernel-1)//2, regardless of whether using dtype=float16, float32, or bfloat16. Here is another example copied from test_conv1d.py:
I believe there might be an error in the implementation of the backward method in the program. Could you please provide any suggestions or references for possible corrections?
P.S. Tested with NVIDIA A800 80GB device, Driver Version: 525.85.12, CUDA Version: 12.0. Python 3.9.19, torch==2.3.1,g++ (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
The text was updated successfully, but these errors were encountered:
Looks like a bug - feel free to look through the outputs and file a PR to fix it if you have the chance. We are (slowly) working to rewrite this library in a more modern framework like ThunderKittens.
This is a brief example that has been edited from the README.md file:
When I ran this sample program, I encountered the following error message:
Interestingly, this code works if the padding is set to (kernel-1)//2, regardless of whether using dtype=float16, float32, or bfloat16. Here is another example copied from test_conv1d.py:
This caused the same error:
I believe there might be an error in the implementation of the backward method in the program. Could you please provide any suggestions or references for possible corrections?
P.S. Tested with NVIDIA A800 80GB device, Driver Version: 525.85.12, CUDA Version: 12.0. Python 3.9.19, torch==2.3.1,g++ (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
The text was updated successfully, but these errors were encountered: