Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout Error Depending On Tensor Sizes #56

Open
jhlee508 opened this issue Sep 12, 2024 · 7 comments
Open

Timeout Error Depending On Tensor Sizes #56

jhlee508 opened this issue Sep 12, 2024 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@jhlee508
Copy link

jhlee508 commented Sep 12, 2024

This single linear layer test code causes an error as below. It works fine if the input_tensor shape is [1, 32] or [1, 32, 32]. However, when the input is [32, 32] it gets a timeout error when reading output (linear.output_add_2).
Could you help me find the solution?

Test code

import pybuda
import torch
import time

class Linear(torch.nn.Module):
	def __init__(self):
		super(Linear, self).__init__()
		self.linear = torch.nn.Linear(32, 32)

	def forward(self, x):
		x = self.linear(x)
		return x

if __name__ == '__main__':
	# Create a TT device
	tt0 = pybuda.TTDevice("tt0", num_chips=1)

	# Create a PyTorch module with PyBuda Wrapper
	tt0.place_module(pybuda.PyTorchModule("linear", Linear()))

	# Create an input tensor
	input_tensor = torch.randn(32, 32)

	# # Compile and run inference
	start = time.time()
	output_queue = pybuda.run_inference(inputs=[input_tensor])
	outputs = output_queue.get()
	print(">>> Inference time: ", time.time() - start)
	print(outputs)

Error Log

2024-09-12 17:12:28.179 | INFO     | Runtime         - running: '/home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/budabackend/umd/device/bin/silicon/x86/create-ethernet-map /home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/budabackend//cluster_desc.yaml' with timeout 120s
  Detecting chips (found 2)                                                                                                                                                                                                                                                                 
2024-09-12 17:12:28.346 | WARNING  | SiliconDriver   - NumHostMemChannels: 2 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-09-12 17:12:28.380 | WARNING  | SiliconDriver   - NumHostMemChannels: 2 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-09-12 17:12:31.103 | INFO     | Backend         - initialize_child_process called on pid 6405
/home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/flax/struct.py:132: FutureWarning: jax.tree_util.register_keypaths is deprecated, and will be removed in a future release. Please use `register_pytree_with_keys()` instead.
  jax.tree_util.register_keypaths(data_clz, keypaths)
/home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/flax/struct.py:132: FutureWarning: jax.tree_util.register_keypaths is deprecated, and will be removed in a future release. Please use `register_pytree_with_keys()` instead.
  jax.tree_util.register_keypaths(data_clz, keypaths)
2024-09-12 17:12:34.873 | DEBUG    | pybuda.tvm_to_python:_determine_node_dtype:1713 - Node 'linear.weight' does not have a framework dtype specified. Using TVM generated dtype.
2024-09-12 17:12:34.873 | DEBUG    | pybuda.tvm_to_python:_determine_node_dtype:1713 - Node 'linear.bias' does not have a framework dtype specified. Using TVM generated dtype.
2024-09-12 17:12:34.916 | DEBUG    | pybuda.ttdevice:_create_input_queue_device_connector:1408 - Creating input queue connector on TTDevice 'tt0'
2024-09-12 17:12:34.916 | DEBUG    | pybuda.ttdevice:_create_intermediates_queue_device_connector:1418 - Creating fwd intermediates queue connector on TTDevice 'tt0'
2024-09-12 17:12:34.916 | DEBUG    | pybuda.ttdevice:_create_forward_output_queue_device_connector:1398 - Creating forward output queue connector on TTDevice 'tt0'
2024-09-12 17:12:39.053 | INFO     | Runtime         - running: '/home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/budabackend/umd/device/bin/silicon/x86/create-ethernet-map /home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/budabackend//cluster_desc.yaml' with timeout 120s
  Detecting chips (found 2)                                                                                                                                                                                                                                                                 
2024-09-12 17:12:39.192 | WARNING  | SiliconDriver   - NumHostMemChannels: 2 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-09-12 17:12:39.227 | WARNING  | SiliconDriver   - NumHostMemChannels: 2 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-09-12 17:12:40.648 | INFO     | pybuda.device_connector:pusher_thread_main:148 - Pusher thread on <pybuda.device_connector.InputQueueDirectPusherDeviceConnector object at 0x7f1e3cfce070> starting
2024-09-12 17:12:40.649 | INFO     | Backend         - initialize_child_process called on pid 6618
2024-09-12 17:12:40.650 | DEBUG    | pybuda.device:run_next_command:455 - Received COMPILE command on TTDevice 'tt0' / 6618
2024-09-12 17:12:40.650 | DEBUG    | pybuda.ttdevice:compile_for:785 - Compiling for Inference mode on TTDevice 'tt0'
2024-09-12 17:12:40.710 | INFO     | Backend         - Lookup contexts -- arch:system scope:device name:chips_with_mmio
2024-09-12 17:12:40.725 | WARNING  | SiliconDriver   - NumHostMemChannels: 2 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-09-12 17:12:40.741 | INFO     | Runtime         - Found cluster descriptor file at path=/tmp/jaehwan/3ab2f8d6c3b9/cluster_desc.yaml
2024-09-12 17:12:40.743 | INFO     | Backend         - Lookup contexts -- arch:system scope:device name:chip_locations
2024-09-12 17:12:40.743 | INFO     | Backend         - Lookup contexts -- arch:system scope:device name:ethernet_connections
2024-09-12 17:12:40.743 | INFO     | Backend         - Lookup contexts -- arch:system scope:device name:chips_with_mmio
2024-09-12 17:12:40.743 | INFO     | Backend         - Lookup contexts -- arch:system scope:device name:chip_locations
2024-09-12 17:12:40.743 | INFO     | Backend         - Lookup contexts -- arch:system scope:device name:ethernet_connections
2024-09-12 17:12:40.743 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage init_compile
2024-09-12 17:12:40.747 | INFO     | pybuda.ci:initialize_output_build_directory:98 - Pybuda output build directory for compiled artifacts: /tmp/jaehwan/3ab2f8d6c3b9
2024-09-12 17:12:40.758 | INFO     | pybuda.ci:create_symlink:89 - Symlink created from /home/n4/jaehwan/research/tenstorrent/buda-tests/torch-module/tt_build/test_out to /tmp/jaehwan/3ab2f8d6c3b9
2024-09-12 17:12:40.794 | INFO     | Backend         - Lookup contexts -- arch:system scope:device name:chips_with_mmio
2024-09-12 17:12:40.794 | INFO     | Backend         - Lookup contexts -- arch:system scope:device name:chip_locations
2024-09-12 17:12:40.794 | INFO     | Backend         - Lookup contexts -- arch:system scope:device name:ethernet_connections
2024-09-12 17:12:40.794 | INFO     | pybuda.compile:init_compile:511 - Device architecutre: wormhole_b0
2024-09-12 17:12:40.794 | INFO     | pybuda.compile:init_compile:512 - Device grid size: r = 8, c = 8
2024-09-12 17:12:40.794 | INFO     | pybuda.compile:init_compile:522 - Using chips: [0]
2024-09-12 17:12:40.794 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage generate_initial_graph
2024-09-12 17:12:40.816 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage post_initial_graph_pass
2024-09-12 17:12:40.872 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage consteval_graph
2024-09-12 17:12:40.905 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage optimized_graph
2024-09-12 17:12:40.975 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage post_autograd_pass
2024-09-12 17:12:40.993 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage pre_lowering_pass
2024-09-12 17:12:41.011 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage buda_graph_pre_placer
2024-09-12 17:12:41.015 | INFO     | GraphCompiler   - Running with Automatic Mixed Precision Level = 0.
2024-09-12 17:12:41.033 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage balancer_pass
2024-09-12 17:12:41.033 | INFO     | Always          - Running Balancer with Policy: PolicyType::NLP
2024-09-12 17:12:41.052 | INFO     | Balancer        - Based on NLP matmul analysis, target cycle count is set to 45000
2024-09-12 17:12:41.052 | INFO     | Balancer        - Balancing 100% completed!
2024-09-12 17:12:41.053 | INFO     | Balancer        - Balancer perf score : 2314814.8
2024-09-12 17:12:41.053 | INFO     | Backend         - Lookup contexts -- arch:system scope:device0 name:harvesting_mask
2024-09-12 17:12:41.053 | INFO     | Placer          - Running DRAM allocator for device 0
2024-09-12 17:12:41.061 | INFO     | PerfModel       - Running performance model...
2024-09-12 17:12:41.079 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage pre_netlist_pass
2024-09-12 17:12:41.097 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage generate_netlist
2024-09-12 17:12:41.097 | INFO     | pybuda.compile:generate_netlist:1075 - Generating Netlist
2024-09-12 17:12:41.165 | INFO     | pybuda.ci:create_symlink:89 - Symlink created from /home/n4/jaehwan/research/tenstorrent/buda-tests/torch-module/linear_netlist.yaml to /tmp/jaehwan/3ab2f8d6c3b9/linear_netlist.yaml
2024-09-12 17:12:41.206 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage backend_golden_verify
2024-09-12 17:12:41.207 | DEBUG    | pybuda.tensor:consteval_tensor:1233 - ConstEval graph: linear.weight
2024-09-12 17:12:41.208 | INFO     | Runtime         - Running tt_runtime on host: 'c1'
2024-09-12 17:12:41.208 | INFO     | PerfInfra       - Backend profiler is disabled
2024-09-12 17:12:41.208 | INFO     | PerfInfra       - Memory profiler is enabled
2024-09-12 17:12:41.212 | WARNING  | Runtime         - Config.soc_descriptor_path='/tmp/jaehwan/3ab2f8d6c3b9/device_descs/wormhole_b0_2064_0x0.yaml' doesn't exist, defaulting to '/home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/budabackend/device/wormhole_b0_8x10.yaml'
2024-09-12 17:12:41.231 | INFO     | SiliconDriver   - Detected 1 PCI device : {0}
2024-09-12 17:12:41.233 | WARNING  | SiliconDriver   - NumHostMemChannels: 2 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-09-12 17:12:41.370 | INFO     | Runtime         - Compiling Firmware for TT device
2024-09-12 17:12:42.118 | INFO     | SiliconDriver   - Software version 6.0.0, Ethernet FW version 6.9.0 (Device 0)
2024-09-12 17:12:42.230 | INFO     | Runtime         - Starting device status monitor with TIMEOUT=500s
2024-09-12 17:12:42.230 | INFO     | Loader          - Waiting for 30 seconds for NCRISC Firmware to start running on 1 device(s)
2024-09-12 17:12:42.243 | INFO     | pybuda.backend:feeder_thread_main:149 - Feeder thread on <pybuda.backend.BackendAPI object at 0x7f1e3cfce1f0> starting
2024-09-12 17:12:42.243 | DEBUG    | pybuda.backend:push_constants_and_parameters:491 - Pushing to parameter linear.weight
2024-09-12 17:12:42.244 | DEBUG    | pybuda.backend:push_constants_and_parameters:491 - Pushing to parameter linear.bias
2024-09-12 17:12:42.273 | INFO     | SiliconDriver   - Detected 1 PCI device : {0}
2024-09-12 17:12:42.274 | WARNING  | SiliconDriver   - NumHostMemChannels: 2 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-09-12 17:12:42.302 | DEBUG    | pybuda.run.impl:_run_forward:644 - Running concurrent device forward: TTDevice 'tt0'
2024-09-12 17:12:42.304 | DEBUG    | pybuda.device:run_next_command:429 - Received RUN_FORWARD command on TTDevice 'tt0' / 6618
2024-09-12 17:12:42.305 | DEBUG    | pybuda.ttdevice:forward:906 - Starting forward on TTDevice 'tt0'
2024-09-12 17:12:42.305 | DEBUG    | pybuda.backend:feeder_thread_main:171 - Run feeder thread cmd: fwd
2024-09-12 17:12:42.306 | INFO     | Runtime         - Running program 'run_fwd_0' with params [("$p_loop_count", "1")]
2024-09-12 17:12:42.307 | DEBUG    | pybuda.backend:read_queues:345 - Reading output queue linear.output_add_2
2024-09-12 17:12:42.308 | DEBUG    | pybuda.device_connector:pusher_thread_main:163 - Pusher thread pushing tensors
2024-09-12 17:12:42.309 | DEBUG    | pybuda.backend:push_to_queues:452 - Pushing to queue input
2024-09-12 17:12:43.309 | DEBUG    | pybuda.backend:read_queues:362 - 0 Reading output queue linear.output_add_2 timed out after 1
2024-09-12 17:12:44.310 | DEBUG    | pybuda.backend:read_queues:362 - 1 Reading output queue linear.output_add_2 timed out after 1
2024-09-12 17:12:45.311 | DEBUG    | pybuda.backend:read_queues:362 - 2 Reading output queue linear.output_add_2 timed out after 1
2024-09-12 17:12:46.312 | DEBUG    | pybuda.backend:read_queues:362 - 3 Reading output queue linear.output_add_2 timed out after 1
2024-09-12 17:12:47.313 | DEBUG    | pybuda.backend:read_queues:362 - 4 Reading output queue linear.output_add_2 timed out after 1
2024-09-12 17:12:48.315 | DEBUG    | pybuda.backend:read_queues:362 - 5 Reading output queue linear.output_add_2 timed out after 1
2024-09-12 17:12:49.316 | DEBUG    | pybuda.backend:read_queues:362 - 6 Reading output queue linear.output_add_2 timed out after 1
2024-09-12 17:12:50.317 | DEBUG    | pybuda.backend:read_queues:362 - 7 Reading output queue linear.output_add_2 timed out after 1
2024-09-12 17:12:51.318 | DEBUG    | pybuda.backend:read_queues:362 - 8 Reading output queue linear.output_add_2 timed out after 1
2024-09-12 17:12:52.319 | DEBUG    | pybuda.backend:read_queues:362 - 9 Reading output queue linear.output_add_2 timed out after 1
Exception in thread Thread-5:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/pybuda/device.py", line 577, in dc_transfer_thread
    self.dc_transfer(direction)
  File "/home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/pybuda/device.py", line 591, in dc_transfer
    self.forward_dc.transfer(blocking=True)
  File "/home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/pybuda/device_connector.py", line 441, in transfer
2024-09-12 17:12:52.321 | DEBUG    | pybuda.device:dc_transfer_thread:581 - Ending dc transfer thread intermediates on TTDevice 'tt0' due to shutdown event
    data = self.read()
  File "/home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/pybuda/device_connector.py", line 348, in read
    ret = BackendAPI.read_queues(self.direct_pop_queues, self.original_shapes, self.runtime_tensor_transforms, requires_grad=self.requires_grad, single_output=False, shutdown_event=self.shutdown_event, clone=False)
  File "/home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/pybuda/backend.py", line 369, in read_queues
2024-09-12 17:12:52.322 | DEBUG    | pybuda.device:dc_transfer_thread:581 - Ending dc transfer thread forward_input on TTDevice 'tt0' due to shutdown event
    raise RuntimeError("Timeout while reading " + outq.name)
RuntimeError: Timeout while reading linear.output_add_2
2024-09-12 17:12:52.324 | DEBUG    | pybuda.device_connector:pusher_thread_main:156 - Ending pusher thread on <pybuda.device_connector.InputQueueDirectPusherDeviceConnector object at 0x7f1e3cfce070> due to shutdown event
2024-09-12 17:12:53.319 | DEBUG    | pybuda.device:get_next_command:360 - Ending process on TTDevice 'tt0' due to shutdown event
@milank94 milank94 self-assigned this Sep 12, 2024
@milank94 milank94 added the bug Something isn't working label Sep 12, 2024
@milank94
Copy link

Based on this line RuntimeError: Timeout while reading linear.output_add_2 it looks like the device is in a bad state.

Can you do a card reset using the tt-smi tool? or just doing a system reboot. Then try again.

@jhlee508
Copy link
Author

Both tt-smi -r 0 and sudo reboot didn't work. I don't understand why the input_tensor = torch.randn(32, 32) causes an error. However, it works fine when it's input_tensor = torch.randn(1, 32, 32). Should I just use the latter as a workaround? (Even though this problem should be fixed.)

@milank94
Copy link

pybuda expects a batch dimension, that's why including the extra (1, 32, 32) is required.

@milank94
Copy link

@jhlee508 any follow up on the above?

@jhlee508
Copy link
Author

It makes sense that pybuda expects a batch dimension, but seems still weird that how does pybuda know which dimension is the batch dimension. For example, if an input with the shape [3, 128, 128] is provided, how does the model determine whether the '3' represents the batch size or not?

I have tried a few input shapes and couldn't figure out how pybuda determines which dimension is the batch dimension, since it doesn't seem that always for example 3rd dimension is the batch dimension.

  • Inputs with these shapes work fine: [1, 32], [4, 32], [1, 32, 32]
  • Inputs with these shapes does not work: [32, 32]

@milank94
Copy link

Thanks for the additional context @jhlee508. What seems to be happening then is that this particular input shape [32, 32] is causing a hang in the software. Fair to label this as a bug.

We've just released a new version of Buda: https://github.com/tenstorrent/tt-buda/releases/tag/v0.19.3

I would suggest that you try that one to see if there is any change. Otherwise, let's keep this bug open and we'll take a look at it.

@jhlee508
Copy link
Author

Thank you for your support. Please refer to this comment: #57 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants