You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running DeepSpeed-MII on a system with two NVIDIA A100X GPUs. I am running the following simple latency benchmark code for inference:
importmathimporttimeimportmiibatch_size=16inputs= [
"DeepSpeed is a machine learning framework",
"He is working on",
"He has a",
"He got all",
"Everyone is happy and I can",
"The new movie that got Oscar this year",
"In the far far distance from our galaxy,",
"Peace is the only way"
]
inputs*=math.ceil(batch_size/len(inputs))
pipe=mii.pipeline("facebook/opt-2.7b", max_length=50)
times= []
foriinrange(30):
start=time.time()
outputs=pipe(inputs)
end=time.time()
times.append(end-start)
print(f"latency: {sum(times[3:]) /len(times[3:])}")
This code works fine if I run with 1 GPU:
deepspeed --num_gpus 1 mii-bench.py
But it is throwing the mentioned error in the tittle if I run it with more than 1 GPUs:
deepspeed --num_gpus 2 mii-bench.py
Full execution log:
[2024-06-24 16:04:09,839] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
�[93m [WARNING] �[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
�[93m [WARNING] �[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
�[93m [WARNING] �[0m using untested triton version (2.3.0), only 1.0.0 is known to be compatible
[2024-06-24 16:04:11,349] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-06-24 16:04:11,350] [INFO] [runner.py:568:main] cmd = /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None mii-bench.py
[2024-06-24 16:04:12,992] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
�[93m [WARNING] �[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
�[93m [WARNING] �[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
�[93m [WARNING] �[0m using untested triton version (2.3.0), only 1.0.0 is known to be compatible
[2024-06-24 16:04:14,619] [INFO] [launch.py:146:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2024-06-24 16:04:14,619] [INFO] [launch.py:152:main] nnodes=1, num_local_procs=2, node_rank=0
[2024-06-24 16:04:14,619] [INFO] [launch.py:163:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2024-06-24 16:04:14,619] [INFO] [launch.py:164:main] dist_world_size=2
[2024-06-24 16:04:14,619] [INFO] [launch.py:168:main] Setting CUDA_VISIBLE_DEVICES=0,1
[2024-06-24 16:04:14,620] [INFO] [launch.py:256:main] process 766235 spawned with command: ['/usr/bin/python3', '-u', 'mii-bench.py', '--local_rank=0']
[2024-06-24 16:04:14,621] [INFO] [launch.py:256:main] process 766236 spawned with command: ['/usr/bin/python3', '-u', 'mii-bench.py', '--local_rank=1']
[2024-06-24 16:04:16,232] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-24 16:04:16,324] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
�[93m [WARNING] �[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
�[93m [WARNING] �[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
�[93m [WARNING] �[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
�[93m [WARNING] �[0m using untested triton version (2.3.0), only 1.0.0 is known to be compatible
�[93m [WARNING] �[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
�[93m [WARNING] �[0m using untested triton version (2.3.0), only 1.0.0 is known to be compatible
[2024-06-24 16:04:17,824] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-06-24 16:04:17,953] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-06-24 16:04:17,953] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
/home/rahamanm/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
/home/rahamanm/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Fetching 6 files: 0%| | 0/6 [00:00<?, ?it/s]
Fetching 6 files: 100%|██████████| 6/6 [00:00<00:00, 90200.09it/s]
Fetching 6 files: 0%| | 0/6 [00:00<?, ?it/s]
Fetching 6 files: 100%|██████████| 6/6 [00:00<00:00, 79387.46it/s]
[2024-06-24 16:04:19,472] [INFO] [engine_v2.py:82:__init__] Building model...
Using /home/rahamanm/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/rahamanm/.cache/torch_extensions/py310_cu121/inference_core_ops/build.ninja...
/home/rahamanm/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
Building extension module inference_core_ops...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module inference_core_ops...
Time to load inference_core_ops op: 0.10766816139221191 seconds
[2024-06-24 16:04:19,763] [INFO] [engine_v2.py:82:__init__] Building model...
Using /home/rahamanm/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/rahamanm/.cache/torch_extensions/py310_cu121/inference_core_ops/build.ninja...
/home/rahamanm/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
Building extension module inference_core_ops...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Using /home/rahamanm/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
ninja: no work to do.
Loading extension module inference_core_ops...
Time to load inference_core_ops op: 0.11362981796264648 seconds
Detected CUDA files, patching ldflags
Emitting ninja build file /home/rahamanm/.cache/torch_extensions/py310_cu121/ragged_device_ops/build.ninja...
/home/rahamanm/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
Building extension module ragged_device_ops...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Using /home/rahamanm/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
ninja: no work to do.
Loading extension module ragged_device_ops...
Time to load ragged_device_ops op: 0.09276843070983887 seconds
Using /home/rahamanm/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/rahamanm/.cache/torch_extensions/py310_cu121/ragged_ops/build.ninja...
/home/rahamanm/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
Building extension module ragged_ops...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Loading extension module ragged_device_ops...
Time to load ragged_device_ops op: 0.10586166381835938 seconds
Using /home/rahamanm/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
ninja: no work to do.
Loading extension module ragged_ops...
Time to load ragged_ops op: 0.08362889289855957 seconds
[2024-06-24 16:04:20,046] [INFO] [huggingface_engine.py:109:parameters] Loading checkpoint: /home/rahamanm/.cache/huggingface/hub/models--facebook--opt-2.7b/snapshots/905a4b602cda5c501f1b3a2650a4152680238254/pytorch_model.bin
Loading extension module ragged_ops...
Time to load ragged_ops op: 0.10570263862609863 seconds
[2024-06-24 16:04:20,133] [INFO] [huggingface_engine.py:109:parameters] Loading checkpoint: /home/rahamanm/.cache/huggingface/hub/models--facebook--opt-2.7b/snapshots/905a4b602cda5c501f1b3a2650a4152680238254/pytorch_model.bin
[2024-06-24 16:04:23,615] [INFO] [engine_v2.py:84:__init__] Model built.
[2024-06-24 16:04:23,810] [INFO] [engine_v2.py:84:__init__] Model built.
[2024-06-24 16:04:24,213] [INFO] [kv_cache.py:135:__init__] Allocating KV-cache 0 with shape: (32, 7647, 64, 2, 16, 80) consisting of 7647 blocks.
[2024-06-24 16:04:24,213] [INFO] [kv_cache.py:135:__init__] Allocating KV-cache 0 with shape: (32, 7647, 64, 2, 16, 80) consisting of 7647 blocks.
[2024-06-24 16:04:26,561] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 1 which does not exist.
[2024-06-24 16:04:27,269] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 9 which does not exist.
[2024-06-24 16:04:27,950] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 1 which does not exist.
[2024-06-24 16:04:28,504] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 1 which does not exist.
[2024-06-24 16:04:28,505] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 9 which does not exist.
[2024-06-24 16:04:29,055] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 1 which does not exist.
[2024-06-24 16:04:29,055] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 9 which does not exist.
[2024-06-24 16:04:29,606] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 1 which does not exist.
[2024-06-24 16:04:29,606] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 9 which does not exist.
[2024-06-24 16:04:30,148] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 9 which does not exist.
[2024-06-24 16:04:30,688] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 1 which does not exist.
[2024-06-24 16:04:31,227] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 1 which does not exist.
[2024-06-24 16:04:31,770] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 9 which does not exist.
[2024-06-24 16:04:32,312] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 9 which does not exist.
[2024-06-24 16:04:32,863] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 1 which does not exist.
[2024-06-24 16:04:32,863] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 2 which does not exist.
[2024-06-24 16:04:32,863] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 9 which does not exist.
[2024-06-24 16:04:32,863] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 10 which does not exist.
[2024-06-24 16:04:33,407] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 9 which does not exist.
[2024-06-24 16:04:33,954] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 1 which does not exist.
[2024-06-24 16:04:33,954] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 3 which does not exist.
[2024-06-24 16:04:34,496] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 1 which does not exist.
[2024-06-24 16:04:34,496] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 9 which does not exist.
[2024-06-24 16:04:35,034] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 9 which does not exist.
[2024-06-24 16:04:35,573] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 1 which does not exist.
[2024-06-24 16:04:35,574] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 9 which does not exist.
[2024-06-24 16:04:36,125] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 9 which does not exist.
[2024-06-24 16:04:36,669] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 9 which does not exist.
[2024-06-24 16:04:37,208] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 1 which does not exist.
[2024-06-24 16:04:37,208] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 9 which does not exist.
[2024-06-24 16:04:37,747] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 1 which does not exist.
[2024-06-24 16:04:38,830] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 1 which does not exist.
[2024-06-24 16:04:38,831] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 9 which does not exist.
[2024-06-24 16:04:39,370] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 1 which does not exist.
[2024-06-24 16:04:39,909] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 9 which does not exist.
[2024-06-24 16:04:40,982] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 1 which does not exist.
[2024-06-24 16:04:40,982] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 9 which does not exist.
[2024-06-24 16:04:41,522] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 1 which does not exist.
[2024-06-24 16:04:42,061] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 1 which does not exist.
[2024-06-24 16:04:42,061] [WARNING] [ragged_manager.py:115:flush_sequence] Attempting to flush sequence 9 which does not exist.
latency: 0.542697650414926
[2024-06-24 16:04:44,653] [INFO] [launch.py:351:main] Process 766235 exits successfully.
[rank1]: Traceback (most recent call last):
[rank1]: File "/home/rahamanm/repos/DeepSpeedExamples/inference/huggingface/text-generation/mii-bench.py", line 26, in <module>
[rank1]: outputs = pipe(inputs)
[rank1]: File "/home/rahamanm/.local/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 570, in __call__
[rank1]: self.schedule_requests()
[rank1]: File "/home/rahamanm/.local/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 335, in schedule_requests
[rank1]: self.reset_request_status()
[rank1]: File "/home/rahamanm/.local/lib/python3.10/site-packages/mii/batching/ragged_batching.py", line 360, in reset_request_status
[rank1]: assert last_r is not None, "Function to clear the KV cache is invoked, but no request consumes KV cache"
[rank1]: AssertionError: Function to clear the KV cache is invoked, but no request consumes KV cache
[2024-06-24 16:05:19,690] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 766235
[2024-06-24 16:05:19,691] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 766236
[2024-06-24 16:05:19,691] [ERROR] [launch.py:325:sigkill_handler] ['/usr/bin/python3', '-u', 'mii-bench.py', '--local_rank=1'] exits with return code = 1
The text was updated successfully, but these errors were encountered:
I am running DeepSpeed-MII on a system with two NVIDIA A100X GPUs. I am running the following simple latency benchmark code for inference:
This code works fine if I run with 1 GPU:
But it is throwing the mentioned error in the tittle if I run it with more than 1 GPUs:
Full execution log:
The text was updated successfully, but these errors were encountered: