DDP Spawn for object detection #1219

aisensiy · 2022-03-06T15:05:01Z

🐛 Bug

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
Traceback (most recent call last):
  File "train.py", line 21, in <module>
    trainer.finetune(model, datamodule=datamodule, strategy="freeze")
  File "/usr/local/lib/python3.8/site-packages/flash/core/trainer.py", line 161, in finetune
    return super().fit(model, train_dataloader, val_dataloaders, datamodule)
  File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
    self._call_and_handle_interrupt(
  File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
    self._dispatch()
  File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
    self.training_type_plugin.start_training(self)
  File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 173, in start_training
    self.spawn(self.new_process, trainer, self.mp_queue, return_result=False)
  File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 201, in spawn
    mp.spawn(self._wrapped_function, args=(function, args, kwargs, return_queue), nprocs=self.num_processes)
  File "/usr/local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/usr/local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 179, in start_processes
    process.start()
  File "/usr/local/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/local/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/usr/local/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/local/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/local/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/local/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object '_icevision_effdet_model_adapter.<locals>.IceVisionEffdetModelAdapter'

To Reproduce

Follow code from Object Detection. This works when gpu count is 1. But show the bugs above when gpu count is 2.

Code sample

import flash
from flash.core.data.utils import download_data
from flash.image import ObjectDetectionData, ObjectDetector

# 1. Create the DataModule
# Dataset Credit: https://www.kaggle.com/ultralytics/coco128
download_data("https://github.com/zhiqwang/yolov5-rt-stack/releases/download/v0.3.0/coco128.zip", "data/")

datamodule = ObjectDetectionData.from_coco(
    train_folder="data/coco128/images/train2017/",
    train_ann_file="data/coco128/annotations/instances_train2017.json",
    val_split=0.1,
    transform_kwargs={"image_size": 128},
    batch_size=4,
)

# 2. Build the task
model = ObjectDetector(head="efficientdet", backbone="d0", num_classes=datamodule.num_classes, image_size=128)

# 3. Create the trainer and finetune the model
trainer = flash.Trainer(max_epochs=1, gpus=2)                                   # <------------------ set gpus=2
trainer.finetune(model, datamodule=datamodule, strategy="freeze")

# 4. Detect objects in a few images!
datamodule = ObjectDetectionData.from_files(
    predict_files=[
        "data/coco128/images/train2017/000000000625.jpg",
        "data/coco128/images/train2017/000000000626.jpg",
        "data/coco128/images/train2017/000000000629.jpg",
    ],
    transform_kwargs={"image_size": 128},
    batch_size=4,
)
predictions = trainer.predict(model, datamodule=datamodule)
print(predictions)

# 5. Save the model!
trainer.save_checkpoint("object_detection_model.pt")

Expected behavior

Environment

OS (e.g., Linux): Linux
Python version: 3.8
PyTorch/Lightning/Flash Version (e.g., 1.10/1.5/0.7): 1.10.2/1.5.10/0.7.1
GPU models and configuration: efficientdet d0
Any other relevant information:

Additional context

I did some easy google search and got a stackoverflow question here:

https://stackoverflow.com/questions/70422581/python-multiprocessing-basic-cant-pickle-local-object-and-ran-out-of-input

hope this is helpful.

The text was updated successfully, but these errors were encountered:

joowon-dm-snu · 2022-03-07T04:04:19Z

add strategy argument might help you i think @aisensiy

trainer = flash.Trainer(max_epochs=300, gpus=[0, 1], strategy="ddp")
trainer.finetune(...)

aisensiy · 2022-03-07T05:52:11Z

Thanks. This works.

BTW I follow the docs to try the multi-gpu training. So maybe the doc can add some tips for multi-gpu condition. :-). That will be very helpful.

https://lightning-flash.readthedocs.io/en/latest/general/training.html?highlight=multi#training-options

ethanwharris · 2022-03-07T17:20:25Z

Thanks for reporting this @aisensiy 😃 Glad that @joowon-dm-snu was able to help out! I will leave this open as a bug report for Object Detection for DDP Spawn.

aisensiy added bug / fix Something isn't working help wanted Extra attention is needed labels Mar 6, 2022

aisensiy closed this as completed Mar 7, 2022

ethanwharris reopened this Mar 7, 2022

ethanwharris changed the title ~~Multi-gpu training for object detection~~ DDP Spawn for object detection Mar 7, 2022

ethanwharris added this to the 0.7.x milestone Mar 7, 2022

ethanwharris mentioned this issue Mar 7, 2022

Fix detection ddp spawn #1222

Merged

8 tasks

ethanwharris closed this as completed in #1222 Mar 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DDP Spawn for object detection #1219

DDP Spawn for object detection #1219

aisensiy commented Mar 6, 2022

joowon-dm-snu commented Mar 7, 2022

aisensiy commented Mar 7, 2022

ethanwharris commented Mar 7, 2022

DDP Spawn for object detection #1219

DDP Spawn for object detection #1219

Comments

aisensiy commented Mar 6, 2022

🐛 Bug

To Reproduce

Code sample

Expected behavior

Environment

Additional context

joowon-dm-snu commented Mar 7, 2022

aisensiy commented Mar 7, 2022

ethanwharris commented Mar 7, 2022