Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

DDP Spawn for object detection #1219

Closed
aisensiy opened this issue Mar 6, 2022 · 3 comments · Fixed by #1222
Closed

DDP Spawn for object detection #1219

aisensiy opened this issue Mar 6, 2022 · 3 comments · Fixed by #1222
Labels
bug / fix Something isn't working help wanted Extra attention is needed
Milestone

Comments

@aisensiy
Copy link
Contributor

aisensiy commented Mar 6, 2022

🐛 Bug

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
Traceback (most recent call last):
  File "train.py", line 21, in <module>
    trainer.finetune(model, datamodule=datamodule, strategy="freeze")
  File "/usr/local/lib/python3.8/site-packages/flash/core/trainer.py", line 161, in finetune
    return super().fit(model, train_dataloader, val_dataloaders, datamodule)
  File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
    self._call_and_handle_interrupt(
  File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
    self._dispatch()
  File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
    self.training_type_plugin.start_training(self)
  File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 173, in start_training
    self.spawn(self.new_process, trainer, self.mp_queue, return_result=False)
  File "/usr/local/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 201, in spawn
    mp.spawn(self._wrapped_function, args=(function, args, kwargs, return_queue), nprocs=self.num_processes)
  File "/usr/local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/usr/local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 179, in start_processes
    process.start()
  File "/usr/local/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/local/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/usr/local/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/local/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/local/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/local/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object '_icevision_effdet_model_adapter.<locals>.IceVisionEffdetModelAdapter'

To Reproduce

Follow code from Object Detection. This works when gpu count is 1. But show the bugs above when gpu count is 2.

Code sample

import flash
from flash.core.data.utils import download_data
from flash.image import ObjectDetectionData, ObjectDetector

# 1. Create the DataModule
# Dataset Credit: https://www.kaggle.com/ultralytics/coco128
download_data("https://github.com/zhiqwang/yolov5-rt-stack/releases/download/v0.3.0/coco128.zip", "data/")

datamodule = ObjectDetectionData.from_coco(
    train_folder="data/coco128/images/train2017/",
    train_ann_file="data/coco128/annotations/instances_train2017.json",
    val_split=0.1,
    transform_kwargs={"image_size": 128},
    batch_size=4,
)

# 2. Build the task
model = ObjectDetector(head="efficientdet", backbone="d0", num_classes=datamodule.num_classes, image_size=128)

# 3. Create the trainer and finetune the model
trainer = flash.Trainer(max_epochs=1, gpus=2)                                   # <------------------ set gpus=2
trainer.finetune(model, datamodule=datamodule, strategy="freeze")

# 4. Detect objects in a few images!
datamodule = ObjectDetectionData.from_files(
    predict_files=[
        "data/coco128/images/train2017/000000000625.jpg",
        "data/coco128/images/train2017/000000000626.jpg",
        "data/coco128/images/train2017/000000000629.jpg",
    ],
    transform_kwargs={"image_size": 128},
    batch_size=4,
)
predictions = trainer.predict(model, datamodule=datamodule)
print(predictions)

# 5. Save the model!
trainer.save_checkpoint("object_detection_model.pt")

Expected behavior

Environment

  • OS (e.g., Linux): Linux
  • Python version: 3.8
  • PyTorch/Lightning/Flash Version (e.g., 1.10/1.5/0.7): 1.10.2/1.5.10/0.7.1
  • GPU models and configuration: efficientdet d0
  • Any other relevant information:

Additional context

I did some easy google search and got a stackoverflow question here:

https://stackoverflow.com/questions/70422581/python-multiprocessing-basic-cant-pickle-local-object-and-ran-out-of-input

hope this is helpful.

@aisensiy aisensiy added bug / fix Something isn't working help wanted Extra attention is needed labels Mar 6, 2022
@joowon-dm-snu
Copy link

add strategy argument might help you i think @aisensiy

trainer = flash.Trainer(max_epochs=300, gpus=[0, 1], strategy="ddp")
trainer.finetune(...)

@aisensiy
Copy link
Contributor Author

aisensiy commented Mar 7, 2022

Thanks. This works.

BTW I follow the docs to try the multi-gpu training. So maybe the doc can add some tips for multi-gpu condition. :-). That will be very helpful.

https://lightning-flash.readthedocs.io/en/latest/general/training.html?highlight=multi#training-options

@aisensiy aisensiy closed this as completed Mar 7, 2022
@ethanwharris
Copy link
Collaborator

Thanks for reporting this @aisensiy 😃 Glad that @joowon-dm-snu was able to help out! I will leave this open as a bug report for Object Detection for DDP Spawn.

@ethanwharris ethanwharris reopened this Mar 7, 2022
@ethanwharris ethanwharris changed the title Multi-gpu training for object detection DDP Spawn for object detection Mar 7, 2022
@ethanwharris ethanwharris added this to the 0.7.x milestone Mar 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug / fix Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants