Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrain with new images #7

Closed
timday23 opened this issue Sep 20, 2023 · 10 comments
Closed

Retrain with new images #7

timday23 opened this issue Sep 20, 2023 · 10 comments

Comments

@timday23
Copy link

How can I correctly upload images to retrain the model? The new mask images are binary 1 bit images, and they are all the same dimensions as the original training images, but when I try to use my new training/validation in the training demo, I get an index error saying: IndexError: index 2 is out of bounds for dimension 0 with size 2.

I have followed the structure for the image file paths and names, so I don't know why I am getting this error.

@maxfrei750
Copy link
Owner

It's hard to diagnose this without the data you're using, but dimension 0 should be the channel dimension. Could you please check, how many dimensions your images have? They might be binary, but still have 3 dimensions (i.e. be RGB images).

@timday23
Copy link
Author

Do you mean whether they are 1 bit or 8 bit or 24 bit?

@timday23
Copy link
Author

Right now, the full images are 8 bit grayscale and the mask images are 1 bit B/W

@maxfrei750
Copy link
Owner

Please post the complete error stack that you receive.

@timday23
Copy link
Author

timday23 commented Sep 21, 2023

My methodology

  1. Generate images in blender (1024x768, 96dpi 8 bit, 256c)
  2. Generate masks in blender (1024x768, 96dpi 8 bit, 256c)
  3. Convert masks to binary imageswith python script(1 bit, 2 c)
  4. split up and upload images to "train2" and "valid2" folders
  5. upload corresponding binary masks to "particle" folder within "train2" and "valid2"
  6. change config to train_subset = train2 and val_subset = train2
  7. Run 02_train_model in jupyter notebook

Am I missing something in these steps?

Here is the complete error stack:


IndexError Traceback (most recent call last)
File ~/Desktop/paddle/train_model.py:4
1 from paddle.training import train_mask_rcnn
3 if name == "main":
----> 4 train_mask_rcnn()

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/hydra/main.py:33, in main..main_decorator..decorated_main(cfg_passthrough)
30 args = get_args_parser()
31 # no return value from run_hydra() as it may sometime actually run the task_function
32 # multiple times (--multirun)
---> 33 _run_hydra(
34 args_parser=args,
35 task_function=task_function,
36 config_path=config_path,
37 config_name=config_name,
38 strict=strict,
39 )

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/hydra/_internal/utils.py:364, in _run_hydra(args_parser, task_function, config_path, config_name, strict)
362 args.run = True
363 if args.run:
--> 364 run_and_report(
365 lambda: hydra.run(
366 config_name=config_name,
367 task_function=task_function,
368 overrides=args.overrides,
369 )
370 )
371 elif args.multirun:
372 run_and_report(
373 lambda: hydra.multirun(
374 config_name=config_name,
(...)
377 )
378 )

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/hydra/_internal/utils.py:215, in run_and_report(func)
213 except Exception as ex:
214 if _is_env_set("HYDRA_FULL_ERROR") or is_under_debugger():
--> 215 raise ex
216 else:
217 if isinstance(ex, CompactHydraException):

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/hydra/_internal/utils.py:212, in run_and_report(func)
210 def run_and_report(func: Any) -> Any:
211 try:
--> 212 return func()
213 except Exception as ex:
214 if _is_env_set("HYDRA_FULL_ERROR") or is_under_debugger():

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/hydra/_internal/utils.py:365, in _run_hydra..()
362 args.run = True
363 if args.run:
364 run_and_report(
--> 365 lambda: hydra.run(
366 config_name=config_name,
367 task_function=task_function,
368 overrides=args.overrides,
369 )
370 )
371 elif args.multirun:
372 run_and_report(
373 lambda: hydra.multirun(
374 config_name=config_name,
(...)
377 )
378 )

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/hydra/_internal/hydra.py:109, in Hydra.run(self, config_name, task_function, overrides, with_log_configuration)
101 cfg = self.compose_config(
102 config_name=config_name,
103 overrides=overrides,
104 with_log_configuration=with_log_configuration,
105 run_mode=RunMode.RUN,
106 )
107 HydraConfig.instance().set_config(cfg)
--> 109 return run_job(
110 config=cfg,
111 task_function=task_function,
112 job_dir_key="hydra.run.dir",
113 job_subdir_key=None,
114 configure_logging=with_log_configuration,
115 )

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/hydra/core/utils.py:129, in run_job(config, task_function, job_dir_key, job_subdir_key, configure_logging)
126 _save_config(config.hydra.overrides.task, "overrides.yaml", hydra_output)
128 with env_override(hydra_cfg.hydra.job.env_set):
--> 129 ret.return_value = task_function(task_cfg)
130 ret.task_name = JobRuntime.instance().get("name")
132 _flush_loggers()

File ~/Desktop/paddle/paddle/training.py:70, in train_mask_rcnn(config)
57 callbacks = [
58 ModelCheckpoint(**config.callbacks.model_checkpoint),
59 EarlyStopping(**config.callbacks.early_stopping),
60 LearningRateMonitor(),
61 ExampleDetectionMonitor(**config.callbacks.example_detection_monitor),
62 ]
64 trainer = Trainer(
65 callbacks=callbacks,
66 logger=TensorBoardLogger(save_dir=str(log_root), name="", version=version),
67 **config.trainer,
68 )
---> 70 trainer.fit(model, datamodule=data_module)

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:510, in Trainer.fit(self, model, train_dataloader, val_dataloaders, datamodule)
504 # ----------------------------
505 # TRAIN
506 # ----------------------------
507 # hook
508 self.call_hook('on_fit_start')
--> 510 results = self.accelerator_backend.train()
511 self.accelerator_backend.teardown()
513 # ----------------------------
514 # POST-Training CLEAN UP
515 # ----------------------------
516 # hook

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py:57, in Accelerator.train(self)
55 def train(self):
56 self.trainer.setup_trainer(self.trainer.model)
---> 57 return self.train_or_test()

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py:74, in Accelerator.train_or_test(self)
72 else:
73 self.trainer.train_loop.setup_training()
---> 74 results = self.trainer.train()
75 return results

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:532, in Trainer.train(self)
531 def train(self):
--> 532 self.run_sanity_check(self.get_model())
534 # set stage for logging
535 self.logger_connector.set_stage("train")

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:731, in Trainer.run_sanity_check(self, ref_model)
728 self.on_sanity_check_start()
730 # run eval step
--> 731 _, eval_results = self.run_evaluation(max_batches=self.num_sanity_val_batches)
733 # allow no returns from eval
734 if eval_results is not None and len(eval_results) > 0:
735 # when we get a list back, used only the last item

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:644, in Trainer.run_evaluation(self, max_batches, on_epoch)
642 with self.profiler.profile("evaluation_step_and_end"):
643 output = self.evaluation_loop.evaluation_step(batch, batch_idx, dataloader_idx)
--> 644 output = self.evaluation_loop.evaluation_step_end(output)
646 # hook + store predictions
647 self.evaluation_loop.on_evaluation_batch_end(output, batch, batch_idx, dataloader_idx)

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py:191, in EvaluationLoop.evaluation_step_end(self, *args, **kwargs)
189 output = self.trainer.call_hook('test_step_end', *args, **kwargs)
190 else:
--> 191 output = self.trainer.call_hook('validation_step_end', *args, **kwargs)
192 return output

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:933, in Trainer.call_hook(self, hook_name, *args, **kwargs)
931 if is_overridden(hook_name, model_ref):
932 hook_fx = getattr(model_ref, hook_name)
--> 933 output = hook_fx(*args, **kwargs)
935 # if the PL module doesn't have the hook then call the accelator
936 # used to auto-reduce things for the user with Results obj
937 elif hasattr(self.accelerator_backend, hook_name):

File ~/Desktop/paddle/paddle/lightning_modules.py:167, in LightningMaskRCNN.validation_step_end(self, output)
162 """Calculate and log the validation_metrics.
163
164 :param output: Outputs of the validation step.
165 """
166 for metric_name, metric in self.validation_metrics.items():
--> 167 metric(output["predictions"], output["targets"])
169 tag = f"validation/{metric_name}"
171 if isinstance(metric, ConfusionMatrix):
172 # TODO: Replace when Lightning-AI/pytorch-lightning#6227
173 # has been merged.

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/torch/nn/modules/module.py:727, in Module._call_impl(self, *input, **kwargs)
725 result = self._slow_forward(*input, **kwargs)
726 else:
--> 727 result = self.forward(*input, **kwargs)
728 for hook in itertools.chain(
729 _global_forward_hooks.values(),
730 self._forward_hooks.values()):
731 hook_result = hook(self, input, result)

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/pytorch_lightning/metrics/metric.py:154, in Metric.forward(self, *args, **kwargs)
152 # add current step
153 with torch.no_grad():
--> 154 self.update(*args, **kwargs)
155 self._forward_cache = None
157 if self.compute_on_step:

File ~/micromamba/envs/paddle/lib/python3.8/site-packages/pytorch_lightning/metrics/metric.py:200, in Metric._wrap_update..wrapped_func(*args, **kwargs)
197 @functools.wraps(update)
198 def wrapped_func(*args, **kwargs):
199 self._computed = None
--> 200 return update(*args, **kwargs)

File ~/Desktop/paddle/paddle/metrics/confusion_matrix.py:67, in ConfusionMatrix.update(self, predictions, targets)
60 """Updates the confusion matrix based on the supplied targets and predictions.
61
62 :param predictions: List of dictionaries with prediction data, such as boxes and masks.
63 :param targets: Tuple of dictionaries with target data, such as boxes and masks.
64 """
66 for prediction, target in zip(predictions, targets):
---> 67 confusion_matrix = self._evaluate_image(prediction, target)
69 self.confmat += confusion_matrix

File ~/Desktop/paddle/paddle/metrics/confusion_matrix.py:139, in ConfusionMatrix._evaluate_image(self, prediction, target)
137 label_pred = 0 # background class
138 for label_gt in labels_gt[~is_assigned_gt]:
--> 139 confusion_matrix[label_gt, label_pred] += 1
141 return confusion_matrix

IndexError: index 2 is out of bounds for dimension 0 with size 2

@maxfrei750
Copy link
Owner

maxfrei750 commented Sep 21, 2023

Thanks for posting the stack. I think that helped a lot...

Forget what I said about the image format.

--> 139 confusion_matrix[label_gt, label_pred] += 1
141 return confusion_matrix

IndexError: index 2 is out of bounds for dimension 0 with size 2

This is during the evaluation, where the code tries to compute the confusion matrix, for the different classes. The error says that the confusion matrix was initialized with 2 classes. This is the default value. The first class is the background class (0) and the second class is the particle class (1). However, your ground truth data has more than two classes. The number of classes in the ground truth data, is defined as the number of subfolders (e.g. "particle"). Do you happen to have more than one subfolder in your "train2" and/or "valid2" folder?

@maxfrei750
Copy link
Owner

While we're at it: Unfortunately, I cannot provide too much support for paddle any longer, since the respective project ended. Fortunately, there has been a lot of progress in the meantime, with regard to the usability of Mask R-CNN for custom applications. Therefore, I'd recommend making a switch to the mmdetection framework, which has a large community and therefore a much more detailed documentation.

@maxfrei750
Copy link
Owner

This is a good place to get started: https://mmdetection.readthedocs.io/en/latest/user_guides/train.html#train-with-customized-datasets

I understand, if you don't want to make the switch right away, since you might already be close to getting paddle to run properly. So I can still try to help you, to sort this issue out. However, if you encounter further problems, I'd definitely advise making the switch.

@timday23
Copy link
Author

Thanks for the help, we were able to get it working.

@maxfrei750
Copy link
Owner

Great! Then please give a short explanation, what caused the problem, in case that others have the same issue. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants