Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking training model is overfit or underfit #11193

Open
1 task done
sangyo1 opened this issue Apr 22, 2024 · 1 comment
Open
1 task done

Checking training model is overfit or underfit #11193

sangyo1 opened this issue Apr 22, 2024 · 1 comment
Assignees
Labels
models:research models that come under research directory type:feature

Comments

@sangyo1
Copy link

sangyo1 commented Apr 22, 2024

Prerequisites

Please answer the following question for yourself before submitting an issue.

  • I checked to make sure that this feature has not been requested already.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/master/research/object_detection/configs/tf2/ssd_mobilenet_v2_320x320_coco17_tpu-8.config

2. Describe the feature you request

I am training my own model built from scratch with Mobilenet_v2 SSD 320x320. I had about 1000 images for training, 100 for validation. However, when I try to check if my model is overfitting or underfitting using tensorboard --logdir, it only shows the training loss, even though I have added the validation set as well. How can I check if my model is overfitting or underfitting?
image

3. Additional context

Here is my model.config

# SSD with Mobilenet v2 FPN-lite (go/fpn-lite) feature extractor, shared box
# predictor and focal loss (a mobile version of Retinanet).
# Retinanet: see Lin et al, https://arxiv.org/abs/1708.02002
# Trained on COCO, initialized from Imagenet classification checkpoint
# Train on TPU-8
#
# Achieves 22.2 mAP on COCO17 Val

model {
  ssd {
    inplace_batchnorm_update: true
    freeze_batchnorm: false
    num_classes: 7
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    encode_background_as_zeros: true
    anchor_generator {
      multiscale_anchor_generator {
        min_level: 3
        max_level: 7
        anchor_scale: 4.0
        aspect_ratios: [1.0, 2.0, 0.5]
        scales_per_octave: 2
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 320
        width: 320
      }
    }
    box_predictor {
      weight_shared_convolutional_box_predictor {
        depth: 128
        class_prediction_bias_init: -4.6
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            random_normal_initializer {
              stddev: 0.01
              mean: 0.0
            }
          }
          batch_norm {
            scale: true,
            decay: 0.997,
            epsilon: 0.001,
          }
        }
        num_layers_before_predictor: 4
        share_prediction_tower: true
        use_depthwise: true
        kernel_size: 3
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v2_fpn_keras'
      use_depthwise: true
      fpn {
        min_level: 3
        max_level: 7
        additional_layer_depth: 128
      }
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          random_normal_initializer {
            stddev: 0.01
            mean: 0.0
          }
        }
        batch_norm {
          scale: true,
          decay: 0.997,
          epsilon: 0.001,
        }
      }
      override_base_feature_extractor_hyperparams: true
    }
    loss {
      classification_loss {
        weighted_sigmoid_focal {
          alpha: 0.25
          gamma: 2.0
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    normalize_loc_loss_by_codesize: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: 16
  sync_replicas: true
  startup_delay_steps: 0
  replicas_to_aggregate: 8
  num_steps: 120000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    random_crop_image {
      min_object_covered: 0.0
      min_aspect_ratio: 0.75
      max_aspect_ratio: 3.0
      min_area: 0.75
      max_area: 1.0
      overlap_thresh: 0.0
    }
  }
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: .08
          total_steps: 50000
          warmup_learning_rate: .026666
          warmup_steps: 1000
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  max_number_of_boxes: 100
  unpad_groundtruth_tensors: false
}

train_input_reader: {
  label_map_path: "/home/ubuntu/ssl/workspace/dataset/support_post.pbtxt"
  tf_record_input_reader {
    input_path: "/home/ubuntu/ssl/workspace/dataset/train_posts_apr19.tfrecord"
  }
}

eval_config: {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
}

eval_input_reader: {
  label_map_path: "/home/ubuntu/ssl/workspace/dataset/support_post.pbtxt"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "/home/ubuntu/ssl/workspace/dataset/val_posts_apr19.tfrecord"
  }
}

4. Are you willing to contribute it? (Yes or No)

@sangyo1 sangyo1 added models:research models that come under research directory type:feature labels Apr 22, 2024
@Sam-Seaberry
Copy link

Sam-Seaberry commented Jun 11, 2024

First off your learning rate decayed to zero before the model finished training so there was no change in the model after 50K steps. As can be seen here in you pipeline.config:
`
cosine_decay_learning_rate {

      learning_rate_base: .08
      total_steps: 50000 # This should = total number of steps 
      warmup_learning_rate: .026666
      warmup_steps: 1000

}
`

The best way to determine overfitting is to test you model on images that it has not trained upon. If the model shows low loss values but is unable to classify any/very few objects from images it has never seen, the model is overfitted.

In the case of underfitting you loss will be much larger during training and will not reduce. Using cosine decay should significantly reduce the possibility of underfitting, but selecting a appropriate learning rate base value is still important.

Also see here for details on how to evaluate a model: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
models:research models that come under research directory type:feature
Projects
None yet
Development

No branches or pull requests

3 participants