what is the lowest loss value can reach? #9

fengjian0106 · 2016-11-16T12:55:02Z

hi, I have trained a yolo-small model to step 4648, but most of loss values are greater than 1.0, and the result of test is not very well. I want to know how well can loss value be, and could you please show some key parameters when training, e.g learning rate, training time, the final loss value, and so on.

I train the model on iMac(4 GHz Inter Core i7, 16GB memory), CPU mode.

thank you!

thtrieu · 2016-11-16T14:44:02Z

What batch size are you using? Because without the batch size, step number cannot say anything about how far you've gone. According to the author of YOLO, he used pretty powerful machine and the training have two stages with the first stage (training convolution layer with average pool) takes about a week. So you should be patient if you're not that far from the beginning.

Training deep net is more of an art than science. So my suggestion is you first train your model on a small data size first to see if the model is able to overfit over training set, if not then there's a problem to solve before proceeding. Notice due to data augmentation built in the code, you can't really reach 0.0 for the loss.

I've trained a few configs on my code and the loss can shrink down well from > 10.0 to around 0.5 or below (parameters C, B, S are not relevant since the loss is averaged across the output tensor). I usually start with default learning rate 1e-5, and batch size 16 or even 8 to speed up the loss first until it stops decreasing and seem to be unstable.

Then, learning rate will be decreased down to 1e-6 and batch size increase to 32 and 64 whenever I feel that the loss get stuck (and testing still does not give good result). You can switch to other adaptive learning rate training algorithm (e.g. Adadelta, Adam, etc) if you feel like familiar with them by editing ./yolo/train.py/yolo_loss()

You can also look at the learning rate policy the YOLO author used, inside .cfg files.

Best of luck

KentChun33333 · 2016-11-16T20:52:14Z

@thtrieu What a nice suggestion !

I also encountered similar issues, and find out that pre-trained weight might be a really help. More, quality and quantities of data-itself is really important especially while training a yolo-style network, it just too hard to converge well ...

I am still struggling on this '

fengjian0106 · 2016-11-17T02:03:24Z

@thtrieu thank you~

In my first round of training, the batch size is 12. I get your point when you say patient.

My final goal is to find the bounding box of object which is not in the Imagenet, so I do the training without pre-trained model.

Thanks again!

thtrieu · 2016-12-26T07:54:44Z

Just a friendly ping. I've finish training for a YOLO of 4 classes, if you are interested I will write some notes about the process of training it.

fengjian0106 · 2017-01-10T07:28:14Z

@thtrieu Yes, I am looking forward to it.

thtrieu · 2017-01-10T07:39:24Z

I have updated the code for many cycles since then, so it will affect the scaling of loss value. But mechanism is the same. Here are my notes:

You should really re-use the trained weights, this is a supported feature in darkflow. Preferably 2 or 3 first layers taken from the original YOLO would be good.
Before training, run a fine-tuning on some trained models to see the loss value. These are converged values, so that is your goal to get down around these numbers. (Approximately 1.5 ~ 1.7)
Make sure to overfit a very small training dataset before going further. This makes sure the logic is working.
When get stuck at the loss value, overfit a very small set of data training again. If you are able to get the loss down, your model is underfitting, so consider two options: 1. increase the size of layers, 2. increase the depth. The later is usually better in terms of generalization and speed.
Occasionally visualise the prediction and see what kind of mistake the model is making. In my case it was predicting almost all classes to be person due to heavily skewed data. When I gradually set the weight for class term in the loss objective higher, this mistake get less severe. Notice replicating other class’s data to achieve balance will result in an unnatural distribution of training data. So I would advise against this.

Good luck, I'd love to hear update from your training.

MisayaZ · 2017-02-07T02:10:41Z

@thtrieu I run a fine-tuning on tiny-yolo-voc models, but the loss value is approximately 6, not 1.5~1.7.

thtrieu · 2017-02-07T02:22:26Z

I don't have much experience in YOLOv2, maybe @ryansun1900 does.

Here is why YOLOv2's loss is much higher than that of v1:

In v2, there are 13 x 13 x 5 = 845 proposal bounding boxes, each with its own confidence (objectness) and conditional class probability terms.
In v1, there are only 7 x 7 x 2 = 98 proposal bounding boxes, sharing the same confidence term as well as conditional class probability terms.

So the output volume of v2 is much larger than v1 (21125 vs 1470), and so is the loss.

ryansun1900 · 2017-02-07T13:45:13Z

So far, I don't have much experience in training large data too.
But thtrieu's explanation is correct. The loss implementation is different between yolov1 & yolov2. I think the loss difference is reasonable.

ghost · 2017-04-11T06:37:18Z

thanks for the good tips :)

Shameendra · 2019-09-13T13:06:38Z

Hi ,

When get stuck at the loss value, overfit a very small set of data training again. If you are able to get the loss down, your model is underfitting, so consider two options: 1. increase the size of layers, 2. increase the depth. The later is usually better in terms of generalization and speed.

@thtrieu can you please explain what do you mean by increase the deapth? How do we do it? By changing something in the cfg file? I am training for 9 classes with yolov2 and have creazed a cfg file called yolov2-tiny-9c.cfg. SO i make changes in this file or in the original yolov2-tiny.cfg file?

CdAB63 · 2020-06-14T23:30:39Z

I`m training a model for 1 class, yolov3-tiny.cfg. Training set 6800 jpegs ranging from 1 to 24 objects in each jpeg. Training set images normalized to 720 lines (height) but variable width. Batch size 24, subdivisions 2. Image size 512x512. Learning rate 0.0015. Max batches 450000. Although mAP is high (about 98%) average loss is still above 0.5. I guess that model is fully trained at iteration 31500 because beyond this point mAP is stable at 0.98 (98%).

My doubt is: I feel the model is overfit because it does not generalizes well or it does not generalizes well because average loss is still high?

luthfi07 · 2020-12-16T06:50:13Z

I`m training a model for 1 class, yolov3-tiny.cfg. Training set 6800 jpegs ranging from 1 to 24 objects in each jpeg. Training set images normalized to 720 lines (height) but variable width. Batch size 24, subdivisions 2. Image size 512x512. Learning rate 0.0015. Max batches 450000. Although mAP is high (about 98%) average loss is still above 0.5. I guess that model is fully trained at iteration 31500 because beyond this point mAP is stable at 0.98 (98%).

My doubt is: I feel the model is overfit because it does not generalizes well or it does not generalizes well because average loss is still high?

hey can you tell me how to print chart like this when you training your model?

gitgurra · 2021-05-18T14:32:10Z

hey can you tell me how to print chart like this when you training your model?

I think he's using AlexeyAB's repo which has GUI support.

NayabZahra · 2022-01-26T07:20:22Z

Just a friendly ping. I've finish training for a YOLO of 4 classes, if you are interested I will write some notes about the process of training it.

I want to get complete loss function computation as I am facing a problem in understanding it

krkrman · 2022-05-05T02:39:33Z

ranging

do not write the parameter dont_show in the training command

thtrieu closed this as completed Nov 17, 2016

thtrieu reopened this Jan 10, 2017

thtrieu closed this as completed Feb 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what is the lowest loss value can reach? #9

what is the lowest loss value can reach? #9

fengjian0106 commented Nov 16, 2016

thtrieu commented Nov 16, 2016 •

edited

Loading

KentChun33333 commented Nov 16, 2016 •

edited

Loading

fengjian0106 commented Nov 17, 2016

thtrieu commented Dec 26, 2016

fengjian0106 commented Jan 10, 2017

thtrieu commented Jan 10, 2017 •

edited

Loading

MisayaZ commented Feb 7, 2017

thtrieu commented Feb 7, 2017 •

edited

Loading

ryansun1900 commented Feb 7, 2017

ghost commented Apr 11, 2017

Shameendra commented Sep 13, 2019 •

edited

Loading

CdAB63 commented Jun 14, 2020 •

edited

Loading

luthfi07 commented Dec 16, 2020

gitgurra commented May 18, 2021

NayabZahra commented Jan 26, 2022

krkrman commented May 5, 2022

what is the lowest loss value can reach? #9

what is the lowest loss value can reach? #9

Comments

fengjian0106 commented Nov 16, 2016

thtrieu commented Nov 16, 2016 • edited Loading

KentChun33333 commented Nov 16, 2016 • edited Loading

fengjian0106 commented Nov 17, 2016

thtrieu commented Dec 26, 2016

fengjian0106 commented Jan 10, 2017

thtrieu commented Jan 10, 2017 • edited Loading

MisayaZ commented Feb 7, 2017

thtrieu commented Feb 7, 2017 • edited Loading

ryansun1900 commented Feb 7, 2017

ghost commented Apr 11, 2017

Shameendra commented Sep 13, 2019 • edited Loading

CdAB63 commented Jun 14, 2020 • edited Loading

luthfi07 commented Dec 16, 2020

gitgurra commented May 18, 2021

NayabZahra commented Jan 26, 2022

krkrman commented May 5, 2022

thtrieu commented Nov 16, 2016 •

edited

Loading

KentChun33333 commented Nov 16, 2016 •

edited

Loading

thtrieu commented Jan 10, 2017 •

edited

Loading

thtrieu commented Feb 7, 2017 •

edited

Loading

Shameendra commented Sep 13, 2019 •

edited

Loading

CdAB63 commented Jun 14, 2020 •

edited

Loading