Replies: 2 comments 7 replies
-
Hi @Peetee06, Thanks for reaching out! It's interesting that you are intentionally trying to get your model to overfit. It also looks like you're onto the right idea with trying to get your model to overfit -- bumping up early stopping (which, by the way, you can also set to Some questions on my end:
|
Beta Was this translation helpful? Give feedback.
-
@Peetee06 thanks for sharing info. Happy to assist in getting your model to overfit. In general, overfitting depends on two factors: 1. Does your model have sufficient capacity to train on the dataset Training for longer and disabling early stopping are good starting points. Additionally, you'd want to make sure that your model has enough capacity (i.e. parameters) to overfit on the data. To figure out if the model if overfitting or underfitting, can you share the training curves for your model? You should be able to generate them by setting up Tensorboard. Instructions here. Ideally, in overfitting, your training loss should go to zero while the testing and validation loss should continue to increase. If you aren't seeing that, can you try:
2. Does your dataset have enough signal to train? In order to figure this out, I'd recommend trying to visualize your data if possible. However, visualizing audio data points is not as straightforward and requires visualizing t-SNE embeddings of embeddings of your data, so I would recommend starting with the suggestions listed above. Happy to follow up when you can share your training curves or output of trying larger models. |
Beta Was this translation helpful? Give feedback.
-
I am currently working on my master's thesis on the topic of technical debt in ML. I am implementing the Infrastructure Tests proposed in "The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction" by Breck et al. while using Ludwig. For the test "Infra 2: Model specification code is unit tested", they propose to train a model to overfit on a test dataset, to get an indicator for if the model can actually learn given data.
I was not yet able to get a model to overfit using Ludwig. What I tried by now is "disabling" early stop (setting it to the number of epochs) and just setting the number of epochs to a high number (300). The model will converge to a loss of 0.38 +- 0.04 and an accuracy of 0.87 +- 0.02 (train/validation/test).
This is the configuration I used:
From what I understand, if the model was overfitting, the validation and test accuracy should significantly drop compared to the one from train.
How do I get the model to overfit so I can implement the Infra 2 test?
Beta Was this translation helpful? Give feedback.
All reactions