Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about training the neural networks #14

Open
zhoujz10 opened this issue Aug 7, 2018 · 8 comments
Open

A question about training the neural networks #14

zhoujz10 opened this issue Aug 7, 2018 · 8 comments

Comments

@zhoujz10
Copy link

zhoujz10 commented Aug 7, 2018

I tried to implement deepstack with python, and generated 4M training samples for the turn network. And I'm using exactly the same network structure as the author did.

But I found that with hundreds of epochs' training, the huber loss is about 0.2 on the training samples, which is far larger than the author's (0.016). Do you have any suggestions on training the network?

Thank you!

@zhoujz10
Copy link
Author

zhoujz10 commented Aug 8, 2018

@happypepper

@happypepper
Copy link
Owner

what was your river loss? or did you solve 2 streets?

@zhoujz10
Copy link
Author

@happypepper Hi, thank you for your reply. I solve 2 streets instead of using a river network. And I calculated my exploitability of a turn case, the exploitability is around 2 mbb, so I guess my resolving process is right. Maybe there are bugs in my bucketing?

How many epochs did you use to train your network? I used thousands of epochs but my training loss is still very high.

@happypepper
Copy link
Owner

After around 80 epochs, it stopped improving. Validation loss after first epoch was 0.08 already.

How did you do bucketing? k means + EMD?

@zhoujz10
Copy link
Author

zhoujz10 commented Aug 11, 2018

@happypepper I use k-means on the river round, and EMD on other rounds. I used the same bucketing in the reference papers.

I noticed that in your code, you made a change when calculating the loss.

In line 64 in masked_huber_loss.lua, your code is:

local loss_multiplier = (batch_size * feature_size) / self.mask_sum:sum()

This means you average the loss on valid buckets, not on all the 1000 buckets. I think this makes sense, and the author's repo has a bug here.

Is there any way to debug my bucketing? I'm at the end of my rope.

@happypepper
Copy link
Owner

how is it possible to use k-means for river? There is only one number instead of distribution. EMD is usually used in combination with k-means.

You can email me and we can communicate outside of github somehow, it's easier

@zhoujz10
Copy link
Author

@happypepper Hi, I just sent an email to you and described the method of generating river clusters.

@aligatorblood
Copy link

Hi, can you send me this email as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants