Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

main_train does still nothing #40

Open
lBlitzdl opened this issue Jun 18, 2020 · 4 comments
Open

main_train does still nothing #40

lBlitzdl opened this issue Jun 18, 2020 · 4 comments

Comments

@lBlitzdl
Copy link

Essentially this problem again: #38
But since a solution was never provided I reopen it.

I am using Google colab's free GPU to generate data and train models. I got data Generated, Data converted, but can't get Model trained.
The issue is whenever i run the training code, the script terminates at line 60 in Training/train.lua at the line:
M.network(inputs)
This function never returns, doesn't give any error or warning either.
Tried wirh different batch sizes, on CPU, on GPU

@airdine
Copy link

airdine commented Jun 28, 2020

Hello,

Look at this issue : aikupoker#15

I used a fresh OS install to solve it.
By the way, if you use cuda > 9.2 I recommend to clone torch source from this repo :
https://github.com/nagadomi/distro.git

Hope it helps !

@lBlitzdl
Copy link
Author

Thank you. Can you give me more information on how you did it?
Ubuntu Version?
Torch Version?
Lua Version?
Coda Version?

Thank you very much :)

@airdine
Copy link

airdine commented Jul 1, 2020

The simplest way was to use this docker image :
nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04

you can inspire yourself from : aikupoker repo

Good luck

@Sohaibb98
Copy link

for this, the host OS needs to be Ubuntu 16.04 or Ubuntu 18? on Ubuntu 18, it gives CMake error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants