-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cuda Error: an illegal memory access was encountered #182
Comments
Please show us more details, such as command line, paddle version, model config file. |
@backyes cmd line: |
hi @jamestang0219 |
@hedaoyuan without gpu, it works well but too slow. i wanna use 4 gpus to boost the train. |
There may be a bug with this |
@hedaoyuan same error occurred. but I change the batch size, the problem never occur. I don't think the batch size will cause cuda's error, maybe there are some bugs in |
@jamestang0219 |
@hedaoyuan |
There may be problem with the input value, but not every time a memory error occurs. |
@hedaoyuan |
@jamestang0219 May be. |
@hedaoyuan |
No response for too long, reopen if there is still some problems here. |
update directory structure
add ernie_varlen test file
* add get_epoch_finish interface * add return * delete return
* add get_epoch_finish interface * add return * delete return
* add set slot_num for psgpuwraper (#177) * add set slot_num_for_pull_feature for psgpuwarper * Add get_epoch_finish python interface (#182) * add get_epoch_finish interface * add return * delete return * add unzip op (#183) * fix miss key for error dataset (#186) * fix miss key for error dataset * fix miss key for error dataset Co-authored-by: yangjunchao <yangjunchao@baidu.com> * add excluded_train_pair and infer_node_type (#187) * support return of degree (#188) * fix task stuck in barrier (#189) Co-authored-by: yangjunchao <yangjunchao@baidu.com> * check node/feature format when loading (#190) * check node&feature format when loading * check node&feature format when loading (2£ (2) * degrade log (#191) * [PGLBOX]fix conflict * [PGLBOX]fix conflict * [PGLBOX]replace LodTensor with phi::DenseTensor * [PGLBOX]fix gpu_primitives.h include path * [PGLBOX]from platform::PADDLE_CUDA_NUM_THREADS to phi::PADDLE_CUDA_NUM_THREADS * [PGLBOX]fix unzip example code * [PGLBOX]fix unzip example code * [PGLBOX]fix unzip example code * [PGLBOX]fix unzip example code * [PGLBOX]fix unzip ut * [PGLBOX]fix unzip ut * [PGLBOX]fix code style * [PGLBOX]fix code style * [PGLBOX]fix code style * fix code style * fix code style * fix unzip ut * fix unzip ut * fix unzip ut * fix unzip * fix code stype * add ut * add c++ ut & fix train_mode_ set * fix load into memory * fix c++ ut * fix c++ ut * fix c++ ut * fix c++ ut * fix code style * fix collective * fix unzip_op.cc * fix barrier * fix code style * fix barrier * fix barrier * fix code styple * fix unzip * add unzip.py * add unzip.py * fix unzip.py --------- Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com> Co-authored-by: Siming Dai <908660116@qq.com> Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com> Co-authored-by: yangjunchao <yangjunchao@baidu.com>
When I'm training the LSTM model, the error occur. Here is the partial training log:
and I checked nvidia source manager before the error occured:
The model can be initialized successfully, but when Paddle trained samples, the error will occur.
I've tried several times, each time get the same error.
The text was updated successfully, but these errors were encountered: