Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training requires too much RAM #285

Open
stweil opened this issue Oct 2, 2021 · 6 comments
Open

Training requires too much RAM #285

stweil opened this issue Oct 2, 2021 · 6 comments
Labels
performance Concerns the computational efficiency

Comments

@stweil
Copy link
Contributor

stweil commented Oct 2, 2021

Training with a large number of lines requires a huge amount of RAM (52 GiB RAM for 375000 lines). Loading the samples into memory contributes only a smaller part to this. Most memory is used by the following data processors (CenterNormalizerProcessor, FinalPreparation, BidiTextProcessor, ...). It looks like the memory which is used for these processors is not released in later steps. Maybe that can be changed by throwing away data which is no longer used.

Reducing the memory requirements is especially important for calamari-cross-fold-train which currently has to be restricted to a subset of folds even on a large server with 128 GiB of RAM.

@ChWick
Copy link
Member

ChWick commented Oct 2, 2021

Do you use data augmentation?

There is a preload flag that can disable preloading the files into RAM:

--train.preload=False
--val.preload=False

@stweil
Copy link
Contributor Author

stweil commented Oct 2, 2021

No, I don't use data augmentation at the moment. I saw the preload flag, but think that normally preloading is good as otherwise the data would have to be loaded again and again for each epoch. Am I wrong?

I think the central point is not preloading, but keeping intermediate data from the data processors. But of course I still don't now the internals of Calamari, so I might be wrong with that assumption.

@jacoborrje
Copy link

I have had the same issue as @stweil. I tested to run calamari-cross-fold-train with multiple folds on a larger dataset but with only one process in parallel. While monitoring the available free RAM it was possible to see how it decreased noticeably for each fold that was trained. Eventually, the computer ran out of free ram and the process froze. Is this expected behaviour, or could it (as stweil mentions) be because the memory used by the data processors is not released?

As mentioned, reducing memory requirements would be highly useful for training multifold models on larger datasets

@andbue
Copy link
Member

andbue commented Oct 26, 2021

Since I did not run any larger trainings currently, I have no experience with this memory leak myself. Digging through the code of cross-fold-train, I found that the training processes are run in separate processes, even if they are started from the same thread when max_parallel_models == 1. Could it be that the OS is somehow not freeing the memory even after the training process ends?
A workaround could be to put "process.kill()" at the end of utils.multiprocessing.run. Since I've seen the "Error: Process finished with code..." quite often, the "kill" should probably come before that line.

@stweil
Copy link
Contributor Author

stweil commented Oct 26, 2021

I don't think that there is a memory leak. The data processors require a lot of memory, and I wonder whether that memory is still needed after a processor has done its job. Maybe it is sufficient to reset some Python variables which hold the data of a processor.

@andbue
Copy link
Member

andbue commented Oct 26, 2021

You're right, killing the processes would only remedy the additional problems @jacoborrje encountered.

@bertsky bertsky added the performance Concerns the computational efficiency label Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Concerns the computational efficiency
Projects
None yet
Development

No branches or pull requests

5 participants