Skip to content
This repository has been archived by the owner on May 4, 2021. It is now read-only.

langstat2candidates.py requires large amounts of RAM #8

Open
achimr opened this issue Jan 24, 2017 · 1 comment
Open

langstat2candidates.py requires large amounts of RAM #8

achimr opened this issue Jan 24, 2017 · 1 comment
Assignees

Comments

@achimr
Copy link
Contributor

achimr commented Jan 24, 2017

langstat2candidates.py, particularly when used with the -candidates parameter uses up large amounts of RAM (needing 32-64 GB of RAM for large language pairs). This is because it reads the entire candidates file into memory (dictionary with the URLs as keys and the entire candidates file line as values). Retaining all this data seems unnecessary.
This reduces the parallelizability and leads to crashes.

@achimr achimr self-assigned this Jan 24, 2017
@achimr
Copy link
Contributor Author

achimr commented Oct 2, 2017

Matching candidates from some language into English with recent CommonCrawls (2016_50) requires 60+ GB of RAM

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant