Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory map file loading #50

Open
bbqsrc opened this issue Jan 12, 2022 · 10 comments
Open

Memory map file loading #50

bbqsrc opened this issue Jan 12, 2022 · 10 comments

Comments

@bbqsrc
Copy link
Member

bbqsrc commented Jan 12, 2022

Grammar checking is using quite a lot of RAM on our Divvun API server:

image

We've mitigated this for the spellchecking in DivvunSpell by using mmap instead of loading data into RAM, with minimal performance penalty in our use cases. Is this something that can be implemented for these grammar checking pipelines?

@unhammer
Copy link
Member

Is that 1313M RES on startup, or could there be a leak? (I'm seeing about half when I test with se.zcheck -n smegramrelease)

@TinoDidriksen
Copy link
Contributor

Isn't that a persistent pipe using CG-3's libcg3 API as part of the process? 'cause if so then GrammarSoft/cg3#74

@unhammer
Copy link
Member

Hm, could perhaps reload the data every so often as a workaround, though it might be easier to just restart the divvun-checker process in that case ;-)

@flammie
Copy link
Contributor

flammie commented Jan 22, 2022

I was profiling a bit for fun and it least my version that uses hfst-ospell didn't really have memory leaks but used up increasing amount of memory on some cache, I disabled that cache in the last version I hope if you can test that again? I guess we are planning to replace hfst-ospell stuff with divvunspell especially if it continues to be the bottleneck?

@bbqsrc
Copy link
Member Author

bbqsrc commented Jan 22, 2022

ah, is it using hfst-ospell? hehe, well we need to fix that then.

@bbqsrc
Copy link
Member Author

bbqsrc commented Jan 22, 2022

If you could give me a list of the functionality that is used by libdivvun from hfst-ospell, I can inventory anything missing for it to be ported across.

If there's not much, I can publish a stable C API header somewhere (basically leached straight from divvunspell-sdk-swift without the Swift ;) )

@flammie
Copy link
Contributor

flammie commented Jan 22, 2022

Mmh, I cannot remember if I made this stuff anymore but main part of hfst_ospell seems to be in speller::Spell here: https://github.com/divvun/libdivvun/blob/master/src/cgspell.cpp#L136
maybe @unhammer remembers?

@snomos
Copy link
Member

snomos commented Jan 22, 2022

The main additions to standard hfst-ospell are:

  • return all analyses of the suggestions
  • tagg all analyses so one can separate speller suggestions from regular analyses

@unhammer would know more 😊

@bbqsrc
Copy link
Member Author

bbqsrc commented Jan 22, 2022

oh god analysis, nooooo. Someone else can port that across, hahaha

@unhammer
Copy link
Member

Yeah as the code shows, we just use ZHfstOspeller::suggest and ZHfstOspeller::analyseSymbols from hfst-ospell.

You can easily make a pipeline without the speller step and check if that takes the pain away (just edit the pipespec.xml in your zcheck zip and remove that one element).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants