Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmap for library(pio) with partial strings #251

Open
UWN opened this issue Dec 3, 2019 · 3 comments
Open

mmap for library(pio) with partial strings #251

UWN opened this issue Dec 3, 2019 · 3 comments

Comments

@UWN
Copy link

UWN commented Dec 3, 2019

((This is for a later moment after #24 #95 is done))

For UTF-8 files not containing a zero-byte (the majority of files to be parsed), phrase_from_file(Phrase__0, File) could avoid incremental copying altogether using mmap(3). The file is mapped at once into a fitting memory area of the heap up to the last page, which is written anew with a terminating zero-byte and a nil at the end.

@UWN
Copy link
Author

UWN commented Dec 4, 2019

... which means in the worst case that two pages are needed: The file ends one byte before the next page.
And thus an additional page is needed just for the []

@UWN
Copy link
Author

UWN commented Aug 17, 2020

Just to be sure: phrase_from_file/3 would need to first open the file, mmap it, and scan it for a zero-byte and malformed UTF-8 encodings, reporting them immediately. Further, the number of characters can be determined this way, should this be of use somehow.

Note that even in the presence of a zero-byte mmap still can be used, at least for the sequence up to that zero-byte.

@UWN
Copy link
Author

UWN commented Jan 15, 2021

Tiny moral update: mmap faster than syscalls, because it uses AVX-instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant