Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File offsets should be u64, not usize #65

Closed
wants to merge 2 commits into from
Closed

File offsets should be u64, not usize #65

wants to merge 2 commits into from

Conversation

Dr-Emann
Copy link
Contributor

This is a breaking change, but both mmap and MapViewOfFile support 64 bit offsets, even on 32bit programs.

@Dr-Emann
Copy link
Contributor Author

The appveyor failure doesn't look related to anything I touched.

@dekellum
Copy link
Contributor

dekellum commented Apr 30, 2018

I was similarly wondering why I'm converting offset/len to usize (and asserting values are not beyond usize::max_value()) in order to use this API. How can we get this merged? If the breaking change aspect is the issue (though that should really be OK in an 0.7 release, given its <1.0), the internals could be changed to u64 and alternative MmapOptions:: offset64(u64) and len64(u64) could be added instead?

@dekellum
Copy link
Contributor

Well, its understandable that mapping length is limited to usize because it is memory addressable. However it would be more ergonomic, and I wonder if internally cleaner, if all offsets and sizes were u64 and the lengths checked and truncated before making the OS calls? I imagine most people are starting with u64 file offset values, to compute offset and length.

@BurntSushi
Copy link
Collaborator

@dekellum How do you index a &[u8] larger than 2^32 on a system whose usize is 32 bits?

@dekellum
Copy link
Contributor

You don't and I understand that, thanks. I still suspect most people mapping a File at least, will be computing both offset and length using u64 values. Anyway, if that isn't agreeable as part of this, then my compatibly suggestion above could just be limited to a new offset64(u64) method and preserving offset(usize) so this isn't a breaking change.

@oconnor663
Copy link

oconnor663 commented Aug 24, 2018

MmapInner::new makes the following call to libc:

let ptr = libc::mmap(
    ptr::null_mut(),
    aligned_len as libc::size_t,
    prot,
    flags,
    file,
    aligned_offset as libc::off_t,
);

Note that the offset is cast there into a libc::off_t, which I believe is 32 bits on (some?) 32 bit platforms. If this PR is passing a 64 bit offset all the way down to here, but ultimately truncating it, does it actually work?

@dekellum
Copy link
Contributor

Some Linux man page extracts related to large files on 32-bit platforms below.

This is clearly complex enough that it would be helpful if an memmap crate abstracted over it, rather than every application. If the rust API consistently used u64, and explicit checked for exceeding each platforms actual limit before truncating, raising a clear error that could be handled by the application, would that not be an improvement in the majority of use cases?

Rust "std::fs::Metadata::len" returns u64, including on 32-bit platforms. See also std::io::Seek. Wouldn't doing as per above be more symmetric, then, for mmap?

Old rust discussion with many additional links:

https://internals.rust-lang.org/t/pre-rfc-rust-and-large-file-support-lfs/2777 (2015-10)


Linux man open(2)

O_LARGEFILE
(LFS) Allow files whose sizes cannot be represented in an off_t (but can be represented in an off64_t) to be opened. The _LARGEFILE64_SOURCE macro must be defined (before including any header files) in order to obtain this definition. Setting the _FILE_OFFSET_BITS feature test macro to 64 (rather than using O_LARGEFILE) is the preferred method of accessing large files on 32-bit systems (see feature_test_macros(7)).

Linux man feature_test_macros(7)

_FILE_OFFSET_BITS
Defining this macro with the value 64 automatically converts references to 32-bit functions and data types related to file I/O and filesystem operations into references to their 64-bit counterparts. This is useful for performing I/O on large files (> 2 Gigabytes) on 32-bit systems. (Defining this macro permits correctly written programs to use large files with only a recompilation being required.)
64-bit systems naturally permit file sizes greater than 2 Gigabytes, and on those systems this macro has no effect.

LFS was the Large File Summit of X/Open. See also mmap64(3).

@oconnor663
Copy link

oconnor663 commented Aug 24, 2018

Yeah papering over these details would be nice. It looks like libc exposes mmap64 on Linux at least, so this PR could be expanded to call that where possible, and to do a checked cast otherwise. That said, I don't know how to ask for "the maximum value of whatever this number type is" in Rust without some kind of crazy hack. (Edit: Maybe it's safe to assume it's something like u128::max_value() as T?)

@danburkert
Copy link
Owner

I've merged the PR with light edits as part of #71, since I think u64 is the correct API to expose, based on how std::fs exposes file offsets. It is interesting that off64_t is defined to be a signed type, though.

@dekellum and @oconnor663, I agree there are potential issues lurking here, especially on 32-bit systems and/or those where mmap is not the same as mmap64. I'd happily take PR's to close these holes. The CI for memmap has a pretty wide range of targets, so it should be possible to test this out. Please feel free to open an additional issue/PR based on this conversation.

@danburkert danburkert closed this Sep 20, 2018
oconnor663 added a commit to oconnor663/memmap-rs that referenced this pull request Sep 20, 2018
This should address one of the concerns that came up in
danburkert#65.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants