Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialize std::io::copy for files, sockets, or pipes on Linux #74426

Closed
ghost opened this issue Jul 17, 2020 · 8 comments · Fixed by #75272
Closed

Specialize std::io::copy for files, sockets, or pipes on Linux #74426

ghost opened this issue Jul 17, 2020 · 8 comments · Fixed by #75272
Labels
C-feature-request Category: A feature request, i.e: not implemented / a PR. O-linux Operating system: Linux T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@ghost
Copy link

ghost commented Jul 17, 2020

Currently std::fs::copy is using std::io::copy to copy files - it reads the file content into the memory, and then write it to the output file. std::fs::copy or (edit: it tries copy_file_range(2) first and has already benefited from the file cloning feature) std::io::copy can benefit from the file cloning feature in Btrfs and XFS by using FICLONE (and fallbacking to read-write loop if the filesystem does not support it).
I think is easy to use FICLONE in std::fs::copy, but it's less general (it requires file paths). std::io::copy is more general, but some additional methods may be needed to be added to std::io::Read and std::io::Write in order to use FICLONE.
libuv's support for FICLONE: libuv/libuv#1491
The ioctl_ficlonerange(2) man page: https://www.man7.org/linux/man-pages/man2/ioctl_ficlonerange.2.html

@jonas-schievink jonas-schievink added C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs Relevant to the library team, which will review and decide on the PR/issue. O-linux Operating system: Linux labels Jul 17, 2020
@cuviper
Copy link
Member

cuviper commented Jul 17, 2020

The Linux fs::copy at least tries to use copy_file_range to avoid the trip through userspace memory, but it will still end up with duplicate data on the device.

GNU coreutils also tries to use FICLONE for cp --reflink, but that's an explicit opt-in.
https://github.com/coreutils/coreutils/blob/6a3d2883fed853ee01079477020091068074e12d/src/copy.c#L408

@the8472
Copy link
Member

the8472 commented Jul 17, 2020

The Linux fs::copy at least tries to use copy_file_range to avoid the trip through userspace memory, but it will still end up with duplicate data on the device.

My understanding is that copy_file_range already uses reflink copies if supported. So all you need is a sufficiently recent kernel.
https://github.com/torvalds/linux/blob/b1b11d0063aab28aaf65f63cff56470bc01dc290/fs/read_write.c#L1728-L1744

std::fs::copy or std::io::copy can benefit from the file cloning feature in Btrfs and XFS by using FICLONE

io::copy would require specialization to make this work and this would be an unreliable optimization since wrapper types such as BufWriter could obscur the underlying files from the specialization. As mentioned above fs::copy should already be using reflinks.

@cuviper
Copy link
Member

cuviper commented Jul 17, 2020

My understanding is that copy_file_range already uses reflink copies if supported.

Thanks, I didn't realize it did that!

@ghost
Copy link
Author

ghost commented Jul 18, 2020

My understanding is that copy_file_range already uses reflink copies if supported.

Thanks, I didn't realize it did that!

Oops, I didn't realize that either. The copy_file_range(2) man page says nothing about it.

std::fs::copy or std::io::copy can benefit from the file cloning feature in Btrfs and XFS by using FICLONE

io::copy would require specialization to make this work and this would be an unreliable optimization since wrapper types such as BufWriter could obscur the underlying files from the specialization. As mentioned above fs::copy should already be using reflinks.

Then, if you think there is nothing to do, feel free to close this issue.

@the8472
Copy link
Member

the8472 commented Jul 18, 2020

Specializing io::copy may still be useful for generic code operating on Read/Write traits or already opened files.

@joshtriplett joshtriplett added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jul 20, 2020
@joshtriplett
Copy link
Member

Tagging T-libs, as it seems like a libs team question whether it would be valid to change the underlying semantics of these copy operations.

It sounds like std::fs::copy is already doing the right thing. std::io::copy might be able to use more specialized operations in various cases, such as splice or FICLONE, or at the very least, sendfile. (Or, long-term, io_uring.)

@ghost
Copy link
Author

ghost commented Jul 20, 2020

Specializing std::io::copy is also useful to clone only a part of a file to only a part of another file - for example, when creating a tar archive.
Also, #60689 may get resolved if this is resolved.

@ghost ghost changed the title Use FICLONE for std::fs::copy or std::io::copy on Linux Specialize std::io::copy for files, sockets, or pipes on Linux Jul 20, 2020
@the8472
Copy link
Member

the8472 commented Jul 22, 2020

Since this would require specializing on the input and output type the naive approach would lead to quadratic blowup of types. So it's probably better to have an internal highly parameterized copy helper and two separate sets of specializations that each prepare one half of the parameter set.
That would allow extracting the file descriptors and some ancillary things from a large set of concrete wrapper types, i.e. it could also cover #49921 while avoiding the concerns in #71091 at the cost of having to spell out each specialized type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: A feature request, i.e: not implemented / a PR. O-linux Operating system: Linux T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants