Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuse OverlayFs implementation #156

Merged
merged 5 commits into from
Mar 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions .github/workflows/xfstests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,44 @@ jobs:
cd $GITHUB_WORKSPACE
sudo ./tests/scripts/xfstests_pathr.sh

xfstests_on_overlayfs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install Rust
uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- name: Build overlay binary
run: |
cd tests/overlay
cargo build --release
sudo install -t /usr/sbin/ -m 700 ./target/release/overlay
- name: Setup and run xfstest
run: |
cd $GITHUB_WORKSPACE
sudo ./tests/scripts/xfstests_overlay.sh

unionmount_testsuite_on_overlayfs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install Rust
uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- name: Build overlay binary
run: |
cd tests/overlay
cargo build --release
sudo install -t /usr/sbin/ -m 700 ./target/release/overlay
- name: Setup and run unionmount testsuite
run: |
cd $GITHUB_WORKSPACE
sudo ./tests/scripts/unionmount_test_overlay.sh
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ libc = "0.2.68"
log = "0.4.6"
mio = { version = "0.8", features = ["os-poll", "os-ext"] }
nix = "0.24"
radix_trie = "0.2.1"
tokio = { version = "1", optional = true }
tokio-uring = { version = "0.4.0", optional = true }
vmm-sys-util = { version = "0.11", optional = true }
Expand Down
7 changes: 7 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -54,3 +54,10 @@ smoke-macos: check-macos

docker-smoke:
docker run --env RUST_BACKTRACE=1 --rm --privileged --volume ${current_dir}:/fuse-rs rust:1.68 sh -c "rustup component add clippy rustfmt; cd /fuse-rs; make smoke-all"

WeiZhang555 marked this conversation as resolved.
Show resolved Hide resolved
testoverlay:
cd tests/testoverlay && cargo build

# Setup xfstests env and run.
xfstests:
./tests/scripts/xfstests.sh
242 changes: 242 additions & 0 deletions docs/images/overlayfs.drawio

Large diffs are not rendered by default.

Binary file added docs/images/overlayfs_dir.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/overlayfs_non_dir_file.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/overlayfs_structs.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
60 changes: 60 additions & 0 deletions docs/overlayfs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Architecture of Overlay FS

The implementation of userspace Overlay FS follows [the design of the kernel](https://docs.kernel.org/filesystems/overlayfs.html),
but it is not a direct port.
There are some differences between the kernel implementation and the userspace implementation due to FUSE limitations,
it's under heavy development to make it more stable and more compatible.

## Basic Struct Definitions

There're some important structs in the implementation of Overlay FS, they are:

* `OverlayFs`: the main struct of the union FS, it's composed of multiple layers, normally one optional writable upper layer and many readonly lower layers.
* `OverlayInode`: inode struct in OverlayFs, one OverlayInode is composed of many `RealInode` in each layer.
* `RealInode`: wrapper for backend `inode` in one single layer.
* `HandleData`: opened file handle in OverlayFs, one OverlayInode reflects to one OverlayInode and one optional `RealHandle` in some layer.
* `RealHandle`: wrapper for backend opened file handle in one single layer.

Also another trait named `Layer` is introducted to represent a single layer in OverlayFs, only filesystems which implement this trait can be used as a layer in OverlayFs.

Relationship between these structs are illustrated in the following figure:

![OverlayFs Structs](./images/overlayfs_structs.png)

## Non-Directory File

Following kernel Overlay semantics, OverlayFs uses the following rules to handle non-directory files:

* If a file with same name exists in all layers, the topmost file will be choosed, any other files with same name in lower layers will be hidden.
* If a file in lower filesystem is accessed in a way the requires write-access, such as opening for write access, changing some metadata etc.,
the file is first copied from the lower filesystem to the upper filesystem (copy_up).

![OverlayFs Non-Directory File](./images/overlayfs_non_dir_file.png)

## Directory

Following kernel Overlay semantics, OverlayFs uses the following rules to handle directories:

* If a directory with same name exists in all layers, the union directory will merge all entries of directory in all layers.
* If a directory is set as opaque, all entries in lower layers will be hidden.
* The copy up logic is similar to non-directory file, any write access to a directory will trigger copy up.

![OverlayFs Directory](./images/overlayfs_dir.png)

## Whiteout

A whiteout is a special file in OverlayFs, it indicates a deletion of a file or directory in lower layer.
whiteout is device file with major number 0 and minor number 0,
and the name of whiteout file is the name of file or directory to be deleted.

## Opaque

Opaque is a special flag for directory in OverlayFs, it indicates that all entries of directory in lower layers will be ignored.
Opaque is implemented by setting one of these xattr to 'y':

* `trusted.overlay.opaque`
* `user.overlay.opaque`
* `user.fuseoverlayfs.opaque`

`user.fuseoverlayfs.opaque` is customized flag for our fuse-overlayfs.

5 changes: 5 additions & 0 deletions src/api/filesystem/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,11 @@ pub use async_io::{AsyncFileSystem, AsyncZeroCopyReader, AsyncZeroCopyWriter};
mod sync_io;
pub use sync_io::FileSystem;

#[cfg(all(any(feature = "fusedev", feature = "virtiofs"), target_os = "linux"))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be preferrable to mount overlayfs inside guest, so is there any need to turn on overlayfs for virtiofs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of Fuse OverlayFS is to take over Container RootFs totally through Nydus and Fuse-backend-rs, so I suppose mounting overlayfs on host is strong requirement.

Besides, the OverlayFs replies heavily on Passthroughfs currently, so I copied #[cfg] for 'passthroughfs' from src/lib.rs:

#[cfg(all(any(feature = "fusedev", feature = "virtiofs"), target_os = "linux"))]
pub mod passthrough;

mod overlay;
#[cfg(all(any(feature = "fusedev", feature = "virtiofs"), target_os = "linux"))]
pub use overlay::Layer;

/// Information about a path in the filesystem.
#[derive(Copy, Clone, Debug)]
pub struct Entry {
Expand Down
205 changes: 205 additions & 0 deletions src/api/filesystem/overlay.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
// Copyright (C) 2023 Ant Group. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE-BSD-3-Clause file.

#![allow(missing_docs)]

use std::ffi::{CStr, CString};
use std::io::{Error, ErrorKind, Result};

use super::{Context, Entry, FileSystem, GetxattrReply};
use crate::abi::fuse_abi::stat64;

pub const OPAQUE_XATTR_LEN: u32 = 16;
pub const OPAQUE_XATTR: &str = "user.fuseoverlayfs.opaque";
pub const UNPRIVILEGED_OPAQUE_XATTR: &str = "user.overlay.opaque";
pub const PRIVILEGED_OPAQUE_XATTR: &str = "trusted.overlay.opaque";

/// A filesystem must implement Layer trait, or it cannot be used as an OverlayFS layer.
pub trait Layer: FileSystem {
/// Return the root inode number
fn root_inode(&self) -> Self::Inode;

/// Create whiteout file with name <name>.
///
/// If this call is successful then the lookup count of the `Inode` associated with the returned
/// `Entry` must be increased by 1.
fn create_whiteout(&self, ctx: &Context, parent: Self::Inode, name: &CStr) -> Result<Entry> {
// Use temp value to avoid moved 'parent'.
let ino: u64 = parent.into();
match self.lookup(ctx, ino.into(), name) {
Ok(v) => {
// Find whiteout char dev.
if is_whiteout(v.attr) {
return Ok(v);
}
// Non-negative entry with inode larger than 0 indicates file exists.
if v.inode != 0 {
// Decrease the refcount.
self.forget(ctx, v.inode.into(), 1);
// File exists with same name, create whiteout file is not allowed.
return Err(Error::from_raw_os_error(libc::EEXIST));
}
}
Err(e) => match e.raw_os_error() {
Some(raw_error) => {
// We expect ENOENT error.
if raw_error != libc::ENOENT {
return Err(e);
}
}
None => return Err(e),
},
}

// Try to create whiteout char device with 0/0 device number.
let dev = libc::makedev(0, 0);
let mode = libc::S_IFCHR | 0o777;
self.mknod(ctx, ino.into(), name, mode, dev as u32, 0)
}

/// Delete whiteout file with name <name>.
fn delete_whiteout(&self, ctx: &Context, parent: Self::Inode, name: &CStr) -> Result<()> {
// Use temp value to avoid moved 'parent'.
let ino: u64 = parent.into();
match self.lookup(ctx, ino.into(), name) {
Ok(v) => {
if v.inode != 0 {
// Decrease the refcount since we make a lookup call.
self.forget(ctx, v.inode.into(), 1);
}

// Find whiteout so we can safely delete it.
if is_whiteout(v.attr) {
return self.unlink(ctx, ino.into(), name);
}
// Non-negative entry with inode larger than 0 indicates file exists.
if v.inode != 0 {
// File exists but not whiteout file.
return Err(Error::from_raw_os_error(libc::EINVAL));
}
}
Err(e) => match e.raw_os_error() {
Some(raw_error) => {
// ENOENT is acceptable.
if raw_error != libc::ENOENT {
return Err(e);
}
}
None => return Err(e),
},
}
Ok(())
}

/// Check if the Inode is a whiteout file
fn is_whiteout(&self, ctx: &Context, inode: Self::Inode) -> Result<bool> {
let (st, _) = self.getattr(ctx, inode, None)?;

// Check attributes of the inode to see if it's a whiteout char device.
Ok(is_whiteout(st))
}

/// Set the directory to opaque.
fn set_opaque(&self, ctx: &Context, inode: Self::Inode) -> Result<()> {
// Use temp value to avoid moved 'parent'.
let ino: u64 = inode.into();

// Get attributes and check if it's directory.
let (st, _d) = self.getattr(ctx, ino.into(), None)?;
if !is_dir(st) {
// Only directory can be set to opaque.
return Err(Error::from_raw_os_error(libc::ENOTDIR));
}
// A directory is made opaque by setting the xattr "trusted.overlay.opaque" to "y".
// See ref: https://docs.kernel.org/filesystems/overlayfs.html#whiteouts-and-opaque-directories
self.setxattr(
ctx,
ino.into(),
to_cstring(OPAQUE_XATTR)?.as_c_str(),
b"y",
0,
)
}

/// Check if the directory is opaque.
fn is_opaque(&self, ctx: &Context, inode: Self::Inode) -> Result<bool> {
// Use temp value to avoid moved 'parent'.
let ino: u64 = inode.into();

// Get attributes of the directory.
let (st, _d) = self.getattr(ctx, ino.into(), None)?;
if !is_dir(st) {
return Err(Error::from_raw_os_error(libc::ENOTDIR));
}

// Return Result<is_opaque>.
let check_attr = |inode: Self::Inode, attr_name: &str, attr_size: u32| -> Result<bool> {
let cname = CString::new(attr_name)?;
match self.getxattr(ctx, inode, cname.as_c_str(), attr_size) {
Ok(v) => {
// xattr name exists and we get value.
if let GetxattrReply::Value(buf) = v {
if buf.len() == 1 && buf[0].to_ascii_lowercase() == b'y' {
return Ok(true);
}
}
// No value found, go on to next check.
Ok(false)
}
Err(e) => {
if let Some(raw_error) = e.raw_os_error() {
if raw_error == libc::ENODATA {
return Ok(false);
}
}

Err(e)
}
}
};

// A directory is made opaque by setting some specific xattr to "y".
// See ref: https://docs.kernel.org/filesystems/overlayfs.html#whiteouts-and-opaque-directories

// Check our customized version of the xattr "user.fuseoverlayfs.opaque".
let is_opaque = check_attr(ino.into(), OPAQUE_XATTR, OPAQUE_XATTR_LEN)?;
if is_opaque {
return Ok(true);
}

// Also check for the unprivileged version of the xattr "trusted.overlay.opaque".
let is_opaque = check_attr(ino.into(), PRIVILEGED_OPAQUE_XATTR, OPAQUE_XATTR_LEN)?;
if is_opaque {
return Ok(true);
}

// Also check for the unprivileged version of the xattr "user.overlay.opaque".
let is_opaque = check_attr(ino.into(), UNPRIVILEGED_OPAQUE_XATTR, OPAQUE_XATTR_LEN)?;
if is_opaque {
return Ok(true);
}

Ok(false)
}
}

pub(crate) fn is_dir(st: stat64) -> bool {
st.st_mode & libc::S_IFMT == libc::S_IFDIR
}

pub(crate) fn is_chardev(st: stat64) -> bool {
st.st_mode & libc::S_IFMT == libc::S_IFCHR
}

pub(crate) fn is_whiteout(st: stat64) -> bool {
// A whiteout is created as a character device with 0/0 device number.
// See ref: https://docs.kernel.org/filesystems/overlayfs.html#whiteouts-and-opaque-directories
let major = unsafe { libc::major(st.st_rdev) };
let minor = unsafe { libc::minor(st.st_rdev) };
is_chardev(st) && major == 0 && minor == 0
}

pub(crate) fn to_cstring(name: &str) -> Result<CString> {
CString::new(name).map_err(|e| Error::new(ErrorKind::InvalidData, e))
}
2 changes: 2 additions & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,8 @@ pub type Result<T> = ::std::result::Result<T, Error>;
pub mod abi;
pub mod api;

#[cfg(all(any(feature = "fusedev", feature = "virtiofs"), target_os = "linux"))]
pub mod overlayfs;
#[cfg(all(any(feature = "fusedev", feature = "virtiofs"), target_os = "linux"))]
pub mod passthrough;
pub mod transport;
Expand Down
45 changes: 45 additions & 0 deletions src/overlayfs/config.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
// Copyright (C) 2023 Ant Group. All rights reserved.
// SPDX-License-Identifier: Apache-2.0

use self::super::CachePolicy;
use std::fmt;
use std::time::Duration;

#[derive(Default, Clone, Debug)]
pub struct Config {
pub mountpoint: String,
pub work: String,
pub do_import: bool,
// Filesystem options.
pub writeback: bool,
pub no_open: bool,
pub no_opendir: bool,
pub killpriv_v2: bool,
pub no_readdir: bool,
pub perfile_dax: bool,
pub cache_policy: CachePolicy,
pub attr_timeout: Duration,
pub entry_timeout: Duration,
}

impl Clone for CachePolicy {
fn clone(&self) -> Self {
match *self {
CachePolicy::Never => CachePolicy::Never,
CachePolicy::Always => CachePolicy::Always,
CachePolicy::Auto => CachePolicy::Auto,
}
}
}

impl fmt::Debug for CachePolicy {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
let policy = match *self {
CachePolicy::Never => "Never",
CachePolicy::Always => "Always",
CachePolicy::Auto => "Auto",
};

write!(f, "CachePolicy: {}", policy)
}
}
Loading
Loading