Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run machines directly from backing storage #281

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

edubart
Copy link
Contributor

@edubart edubart commented Oct 10, 2024

This PR is still WIP and in research, it's my take on #277 .

This PR completely reworks how memory mapping is done. It also introduces many routines for handling file operations that depends on the host OS and filesystem.

Progress

  • Allocates memory using mmap() for all memory PMAs
  • Use file locks (through flock()) to make sure backing files are not changed while in use by a machine
  • Use ftruncate() to initialize a sparse file for new machines backing storage
  • Uses backing file as much as possible, speeding up machine startup by skipping memory copy (even for kernel image)
  • Support read-only mappings using O_RDONLY and just PROT_READ
  • Add backing_storage runtime option
  • Copy backing files using cp --reflink on COW filesystems
  • Shrink newly created image files by skipping zeroed blocks
  • Handle possible SIGBUS on disk errors.
  • Disallow fork() on machines using shared images
  • Windows support
  • Test on MacOS
  • Fallback support (no OS, for WASM)
  • Benchmark on BTRFS/EXT4 filesystems with large flash drives
  • Add backing storage option to the cli

Throughput benchmark

benchmark ext4 (before) ext4 btrfs xfs
load config + memory storage + discard 436.18 3407.30 3512.25 3481.82
load config + memory storage + store 4.82 16.05 14.73 17.83
load config + backing storage + discard N/A 11.44 577.46 1155.42
load config + backing storage + store N/A 12.49 535.93 1038.09
load snapshot + memory storage + discard 99.10 13668.00 14463.61 14639.84
load snapshot + memory storage + store 4.66 13.76 12.35 14.74
load snapshot + backing storage + discard N/A 10.48 866.25 1812.44
load snapshot + backing storage + store N/A 11.39 885.68 1467.20
roll snapshot + memory storage + discard 223289.05 130662.59 211550.67 131551.25
roll snapshot + memory storage + store 5.26 19.44 15.86 20.14
roll snapshot + backing storage + discard N/A 478350.80 199130.24 328092.06
roll snapshot + backing storage + store N/A 17.24 4476.19 2311.65

Benchmark notes:

  • The numbers are number of operations/second, every operation advances 1000 machine cycles.
  • The used machine has 512MB of root flash drive, 64MB on RAM and its total snapshot size is 583MB.
  • The used machine is just booting the Linux and doing nothing.
  • Root hash computation was disabled for load/store.
  • "backing storage" means the machine runs on a memory map of a shared file.
  • "memory storage" means the machine runs on a private memory map.
  • "load config" means it creates a machine from scratch from a new config.
  • "load snapshot" means it creates a machine from previously stored snapshot.
  • "roll snapshot" means it rolls forward the machine state from a snapshot.
  • "discard" means the machine is discarded after advancing machine state.
  • "store" means a machine snapshot is stored to a new directory after every advance.
  • "ext (before") is before this PR.
  • "ext4" is not a copy-on-write filesystem, so it performs slow file copy.
  • "btrfs" and "xfs" are copy-on-write-filesystem, so it performs fast copy with reflink.

Special benchmarks:

  • "roll snapshot + memory storage + store" is the strategy rollups advancer node were to use if had to save a machine snapshot to disk after every advance to be used externally.
  • "roll snapshot + backing storage + store" is the strategy rollups advancer node could use to make its entire state available externally (e.g, give it to a reader node).
  • "load snapshot + memory storage + discard" is the strategy reader node could use to serve inspects from a machine state available in the disk.

Disk usage benchmark

benchmark ext4 (before) ext4 btrfs xfs
load config + memory storage + store 583.03 MB 123.64 MB 123.90 MB 123.65 MB
load config + backing storage + store 124.64 MB 1.14 MB 1.38 MB
load snapshot + memory storage + store 583.03 MB 123.64 MB 123.89 MB 123.65 MB
load snapshot + backing storage + store 123.64 MB 0.05 MB 0.16 MB
roll snapshot + memory storage + store 583.03 MB 123.65 MB 123.91 MB 123.65 MB
roll snapshot + backing storage + store 123.65 MB 0.05 MB 0.05 MB

Notes:

  • The numbers are amount of MB used in the filesystem per machine snapshot stored.
  • "btrfs" and "xfs" filesystem uses less disk usage on "backing storage" because it uses copy reflink.
  • "ext4" before this PR uses more disk space because it doesn't take advantage of sparse files.

Larger machines

The following is a benchmark for a machine with 256GB/1TB of RAM running on a host filesystem with just 8GB of space.

benchmark (BTRFS - 256GB of RAM) ops/sec MB/snapshot
load config + backing storage + discard 517.80
load config + backing storage + store 489.64 1.14 MB
load snapshot + backing storage + discard 790.93
load snapshot + backing storage + store 790.24 0.05 MB
roll snapshot + backing storage + discard 274723.31
roll snapshot + backing storage + store 4799.38 0.05 MB
benchmark (BTRFS - 1TB of RAM) ops/sec MB/snapshot
load config + backing storage + discard 204.35
load config + backing storage + store 193.15 1.14 MB
load snapshot + backing storage + discard 234.92
load snapshot + backing storage + store 222.32 0.05 MB
roll snapshot + backing storage + discard 275655.38
roll snapshot + backing storage + store 4737.27 0.05 MB

@edubart edubart added the enhancement New feature or request label Oct 10, 2024
@edubart edubart added this to the v0.20.0 milestone Oct 10, 2024
@edubart edubart self-assigned this Oct 10, 2024
@edubart edubart changed the base branch from main to refactor/c-api October 10, 2024 22:55
@edubart edubart marked this pull request as draft October 10, 2024 22:59
@edubart edubart force-pushed the feature/backing-storage branch from a99ed4d to d03411e Compare October 11, 2024 18:22
@edubart edubart force-pushed the feature/backing-storage branch from d03411e to 8619295 Compare October 13, 2024 19:52
Base automatically changed from refactor/c-api to main December 10, 2024 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

1 participant